From: James Almer <jamrial@gmail.com> To: ffmpeg-devel@ffmpeg.org Subject: Re: [FFmpeg-devel] [RFC]] swscale modernization proposal Date: Sat, 29 Jun 2024 11:11:09 -0300 Message-ID: <c7d79fc6-2d54-47d6-9316-a9c9f3ad3ee0@gmail.com> (raw) In-Reply-To: <20240629160557.GB37436@haasn.xyz> On 6/29/2024 11:05 AM, Niklas Haas wrote: > On Sat, 29 Jun 2024 14:35:32 +0200 Michael Niedermayer <michael@niedermayer.cc> wrote: >> On Sat, Jun 29, 2024 at 01:47:43PM +0200, Niklas Haas wrote: >>> On Sat, 22 Jun 2024 15:13:34 +0200 Niklas Haas <ffmpeg@haasn.xyz> wrote: >>>> Hey, >>>> >>>> As some of you know, I got contracted (by STF 2024) to work on improving >>>> swscale, over the course of the next couple of months. I want to share my >>>> current plans and gather feedback + measure sentiment. >>>> >>>> ## Problem statement >>>> >>>> The two issues I'd like to focus on for now are: >>>> >>>> 1. Lack of support for a lot of modern formats and conversions (HDR, ICtCp, >>>> IPTc2, BT.2020-CL, XYZ, YCgCo, Dolby Vision, ...) >>>> 2. Complicated context management, with cascaded contexts, threading, stateful >>>> configuration, multi-step init procedures, etc; and related bugs >>>> >>>> In order to make these feasible, some amount of internal re-organization of >>>> duties inside swscale is prudent. >>>> >>>> ## Proposed approach >>>> >>>> The first step is to create a new API, which will (tentatively) live in >>>> <libswscale/avscale.h>. This API will initially start off as a near-copy of the >>>> current swscale public API, but with the major difference that I want it to be >>>> state-free and only access metadata in terms of AVFrame properties. So there >>>> will be no independent configuration of the input chroma location etc. like >>>> there is currently, and no need to re-configure or re-init the context when >>>> feeding it frames with different properties. The goal is for users to be able >>>> to just feed it AVFrame pairs and have it internally cache expensive >>>> pre-processing steps as needed. Finally, avscale_* should ultimately also >>>> support hardware frames directly, in which case it will dispatch to some >>>> equivalent of scale_vulkan/vaapi/cuda or possibly even libplacebo. (But I will >>>> defer this to a future milestone) >>> >>> So, I've spent the past days implementing this API and hooking it up to >>> swscale internally. (For testing, I am also replacing `vf_scale` by the >>> equivalent AVScale-based implementation to see how the new API impacts >>> existing users). It mostly works so far, with some left-over translation >>> issues that I have to address before it can be sent upstream. >>> >>> ------ >>> >>> One of the things I was thinking about was how to configure >>> scalers/dither modes, which sws currently, somewhat clunkily, controls >>> with flags. IMO, flags are not the right design here - if anything, it >>> should be a separate enum/int, and controllable separately for chroma >>> resampling (4:4:4 <-> 4:2:0) and main scaling (e.g. 50x50 <-> 80x80). >>> >>> That said, I think that for most end users, having such fine-grained >>> options is not really providing any end value - unless you're already >>> knee-deep in signal theory, the actual differences between, say, >>> "natural bicubic spline" and "Lanczos" are obtuse at best and alien at >>> worst. >>> >>> My idea was to provide a single `int quality`, which the user can set to >>> tune the speed <-> quality trade-off on an arbitrary numeric scale from >>> 0 to 10, with 0 being the fastest (alias everything, nearest neighbour, >>> drop half chroma samples, etc.), the default being something in the >>> vicinity of 3-5, and 10 being the maximum quality (full linear >>> downscaling, anti-aliasing, error diffusion, etc.). >> >> I think 10 levels is not fine grained enough, >> when there are more then 10 features to switch on/off we would have >> to switch more than 1 at a time. >> >> also the scale has an issue, that becomes obvious when you consider the >> extreems, like memset(0) at level 0, not converting chroma at level 1 >> and hiring a human artist to paint a matching upscaled image at 10 >> using a neural net at 9 >> >> the quality factor would probably have thus at least 3 ranges >> 1. the as fast as possible with noticeable quality issues >> 2. the normal range >> 3. the as best as possible, disregarding the computation needed >> >> some encoder (like x264) use words like UltraFast and Placebo for the ends >> of this curve > > I like the idea of using explicit names instead of numbers. It > translates well onto the human-facing API anyway. > > I don't think 10 levels is too few if we also pair it with a granular > API for controlling exactly which scalers etc. you want. > > In particular, if we want human-compatible names for them (ranging from > "ultrafast" to "placebo" as discussed), you would be hard-pressed to > find many more sensible names than 10. > > Especially if we treat these just as presets and not the only way to > configure them. > >> >> It also would be possible to use a more formal definition of how much one >> wants to trade quality per time spend but that then makes it harder to >> decide which feature to actually turn on when one requests a ratio between >> PSNR and seconds >> >> >>> >>> The upside of this approach is that it would be vastly simpler for most >>> end users. It would also track newly added functionality automatically; >>> e.g. if we get a higher-quality tone mapping mode, it can be >>> retroactively added to the higher quality presets. The biggest downside >>> I can think of is that doing this would arguably violate the semantics >>> of a "bitexact" flag, since it would break results relative to >>> a previous version of libswscale - unless we maybe also force a specific >>> quality level in bitexact mode? >>> >>> Open questions: >>> >>> 1. Is this a good idea, or do the downsides outweigh the benefits? >> >> >> >>> 2. Is an "advanced configuration" API still needed, in addition to the >>> quality presets? >> >> For regression testing and debuging it is very usefull to be able to turn >> features on one at a time. A failure could then be quickly isolated to >> a feature. > > Very strong argument in favor of granular control. I'll find a way to > support it while still having "presets". > >> >> >> >> [...] >> >>> /** >>> * Statically test if a conversion is supported. Values of (respectively) >>> * NONE/UNSPECIFIED are ignored. >>> * >>> * Returns 1 if the conversion is supported, or 0 otherwise. >>> */ >>> int avscale_test_format(enum AVPixelFormat dst, enum AVPixelFormat src); >>> int avscale_test_colorspace(enum AVColorSpace dst, enum AVColorSpace src); >>> int avscale_test_primaries(enum AVColorPrimaries dst, enum AVColorPrimaries src); >>> int avscale_test_transfer(enum AVColorTransferCharacteristic dst, >>> enum AVColorTransferCharacteristic src); I think it'd be best if you define > 0 as supported for API like this, so it becomes extensible. Also maybe < 0 for failure, Like AVERROR(EINVAL), so invalid enum values are not simply treated as "not supported". >> >> If we support A for any input and and support B for any output then we >> should support converting from A to B >> >> I dont think this API is a good idea. It allows supporting random subsets >> which would cause confusion and wierd bugs by code using it. >> (for example removial of an intermediate filter could lead to failure) > > Good point, will change. The prototypal use case for this API is setting > up format lists inside vf_scale, which need to be set up independently > anyway. > > I was planning on adding another _test_frames() function that takes two > AVFrames and returns in a tri-state manner whether conversion is > supported, unsupported, or a no-op. If an exception to the input/output > independence does ever arise, we can test for it in this function. > >> >> [...] >> >> thx >> >> -- >> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB >> >> Elect your leaders based on what they did after the last election, not >> based on what they say before an election. >> >> _______________________________________________ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel >> >> To unsubscribe, visit link above, or email >> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2024-06-29 14:10 UTC|newest] Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top 2024-06-22 13:13 Niklas Haas 2024-06-22 14:23 ` Andrew Sayers 2024-06-22 15:10 ` Niklas Haas 2024-06-22 19:52 ` Michael Niedermayer 2024-06-22 22:24 ` Niklas Haas 2024-06-23 17:27 ` Michael Niedermayer 2024-06-22 22:19 ` Vittorio Giovara 2024-06-22 22:39 ` Niklas Haas 2024-06-23 17:46 ` Michael Niedermayer 2024-06-23 19:00 ` Paul B Mahol 2024-06-23 17:57 ` James Almer 2024-06-23 18:40 ` Andrew Sayers 2024-06-24 14:33 ` Niklas Haas 2024-06-24 14:44 ` Vittorio Giovara 2024-06-25 15:31 ` Niklas Haas 2024-07-01 21:10 ` Stefano Sabatini 2024-06-29 7:41 ` Zhao Zhili 2024-06-29 10:58 ` Niklas Haas 2024-06-29 11:47 ` Niklas Haas 2024-06-29 12:35 ` Michael Niedermayer 2024-06-29 14:05 ` Niklas Haas 2024-06-29 14:11 ` James Almer [this message] 2024-06-30 6:25 ` Vittorio Giovara 2024-07-02 13:27 ` Niklas Haas 2024-07-03 13:25 ` Niklas Haas 2024-07-05 18:31 ` Niklas Haas 2024-07-05 21:34 ` Michael Niedermayer 2024-07-06 0:11 ` Hendrik Leppkes 2024-07-06 12:32 ` Niklas Haas 2024-07-06 16:42 ` Michael Niedermayer 2024-07-06 17:29 ` Hendrik Leppkes 2024-07-08 11:58 ` Ronald S. Bultje 2024-07-08 12:33 ` Andrew Sayers 2024-07-08 13:25 ` Ronald S. Bultje 2024-07-06 11:36 ` Andrew Sayers 2024-07-06 12:27 ` Niklas Haas
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=c7d79fc6-2d54-47d6-9316-a9c9f3ad3ee0@gmail.com \ --to=jamrial@gmail.com \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git