From: Niklas Haas <ffmpeg@haasn.xyz> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Subject: Re: [FFmpeg-devel] [RFC]] swscale modernization proposal Date: Sat, 29 Jun 2024 16:05:57 +0200 Message-ID: <20240629160557.GB37436@haasn.xyz> (raw) In-Reply-To: <20240629123532.GZ4991@pb2> On Sat, 29 Jun 2024 14:35:32 +0200 Michael Niedermayer <michael@niedermayer.cc> wrote: > On Sat, Jun 29, 2024 at 01:47:43PM +0200, Niklas Haas wrote: > > On Sat, 22 Jun 2024 15:13:34 +0200 Niklas Haas <ffmpeg@haasn.xyz> wrote: > > > Hey, > > > > > > As some of you know, I got contracted (by STF 2024) to work on improving > > > swscale, over the course of the next couple of months. I want to share my > > > current plans and gather feedback + measure sentiment. > > > > > > ## Problem statement > > > > > > The two issues I'd like to focus on for now are: > > > > > > 1. Lack of support for a lot of modern formats and conversions (HDR, ICtCp, > > > IPTc2, BT.2020-CL, XYZ, YCgCo, Dolby Vision, ...) > > > 2. Complicated context management, with cascaded contexts, threading, stateful > > > configuration, multi-step init procedures, etc; and related bugs > > > > > > In order to make these feasible, some amount of internal re-organization of > > > duties inside swscale is prudent. > > > > > > ## Proposed approach > > > > > > The first step is to create a new API, which will (tentatively) live in > > > <libswscale/avscale.h>. This API will initially start off as a near-copy of the > > > current swscale public API, but with the major difference that I want it to be > > > state-free and only access metadata in terms of AVFrame properties. So there > > > will be no independent configuration of the input chroma location etc. like > > > there is currently, and no need to re-configure or re-init the context when > > > feeding it frames with different properties. The goal is for users to be able > > > to just feed it AVFrame pairs and have it internally cache expensive > > > pre-processing steps as needed. Finally, avscale_* should ultimately also > > > support hardware frames directly, in which case it will dispatch to some > > > equivalent of scale_vulkan/vaapi/cuda or possibly even libplacebo. (But I will > > > defer this to a future milestone) > > > > So, I've spent the past days implementing this API and hooking it up to > > swscale internally. (For testing, I am also replacing `vf_scale` by the > > equivalent AVScale-based implementation to see how the new API impacts > > existing users). It mostly works so far, with some left-over translation > > issues that I have to address before it can be sent upstream. > > > > ------ > > > > One of the things I was thinking about was how to configure > > scalers/dither modes, which sws currently, somewhat clunkily, controls > > with flags. IMO, flags are not the right design here - if anything, it > > should be a separate enum/int, and controllable separately for chroma > > resampling (4:4:4 <-> 4:2:0) and main scaling (e.g. 50x50 <-> 80x80). > > > > That said, I think that for most end users, having such fine-grained > > options is not really providing any end value - unless you're already > > knee-deep in signal theory, the actual differences between, say, > > "natural bicubic spline" and "Lanczos" are obtuse at best and alien at > > worst. > > > > My idea was to provide a single `int quality`, which the user can set to > > tune the speed <-> quality trade-off on an arbitrary numeric scale from > > 0 to 10, with 0 being the fastest (alias everything, nearest neighbour, > > drop half chroma samples, etc.), the default being something in the > > vicinity of 3-5, and 10 being the maximum quality (full linear > > downscaling, anti-aliasing, error diffusion, etc.). > > I think 10 levels is not fine grained enough, > when there are more then 10 features to switch on/off we would have > to switch more than 1 at a time. > > also the scale has an issue, that becomes obvious when you consider the > extreems, like memset(0) at level 0, not converting chroma at level 1 > and hiring a human artist to paint a matching upscaled image at 10 > using a neural net at 9 > > the quality factor would probably have thus at least 3 ranges > 1. the as fast as possible with noticeable quality issues > 2. the normal range > 3. the as best as possible, disregarding the computation needed > > some encoder (like x264) use words like UltraFast and Placebo for the ends > of this curve I like the idea of using explicit names instead of numbers. It translates well onto the human-facing API anyway. I don't think 10 levels is too few if we also pair it with a granular API for controlling exactly which scalers etc. you want. In particular, if we want human-compatible names for them (ranging from "ultrafast" to "placebo" as discussed), you would be hard-pressed to find many more sensible names than 10. Especially if we treat these just as presets and not the only way to configure them. > > It also would be possible to use a more formal definition of how much one > wants to trade quality per time spend but that then makes it harder to > decide which feature to actually turn on when one requests a ratio between > PSNR and seconds > > > > > > The upside of this approach is that it would be vastly simpler for most > > end users. It would also track newly added functionality automatically; > > e.g. if we get a higher-quality tone mapping mode, it can be > > retroactively added to the higher quality presets. The biggest downside > > I can think of is that doing this would arguably violate the semantics > > of a "bitexact" flag, since it would break results relative to > > a previous version of libswscale - unless we maybe also force a specific > > quality level in bitexact mode? > > > > Open questions: > > > > 1. Is this a good idea, or do the downsides outweigh the benefits? > > > > > 2. Is an "advanced configuration" API still needed, in addition to the > > quality presets? > > For regression testing and debuging it is very usefull to be able to turn > features on one at a time. A failure could then be quickly isolated to > a feature. Very strong argument in favor of granular control. I'll find a way to support it while still having "presets". > > > > [...] > > > /** > > * Statically test if a conversion is supported. Values of (respectively) > > * NONE/UNSPECIFIED are ignored. > > * > > * Returns 1 if the conversion is supported, or 0 otherwise. > > */ > > int avscale_test_format(enum AVPixelFormat dst, enum AVPixelFormat src); > > int avscale_test_colorspace(enum AVColorSpace dst, enum AVColorSpace src); > > int avscale_test_primaries(enum AVColorPrimaries dst, enum AVColorPrimaries src); > > int avscale_test_transfer(enum AVColorTransferCharacteristic dst, > > enum AVColorTransferCharacteristic src); > > If we support A for any input and and support B for any output then we > should support converting from A to B > > I dont think this API is a good idea. It allows supporting random subsets > which would cause confusion and wierd bugs by code using it. > (for example removial of an intermediate filter could lead to failure) Good point, will change. The prototypal use case for this API is setting up format lists inside vf_scale, which need to be set up independently anyway. I was planning on adding another _test_frames() function that takes two AVFrames and returns in a tri-state manner whether conversion is supported, unsupported, or a no-op. If an exception to the input/output independence does ever arise, we can test for it in this function. > > [...] > > thx > > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > Elect your leaders based on what they did after the last election, not > based on what they say before an election. > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2024-06-29 14:06 UTC|newest] Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top 2024-06-22 13:13 Niklas Haas 2024-06-22 14:23 ` Andrew Sayers 2024-06-22 15:10 ` Niklas Haas 2024-06-22 19:52 ` Michael Niedermayer 2024-06-22 22:24 ` Niklas Haas 2024-06-23 17:27 ` Michael Niedermayer 2024-06-22 22:19 ` Vittorio Giovara 2024-06-22 22:39 ` Niklas Haas 2024-06-23 17:46 ` Michael Niedermayer 2024-06-23 19:00 ` Paul B Mahol 2024-06-23 17:57 ` James Almer 2024-06-23 18:40 ` Andrew Sayers 2024-06-24 14:33 ` Niklas Haas 2024-06-24 14:44 ` Vittorio Giovara 2024-06-25 15:31 ` Niklas Haas 2024-07-01 21:10 ` Stefano Sabatini 2024-06-29 7:41 ` Zhao Zhili 2024-06-29 10:58 ` Niklas Haas 2024-06-29 11:47 ` Niklas Haas 2024-06-29 12:35 ` Michael Niedermayer 2024-06-29 14:05 ` Niklas Haas [this message] 2024-06-29 14:11 ` James Almer 2024-06-30 6:25 ` Vittorio Giovara 2024-07-02 13:27 ` Niklas Haas 2024-07-03 13:25 ` Niklas Haas 2024-07-05 18:31 ` Niklas Haas 2024-07-05 21:34 ` Michael Niedermayer 2024-07-06 0:11 ` Hendrik Leppkes 2024-07-06 12:32 ` Niklas Haas 2024-07-06 16:42 ` Michael Niedermayer 2024-07-06 17:29 ` Hendrik Leppkes 2024-07-08 11:58 ` Ronald S. Bultje 2024-07-08 12:33 ` Andrew Sayers 2024-07-08 13:25 ` Ronald S. Bultje 2024-07-06 11:36 ` Andrew Sayers 2024-07-06 12:27 ` Niklas Haas
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20240629160557.GB37436@haasn.xyz \ --to=ffmpeg@haasn.xyz \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git