Re: [FFmpeg-devel] [RFC]] swscale modernization proposal

From: James Almer <jamrial@gmail.com>
To: ffmpeg-devel@ffmpeg.org
Subject: Re: [FFmpeg-devel] [RFC]] swscale modernization proposal
Date: Sat, 29 Jun 2024 11:11:09 -0300
Message-ID: <c7d79fc6-2d54-47d6-9316-a9c9f3ad3ee0@gmail.com> (raw)
In-Reply-To: <20240629160557.GB37436@haasn.xyz>

On 6/29/2024 11:05 AM, Niklas Haas wrote:
> On Sat, 29 Jun 2024 14:35:32 +0200 Michael Niedermayer <michael@niedermayer.cc> wrote:
>> On Sat, Jun 29, 2024 at 01:47:43PM +0200, Niklas Haas wrote:
>>> On Sat, 22 Jun 2024 15:13:34 +0200 Niklas Haas <ffmpeg@haasn.xyz> wrote:
>>>> Hey,
>>>>
>>>> As some of you know, I got contracted (by STF 2024) to work on improving
>>>> swscale, over the course of the next couple of months. I want to share my
>>>> current plans and gather feedback + measure sentiment.
>>>>
>>>> ## Problem statement
>>>>
>>>> The two issues I'd like to focus on for now are:
>>>>
>>>> 1. Lack of support for a lot of modern formats and conversions (HDR, ICtCp,
>>>>     IPTc2, BT.2020-CL, XYZ, YCgCo, Dolby Vision, ...)
>>>> 2. Complicated context management, with cascaded contexts, threading, stateful
>>>>     configuration, multi-step init procedures, etc; and related bugs
>>>>
>>>> In order to make these feasible, some amount of internal re-organization of
>>>> duties inside swscale is prudent.
>>>>
>>>> ## Proposed approach
>>>>
>>>> The first step is to create a new API, which will (tentatively) live in
>>>> <libswscale/avscale.h>. This API will initially start off as a near-copy of the
>>>> current swscale public API, but with the major difference that I want it to be
>>>> state-free and only access metadata in terms of AVFrame properties. So there
>>>> will be no independent configuration of the input chroma location etc. like
>>>> there is currently, and no need to re-configure or re-init the context when
>>>> feeding it frames with different properties. The goal is for users to be able
>>>> to just feed it AVFrame pairs and have it internally cache expensive
>>>> pre-processing steps as needed. Finally, avscale_* should ultimately also
>>>> support hardware frames directly, in which case it will dispatch to some
>>>> equivalent of scale_vulkan/vaapi/cuda or possibly even libplacebo. (But I will
>>>> defer this to a future milestone)
>>>
>>> So, I've spent the past days implementing this API and hooking it up to
>>> swscale internally. (For testing, I am also replacing `vf_scale` by the
>>> equivalent AVScale-based implementation to see how the new API impacts
>>> existing users). It mostly works so far, with some left-over translation
>>> issues that I have to address before it can be sent upstream.
>>>
>>> ------
>>>
>>> One of the things I was thinking about was how to configure
>>> scalers/dither modes, which sws currently, somewhat clunkily, controls
>>> with flags. IMO, flags are not the right design here - if anything, it
>>> should be a separate enum/int, and controllable separately for chroma
>>> resampling (4:4:4 <-> 4:2:0) and main scaling (e.g. 50x50 <-> 80x80).
>>>
>>> That said, I think that for most end users, having such fine-grained
>>> options is not really providing any end value - unless you're already
>>> knee-deep in signal theory, the actual differences between, say,
>>> "natural bicubic spline" and "Lanczos" are obtuse at best and alien at
>>> worst.
>>>
>>> My idea was to provide a single `int quality`, which the user can set to
>>> tune the speed <-> quality trade-off on an arbitrary numeric scale from
>>> 0 to 10, with 0 being the fastest (alias everything, nearest neighbour,
>>> drop half chroma samples, etc.), the default being something in the
>>> vicinity of 3-5, and 10 being the maximum quality (full linear
>>> downscaling, anti-aliasing, error diffusion, etc.).
>>
>> I think 10 levels is not fine grained enough,
>> when there are more then 10 features to switch on/off we would have
>> to switch more than 1 at a time.
>>
>> also the scale has an issue, that becomes obvious when you consider the
>> extreems, like memset(0) at level 0, not converting chroma at level 1
>> and hiring a human artist to paint a matching upscaled image at 10
>> using a neural net at 9
>>
>> the quality factor would probably have thus at least 3 ranges
>> 1. the as fast as possible with noticeable quality issues
>> 2. the normal range
>> 3. the as best as possible, disregarding the computation needed
>>
>> some encoder (like x264) use words like UltraFast and Placebo for the ends
>> of this curve
> 
> I like the idea of using explicit names instead of numbers. It
> translates well onto the human-facing API anyway.
> 
> I don't think 10 levels is too few if we also pair it with a granular
> API for controlling exactly which scalers etc. you want.
> 
> In particular, if we want human-compatible names for them (ranging from
> "ultrafast" to "placebo" as discussed), you would be hard-pressed to
> find many more sensible names than 10.
> 
> Especially if we treat these just as presets and not the only way to
> configure them.
> 
>>
>> It also would be possible to use a more formal definition of how much one
>> wants to trade quality per time spend but that then makes it harder to
>> decide which feature to actually turn on when one requests a ratio between
>> PSNR and seconds
>>
>>
>>>
>>> The upside of this approach is that it would be vastly simpler for most
>>> end users. It would also track newly added functionality automatically;
>>> e.g. if we get a higher-quality tone mapping mode, it can be
>>> retroactively added to the higher quality presets. The biggest downside
>>> I can think of is that doing this would arguably violate the semantics
>>> of a "bitexact" flag, since it would break results relative to
>>> a previous version of libswscale - unless we maybe also force a specific
>>> quality level in bitexact mode?
>>>
>>> Open questions:
>>>
>>> 1. Is this a good idea, or do the downsides outweigh the benefits?
>>
>>
>>
>>> 2. Is an "advanced configuration" API still needed, in addition to the
>>>     quality presets?
>>
>> For regression testing and debuging it is very usefull to be able to turn
>> features on one at a time. A failure could then be quickly isolated to
>> a feature.
> 
> Very strong argument in favor of granular control. I'll find a way to
> support it while still having "presets".
> 
>>
>>
>>
>> [...]
>>
>>> /**
>>>   * Statically test if a conversion is supported. Values of (respectively)
>>>   * NONE/UNSPECIFIED are ignored.
>>>   *
>>>   * Returns 1 if the conversion is supported, or 0 otherwise.
>>>   */
>>> int avscale_test_format(enum AVPixelFormat dst, enum AVPixelFormat src);
>>> int avscale_test_colorspace(enum AVColorSpace dst, enum AVColorSpace src);
>>> int avscale_test_primaries(enum AVColorPrimaries dst, enum AVColorPrimaries src);
>>> int avscale_test_transfer(enum AVColorTransferCharacteristic dst,
>>>                            enum AVColorTransferCharacteristic src);

I think it'd be best if you define > 0 as supported for API like this, 
so it becomes extensible.
Also maybe < 0 for failure, Like AVERROR(EINVAL), so invalid enum values 
are not simply treated as "not supported".

>>
>> If we support A for any input and and support B for any output then we
>> should support converting from A to B
>>
>> I dont think this API is a good idea. It allows supporting random subsets
>> which would cause confusion and wierd bugs by code using it.
>> (for example removial of an intermediate filter could lead to failure)
> 
> Good point, will change. The prototypal use case for this API is setting
> up format lists inside vf_scale, which need to be set up independently
> anyway.
> 
> I was planning on adding another _test_frames() function that takes two
> AVFrames and returns in a tri-state manner whether conversion is
> supported, unsupported, or a no-op. If an exception to the input/output
> independence does ever arise, we can test for it in this function.
> 
>>
>> [...]
>>
>> thx
>>
>> -- 
>> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>>
>> Elect your leaders based on what they did after the last election, not
>> based on what they say before an election.
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".