On Sat, 22 Jun 2024 15:13:34 +0200 Niklas Haas <ffmpeg@haasn.xyz> wrote:
> Hey,
> 
> As some of you know, I got contracted (by STF 2024) to work on improving
> swscale, over the course of the next couple of months. I want to share my
> current plans and gather feedback + measure sentiment.
> 
> ## Problem statement
> 
> The two issues I'd like to focus on for now are:
> 
> 1. Lack of support for a lot of modern formats and conversions (HDR, ICtCp,
>    IPTc2, BT.2020-CL, XYZ, YCgCo, Dolby Vision, ...)
> 2. Complicated context management, with cascaded contexts, threading, stateful
>    configuration, multi-step init procedures, etc; and related bugs
> 
> In order to make these feasible, some amount of internal re-organization of
> duties inside swscale is prudent.
> 
> ## Proposed approach
> 
> The first step is to create a new API, which will (tentatively) live in
> <libswscale/avscale.h>. This API will initially start off as a near-copy of the
> current swscale public API, but with the major difference that I want it to be
> state-free and only access metadata in terms of AVFrame properties. So there
> will be no independent configuration of the input chroma location etc. like
> there is currently, and no need to re-configure or re-init the context when
> feeding it frames with different properties. The goal is for users to be able
> to just feed it AVFrame pairs and have it internally cache expensive
> pre-processing steps as needed. Finally, avscale_* should ultimately also
> support hardware frames directly, in which case it will dispatch to some
> equivalent of scale_vulkan/vaapi/cuda or possibly even libplacebo. (But I will
> defer this to a future milestone)

So, I've spent the past days implementing this API and hooking it up to
swscale internally. (For testing, I am also replacing `vf_scale` by the
equivalent AVScale-based implementation to see how the new API impacts
existing users). It mostly works so far, with some left-over translation
issues that I have to address before it can be sent upstream.

------

One of the things I was thinking about was how to configure
scalers/dither modes, which sws currently, somewhat clunkily, controls
with flags. IMO, flags are not the right design here - if anything, it
should be a separate enum/int, and controllable separately for chroma
resampling (4:4:4 <-> 4:2:0) and main scaling (e.g. 50x50 <-> 80x80).

That said, I think that for most end users, having such fine-grained
options is not really providing any end value - unless you're already
knee-deep in signal theory, the actual differences between, say,
"natural bicubic spline" and "Lanczos" are obtuse at best and alien at
worst.

My idea was to provide a single `int quality`, which the user can set to
tune the speed <-> quality trade-off on an arbitrary numeric scale from
0 to 10, with 0 being the fastest (alias everything, nearest neighbour,
drop half chroma samples, etc.), the default being something in the
vicinity of 3-5, and 10 being the maximum quality (full linear
downscaling, anti-aliasing, error diffusion, etc.).

The upside of this approach is that it would be vastly simpler for most
end users. It would also track newly added functionality automatically;
e.g. if we get a higher-quality tone mapping mode, it can be
retroactively added to the higher quality presets. The biggest downside
I can think of is that doing this would arguably violate the semantics
of a "bitexact" flag, since it would break results relative to
a previous version of libswscale - unless we maybe also force a specific
quality level in bitexact mode?

Open questions:

1. Is this a good idea, or do the downsides outweigh the benefits?
2. Is an "advanced configuration" API still needed, in addition to the
   quality presets?

------

I have attached my current working draft of the public half of
<avscale.h>, for reference. You can also find my implementation draft at
the time of writing here:

https://github.com/haasn/FFmpeg/blob/avscale/libswscale/avscale.h