On Sat, 22 Jun 2024 15:13:34 +0200 Niklas Haas wrote: > Hey, > > As some of you know, I got contracted (by STF 2024) to work on improving > swscale, over the course of the next couple of months. I want to share my > current plans and gather feedback + measure sentiment. > > ## Problem statement > > The two issues I'd like to focus on for now are: > > 1. Lack of support for a lot of modern formats and conversions (HDR, ICtCp, > IPTc2, BT.2020-CL, XYZ, YCgCo, Dolby Vision, ...) > 2. Complicated context management, with cascaded contexts, threading, stateful > configuration, multi-step init procedures, etc; and related bugs > > In order to make these feasible, some amount of internal re-organization of > duties inside swscale is prudent. > > ## Proposed approach > > The first step is to create a new API, which will (tentatively) live in > . This API will initially start off as a near-copy of the > current swscale public API, but with the major difference that I want it to be > state-free and only access metadata in terms of AVFrame properties. So there > will be no independent configuration of the input chroma location etc. like > there is currently, and no need to re-configure or re-init the context when > feeding it frames with different properties. The goal is for users to be able > to just feed it AVFrame pairs and have it internally cache expensive > pre-processing steps as needed. Finally, avscale_* should ultimately also > support hardware frames directly, in which case it will dispatch to some > equivalent of scale_vulkan/vaapi/cuda or possibly even libplacebo. (But I will > defer this to a future milestone) So, I've spent the past days implementing this API and hooking it up to swscale internally. (For testing, I am also replacing `vf_scale` by the equivalent AVScale-based implementation to see how the new API impacts existing users). It mostly works so far, with some left-over translation issues that I have to address before it can be sent upstream. ------ One of the things I was thinking about was how to configure scalers/dither modes, which sws currently, somewhat clunkily, controls with flags. IMO, flags are not the right design here - if anything, it should be a separate enum/int, and controllable separately for chroma resampling (4:4:4 <-> 4:2:0) and main scaling (e.g. 50x50 <-> 80x80). That said, I think that for most end users, having such fine-grained options is not really providing any end value - unless you're already knee-deep in signal theory, the actual differences between, say, "natural bicubic spline" and "Lanczos" are obtuse at best and alien at worst. My idea was to provide a single `int quality`, which the user can set to tune the speed <-> quality trade-off on an arbitrary numeric scale from 0 to 10, with 0 being the fastest (alias everything, nearest neighbour, drop half chroma samples, etc.), the default being something in the vicinity of 3-5, and 10 being the maximum quality (full linear downscaling, anti-aliasing, error diffusion, etc.). The upside of this approach is that it would be vastly simpler for most end users. It would also track newly added functionality automatically; e.g. if we get a higher-quality tone mapping mode, it can be retroactively added to the higher quality presets. The biggest downside I can think of is that doing this would arguably violate the semantics of a "bitexact" flag, since it would break results relative to a previous version of libswscale - unless we maybe also force a specific quality level in bitexact mode? Open questions: 1. Is this a good idea, or do the downsides outweigh the benefits? 2. Is an "advanced configuration" API still needed, in addition to the quality presets? ------ I have attached my current working draft of the public half of , for reference. You can also find my implementation draft at the time of writing here: https://github.com/haasn/FFmpeg/blob/avscale/libswscale/avscale.h