From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 7AC844A341 for ; Sat, 29 Jun 2024 14:06:07 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6ADE768D6BB; Sat, 29 Jun 2024 17:06:04 +0300 (EEST) Received: from haasn.dev (haasn.dev [78.46.187.166]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 24DA268D3FA for ; Sat, 29 Jun 2024 17:05:58 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=haasn.xyz; s=mail; t=1719669957; bh=i6s1x84lVNDvFCviTxjfUCUC3eRPNYMcKwkhbD0rG4s=; h=Date:From:To:Subject:In-Reply-To:References:From; b=GuxLqTc1g4YdATaw7omyJzaKONT/HvFXMJRvQ8rM754uT9GIia/H7CcN4U/volYdL DNWGvGEnsmrZKQxdOsD8Lgf/NBXnUNdl3o0Ia8LnkJgwdnwSGsYQqmk+EvCd6K1DMc oIwY4TLB3LEWSxaAxc9s6K4orTXZfvW10YpzdyRg= Received: from haasn.dev (unknown [10.30.0.2]) by haasn.dev (Postfix) with ESMTP id CEA7841761 for ; Sat, 29 Jun 2024 16:05:57 +0200 (CEST) Date: Sat, 29 Jun 2024 16:05:57 +0200 Message-ID: <20240629160557.GB37436@haasn.xyz> From: Niklas Haas To: FFmpeg development discussions and patches In-Reply-To: <20240629123532.GZ4991@pb2> References: <20240622151334.GD14140@haasn.xyz> <20240629134743.GD4857@haasn.xyz> <20240629123532.GZ4991@pb2> MIME-Version: 1.0 Content-Disposition: inline Subject: Re: [FFmpeg-devel] [RFC]] swscale modernization proposal X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Sat, 29 Jun 2024 14:35:32 +0200 Michael Niedermayer wrote: > On Sat, Jun 29, 2024 at 01:47:43PM +0200, Niklas Haas wrote: > > On Sat, 22 Jun 2024 15:13:34 +0200 Niklas Haas wrote: > > > Hey, > > > > > > As some of you know, I got contracted (by STF 2024) to work on improving > > > swscale, over the course of the next couple of months. I want to share my > > > current plans and gather feedback + measure sentiment. > > > > > > ## Problem statement > > > > > > The two issues I'd like to focus on for now are: > > > > > > 1. Lack of support for a lot of modern formats and conversions (HDR, ICtCp, > > > IPTc2, BT.2020-CL, XYZ, YCgCo, Dolby Vision, ...) > > > 2. Complicated context management, with cascaded contexts, threading, stateful > > > configuration, multi-step init procedures, etc; and related bugs > > > > > > In order to make these feasible, some amount of internal re-organization of > > > duties inside swscale is prudent. > > > > > > ## Proposed approach > > > > > > The first step is to create a new API, which will (tentatively) live in > > > . This API will initially start off as a near-copy of the > > > current swscale public API, but with the major difference that I want it to be > > > state-free and only access metadata in terms of AVFrame properties. So there > > > will be no independent configuration of the input chroma location etc. like > > > there is currently, and no need to re-configure or re-init the context when > > > feeding it frames with different properties. The goal is for users to be able > > > to just feed it AVFrame pairs and have it internally cache expensive > > > pre-processing steps as needed. Finally, avscale_* should ultimately also > > > support hardware frames directly, in which case it will dispatch to some > > > equivalent of scale_vulkan/vaapi/cuda or possibly even libplacebo. (But I will > > > defer this to a future milestone) > > > > So, I've spent the past days implementing this API and hooking it up to > > swscale internally. (For testing, I am also replacing `vf_scale` by the > > equivalent AVScale-based implementation to see how the new API impacts > > existing users). It mostly works so far, with some left-over translation > > issues that I have to address before it can be sent upstream. > > > > ------ > > > > One of the things I was thinking about was how to configure > > scalers/dither modes, which sws currently, somewhat clunkily, controls > > with flags. IMO, flags are not the right design here - if anything, it > > should be a separate enum/int, and controllable separately for chroma > > resampling (4:4:4 <-> 4:2:0) and main scaling (e.g. 50x50 <-> 80x80). > > > > That said, I think that for most end users, having such fine-grained > > options is not really providing any end value - unless you're already > > knee-deep in signal theory, the actual differences between, say, > > "natural bicubic spline" and "Lanczos" are obtuse at best and alien at > > worst. > > > > My idea was to provide a single `int quality`, which the user can set to > > tune the speed <-> quality trade-off on an arbitrary numeric scale from > > 0 to 10, with 0 being the fastest (alias everything, nearest neighbour, > > drop half chroma samples, etc.), the default being something in the > > vicinity of 3-5, and 10 being the maximum quality (full linear > > downscaling, anti-aliasing, error diffusion, etc.). > > I think 10 levels is not fine grained enough, > when there are more then 10 features to switch on/off we would have > to switch more than 1 at a time. > > also the scale has an issue, that becomes obvious when you consider the > extreems, like memset(0) at level 0, not converting chroma at level 1 > and hiring a human artist to paint a matching upscaled image at 10 > using a neural net at 9 > > the quality factor would probably have thus at least 3 ranges > 1. the as fast as possible with noticeable quality issues > 2. the normal range > 3. the as best as possible, disregarding the computation needed > > some encoder (like x264) use words like UltraFast and Placebo for the ends > of this curve I like the idea of using explicit names instead of numbers. It translates well onto the human-facing API anyway. I don't think 10 levels is too few if we also pair it with a granular API for controlling exactly which scalers etc. you want. In particular, if we want human-compatible names for them (ranging from "ultrafast" to "placebo" as discussed), you would be hard-pressed to find many more sensible names than 10. Especially if we treat these just as presets and not the only way to configure them. > > It also would be possible to use a more formal definition of how much one > wants to trade quality per time spend but that then makes it harder to > decide which feature to actually turn on when one requests a ratio between > PSNR and seconds > > > > > > The upside of this approach is that it would be vastly simpler for most > > end users. It would also track newly added functionality automatically; > > e.g. if we get a higher-quality tone mapping mode, it can be > > retroactively added to the higher quality presets. The biggest downside > > I can think of is that doing this would arguably violate the semantics > > of a "bitexact" flag, since it would break results relative to > > a previous version of libswscale - unless we maybe also force a specific > > quality level in bitexact mode? > > > > Open questions: > > > > 1. Is this a good idea, or do the downsides outweigh the benefits? > > > > > 2. Is an "advanced configuration" API still needed, in addition to the > > quality presets? > > For regression testing and debuging it is very usefull to be able to turn > features on one at a time. A failure could then be quickly isolated to > a feature. Very strong argument in favor of granular control. I'll find a way to support it while still having "presets". > > > > [...] > > > /** > > * Statically test if a conversion is supported. Values of (respectively) > > * NONE/UNSPECIFIED are ignored. > > * > > * Returns 1 if the conversion is supported, or 0 otherwise. > > */ > > int avscale_test_format(enum AVPixelFormat dst, enum AVPixelFormat src); > > int avscale_test_colorspace(enum AVColorSpace dst, enum AVColorSpace src); > > int avscale_test_primaries(enum AVColorPrimaries dst, enum AVColorPrimaries src); > > int avscale_test_transfer(enum AVColorTransferCharacteristic dst, > > enum AVColorTransferCharacteristic src); > > If we support A for any input and and support B for any output then we > should support converting from A to B > > I dont think this API is a good idea. It allows supporting random subsets > which would cause confusion and wierd bugs by code using it. > (for example removial of an intermediate filter could lead to failure) Good point, will change. The prototypal use case for this API is setting up format lists inside vf_scale, which need to be set up independently anyway. I was planning on adding another _test_frames() function that takes two AVFrames and returns in a tri-state manner whether conversion is supported, unsupported, or a no-op. If an exception to the input/output independence does ever arise, we can test for it in this function. > > [...] > > thx > > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > Elect your leaders based on what they did after the last election, not > based on what they say before an election. > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".