From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 0A6724A32E for ; Sat, 29 Jun 2024 12:35:43 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0AF0568D6C3; Sat, 29 Jun 2024 15:35:41 +0300 (EEST) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 9AF8968CDA6 for ; Sat, 29 Jun 2024 15:35:34 +0300 (EEST) Received: by mail.gandi.net (Postfix) with ESMTPSA id BAF031C0003 for ; Sat, 29 Jun 2024 12:35:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=niedermayer.cc; s=gm1; t=1719664533; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=95RoAFyYVHdI2gFOFC3gDFhK7h0QR9XOU2rBio4HO3Y=; b=oFs3lYsEFi3UndQxasbNwxC3IoK/EfOuUiQDE6/FvFfxWfvB21xzJKaL9d2tZQK1xLLvGQ YtjM1WYJVlerAUUgZozoyumlPvR3kD9MoHc33oSU8Y7LVIqcaiWoPP4Yxdsx/xygQAmRyO ICZaDSIz6ProSX6Mx//++HuA3eRuN3eWMIqEpZ0Huh0QT6qQtjQivebY3YNCRqNMBF5WSU jMj1HifyPWnruRzsawE0c/nVgz3+dBLl+ZA9ke6y4KFUTr7yquM4jPx967FYmDWmvYei4S f16NKJAFABOKZnH/fI0jWkUjl7x4CiZMw0IeF5CMTr96yUP++u+oWqOmZaVJ8Q== Date: Sat, 29 Jun 2024 14:35:32 +0200 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20240629123532.GZ4991@pb2> References: <20240622151334.GD14140@haasn.xyz> <20240629134743.GD4857@haasn.xyz> MIME-Version: 1.0 In-Reply-To: <20240629134743.GD4857@haasn.xyz> X-GND-Sasl: michael@niedermayer.cc Subject: Re: [FFmpeg-devel] [RFC]] swscale modernization proposal X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: multipart/mixed; boundary="===============1663567977038037662==" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --===============1663567977038037662== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="j6Ejlva3tMw3xXYi" Content-Disposition: inline --j6Ejlva3tMw3xXYi Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jun 29, 2024 at 01:47:43PM +0200, Niklas Haas wrote: > On Sat, 22 Jun 2024 15:13:34 +0200 Niklas Haas wrote: > > Hey, > >=20 > > As some of you know, I got contracted (by STF 2024) to work on improving > > swscale, over the course of the next couple of months. I want to share = my > > current plans and gather feedback + measure sentiment. > >=20 > > ## Problem statement > >=20 > > The two issues I'd like to focus on for now are: > >=20 > > 1. Lack of support for a lot of modern formats and conversions (HDR, IC= tCp, > > IPTc2, BT.2020-CL, XYZ, YCgCo, Dolby Vision, ...) > > 2. Complicated context management, with cascaded contexts, threading, s= tateful > > configuration, multi-step init procedures, etc; and related bugs > >=20 > > In order to make these feasible, some amount of internal re-organizatio= n of > > duties inside swscale is prudent. > >=20 > > ## Proposed approach > >=20 > > The first step is to create a new API, which will (tentatively) live in > > . This API will initially start off as a near-cop= y of the > > current swscale public API, but with the major difference that I want i= t to be > > state-free and only access metadata in terms of AVFrame properties. So = there > > will be no independent configuration of the input chroma location etc. = like > > there is currently, and no need to re-configure or re-init the context = when > > feeding it frames with different properties. The goal is for users to b= e able > > to just feed it AVFrame pairs and have it internally cache expensive > > pre-processing steps as needed. Finally, avscale_* should ultimately al= so > > support hardware frames directly, in which case it will dispatch to some > > equivalent of scale_vulkan/vaapi/cuda or possibly even libplacebo. (But= I will > > defer this to a future milestone) >=20 > So, I've spent the past days implementing this API and hooking it up to > swscale internally. (For testing, I am also replacing `vf_scale` by the > equivalent AVScale-based implementation to see how the new API impacts > existing users). It mostly works so far, with some left-over translation > issues that I have to address before it can be sent upstream. >=20 > ------ >=20 > One of the things I was thinking about was how to configure > scalers/dither modes, which sws currently, somewhat clunkily, controls > with flags. IMO, flags are not the right design here - if anything, it > should be a separate enum/int, and controllable separately for chroma > resampling (4:4:4 <-> 4:2:0) and main scaling (e.g. 50x50 <-> 80x80). >=20 > That said, I think that for most end users, having such fine-grained > options is not really providing any end value - unless you're already > knee-deep in signal theory, the actual differences between, say, > "natural bicubic spline" and "Lanczos" are obtuse at best and alien at > worst. >=20 > My idea was to provide a single `int quality`, which the user can set to > tune the speed <-> quality trade-off on an arbitrary numeric scale from > 0 to 10, with 0 being the fastest (alias everything, nearest neighbour, > drop half chroma samples, etc.), the default being something in the > vicinity of 3-5, and 10 being the maximum quality (full linear > downscaling, anti-aliasing, error diffusion, etc.). I think 10 levels is not fine grained enough, when there are more then 10 features to switch on/off we would have to switch more than 1 at a time. also the scale has an issue, that becomes obvious when you consider the extreems, like memset(0) at level 0, not converting chroma at level 1 and hiring a human artist to paint a matching upscaled image at 10 using a neural net at 9 the quality factor would probably have thus at least 3 ranges 1. the as fast as possible with noticeable quality issues 2. the normal range 3. the as best as possible, disregarding the computation needed some encoder (like x264) use words like UltraFast and Placebo for the ends of this curve It also would be possible to use a more formal definition of how much one wants to trade quality per time spend but that then makes it harder to decide which feature to actually turn on when one requests a ratio between PSNR and seconds >=20 > The upside of this approach is that it would be vastly simpler for most > end users. It would also track newly added functionality automatically; > e.g. if we get a higher-quality tone mapping mode, it can be > retroactively added to the higher quality presets. The biggest downside > I can think of is that doing this would arguably violate the semantics > of a "bitexact" flag, since it would break results relative to > a previous version of libswscale - unless we maybe also force a specific > quality level in bitexact mode? >=20 > Open questions: >=20 > 1. Is this a good idea, or do the downsides outweigh the benefits? > 2. Is an "advanced configuration" API still needed, in addition to the > quality presets? For regression testing and debuging it is very usefull to be able to turn features on one at a time. A failure could then be quickly isolated to a feature. [...] > /** > * Statically test if a conversion is supported. Values of (respectively) > * NONE/UNSPECIFIED are ignored. > * > * Returns 1 if the conversion is supported, or 0 otherwise. > */ > int avscale_test_format(enum AVPixelFormat dst, enum AVPixelFormat src); > int avscale_test_colorspace(enum AVColorSpace dst, enum AVColorSpace src); > int avscale_test_primaries(enum AVColorPrimaries dst, enum AVColorPrimari= es src); > int avscale_test_transfer(enum AVColorTransferCharacteristic dst, > enum AVColorTransferCharacteristic src); If we support A for any input and and support B for any output then we should support converting from A to B I dont think this API is a good idea. It allows supporting random subsets which would cause confusion and wierd bugs by code using it. (for example removial of an intermediate filter could lead to failure) [...] thx --=20 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Elect your leaders based on what they did after the last election, not based on what they say before an election. --j6Ejlva3tMw3xXYi Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EABEKAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCZn//iwAKCRBhHseHBAsP q6fEAJ9l9MyIcm/qhv7MCjetSac816MGaQCaA4xRjIRQXkHdJGOp8+b6xYeiDVM= =2Q5e -----END PGP SIGNATURE----- --j6Ejlva3tMw3xXYi-- --===============1663567977038037662== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --===============1663567977038037662==--