From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTP id 179D64A31A
	for <ffmpegdev@gitmailbox.com>; Sat, 29 Jun 2024 14:10:57 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8D6A268D6C9;
	Sat, 29 Jun 2024 17:10:54 +0300 (EEST)
Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com
 [209.85.215.182])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 137E368D63B
 for <ffmpeg-devel@ffmpeg.org>; Sat, 29 Jun 2024 17:10:48 +0300 (EEST)
Received: by mail-pg1-f182.google.com with SMTP id
 41be03b00d2f7-7182a634815so979267a12.3
 for <ffmpeg-devel@ffmpeg.org>; Sat, 29 Jun 2024 07:10:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1719670245; x=1720275045; darn=ffmpeg.org;
 h=content-transfer-encoding:in-reply-to:from:content-language
 :references:to:subject:user-agent:mime-version:date:message-id:from
 :to:cc:subject:date:message-id:reply-to;
 bh=GIUJZo3OzoW8tDvWwaYewuS0mPnVQAe0cz0wXcKHfwE=;
 b=C3mINBNCOgJo96VRhAxZRnpn3ihZ1p5lSUKwKXoBsxfwTtAowDG+YXxzq5rNMYxj6R
 Kzub79cWbGQDi8Q9tDtcON0FdP5/PUg8flwfMsLDdkKOY8bE6jYdzT4tKKN9qHOUeS0D
 k0PPl9URGyX+ZZm8LPVlhkEHnRMIB7minDE4otKYj8uMYZw77redAz5LRDx7RVDTrtq1
 cQ910wYXzz9s4/vgcNGPE63L8N77jeUlMdbklnDwx+e2RlaThzTX3sUmN0bHHSLpihjl
 cABScHXn6JVdeYeny8OOoXNLpuoPDCnApcp9HmIPM9xkvMrlZwHmLiPWoFnxIM7kB2Ya
 9lHw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1719670245; x=1720275045;
 h=content-transfer-encoding:in-reply-to:from:content-language
 :references:to:subject:user-agent:mime-version:date:message-id
 :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
 bh=GIUJZo3OzoW8tDvWwaYewuS0mPnVQAe0cz0wXcKHfwE=;
 b=Yoz9l21ZrHbNF0XXjL+4B7G4Myvc+gCKeB0OsiW3RdnzZEtSHu89iKS/t0cMM0AXqV
 L2hv+6WUEAF0vbnLD5Unq6Q2jeVyf7U/VDGCLRfDivo6Q0qxpKVl3EHcb6nP/3rJyphY
 ZcR8XkcYwN+Ay5GYMb68zLcP/hpMwASg8im+FsoFHLdMQkM6vL1GlxBr0BS6uhYP7O9Z
 vncWlK2+QW73bYHWAwQgeasHUiohUNDJxzgDkRlvxcIVXdkxOe72aGAK2Wx8CnWym2tR
 ZsS/Yjq0eO5JEXNlNTRk8wCj1dr1NAvRhU7/Ah0xPQUYECrthSGhtRfft8c1M0MrjOcK
 r/9Q==
X-Gm-Message-State: AOJu0YyaInapkdguFaiMUFQzXhPTGdKthKzU2s90kxukYSjTkuFjmmip
 dXRaB2+CZEqa7jY97ObxpLfyCXBLLCF7iQXatXZU1SmwMHgByVKqqz4N7g==
X-Google-Smtp-Source: AGHT+IE9m40c84HTAos1kZCNKqVTUtOLiXwdL1riUIqvawN9Sx8KkIrATzmXUPZpejsgDRgfgl+DMA==
X-Received: by 2002:a05:6a20:2450:b0:1bd:25ac:a08a with SMTP id
 adf61e73a8af0-1bef60e3ef4mr1238770637.12.1719670244719; 
 Sat, 29 Jun 2024 07:10:44 -0700 (PDT)
Received: from [192.168.0.16] ([190.194.167.233])
 by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-1fac10e2df7sm32776075ad.82.2024.06.29.07.10.43
 for <ffmpeg-devel@ffmpeg.org>
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Sat, 29 Jun 2024 07:10:44 -0700 (PDT)
Message-ID: <c7d79fc6-2d54-47d6-9316-a9c9f3ad3ee0@gmail.com>
Date: Sat, 29 Jun 2024 11:11:09 -0300
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
To: ffmpeg-devel@ffmpeg.org
References: <20240622151334.GD14140@haasn.xyz>
 <20240629134743.GD4857@haasn.xyz> <20240629123532.GZ4991@pb2>
 <20240629160557.GB37436@haasn.xyz>
Content-Language: en-US
From: James Almer <jamrial@gmail.com>
In-Reply-To: <20240629160557.GB37436@haasn.xyz>
Subject: Re: [FFmpeg-devel] [RFC]] swscale modernization proposal
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/c7d79fc6-2d54-47d6-9316-a9c9f3ad3ee0@gmail.com/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

On 6/29/2024 11:05 AM, Niklas Haas wrote:
> On Sat, 29 Jun 2024 14:35:32 +0200 Michael Niedermayer <michael@niedermayer.cc> wrote:
>> On Sat, Jun 29, 2024 at 01:47:43PM +0200, Niklas Haas wrote:
>>> On Sat, 22 Jun 2024 15:13:34 +0200 Niklas Haas <ffmpeg@haasn.xyz> wrote:
>>>> Hey,
>>>>
>>>> As some of you know, I got contracted (by STF 2024) to work on improving
>>>> swscale, over the course of the next couple of months. I want to share my
>>>> current plans and gather feedback + measure sentiment.
>>>>
>>>> ## Problem statement
>>>>
>>>> The two issues I'd like to focus on for now are:
>>>>
>>>> 1. Lack of support for a lot of modern formats and conversions (HDR, ICtCp,
>>>>     IPTc2, BT.2020-CL, XYZ, YCgCo, Dolby Vision, ...)
>>>> 2. Complicated context management, with cascaded contexts, threading, stateful
>>>>     configuration, multi-step init procedures, etc; and related bugs
>>>>
>>>> In order to make these feasible, some amount of internal re-organization of
>>>> duties inside swscale is prudent.
>>>>
>>>> ## Proposed approach
>>>>
>>>> The first step is to create a new API, which will (tentatively) live in
>>>> <libswscale/avscale.h>. This API will initially start off as a near-copy of the
>>>> current swscale public API, but with the major difference that I want it to be
>>>> state-free and only access metadata in terms of AVFrame properties. So there
>>>> will be no independent configuration of the input chroma location etc. like
>>>> there is currently, and no need to re-configure or re-init the context when
>>>> feeding it frames with different properties. The goal is for users to be able
>>>> to just feed it AVFrame pairs and have it internally cache expensive
>>>> pre-processing steps as needed. Finally, avscale_* should ultimately also
>>>> support hardware frames directly, in which case it will dispatch to some
>>>> equivalent of scale_vulkan/vaapi/cuda or possibly even libplacebo. (But I will
>>>> defer this to a future milestone)
>>>
>>> So, I've spent the past days implementing this API and hooking it up to
>>> swscale internally. (For testing, I am also replacing `vf_scale` by the
>>> equivalent AVScale-based implementation to see how the new API impacts
>>> existing users). It mostly works so far, with some left-over translation
>>> issues that I have to address before it can be sent upstream.
>>>
>>> ------
>>>
>>> One of the things I was thinking about was how to configure
>>> scalers/dither modes, which sws currently, somewhat clunkily, controls
>>> with flags. IMO, flags are not the right design here - if anything, it
>>> should be a separate enum/int, and controllable separately for chroma
>>> resampling (4:4:4 <-> 4:2:0) and main scaling (e.g. 50x50 <-> 80x80).
>>>
>>> That said, I think that for most end users, having such fine-grained
>>> options is not really providing any end value - unless you're already
>>> knee-deep in signal theory, the actual differences between, say,
>>> "natural bicubic spline" and "Lanczos" are obtuse at best and alien at
>>> worst.
>>>
>>> My idea was to provide a single `int quality`, which the user can set to
>>> tune the speed <-> quality trade-off on an arbitrary numeric scale from
>>> 0 to 10, with 0 being the fastest (alias everything, nearest neighbour,
>>> drop half chroma samples, etc.), the default being something in the
>>> vicinity of 3-5, and 10 being the maximum quality (full linear
>>> downscaling, anti-aliasing, error diffusion, etc.).
>>
>> I think 10 levels is not fine grained enough,
>> when there are more then 10 features to switch on/off we would have
>> to switch more than 1 at a time.
>>
>> also the scale has an issue, that becomes obvious when you consider the
>> extreems, like memset(0) at level 0, not converting chroma at level 1
>> and hiring a human artist to paint a matching upscaled image at 10
>> using a neural net at 9
>>
>> the quality factor would probably have thus at least 3 ranges
>> 1. the as fast as possible with noticeable quality issues
>> 2. the normal range
>> 3. the as best as possible, disregarding the computation needed
>>
>> some encoder (like x264) use words like UltraFast and Placebo for the ends
>> of this curve
> 
> I like the idea of using explicit names instead of numbers. It
> translates well onto the human-facing API anyway.
> 
> I don't think 10 levels is too few if we also pair it with a granular
> API for controlling exactly which scalers etc. you want.
> 
> In particular, if we want human-compatible names for them (ranging from
> "ultrafast" to "placebo" as discussed), you would be hard-pressed to
> find many more sensible names than 10.
> 
> Especially if we treat these just as presets and not the only way to
> configure them.
> 
>>
>> It also would be possible to use a more formal definition of how much one
>> wants to trade quality per time spend but that then makes it harder to
>> decide which feature to actually turn on when one requests a ratio between
>> PSNR and seconds
>>
>>
>>>
>>> The upside of this approach is that it would be vastly simpler for most
>>> end users. It would also track newly added functionality automatically;
>>> e.g. if we get a higher-quality tone mapping mode, it can be
>>> retroactively added to the higher quality presets. The biggest downside
>>> I can think of is that doing this would arguably violate the semantics
>>> of a "bitexact" flag, since it would break results relative to
>>> a previous version of libswscale - unless we maybe also force a specific
>>> quality level in bitexact mode?
>>>
>>> Open questions:
>>>
>>> 1. Is this a good idea, or do the downsides outweigh the benefits?
>>
>>
>>
>>> 2. Is an "advanced configuration" API still needed, in addition to the
>>>     quality presets?
>>
>> For regression testing and debuging it is very usefull to be able to turn
>> features on one at a time. A failure could then be quickly isolated to
>> a feature.
> 
> Very strong argument in favor of granular control. I'll find a way to
> support it while still having "presets".
> 
>>
>>
>>
>> [...]
>>
>>> /**
>>>   * Statically test if a conversion is supported. Values of (respectively)
>>>   * NONE/UNSPECIFIED are ignored.
>>>   *
>>>   * Returns 1 if the conversion is supported, or 0 otherwise.
>>>   */
>>> int avscale_test_format(enum AVPixelFormat dst, enum AVPixelFormat src);
>>> int avscale_test_colorspace(enum AVColorSpace dst, enum AVColorSpace src);
>>> int avscale_test_primaries(enum AVColorPrimaries dst, enum AVColorPrimaries src);
>>> int avscale_test_transfer(enum AVColorTransferCharacteristic dst,
>>>                            enum AVColorTransferCharacteristic src);

I think it'd be best if you define > 0 as supported for API like this, 
so it becomes extensible.
Also maybe < 0 for failure, Like AVERROR(EINVAL), so invalid enum values 
are not simply treated as "not supported".

>>
>> If we support A for any input and and support B for any output then we
>> should support converting from A to B
>>
>> I dont think this API is a good idea. It allows supporting random subsets
>> which would cause confusion and wierd bugs by code using it.
>> (for example removial of an intermediate filter could lead to failure)
> 
> Good point, will change. The prototypal use case for this API is setting
> up format lists inside vf_scale, which need to be set up independently
> anyway.
> 
> I was planning on adding another _test_frames() function that takes two
> AVFrames and returns in a tri-state manner whether conversion is
> supported, unsupported, or a no-op. If an exception to the input/output
> independence does ever arise, we can test for it in this function.
> 
>>
>> [...]
>>
>> thx
>>
>> -- 
>> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>>
>> Elect your leaders based on what they did after the last election, not
>> based on what they say before an election.
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".