From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 179D64A31A for ; Sat, 29 Jun 2024 14:10:57 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8D6A268D6C9; Sat, 29 Jun 2024 17:10:54 +0300 (EEST) Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com [209.85.215.182]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 137E368D63B for ; Sat, 29 Jun 2024 17:10:48 +0300 (EEST) Received: by mail-pg1-f182.google.com with SMTP id 41be03b00d2f7-7182a634815so979267a12.3 for ; Sat, 29 Jun 2024 07:10:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719670245; x=1720275045; darn=ffmpeg.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id:from :to:cc:subject:date:message-id:reply-to; bh=GIUJZo3OzoW8tDvWwaYewuS0mPnVQAe0cz0wXcKHfwE=; b=C3mINBNCOgJo96VRhAxZRnpn3ihZ1p5lSUKwKXoBsxfwTtAowDG+YXxzq5rNMYxj6R Kzub79cWbGQDi8Q9tDtcON0FdP5/PUg8flwfMsLDdkKOY8bE6jYdzT4tKKN9qHOUeS0D k0PPl9URGyX+ZZm8LPVlhkEHnRMIB7minDE4otKYj8uMYZw77redAz5LRDx7RVDTrtq1 cQ910wYXzz9s4/vgcNGPE63L8N77jeUlMdbklnDwx+e2RlaThzTX3sUmN0bHHSLpihjl cABScHXn6JVdeYeny8OOoXNLpuoPDCnApcp9HmIPM9xkvMrlZwHmLiPWoFnxIM7kB2Ya 9lHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719670245; x=1720275045; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GIUJZo3OzoW8tDvWwaYewuS0mPnVQAe0cz0wXcKHfwE=; b=Yoz9l21ZrHbNF0XXjL+4B7G4Myvc+gCKeB0OsiW3RdnzZEtSHu89iKS/t0cMM0AXqV L2hv+6WUEAF0vbnLD5Unq6Q2jeVyf7U/VDGCLRfDivo6Q0qxpKVl3EHcb6nP/3rJyphY ZcR8XkcYwN+Ay5GYMb68zLcP/hpMwASg8im+FsoFHLdMQkM6vL1GlxBr0BS6uhYP7O9Z vncWlK2+QW73bYHWAwQgeasHUiohUNDJxzgDkRlvxcIVXdkxOe72aGAK2Wx8CnWym2tR ZsS/Yjq0eO5JEXNlNTRk8wCj1dr1NAvRhU7/Ah0xPQUYECrthSGhtRfft8c1M0MrjOcK r/9Q== X-Gm-Message-State: AOJu0YyaInapkdguFaiMUFQzXhPTGdKthKzU2s90kxukYSjTkuFjmmip dXRaB2+CZEqa7jY97ObxpLfyCXBLLCF7iQXatXZU1SmwMHgByVKqqz4N7g== X-Google-Smtp-Source: AGHT+IE9m40c84HTAos1kZCNKqVTUtOLiXwdL1riUIqvawN9Sx8KkIrATzmXUPZpejsgDRgfgl+DMA== X-Received: by 2002:a05:6a20:2450:b0:1bd:25ac:a08a with SMTP id adf61e73a8af0-1bef60e3ef4mr1238770637.12.1719670244719; Sat, 29 Jun 2024 07:10:44 -0700 (PDT) Received: from [192.168.0.16] ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1fac10e2df7sm32776075ad.82.2024.06.29.07.10.43 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 29 Jun 2024 07:10:44 -0700 (PDT) Message-ID: Date: Sat, 29 Jun 2024 11:11:09 -0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: ffmpeg-devel@ffmpeg.org References: <20240622151334.GD14140@haasn.xyz> <20240629134743.GD4857@haasn.xyz> <20240629123532.GZ4991@pb2> <20240629160557.GB37436@haasn.xyz> Content-Language: en-US From: James Almer In-Reply-To: <20240629160557.GB37436@haasn.xyz> Subject: Re: [FFmpeg-devel] [RFC]] swscale modernization proposal X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On 6/29/2024 11:05 AM, Niklas Haas wrote: > On Sat, 29 Jun 2024 14:35:32 +0200 Michael Niedermayer wrote: >> On Sat, Jun 29, 2024 at 01:47:43PM +0200, Niklas Haas wrote: >>> On Sat, 22 Jun 2024 15:13:34 +0200 Niklas Haas wrote: >>>> Hey, >>>> >>>> As some of you know, I got contracted (by STF 2024) to work on improving >>>> swscale, over the course of the next couple of months. I want to share my >>>> current plans and gather feedback + measure sentiment. >>>> >>>> ## Problem statement >>>> >>>> The two issues I'd like to focus on for now are: >>>> >>>> 1. Lack of support for a lot of modern formats and conversions (HDR, ICtCp, >>>> IPTc2, BT.2020-CL, XYZ, YCgCo, Dolby Vision, ...) >>>> 2. Complicated context management, with cascaded contexts, threading, stateful >>>> configuration, multi-step init procedures, etc; and related bugs >>>> >>>> In order to make these feasible, some amount of internal re-organization of >>>> duties inside swscale is prudent. >>>> >>>> ## Proposed approach >>>> >>>> The first step is to create a new API, which will (tentatively) live in >>>> . This API will initially start off as a near-copy of the >>>> current swscale public API, but with the major difference that I want it to be >>>> state-free and only access metadata in terms of AVFrame properties. So there >>>> will be no independent configuration of the input chroma location etc. like >>>> there is currently, and no need to re-configure or re-init the context when >>>> feeding it frames with different properties. The goal is for users to be able >>>> to just feed it AVFrame pairs and have it internally cache expensive >>>> pre-processing steps as needed. Finally, avscale_* should ultimately also >>>> support hardware frames directly, in which case it will dispatch to some >>>> equivalent of scale_vulkan/vaapi/cuda or possibly even libplacebo. (But I will >>>> defer this to a future milestone) >>> >>> So, I've spent the past days implementing this API and hooking it up to >>> swscale internally. (For testing, I am also replacing `vf_scale` by the >>> equivalent AVScale-based implementation to see how the new API impacts >>> existing users). It mostly works so far, with some left-over translation >>> issues that I have to address before it can be sent upstream. >>> >>> ------ >>> >>> One of the things I was thinking about was how to configure >>> scalers/dither modes, which sws currently, somewhat clunkily, controls >>> with flags. IMO, flags are not the right design here - if anything, it >>> should be a separate enum/int, and controllable separately for chroma >>> resampling (4:4:4 <-> 4:2:0) and main scaling (e.g. 50x50 <-> 80x80). >>> >>> That said, I think that for most end users, having such fine-grained >>> options is not really providing any end value - unless you're already >>> knee-deep in signal theory, the actual differences between, say, >>> "natural bicubic spline" and "Lanczos" are obtuse at best and alien at >>> worst. >>> >>> My idea was to provide a single `int quality`, which the user can set to >>> tune the speed <-> quality trade-off on an arbitrary numeric scale from >>> 0 to 10, with 0 being the fastest (alias everything, nearest neighbour, >>> drop half chroma samples, etc.), the default being something in the >>> vicinity of 3-5, and 10 being the maximum quality (full linear >>> downscaling, anti-aliasing, error diffusion, etc.). >> >> I think 10 levels is not fine grained enough, >> when there are more then 10 features to switch on/off we would have >> to switch more than 1 at a time. >> >> also the scale has an issue, that becomes obvious when you consider the >> extreems, like memset(0) at level 0, not converting chroma at level 1 >> and hiring a human artist to paint a matching upscaled image at 10 >> using a neural net at 9 >> >> the quality factor would probably have thus at least 3 ranges >> 1. the as fast as possible with noticeable quality issues >> 2. the normal range >> 3. the as best as possible, disregarding the computation needed >> >> some encoder (like x264) use words like UltraFast and Placebo for the ends >> of this curve > > I like the idea of using explicit names instead of numbers. It > translates well onto the human-facing API anyway. > > I don't think 10 levels is too few if we also pair it with a granular > API for controlling exactly which scalers etc. you want. > > In particular, if we want human-compatible names for them (ranging from > "ultrafast" to "placebo" as discussed), you would be hard-pressed to > find many more sensible names than 10. > > Especially if we treat these just as presets and not the only way to > configure them. > >> >> It also would be possible to use a more formal definition of how much one >> wants to trade quality per time spend but that then makes it harder to >> decide which feature to actually turn on when one requests a ratio between >> PSNR and seconds >> >> >>> >>> The upside of this approach is that it would be vastly simpler for most >>> end users. It would also track newly added functionality automatically; >>> e.g. if we get a higher-quality tone mapping mode, it can be >>> retroactively added to the higher quality presets. The biggest downside >>> I can think of is that doing this would arguably violate the semantics >>> of a "bitexact" flag, since it would break results relative to >>> a previous version of libswscale - unless we maybe also force a specific >>> quality level in bitexact mode? >>> >>> Open questions: >>> >>> 1. Is this a good idea, or do the downsides outweigh the benefits? >> >> >> >>> 2. Is an "advanced configuration" API still needed, in addition to the >>> quality presets? >> >> For regression testing and debuging it is very usefull to be able to turn >> features on one at a time. A failure could then be quickly isolated to >> a feature. > > Very strong argument in favor of granular control. I'll find a way to > support it while still having "presets". > >> >> >> >> [...] >> >>> /** >>> * Statically test if a conversion is supported. Values of (respectively) >>> * NONE/UNSPECIFIED are ignored. >>> * >>> * Returns 1 if the conversion is supported, or 0 otherwise. >>> */ >>> int avscale_test_format(enum AVPixelFormat dst, enum AVPixelFormat src); >>> int avscale_test_colorspace(enum AVColorSpace dst, enum AVColorSpace src); >>> int avscale_test_primaries(enum AVColorPrimaries dst, enum AVColorPrimaries src); >>> int avscale_test_transfer(enum AVColorTransferCharacteristic dst, >>> enum AVColorTransferCharacteristic src); I think it'd be best if you define > 0 as supported for API like this, so it becomes extensible. Also maybe < 0 for failure, Like AVERROR(EINVAL), so invalid enum values are not simply treated as "not supported". >> >> If we support A for any input and and support B for any output then we >> should support converting from A to B >> >> I dont think this API is a good idea. It allows supporting random subsets >> which would cause confusion and wierd bugs by code using it. >> (for example removial of an intermediate filter could lead to failure) > > Good point, will change. The prototypal use case for this API is setting > up format lists inside vf_scale, which need to be set up independently > anyway. > > I was planning on adding another _test_frames() function that takes two > AVFrames and returns in a tri-state manner whether conversion is > supported, unsupported, or a no-op. If an exception to the input/output > independence does ever arise, we can test for it in this function. > >> >> [...] >> >> thx >> >> -- >> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB >> >> Elect your leaders based on what they did after the last election, not >> based on what they say before an election. >> >> _______________________________________________ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel >> >> To unsubscribe, visit link above, or email >> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".