From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 8AB1A4556B for ; Sun, 23 Jun 2024 17:46:30 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AA17D68D5D2; Sun, 23 Jun 2024 20:46:26 +0300 (EEST) Received: from relay8-d.mail.gandi.net (relay8-d.mail.gandi.net [217.70.183.201]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 123A368CB9F for ; Sun, 23 Jun 2024 20:46:21 +0300 (EEST) Received: by mail.gandi.net (Postfix) with ESMTPSA id 5D51C1BF204 for ; Sun, 23 Jun 2024 17:46:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=niedermayer.cc; s=gm1; t=1719164780; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=BdEfrigyblSmmunKkgbsgEDzGf/9ZF8pT3EMyhiDRXg=; b=IAU9eWlbIUOxbttXp+PS5uE34AKf2r9ZCqRNxETZD1O3ggp8K7tFCoXtixYSFok/brsA3V uMf6JK2gmonQ9k4PUZgCUM1v79kGIirGo6w3ZG89G5HouFB6ph47YXYsdise9Sd9Mhe2bK flY1S2gvBaT+yADxnY9MNti7tq8lyPHgdmPw5FPGaFTecPihZlO07kaHeKO0eDC2X+m6Qp zQYaDXaikcEWgvkYosk8yUvkU5REe3GN7GVz+Ime83x2iNps6hjjthCE1/QA5JmwAJYpgh MtOG1H5R5oz2VbUKc1jsRmy1O2NLpXyxEUiTUek+NT8VHs8c5Td8oUWSuRHn1A== Date: Sun, 23 Jun 2024 19:46:19 +0200 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20240623174619.GA4991@pb2> References: <20240622151334.GD14140@haasn.xyz> MIME-Version: 1.0 In-Reply-To: X-GND-Sasl: michael@niedermayer.cc Subject: Re: [FFmpeg-devel] [RFC]] swscale modernization proposal X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: multipart/mixed; boundary="===============0810577217065947050==" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --===============0810577217065947050== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="SLk8PSyqxe/ugcCT" Content-Disposition: inline --SLk8PSyqxe/ugcCT Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Jun 23, 2024 at 12:19:13AM +0200, Vittorio Giovara wrote: > On Sat, Jun 22, 2024 at 3:22=E2=80=AFPM Niklas Haas wr= ote: >=20 > > Hey, > > > > As some of you know, I got contracted (by STF 2024) to work on improving > > swscale, over the course of the next couple of months. I want to share = my > > current plans and gather feedback + measure sentiment. > > > > ## Problem statement > > > > The two issues I'd like to focus on for now are: > > > > 1. Lack of support for a lot of modern formats and conversions (HDR, IC= tCp, > > IPTc2, BT.2020-CL, XYZ, YCgCo, Dolby Vision, ...) > > 2. Complicated context management, with cascaded contexts, threading, > > stateful > > configuration, multi-step init procedures, etc; and related bugs > > > > In order to make these feasible, some amount of internal re-organizatio= n of > > duties inside swscale is prudent. > > > > ## Proposed approach > > > > The first step is to create a new API, which will (tentatively) live in > > . This API will initially start off as a near-copy > > of the > > current swscale public API, but with the major difference that I want it > > to be > > state-free and only access metadata in terms of AVFrame properties. So > > there > > will be no independent configuration of the input chroma location etc. = like > > there is currently, and no need to re-configure or re-init the context = when > > feeding it frames with different properties. The goal is for users to be > > able > > to just feed it AVFrame pairs and have it internally cache expensive > > pre-processing steps as needed. Finally, avscale_* should ultimately al= so > > support hardware frames directly, in which case it will dispatch to some > > equivalent of scale_vulkan/vaapi/cuda or possibly even libplacebo. (But= I > > will > > defer this to a future milestone) > > > > After this API is established, I want to start expanding the functional= ity > > in > > the following manner: > > > > ### Phase 1 > > > > For basic operation, avscale_* will just dispatch to a sequence of > > swscale_* > > invocations. In the basic case, it will just directly invoke swscale wi= th > > minimal overhead. In more advanced cases, it might resolve to a *sequen= ce* > > of > > swscale operations, with other operations (e.g. colorspace conversions = a la > > vf_colorspace) mixed in. > > > > This will allow us to gain new functionality in a minimally invasive wa= y, > > and > > will let API users start porting to the new API. This will also serve a= s a > > good > > "selling point" for the new API, allowing us to hopefully break up the > > legacy > > swscale API afterwards. > > > > ### Phase 2 > > > > After this is working, I want to cleanly separate swscale into two dist= inct > > components: > > > > 1. vertical/horizontal scaling > > 2. input/output conversions > > > > Right now, these operations both live inside the main SwsContext, even > > though > > they are conceptually orthogonal. Input handling is done entirely by the > > abstract callbacks lumToYV12 etc., while output conversion is currently > > "merged" with vertical scaling (yuv2planeX etc.). > > > > I want to cleanly separate these components so they can live inside > > independent > > contexts, and be considered as semantically distinct steps. (In particu= lar, > > there should ideally be no more "unscaled special converters", instead > > this can > > be seen as a special case where there simply is no vertical/horizontal > > scaling > > step) > > > > The idea is for the colorspace conversion layer to sit in between the > > input/output converters and the horizontal/vertical scalers. This all > > would be > > orchestrated by the avscale_* abstraction. > > > > ## Implementation details > > > > To avoid performance loss from separating "merged" functions into their > > constituents, care needs to be taken such that all intermediate data, in > > addition to all involved look-up tables, will fit comfortably inside th= e L1 > > cache. The approach I propose, which is also (afaict) used by zscale, i= s to > > loop over line segments, applying each operation in sequence, on a small > > temporary buffer. > > > > e.g. > > > > hscale_row(pixel *dst, const pixel *src, int img_width) > > { > > const int SIZE =3D 256; // or some other small-ish figure, possibly= a > > design > > // constant of the API so that SIMD > > implementations > > // can be appropriately unrolled > > > > pixel tmp[SIZE]; > > for (i =3D 0; i < img_width; i +=3D SIZE) { > > int pixels =3D min(SIZE, img_width - i); > > > > { /* inside read input callback */ > > unpack_input(tmp, src, pixels); > > // the amount of separation here will depend on the perform= ance > > apply_matrix3x3(tmp, yuv2rgb, pixels); > > apply_lut3x1d(tmp, gamma_lut, pixels); > > ... > > } > > > > hscale(dst, tmp, filter, pixels); > > > > src +=3D pixels; > > dst +=3D scale_factor(pixels); > > } > > } > > > > This function can then output rows into a ring buffer for use inside the > > vertical scaler, after which the same procedure happens (in reverse) for > > the > > final output pass. > > > > Possibly, we also want to additionally limit the size of a row for the > > horizontal scaler, to allow arbitrary large input images. > > > > ## Comments / feedback? > > > > Does the above approach seem reasonable? How do people feel about > > introducing > > a new API vs. trying to hammer the existing API into the shape I want it > > to be? > > > > I've attached an example of what could end up looking like.= If > > there is broad agreement on this design, I will move on to an > > implementation. > > >=20 > What do you think of the concept of kernels like > https://github.com/lu-zero/avscale/blob/master/kernels/rgb2yuv.c > The idea is that there is a bit of analysis on input and output format > requested, and either a specialized kernel is used, or a chain of kernels > is built and data is passed along. > Among the design goals of that library, there was also readability (so th= at > the flow was always under control) and the ease of writing assembly and/or > shader for any single kernel. I think I have not looked at lucas work before, so i cannot comment on it s= pecifically But i think what you suggest is what Niklas intends to do. swscale has evolved over a long time from code with a very small subset of the current features. The code is in need for being "refactored" into some cleaner kernel / modular design. Also as you mention lu_zero, I had talked with him very briefly and he will be on the next extra member vote for the GA (whoever initiates it, ill try = to make sure luca is not forgotten) Just saying, i have not forgotten him, just that i wanted to accumulate more people before bringing that up. >=20 > Needless to say I support the plan of renaming the library so that it can As the main author of libswscale, i find this quite offensive. thx [...] --=20 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Everything should be made as simple as possible, but not simpler. -- Albert Einstein --SLk8PSyqxe/ugcCT Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EABEKAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCZnhfYgAKCRBhHseHBAsP q3P4AJ9wUqfTdG6RPjlo9EQ2DUB/3o7kGACfc3qFJhL6PnX9CZLqtsouNsoja8A= =jKBe -----END PGP SIGNATURE----- --SLk8PSyqxe/ugcCT-- --===============0810577217065947050== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --===============0810577217065947050==--