From: Zhao Zhili <quinkblack@foxmail.com> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Subject: Re: [FFmpeg-devel] [RFC]] swscale modernization proposal Date: Sat, 29 Jun 2024 15:41:58 +0800 Message-ID: <tencent_427A7CA6AC523AC5C1F1B56C69CA02F5680A@qq.com> (raw) In-Reply-To: <20240622151334.GD14140@haasn.xyz> > On Jun 22, 2024, at 21:13, Niklas Haas <ffmpeg@haasn.xyz> wrote: > > Hey, > > As some of you know, I got contracted (by STF 2024) to work on improving > swscale, over the course of the next couple of months. I want to share my > current plans and gather feedback + measure sentiment. > > ## Problem statement > > The two issues I'd like to focus on for now are: > > 1. Lack of support for a lot of modern formats and conversions (HDR, ICtCp, > IPTc2, BT.2020-CL, XYZ, YCgCo, Dolby Vision, ...) > 2. Complicated context management, with cascaded contexts, threading, stateful > configuration, multi-step init procedures, etc; and related bugs > > In order to make these feasible, some amount of internal re-organization of > duties inside swscale is prudent. > > ## Proposed approach > > The first step is to create a new API, which will (tentatively) live in > <libswscale/avscale.h>. This API will initially start off as a near-copy of the > current swscale public API, but with the major difference that I want it to be > state-free and only access metadata in terms of AVFrame properties. So there > will be no independent configuration of the input chroma location etc. like > there is currently, and no need to re-configure or re-init the context when > feeding it frames with different properties. The goal is for users to be able > to just feed it AVFrame pairs and have it internally cache expensive > pre-processing steps as needed. Finally, avscale_* should ultimately also > support hardware frames directly, in which case it will dispatch to some > equivalent of scale_vulkan/vaapi/cuda or possibly even libplacebo. (But I will > defer this to a future milestone) > > After this API is established, I want to start expanding the functionality in > the following manner: > > ### Phase 1 > > For basic operation, avscale_* will just dispatch to a sequence of swscale_* > invocations. In the basic case, it will just directly invoke swscale with > minimal overhead. In more advanced cases, it might resolve to a *sequence* of > swscale operations, with other operations (e.g. colorspace conversions a la > vf_colorspace) mixed in. > > This will allow us to gain new functionality in a minimally invasive way, and > will let API users start porting to the new API. This will also serve as a good > "selling point" for the new API, allowing us to hopefully break up the legacy > swscale API afterwards. > > ### Phase 2 > > After this is working, I want to cleanly separate swscale into two distinct > components: > > 1. vertical/horizontal scaling > 2. input/output conversions > > Right now, these operations both live inside the main SwsContext, even though > they are conceptually orthogonal. Input handling is done entirely by the > abstract callbacks lumToYV12 etc., while output conversion is currently > "merged" with vertical scaling (yuv2planeX etc.). > > I want to cleanly separate these components so they can live inside independent > contexts, and be considered as semantically distinct steps. (In particular, > there should ideally be no more "unscaled special converters", instead this can > be seen as a special case where there simply is no vertical/horizontal scaling > step) > > The idea is for the colorspace conversion layer to sit in between the > input/output converters and the horizontal/vertical scalers. This all would be > orchestrated by the avscale_* abstraction. > > ## Implementation details > > To avoid performance loss from separating "merged" functions into their > constituents, care needs to be taken such that all intermediate data, in > addition to all involved look-up tables, will fit comfortably inside the L1 > cache. The approach I propose, which is also (afaict) used by zscale, is to > loop over line segments, applying each operation in sequence, on a small > temporary buffer. > > e.g. > > hscale_row(pixel *dst, const pixel *src, int img_width) > { > const int SIZE = 256; // or some other small-ish figure, possibly a design > // constant of the API so that SIMD implementations > // can be appropriately unrolled > > pixel tmp[SIZE]; > for (i = 0; i < img_width; i += SIZE) { > int pixels = min(SIZE, img_width - i); > > { /* inside read input callback */ > unpack_input(tmp, src, pixels); > // the amount of separation here will depend on the performance > apply_matrix3x3(tmp, yuv2rgb, pixels); > apply_lut3x1d(tmp, gamma_lut, pixels); > ... > } > > hscale(dst, tmp, filter, pixels); > > src += pixels; > dst += scale_factor(pixels); > } > } > > This function can then output rows into a ring buffer for use inside the > vertical scaler, after which the same procedure happens (in reverse) for the > final output pass. > > Possibly, we also want to additionally limit the size of a row for the > horizontal scaler, to allow arbitrary large input images. I did a simple benchmark to compare the performance between libswscale and libyuv. With Apple M1 arm64, libyuv is about 10 times faster than libswscale for unscaled rgba to yuv420p. After recently aarch64 neon optimizations, libyuv is still 5 times faster than libswscale. The situation isn’t much better with scaled conversion. Sure libswscale has more features and can be more precise than libyuv. Hope we can catch up the performance after your refactor. > > ## Comments / feedback? > > Does the above approach seem reasonable? How do people feel about introducing > a new API vs. trying to hammer the existing API into the shape I want it to be? > > I've attached an example of what <avscale.h> could end up looking like. If > there is broad agreement on this design, I will move on to an implementation. > <avscale.h>_______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2024-06-29 7:44 UTC|newest] Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top 2024-06-22 13:13 Niklas Haas 2024-06-22 14:23 ` Andrew Sayers 2024-06-22 15:10 ` Niklas Haas 2024-06-22 19:52 ` Michael Niedermayer 2024-06-22 22:24 ` Niklas Haas 2024-06-23 17:27 ` Michael Niedermayer 2024-06-22 22:19 ` Vittorio Giovara 2024-06-22 22:39 ` Niklas Haas 2024-06-23 17:46 ` Michael Niedermayer 2024-06-23 19:00 ` Paul B Mahol 2024-06-23 17:57 ` James Almer 2024-06-23 18:40 ` Andrew Sayers 2024-06-24 14:33 ` Niklas Haas 2024-06-24 14:44 ` Vittorio Giovara 2024-06-25 15:31 ` Niklas Haas 2024-07-01 21:10 ` Stefano Sabatini 2024-06-29 7:41 ` Zhao Zhili [this message] 2024-06-29 10:58 ` Niklas Haas 2024-06-29 11:47 ` Niklas Haas 2024-06-29 12:35 ` Michael Niedermayer 2024-06-29 14:05 ` Niklas Haas 2024-06-29 14:11 ` James Almer 2024-06-30 6:25 ` Vittorio Giovara 2024-07-02 13:27 ` Niklas Haas 2024-07-03 13:25 ` Niklas Haas 2024-07-05 18:31 ` Niklas Haas 2024-07-05 21:34 ` Michael Niedermayer 2024-07-06 0:11 ` Hendrik Leppkes 2024-07-06 12:32 ` Niklas Haas 2024-07-06 16:42 ` Michael Niedermayer 2024-07-06 17:29 ` Hendrik Leppkes 2024-07-08 11:58 ` Ronald S. Bultje 2024-07-08 12:33 ` Andrew Sayers 2024-07-08 13:25 ` Ronald S. Bultje 2024-07-06 11:36 ` Andrew Sayers 2024-07-06 12:27 ` Niklas Haas
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=tencent_427A7CA6AC523AC5C1F1B56C69CA02F5680A@qq.com \ --to=quinkblack@foxmail.com \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git