Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
From: Niklas Haas <ffmpeg@haasn.xyz>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [RFC]] swscale modernization proposal
Date: Sat, 29 Jun 2024 12:58:03 +0200
Message-ID: <20240629125803.GB4857@haasn.xyz> (raw)
In-Reply-To: <tencent_427A7CA6AC523AC5C1F1B56C69CA02F5680A@qq.com>

On Sat, 29 Jun 2024 15:41:58 +0800 Zhao Zhili <quinkblack@foxmail.com> wrote:
> 
> 
> > On Jun 22, 2024, at 21:13, Niklas Haas <ffmpeg@haasn.xyz> wrote:
> > 
> > Hey,
> > 
> > As some of you know, I got contracted (by STF 2024) to work on improving
> > swscale, over the course of the next couple of months. I want to share my
> > current plans and gather feedback + measure sentiment.
> > 
> > ## Problem statement
> > 
> > The two issues I'd like to focus on for now are:
> > 
> > 1. Lack of support for a lot of modern formats and conversions (HDR, ICtCp,
> >   IPTc2, BT.2020-CL, XYZ, YCgCo, Dolby Vision, ...)
> > 2. Complicated context management, with cascaded contexts, threading, stateful
> >   configuration, multi-step init procedures, etc; and related bugs
> > 
> > In order to make these feasible, some amount of internal re-organization of
> > duties inside swscale is prudent.
> > 
> > ## Proposed approach
> > 
> > The first step is to create a new API, which will (tentatively) live in
> > <libswscale/avscale.h>. This API will initially start off as a near-copy of the
> > current swscale public API, but with the major difference that I want it to be
> > state-free and only access metadata in terms of AVFrame properties. So there
> > will be no independent configuration of the input chroma location etc. like
> > there is currently, and no need to re-configure or re-init the context when
> > feeding it frames with different properties. The goal is for users to be able
> > to just feed it AVFrame pairs and have it internally cache expensive
> > pre-processing steps as needed. Finally, avscale_* should ultimately also
> > support hardware frames directly, in which case it will dispatch to some
> > equivalent of scale_vulkan/vaapi/cuda or possibly even libplacebo. (But I will
> > defer this to a future milestone)
> > 
> > After this API is established, I want to start expanding the functionality in
> > the following manner:
> > 
> > ### Phase 1
> > 
> > For basic operation, avscale_* will just dispatch to a sequence of swscale_*
> > invocations. In the basic case, it will just directly invoke swscale with
> > minimal overhead. In more advanced cases, it might resolve to a *sequence* of
> > swscale operations, with other operations (e.g. colorspace conversions a la
> > vf_colorspace) mixed in.
> > 
> > This will allow us to gain new functionality in a minimally invasive way, and
> > will let API users start porting to the new API. This will also serve as a good
> > "selling point" for the new API, allowing us to hopefully break up the legacy
> > swscale API afterwards.
> > 
> > ### Phase 2
> > 
> > After this is working, I want to cleanly separate swscale into two distinct
> > components:
> > 
> > 1. vertical/horizontal scaling
> > 2. input/output conversions
> > 
> > Right now, these operations both live inside the main SwsContext, even though
> > they are conceptually orthogonal. Input handling is done entirely by the
> > abstract callbacks lumToYV12 etc., while output conversion is currently
> > "merged" with vertical scaling (yuv2planeX etc.).
> > 
> > I want to cleanly separate these components so they can live inside independent
> > contexts, and be considered as semantically distinct steps. (In particular,
> > there should ideally be no more "unscaled special converters", instead this can
> > be seen as a special case where there simply is no vertical/horizontal scaling
> > step)
> > 
> > The idea is for the colorspace conversion layer to sit in between the
> > input/output converters and the horizontal/vertical scalers. This all would be
> > orchestrated by the avscale_* abstraction.
> > 
> > ## Implementation details
> > 
> > To avoid performance loss from separating "merged" functions into their
> > constituents, care needs to be taken such that all intermediate data, in
> > addition to all involved look-up tables, will fit comfortably inside the L1
> > cache. The approach I propose, which is also (afaict) used by zscale, is to
> > loop over line segments, applying each operation in sequence, on a small
> > temporary buffer.
> > 
> > e.g.
> > 
> > hscale_row(pixel *dst, const pixel *src, int img_width)
> > {
> >    const int SIZE = 256; // or some other small-ish figure, possibly a design
> >                          // constant of the API so that SIMD implementations
> >                          // can be appropriately unrolled
> > 
> >    pixel tmp[SIZE];
> >    for (i = 0; i < img_width; i += SIZE) {
> >        int pixels = min(SIZE, img_width - i);
> > 
> >        { /* inside read input callback */
> >            unpack_input(tmp, src, pixels);
> >            // the amount of separation here will depend on the performance
> >            apply_matrix3x3(tmp, yuv2rgb, pixels);
> >            apply_lut3x1d(tmp, gamma_lut, pixels);
> >            ...
> >        }
> > 
> >        hscale(dst, tmp, filter, pixels);
> > 
> >        src += pixels;
> >        dst += scale_factor(pixels);
> >    }
> > }
> > 
> > This function can then output rows into a ring buffer for use inside the
> > vertical scaler, after which the same procedure happens (in reverse) for the
> > final output pass.
> > 
> > Possibly, we also want to additionally limit the size of a row for the
> > horizontal scaler, to allow arbitrary large input images.
> 
> I did a simple benchmark to compare the performance between libswscale and
> libyuv. With Apple M1 arm64, libyuv is about 10 times faster than libswscale for
> unscaled rgba to yuv420p. After recently aarch64 neon optimizations, libyuv is
> still 5 times faster than libswscale. The situation isn’t much better with scaled
> conversion.
> 
> Sure libswscale has more features and can be more precise than libyuv. Hope
> we can catch up the performance after your refactor.

AFAICT, libyuv does not do any dithering nor advanced filtering, which
swscale is capable of. They also do processing at a pretty low bit
depth, e.g. converting from 8-bit yuv420p straight to 8-bit rgba before
scaling at, you guessed it, 8-bit resolution. (libswscale would use
15-bit here)

That said, there's probably still something we can/should learn from
this implementation.

> 
> > 
> > ## Comments / feedback?
> > 
> > Does the above approach seem reasonable? How do people feel about introducing
> > a new API vs. trying to hammer the existing API into the shape I want it to be?
> > 
> > I've attached an example of what <avscale.h> could end up looking like. If
> > there is broad agreement on this design, I will move on to an implementation.
> > <avscale.h>_______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > 
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

  reply	other threads:[~2024-06-29 10:58 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-22 13:13 Niklas Haas
2024-06-22 14:23 ` Andrew Sayers
2024-06-22 15:10   ` Niklas Haas
2024-06-22 19:52     ` Michael Niedermayer
2024-06-22 22:24       ` Niklas Haas
2024-06-23 17:27         ` Michael Niedermayer
2024-06-22 22:19 ` Vittorio Giovara
2024-06-22 22:39   ` Niklas Haas
2024-06-23 17:46   ` Michael Niedermayer
2024-06-23 19:00     ` Paul B Mahol
2024-06-23 17:57   ` James Almer
2024-06-23 18:40     ` Andrew Sayers
2024-06-24 14:33     ` Niklas Haas
2024-06-24 14:44     ` Vittorio Giovara
2024-06-25 15:31       ` Niklas Haas
2024-07-01 21:10       ` Stefano Sabatini
2024-06-29  7:41 ` Zhao Zhili
2024-06-29 10:58   ` Niklas Haas [this message]
2024-06-29 11:47 ` Niklas Haas
2024-06-29 12:35   ` Michael Niedermayer
2024-06-29 14:05     ` Niklas Haas
2024-06-29 14:11       ` James Almer
2024-06-30  6:25   ` Vittorio Giovara
2024-07-02 13:27 ` Niklas Haas
2024-07-03 13:25   ` Niklas Haas
2024-07-05 18:31     ` Niklas Haas
2024-07-05 21:34       ` Michael Niedermayer
2024-07-06  0:11         ` Hendrik Leppkes
2024-07-06 12:32           ` Niklas Haas
2024-07-06 16:42           ` Michael Niedermayer
2024-07-06 17:29             ` Hendrik Leppkes
2024-07-08 11:58               ` Ronald S. Bultje
2024-07-08 12:33                 ` Andrew Sayers
2024-07-08 13:25                   ` Ronald S. Bultje
2024-07-06 11:36         ` Andrew Sayers
2024-07-06 12:27         ` Niklas Haas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240629125803.GB4857@haasn.xyz \
    --to=ffmpeg@haasn.xyz \
    --cc=ffmpeg-devel@ffmpeg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git