Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
From: Niklas Haas <ffmpeg@haasn.xyz>
To: ffmpeg-devel@ffmpeg.org
Subject: Re: [FFmpeg-devel] [PATCH 00/17] swscale v2: new framework [RFC]
Date: Fri, 2 May 2025 19:51:11 +0200
Message-ID: <20250502195111.GB219301@haasn.xyz> (raw)
In-Reply-To: <20250426175603.726924-1-ffmpeg@haasn.xyz>

On Sat, 26 Apr 2025 19:41:04 +0200 Niklas Haas <ffmpeg@haasn.xyz> wrote:
> Hi all,
>
> After extensive amounts of refactoring and iteration on the design and API,
> and the implementation of an x86 SIMD backend, I'm happy to present the
> revised version of my ongoing swscale rewrite. Now with 100% less reliance on
> compiler autovectorization.
>
> As before, I recommend (re)reading the design document to understand the
> motivation, structure and implementation details of this rewrite. At this
> point, I expect the major API and internal organization decisions to remain
> stable.
>
> I will preface with some benchmark figures, on my (new) AMD Ryzen 9 9950X3D:
>
> All formats:
>   - single thread: Overall speedup=2.109x faster, min=0.018x max=40.309x
>   - multi thread:  Overall speedup=2.607x faster, min=0.112x max=254.738x
>
> "Common" formats: (referenced >100 times in FFmpeg source code)
>   - single thread: Overall speedup=2.797x faster, min=0.408x max=16.514x
>   - multi thread:  Overall speedup=2.870x faster, min=0.715x max=21.983x

Small update: I noticed that one code path was accidentally not enabled. I
also implemented asm for the remaining bit-packed formats. After those two
changes, the new numbers are:

All formats:
  - single thread: Overall speedup=4.247x faster, min=0.177x max=224.809x
  - multi thread:  Overall speedup=4.000x faster, min=0.256x max=968.725x

"Common" formats:
  - single thread: Overall speedup=3.174x faster, min=0.596x max=12.616x
  - multi thread:  Overall speedup=3.005x faster, min=0.617x max=14.739x

>
> However, the main goal of this rewrite is not to improve performance, but to
> improve the maintainability, extensibility and correctness of the code. Most of
> the slowdowns for "common" formats are due to increased correctness (e.g.
> accurate rounding and dithering), and not the result of a regression per se.
>
> All of the remaining slowdowns (notably, the 0.1x cases) are due to incomplete
> coverage of the x86 SIMD. Notably, this currently affects bit packed formats
> (e.g. rgb8, rgb4). (I also did not yet incorporate any AVX-512 code, which
> some of the existing routines take advantage of)
>
> While I will continue working on this and expanding coverage to all remaining
> operations, I felt that now is a good point in time to get some code review
> and feedback regardless. I would especially appreciate code review of the x86
> SIMD code inside libswscale/x86/ops_*.asm, as this is my first time writing
> x86 assembly code.
>
>  doc/APIchanges                |   3 +
>  doc/scaler.texi               |   3 +
>  doc/swscale-v2.txt            | 344 +++++++++++++++++++++++++++
>  libswscale/Makefile           |   9 +
>  libswscale/format.c           | 945 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  libswscale/format.h           |  29 ++-
>  libswscale/graph.c            | 151 ++++++++----
>  libswscale/graph.h            |  37 ++-
>  libswscale/ops.c              | 850 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  libswscale/ops.h              | 263 +++++++++++++++++++++
>  libswscale/ops_backend.c      | 101 ++++++++
>  libswscale/ops_backend.h      | 181 ++++++++++++++
>  libswscale/ops_chain.c        | 291 +++++++++++++++++++++++
>  libswscale/ops_chain.h        | 108 +++++++++
>  libswscale/ops_internal.h     | 103 ++++++++
>  libswscale/ops_optimizer.c    | 810 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  libswscale/ops_tmpl_common.c  | 176 ++++++++++++++
>  libswscale/ops_tmpl_float.c   | 255 ++++++++++++++++++++
>  libswscale/ops_tmpl_int.c     | 609 +++++++++++++++++++++++++++++++++++++++++++++++
>  libswscale/options.c          |   1 +
>  libswscale/swscale.h          |   7 +
>  libswscale/tests/swscale.c    |  11 +-
>  libswscale/version.h          |   2 +-
>  libswscale/x86/Makefile       |   3 +
>  libswscale/x86/ops.c          | 735 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  libswscale/x86/ops_common.asm | 208 ++++++++++++++++
>  libswscale/x86/ops_float.asm  | 376 +++++++++++++++++++++++++++++
>  libswscale/x86/ops_int.asm    | 882 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/checkasm/Makefile       |   8 +-
>  tests/checkasm/checkasm.c     |   4 +-
>  tests/checkasm/checkasm.h     |  26 +-
>  tests/checkasm/sw_ops.c       | 748 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  32 files changed, 8206 insertions(+), 73 deletions(-)
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

  parent reply	other threads:[~2025-05-02 17:51 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-26 17:41 Niklas Haas
2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 01/17] tests/swscale: improve colorization of speedup Niklas Haas
2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 02/17] swscale/graph: expose ff_sws_graph_add_pass Niklas Haas
2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 03/17] swscale/graph: make noop loop more robust Niklas Haas
2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 04/17] swscale/graph: move vshift() and shift_img() to shared header Niklas Haas
2025-05-16 15:41   ` Ramiro Polla
2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 05/17] swscale/graph: prefer bools to ints Niklas Haas
2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 06/17] doc: add swscale rewrite design document Niklas Haas
2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 07/17] swscale: add SWS_EXPERIMENTAL flag Niklas Haas
2025-05-08 11:37   ` Niklas Haas
2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 08/17] swscale/ops: introduce new low level framework Niklas Haas
2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 09/17] swscale/ops_chain: add internal abstraction for kernel linking Niklas Haas
2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 10/17] swscale/ops_backend: add reference backend basend on C templates Niklas Haas
2025-05-02 15:06   ` Michael Niedermayer
2025-05-08 12:24     ` Niklas Haas
2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 11/17] swscale/x86: add SIMD backend Niklas Haas
2025-04-29 13:00   ` Michael Niedermayer
2025-04-30 16:24     ` Niklas Haas
2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 12/17] tests/checkasm: increase number of runs in between measurements Niklas Haas
2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 13/17] tests/checkasm: add checkasm_check_float Niklas Haas
2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 14/17] tests/checkasm: add checkasm tests for swscale ops Niklas Haas
2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 15/17] swscale/format: rename legacy format conversion table Niklas Haas
2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 16/17] swscale/format: add new format decode/encode logic Niklas Haas
2025-05-02 14:10   ` Michael Niedermayer
2025-05-02 14:36     ` Niklas Haas
2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 17/17] swscale/graph: allow experimental use of new format handler Niklas Haas
2025-04-26 22:22 ` [FFmpeg-devel] [PATCH 00/17] swscale v2: new framework [RFC] Niklas Haas
2025-05-02 17:51 ` Niklas Haas [this message]
2025-05-16 11:09 ` Niklas Haas
2025-05-16 14:32   ` Ramiro Polla
2025-05-16 14:39     ` Niklas Haas
2025-05-16 15:44       ` Ramiro Polla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250502195111.GB219301@haasn.xyz \
    --to=ffmpeg@haasn.xyz \
    --cc=ffmpeg-devel@ffmpeg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git