From: Niklas Haas <ffmpeg@haasn.xyz> To: ffmpeg-devel@ffmpeg.org Subject: Re: [FFmpeg-devel] [PATCH 00/17] swscale v2: new framework [RFC] Date: Fri, 2 May 2025 19:51:11 +0200 Message-ID: <20250502195111.GB219301@haasn.xyz> (raw) In-Reply-To: <20250426175603.726924-1-ffmpeg@haasn.xyz> On Sat, 26 Apr 2025 19:41:04 +0200 Niklas Haas <ffmpeg@haasn.xyz> wrote: > Hi all, > > After extensive amounts of refactoring and iteration on the design and API, > and the implementation of an x86 SIMD backend, I'm happy to present the > revised version of my ongoing swscale rewrite. Now with 100% less reliance on > compiler autovectorization. > > As before, I recommend (re)reading the design document to understand the > motivation, structure and implementation details of this rewrite. At this > point, I expect the major API and internal organization decisions to remain > stable. > > I will preface with some benchmark figures, on my (new) AMD Ryzen 9 9950X3D: > > All formats: > - single thread: Overall speedup=2.109x faster, min=0.018x max=40.309x > - multi thread: Overall speedup=2.607x faster, min=0.112x max=254.738x > > "Common" formats: (referenced >100 times in FFmpeg source code) > - single thread: Overall speedup=2.797x faster, min=0.408x max=16.514x > - multi thread: Overall speedup=2.870x faster, min=0.715x max=21.983x Small update: I noticed that one code path was accidentally not enabled. I also implemented asm for the remaining bit-packed formats. After those two changes, the new numbers are: All formats: - single thread: Overall speedup=4.247x faster, min=0.177x max=224.809x - multi thread: Overall speedup=4.000x faster, min=0.256x max=968.725x "Common" formats: - single thread: Overall speedup=3.174x faster, min=0.596x max=12.616x - multi thread: Overall speedup=3.005x faster, min=0.617x max=14.739x > > However, the main goal of this rewrite is not to improve performance, but to > improve the maintainability, extensibility and correctness of the code. Most of > the slowdowns for "common" formats are due to increased correctness (e.g. > accurate rounding and dithering), and not the result of a regression per se. > > All of the remaining slowdowns (notably, the 0.1x cases) are due to incomplete > coverage of the x86 SIMD. Notably, this currently affects bit packed formats > (e.g. rgb8, rgb4). (I also did not yet incorporate any AVX-512 code, which > some of the existing routines take advantage of) > > While I will continue working on this and expanding coverage to all remaining > operations, I felt that now is a good point in time to get some code review > and feedback regardless. I would especially appreciate code review of the x86 > SIMD code inside libswscale/x86/ops_*.asm, as this is my first time writing > x86 assembly code. > > doc/APIchanges | 3 + > doc/scaler.texi | 3 + > doc/swscale-v2.txt | 344 +++++++++++++++++++++++++++ > libswscale/Makefile | 9 + > libswscale/format.c | 945 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- > libswscale/format.h | 29 ++- > libswscale/graph.c | 151 ++++++++---- > libswscale/graph.h | 37 ++- > libswscale/ops.c | 850 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > libswscale/ops.h | 263 +++++++++++++++++++++ > libswscale/ops_backend.c | 101 ++++++++ > libswscale/ops_backend.h | 181 ++++++++++++++ > libswscale/ops_chain.c | 291 +++++++++++++++++++++++ > libswscale/ops_chain.h | 108 +++++++++ > libswscale/ops_internal.h | 103 ++++++++ > libswscale/ops_optimizer.c | 810 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > libswscale/ops_tmpl_common.c | 176 ++++++++++++++ > libswscale/ops_tmpl_float.c | 255 ++++++++++++++++++++ > libswscale/ops_tmpl_int.c | 609 +++++++++++++++++++++++++++++++++++++++++++++++ > libswscale/options.c | 1 + > libswscale/swscale.h | 7 + > libswscale/tests/swscale.c | 11 +- > libswscale/version.h | 2 +- > libswscale/x86/Makefile | 3 + > libswscale/x86/ops.c | 735 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > libswscale/x86/ops_common.asm | 208 ++++++++++++++++ > libswscale/x86/ops_float.asm | 376 +++++++++++++++++++++++++++++ > libswscale/x86/ops_int.asm | 882 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > tests/checkasm/Makefile | 8 +- > tests/checkasm/checkasm.c | 4 +- > tests/checkasm/checkasm.h | 26 +- > tests/checkasm/sw_ops.c | 748 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 32 files changed, 8206 insertions(+), 73 deletions(-) > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2025-05-02 17:51 UTC|newest] Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top 2025-04-26 17:41 Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 01/17] tests/swscale: improve colorization of speedup Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 02/17] swscale/graph: expose ff_sws_graph_add_pass Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 03/17] swscale/graph: make noop loop more robust Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 04/17] swscale/graph: move vshift() and shift_img() to shared header Niklas Haas 2025-05-16 15:41 ` Ramiro Polla 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 05/17] swscale/graph: prefer bools to ints Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 06/17] doc: add swscale rewrite design document Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 07/17] swscale: add SWS_EXPERIMENTAL flag Niklas Haas 2025-05-08 11:37 ` Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 08/17] swscale/ops: introduce new low level framework Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 09/17] swscale/ops_chain: add internal abstraction for kernel linking Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 10/17] swscale/ops_backend: add reference backend basend on C templates Niklas Haas 2025-05-02 15:06 ` Michael Niedermayer 2025-05-08 12:24 ` Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 11/17] swscale/x86: add SIMD backend Niklas Haas 2025-04-29 13:00 ` Michael Niedermayer 2025-04-30 16:24 ` Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 12/17] tests/checkasm: increase number of runs in between measurements Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 13/17] tests/checkasm: add checkasm_check_float Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 14/17] tests/checkasm: add checkasm tests for swscale ops Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 15/17] swscale/format: rename legacy format conversion table Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 16/17] swscale/format: add new format decode/encode logic Niklas Haas 2025-05-02 14:10 ` Michael Niedermayer 2025-05-02 14:36 ` Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 17/17] swscale/graph: allow experimental use of new format handler Niklas Haas 2025-04-26 22:22 ` [FFmpeg-devel] [PATCH 00/17] swscale v2: new framework [RFC] Niklas Haas 2025-05-02 17:51 ` Niklas Haas [this message] 2025-05-16 11:09 ` Niklas Haas 2025-05-16 14:32 ` Ramiro Polla 2025-05-16 14:39 ` Niklas Haas 2025-05-16 15:44 ` Ramiro Polla
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20250502195111.GB219301@haasn.xyz \ --to=ffmpeg@haasn.xyz \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git