From: Niklas Haas <ffmpeg@haasn.xyz> To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH 00/17] swscale v2: new framework [RFC] Date: Sat, 26 Apr 2025 19:41:04 +0200 Message-ID: <20250426175603.726924-1-ffmpeg@haasn.xyz> (raw) Hi all, After extensive amounts of refactoring and iteration on the design and API, and the implementation of an x86 SIMD backend, I'm happy to present the revised version of my ongoing swscale rewrite. Now with 100% less reliance on compiler autovectorization. As before, I recommend (re)reading the design document to understand the motivation, structure and implementation details of this rewrite. At this point, I expect the major API and internal organization decisions to remain stable. I will preface with some benchmark figures, on my (new) AMD Ryzen 9 9950X3D: All formats: - single thread: Overall speedup=2.109x faster, min=0.018x max=40.309x - multi thread: Overall speedup=2.607x faster, min=0.112x max=254.738x "Common" formats: (referenced >100 times in FFmpeg source code) - single thread: Overall speedup=2.797x faster, min=0.408x max=16.514x - multi thread: Overall speedup=2.870x faster, min=0.715x max=21.983x However, the main goal of this rewrite is not to improve performance, but to improve the maintainability, extensibility and correctness of the code. Most of the slowdowns for "common" formats are due to increased correctness (e.g. accurate rounding and dithering), and not the result of a regression per se. All of the remaining slowdowns (notably, the 0.1x cases) are due to incomplete coverage of the x86 SIMD. Notably, this currently affects bit packed formats (e.g. rgb8, rgb4). (I also did not yet incorporate any AVX-512 code, which some of the existing routines take advantage of) While I will continue working on this and expanding coverage to all remaining operations, I felt that now is a good point in time to get some code review and feedback regardless. I would especially appreciate code review of the x86 SIMD code inside libswscale/x86/ops_*.asm, as this is my first time writing x86 assembly code. doc/APIchanges | 3 + doc/scaler.texi | 3 + doc/swscale-v2.txt | 344 +++++++++++++++++++++++++++ libswscale/Makefile | 9 + libswscale/format.c | 945 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- libswscale/format.h | 29 ++- libswscale/graph.c | 151 ++++++++---- libswscale/graph.h | 37 ++- libswscale/ops.c | 850 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ libswscale/ops.h | 263 +++++++++++++++++++++ libswscale/ops_backend.c | 101 ++++++++ libswscale/ops_backend.h | 181 ++++++++++++++ libswscale/ops_chain.c | 291 +++++++++++++++++++++++ libswscale/ops_chain.h | 108 +++++++++ libswscale/ops_internal.h | 103 ++++++++ libswscale/ops_optimizer.c | 810 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ libswscale/ops_tmpl_common.c | 176 ++++++++++++++ libswscale/ops_tmpl_float.c | 255 ++++++++++++++++++++ libswscale/ops_tmpl_int.c | 609 +++++++++++++++++++++++++++++++++++++++++++++++ libswscale/options.c | 1 + libswscale/swscale.h | 7 + libswscale/tests/swscale.c | 11 +- libswscale/version.h | 2 +- libswscale/x86/Makefile | 3 + libswscale/x86/ops.c | 735 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ libswscale/x86/ops_common.asm | 208 ++++++++++++++++ libswscale/x86/ops_float.asm | 376 +++++++++++++++++++++++++++++ libswscale/x86/ops_int.asm | 882 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/checkasm/Makefile | 8 +- tests/checkasm/checkasm.c | 4 +- tests/checkasm/checkasm.h | 26 +- tests/checkasm/sw_ops.c | 748 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 32 files changed, 8206 insertions(+), 73 deletions(-) _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next reply other threads:[~2025-04-26 17:56 UTC|newest] Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top 2025-04-26 17:41 Niklas Haas [this message] 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 01/17] tests/swscale: improve colorization of speedup Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 02/17] swscale/graph: expose ff_sws_graph_add_pass Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 03/17] swscale/graph: make noop loop more robust Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 04/17] swscale/graph: move vshift() and shift_img() to shared header Niklas Haas 2025-05-16 15:41 ` Ramiro Polla 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 05/17] swscale/graph: prefer bools to ints Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 06/17] doc: add swscale rewrite design document Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 07/17] swscale: add SWS_EXPERIMENTAL flag Niklas Haas 2025-05-08 11:37 ` Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 08/17] swscale/ops: introduce new low level framework Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 09/17] swscale/ops_chain: add internal abstraction for kernel linking Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 10/17] swscale/ops_backend: add reference backend basend on C templates Niklas Haas 2025-05-02 15:06 ` Michael Niedermayer 2025-05-08 12:24 ` Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 11/17] swscale/x86: add SIMD backend Niklas Haas 2025-04-29 13:00 ` Michael Niedermayer 2025-04-30 16:24 ` Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 12/17] tests/checkasm: increase number of runs in between measurements Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 13/17] tests/checkasm: add checkasm_check_float Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 14/17] tests/checkasm: add checkasm tests for swscale ops Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 15/17] swscale/format: rename legacy format conversion table Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 16/17] swscale/format: add new format decode/encode logic Niklas Haas 2025-05-02 14:10 ` Michael Niedermayer 2025-05-02 14:36 ` Niklas Haas 2025-04-26 17:41 ` [FFmpeg-devel] [PATCH 17/17] swscale/graph: allow experimental use of new format handler Niklas Haas 2025-04-26 22:22 ` [FFmpeg-devel] [PATCH 00/17] swscale v2: new framework [RFC] Niklas Haas 2025-05-02 17:51 ` Niklas Haas 2025-05-16 11:09 ` Niklas Haas 2025-05-16 14:32 ` Ramiro Polla 2025-05-16 14:39 ` Niklas Haas 2025-05-16 15:44 ` Ramiro Polla
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20250426175603.726924-1-ffmpeg@haasn.xyz \ --to=ffmpeg@haasn.xyz \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git