From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 382F94C22F for ; Fri, 2 May 2025 17:51:23 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 36B5968B172; Fri, 2 May 2025 20:51:18 +0300 (EEST) Received: from haasn.dev (haasn.dev [78.46.187.166]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B3724689F44 for ; Fri, 2 May 2025 20:51:11 +0300 (EEST) Received: from haasn.dev (unknown [10.30.1.1]) by haasn.dev (Postfix) with UTF8SMTP id 7D8BF40717 for ; Fri, 2 May 2025 19:51:11 +0200 (CEST) Date: Fri, 2 May 2025 19:51:11 +0200 Message-ID: <20250502195111.GB219301@haasn.xyz> From: Niklas Haas To: ffmpeg-devel@ffmpeg.org In-Reply-To: <20250426175603.726924-1-ffmpeg@haasn.xyz> References: <20250426175603.726924-1-ffmpeg@haasn.xyz> MIME-Version: 1.0 Content-Disposition: inline Subject: Re: [FFmpeg-devel] [PATCH 00/17] swscale v2: new framework [RFC] X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Sat, 26 Apr 2025 19:41:04 +0200 Niklas Haas wrote: > Hi all, > > After extensive amounts of refactoring and iteration on the design and API, > and the implementation of an x86 SIMD backend, I'm happy to present the > revised version of my ongoing swscale rewrite. Now with 100% less reliance on > compiler autovectorization. > > As before, I recommend (re)reading the design document to understand the > motivation, structure and implementation details of this rewrite. At this > point, I expect the major API and internal organization decisions to remain > stable. > > I will preface with some benchmark figures, on my (new) AMD Ryzen 9 9950X3D: > > All formats: > - single thread: Overall speedup=2.109x faster, min=0.018x max=40.309x > - multi thread: Overall speedup=2.607x faster, min=0.112x max=254.738x > > "Common" formats: (referenced >100 times in FFmpeg source code) > - single thread: Overall speedup=2.797x faster, min=0.408x max=16.514x > - multi thread: Overall speedup=2.870x faster, min=0.715x max=21.983x Small update: I noticed that one code path was accidentally not enabled. I also implemented asm for the remaining bit-packed formats. After those two changes, the new numbers are: All formats: - single thread: Overall speedup=4.247x faster, min=0.177x max=224.809x - multi thread: Overall speedup=4.000x faster, min=0.256x max=968.725x "Common" formats: - single thread: Overall speedup=3.174x faster, min=0.596x max=12.616x - multi thread: Overall speedup=3.005x faster, min=0.617x max=14.739x > > However, the main goal of this rewrite is not to improve performance, but to > improve the maintainability, extensibility and correctness of the code. Most of > the slowdowns for "common" formats are due to increased correctness (e.g. > accurate rounding and dithering), and not the result of a regression per se. > > All of the remaining slowdowns (notably, the 0.1x cases) are due to incomplete > coverage of the x86 SIMD. Notably, this currently affects bit packed formats > (e.g. rgb8, rgb4). (I also did not yet incorporate any AVX-512 code, which > some of the existing routines take advantage of) > > While I will continue working on this and expanding coverage to all remaining > operations, I felt that now is a good point in time to get some code review > and feedback regardless. I would especially appreciate code review of the x86 > SIMD code inside libswscale/x86/ops_*.asm, as this is my first time writing > x86 assembly code. > > doc/APIchanges | 3 + > doc/scaler.texi | 3 + > doc/swscale-v2.txt | 344 +++++++++++++++++++++++++++ > libswscale/Makefile | 9 + > libswscale/format.c | 945 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- > libswscale/format.h | 29 ++- > libswscale/graph.c | 151 ++++++++---- > libswscale/graph.h | 37 ++- > libswscale/ops.c | 850 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > libswscale/ops.h | 263 +++++++++++++++++++++ > libswscale/ops_backend.c | 101 ++++++++ > libswscale/ops_backend.h | 181 ++++++++++++++ > libswscale/ops_chain.c | 291 +++++++++++++++++++++++ > libswscale/ops_chain.h | 108 +++++++++ > libswscale/ops_internal.h | 103 ++++++++ > libswscale/ops_optimizer.c | 810 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > libswscale/ops_tmpl_common.c | 176 ++++++++++++++ > libswscale/ops_tmpl_float.c | 255 ++++++++++++++++++++ > libswscale/ops_tmpl_int.c | 609 +++++++++++++++++++++++++++++++++++++++++++++++ > libswscale/options.c | 1 + > libswscale/swscale.h | 7 + > libswscale/tests/swscale.c | 11 +- > libswscale/version.h | 2 +- > libswscale/x86/Makefile | 3 + > libswscale/x86/ops.c | 735 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > libswscale/x86/ops_common.asm | 208 ++++++++++++++++ > libswscale/x86/ops_float.asm | 376 +++++++++++++++++++++++++++++ > libswscale/x86/ops_int.asm | 882 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > tests/checkasm/Makefile | 8 +- > tests/checkasm/checkasm.c | 4 +- > tests/checkasm/checkasm.h | 26 +- > tests/checkasm/sw_ops.c | 748 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 32 files changed, 8206 insertions(+), 73 deletions(-) > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".