From: Niklas Haas <ffmpeg@haasn.xyz> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Subject: Re: [FFmpeg-devel] [RFC] New swscale internal design prototype Date: Wed, 12 Mar 2025 12:27:02 +0100 Message-ID: <20250312122702.GB6345@haasn.xyz> (raw) In-Reply-To: <a0321e3-acb5-8d72-755c-8e74824e1f9a@martin.st> On Wed, 12 Mar 2025 09:15:22 +0200 Martin Storsjö <martin@martin.st> wrote: > On Wed, 12 Mar 2025, Niklas Haas wrote: > > > On Sun, 09 Mar 2025 20:45:23 +0100 Niklas Haas <ffmpeg@haasn.xyz> wrote: > >> On Sun, 09 Mar 2025 18:11:54 +0200 Martin Storsjö <martin@martin.st> wrote: > >> > On Sat, 8 Mar 2025, Niklas Haas wrote: > >> > > >> > > What are the thoughts on the float-first approach? > >> > > >> > In general, for modern architectures, relying on floats probably is > >> > reasonable. (On architectures that aren't of quite as widespread interest, > >> > it might not be so clear cut though.) > >> > > >> > However with the benchmark example you provided a couple of weeks ago, we > >> > concluded that even on x86 on modern HW, floats were faster than int16 > >> > only in one case: When using Clang, not GCC, and when compiling with > >> > -mavx2, not without it. In all the other cases, int16 was faster than > >> > float. > >> > >> Hi Martin, > >> > >> I should preface that this particular benchmark was a very specific test for > >> floating point *filtering*, which is considerably more punishing than the > >> conversion pipeline I have implemented here, and I think it's partly the > >> fault of compilers generating very unoptimal filtering code. > >> > >> I think it would be better to re-assess using the current prototype on actual > >> hardware. I threw up a quick NEON test branch: (untested, should hopefully work) > >> https://github.com/haasn/FFmpeg/commits/swscale3-neon > >> > >> # adjust the benchmark iters count as needed based on the HW perf > >> make libswscale/tests/swscale && libswscale/tests/swscale -unscaled 1 -bench 50 > >> > >> If this differs significantly from the ~1.8x speedup I measure on x86, I > >> will be far more concerned about the new approach. > > Sorry, I haven't had time to try this out myself yet... No worries. I think I have gathered enough performance figures myself to come to the conclusion that this approach won't work unmodified - not because of the usage of floats so much as the fact that the load/store overhead is sufficiently expensive in very simple scenarios to the point where it outweighs the benefits. I think my plan for now is the following: 1. Delete all of the "optimized" variants of the C templates, and keep only the general purpose base case merely as a fallback / reference code. 2. Instead, make the architectural split at a higher level; and allow arch- specific implementations to choose their own preferred chunk size, or even do something wildly different like runtime code generation or custom calling conventions. 3. Merge the new code for now guarded under an explicit opt in flag so we can continue to develop it alongside the existing approach until arch-specific optimized variants are available and sufficiently fast in _all_ cases. > > > I gave it a try. So, the result of a naive/blind run on a Cortex-X1 using clang > > version 20.0.0 (from the latest Android NDK v29) is: > > > > Overall speedup=1.688x faster, min=0.141x max=45.898x > > > > This has quite a lot more significant speed regressions compared to x86 though. > > > > In particular, clang/LLVM refuses to vectorize packed reads of 2 or 3 elements, > > so any sort of operation involving rgb24 or bgr24 suffers horribly: > > So, if the performance of this relies on compiler autovectorization, > what's the plan wrt GCC? We blanket disable autovectorization when > compiling with GCC - see fd6dbc53855fbfc9a782095d0ffe11dd3a98905f for when > it was disabled last time. Building and running fate with > autovectorization in GCC does succeed at least on modern GCC on x86_64, > but it's of course possible that it still can cause issues in various more > tricky configurations. See https://github.com/haasn/FFmpeg/blob/swscale3/libswscale/ops_internal.h#L28 > > // Martin > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2025-03-12 11:27 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2025-03-08 22:53 Niklas Haas 2025-03-09 16:11 ` Martin Storsjö 2025-03-09 19:45 ` Niklas Haas 2025-03-11 23:56 ` Niklas Haas 2025-03-12 7:15 ` Martin Storsjö 2025-03-12 11:27 ` Niklas Haas [this message] 2025-03-09 18:18 ` Rémi Denis-Courmont 2025-03-09 19:57 ` Niklas Haas 2025-03-10 0:57 ` Rémi Denis-Courmont 2025-03-10 13:14 ` Niklas Haas 2025-03-12 0:58 ` Rémi Denis-Courmont 2025-03-09 19:41 ` Michael Niedermayer 2025-03-09 21:13 ` Niklas Haas 2025-03-09 21:28 ` Niklas Haas
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20250312122702.GB6345@haasn.xyz \ --to=ffmpeg@haasn.xyz \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git