Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
From: "Martin Storsjö" <martin@martin.st>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [RFC] New swscale internal design prototype
Date: Wed, 12 Mar 2025 09:15:22 +0200 (EET)
Message-ID: <a0321e3-acb5-8d72-755c-8e74824e1f9a@martin.st> (raw)
In-Reply-To: <20250312005651.GC800168@haasn.xyz>

On Wed, 12 Mar 2025, Niklas Haas wrote:

> On Sun, 09 Mar 2025 20:45:23 +0100 Niklas Haas <ffmpeg@haasn.xyz> wrote:
>> On Sun, 09 Mar 2025 18:11:54 +0200 Martin Storsjö <martin@martin.st> wrote:
>> > On Sat, 8 Mar 2025, Niklas Haas wrote:
>> > 
>> > > What are the thoughts on the float-first approach?
>> > 
>> > In general, for modern architectures, relying on floats probably is 
>> > reasonable. (On architectures that aren't of quite as widespread interest, 
>> > it might not be so clear cut though.)
>> > 
>> > However with the benchmark example you provided a couple of weeks ago, we 
>> > concluded that even on x86 on modern HW, floats were faster than int16 
>> > only in one case: When using Clang, not GCC, and when compiling with 
>> > -mavx2, not without it. In all the other cases, int16 was faster than 
>> > float.
>> 
>> Hi Martin,
>> 
>> I should preface that this particular benchmark was a very specific test for
>> floating point *filtering*, which is considerably more punishing than the
>> conversion pipeline I have implemented here, and I think it's partly the
>> fault of compilers generating very unoptimal filtering code.
>> 
>> I think it would be better to re-assess using the current prototype on actual
>> hardware. I threw up a quick NEON test branch: (untested, should hopefully work)
>> https://github.com/haasn/FFmpeg/commits/swscale3-neon
>> 
>> # adjust the benchmark iters count as needed based on the HW perf
>> make libswscale/tests/swscale && libswscale/tests/swscale -unscaled 1 -bench 50
>> 
>> If this differs significantly from the ~1.8x speedup I measure on x86, I
>> will be far more concerned about the new approach.

Sorry, I haven't had time to try this out myself yet...

> I gave it a try. So, the result of a naive/blind run on a Cortex-X1 using clang
> version 20.0.0 (from the latest Android NDK v29) is:
>
> Overall speedup=1.688x faster, min=0.141x max=45.898x
>
> This has quite a lot more significant speed regressions compared to x86 though.
>
> In particular, clang/LLVM refuses to vectorize packed reads of 2 or 3 elements,
> so any sort of operation involving rgb24 or bgr24 suffers horribly:

So, if the performance of this relies on compiler autovectorization, 
what's the plan wrt GCC? We blanket disable autovectorization when 
compiling with GCC - see fd6dbc53855fbfc9a782095d0ffe11dd3a98905f for when 
it was disabled last time. Building and running fate with 
autovectorization in GCC does succeed at least on modern GCC on x86_64, 
but it's of course possible that it still can cause issues in various more 
tricky configurations.

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

  reply	other threads:[~2025-03-12  7:15 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-08 22:53 Niklas Haas
2025-03-09 16:11 ` Martin Storsjö
2025-03-09 19:45   ` Niklas Haas
2025-03-11 23:56     ` Niklas Haas
2025-03-12  7:15       ` Martin Storsjö [this message]
2025-03-12 11:27         ` Niklas Haas
2025-03-09 18:18 ` Rémi Denis-Courmont
2025-03-09 19:57   ` Niklas Haas
2025-03-10  0:57     ` Rémi Denis-Courmont
2025-03-10 13:14       ` Niklas Haas
2025-03-12  0:58         ` Rémi Denis-Courmont
2025-03-09 19:41 ` Michael Niedermayer
2025-03-09 21:13 ` Niklas Haas
2025-03-09 21:28   ` Niklas Haas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a0321e3-acb5-8d72-755c-8e74824e1f9a@martin.st \
    --to=martin@martin.st \
    --cc=ffmpeg-devel@ffmpeg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git