Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
From: "Martin Storsjö" <martin@martin.st>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Dmitriy Kovalenko <dmtr.kovalenko@outlook.com>
Subject: Re: [FFmpeg-devel] [PATCH v4 1/2] swscale: rgb_to_yuv neon optimizations
Date: Fri, 30 May 2025 12:07:26 +0300 (EEST)
Message-ID: <544ef49a-b717-2175-1216-a23c63bf4fd8@martin.st> (raw)
In-Reply-To: <DBAP193MB0956BC2660ECBF6F17B644E88D61A@DBAP193MB0956.EURP193.PROD.OUTLOOK.COM>

On Fri, 30 May 2025, Dmitriy Kovalenko wrote:

>> If you with "non-performant mobile" mean small in-order cores, most of them can handle repeated accumulation like these even faster, if you sequence these so that all accumulations to one register is sequentially. E.g. first all "smlal \u_dst1\().4s", followed by all "smlal \u_dst2\().4s", followed by \v_dst1, followed by \v_dst2. It's worth benchmarking if you do have access to such cores (e.g. Cortex-A53/A55; perhaps that's also the case on the Cortex-R you mentioned in the commit message).
>
> I mean generally mobile first CPUs. But I just verified even on macbook
> pro interleaving instruction per the component does not enable IRL

What does "does not enable IRL" mean?

> and but having a "hot-register" being multipled several times in 
> parallel gives a difference. Here is checask results from macbook w/ my 
> and interleaved by r/g/b component version

I'm sorry but it is very hard to interpret what you're saying here; what 
is the first and second measurement?

In any case; now with this version of the patchset which actually does 
compile and pass checkasm om linux, I tested reordering 
rgb_to_uv_interleaved_product in the way I suggested, like this:

         smlal           \u_dst1\().4s
         smlal           \u_dst1\().4s
         smlal           \u_dst1\().4s
         smlal2          \u_dst2\().4s
         smlal2          \u_dst2\().4s
         smlal2          \u_dst2\().4s
         smlal           \v_dst1\().4s
         smlal           \v_dst1\().4s
         smlal           \v_dst1\().4s
         smlal2          \v_dst2\().4s
         smlal2          \v_dst2\().4s
         smlal2          \v_dst2\().4s

Such accumulation orders can sometimes give significant speedups on 
in-order cores like Cortex A53 and A55. In this case it didn't make any 
difference, so the there's no need to investigate it further.

>> Does this make any practical difference, as we're just storing the 
>> lower 32 bits anyway?
>
> Not really but I found it quite confusing at first becuase it looks like
> this instruction will imply narrowing, but looking into the w13 / w13 is
> much more clear what is going on.

If it doesn't make any difference, then don't change it. The fewer changes 
in a patch, the easier it is to accept the patch. Especially if you are 
optimizing code, don't include unrelated changes in the same patch. If you 
feel strongly that it should be changed for readability/understandability 
reasons, then factor out that change to a separate patch.

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

      reply	other threads:[~2025-05-30  9:07 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-30  8:40 Dmitriy Kovalenko
2025-05-30  9:07 ` Martin Storsjö [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=544ef49a-b717-2175-1216-a23c63bf4fd8@martin.st \
    --to=martin@martin.st \
    --cc=dmtr.kovalenko@outlook.com \
    --cc=ffmpeg-devel@ffmpeg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git