Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
From: "Martin Storsjö" <martin@martin.st>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Dmitriy Kovalenko <dmtr.kovalenko@outlook.com>
Subject: Re: [FFmpeg-devel] [PATCH 1/2] swscale: rgb_to_yuv neon optimizations
Date: Thu, 5 Jun 2025 15:00:44 +0300 (EEST)
Message-ID: <a9af9ffb-be49-4f35-79e5-edb589a1d19b@martin.st> (raw)
In-Reply-To: <DBAP193MB0956F9E8C72D5F6E84578A7E8D60A@DBAP193MB0956.EURP193.PROD.OUTLOOK.COM>

On Sat, 31 May 2025, Dmitriy Kovalenko wrote:

> I've found quite a few ways to optimize existing ffmpeg's rgb to yuv
> subsampled conversion. In this patch stack I'll try to
> improve the perofrmance.
>
> This particular set of changes is a small improvement to all the
> existing functions and macro. The biggest performance gain is
> coming from post loading increment of the pointer and immediate
> ~~prefetching of the memory blocks~~(was moved to the next patch in the stack) and interleaving the multiplication shifting operations of
> different registers for better scheduling.

Why keep the mention of prefetching here, when it no longer is included in 
the patch at all? This is what you suggest is encoded as the final, 
immutable commit message describing this change.

I have further inline comments below, please read them all.

> Also changed a bunch of places where cmp + b.le was used instead
> of one instruction cbnz/tbnz and some other small cleanups.
>
> Here are checkasm results on the macbook pro with the latest M4 max
>
> <before>
>
> bgra_to_uv_1080_c:                                     257.5 ( 1.00x)
> bgra_to_uv_1080_neon:                                  211.9 ( 1.22x)
> bgra_to_uv_1920_c:                                     467.1 ( 1.00x)
> bgra_to_uv_1920_neon:                                  379.3 ( 1.23x)
> bgra_to_uv_half_1080_c:                                198.9 ( 1.00x)
> bgra_to_uv_half_1080_neon:                             125.7 ( 1.58x)
> bgra_to_uv_half_1920_c:                                346.3 ( 1.00x)
> bgra_to_uv_half_1920_neon:                             223.7 ( 1.55x)
>
> <after>
>
> bgra_to_uv_1080_c:                                     268.3 ( 1.00x)
> bgra_to_uv_1080_neon:                                  176.0 ( 1.53x)
> bgra_to_uv_1920_c:                                     456.6 ( 1.00x)
> bgra_to_uv_1920_neon:                                  307.7 ( 1.48x)
> bgra_to_uv_half_1080_c:                                193.2 ( 1.00x)
> bgra_to_uv_half_1080_neon:                              96.8 ( 2.00x)
> bgra_to_uv_half_1920_c:                                347.2 ( 1.00x)
> bgra_to_uv_half_1920_neon:                             182.6 ( 1.92x)
>
> With my proprietary test on IOS it gives around 70% of performance
> improvement converting bgra 1920x1920 image to yuv420p
>
> On my linux arm cortex-r processing the performance improvement not that
> visible but still consistently faster by 5-10% than the current
> implementation.
> ---
> libswscale/aarch64/input.S | 143 +++++++++++++++++++++++--------------
> 1 file changed, 91 insertions(+), 52 deletions(-)

> @@ -292,7 +330,7 @@ function ff_\fmt_rgb\()ToUV_neon, export=1
>         smaddl          x8, w16, w10, x9        // x8 = ru * r + const_offset
>         smaddl          x8, w17, w11, x8        // x8 += gu * g
>         smaddl          x8, w4, w12, x8         // x8 += bu * b
> -        asr             w8, w8, #9              // x8 >>= 9
> +        asr             x8, x8, #9              // x8 >>= 9
>         strh            w8, [x0], #2            // store to dst_u
>

Here you _still_ have one instance of these unrelated changes left in your 
patch.

>         smaddl          x8, w16, w13, x9        // x8 = rv * r + const_offset
> @@ -401,3 +439,4 @@ endfunc
>
> DISABLE_DOTPROD
> #endif
> +
> --

Here you are adding one unrelated empty line at the end of the file. Don't 
include any unrelated changes in your patches.

Before sending a patch, do review it yourself first, checking for any such 
unrelated stray changes.

Other than those details, the rest of the patch looks ok.

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

  reply	other threads:[~2025-06-05 12:00 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20250531091631.45342-1-dmtr.kovalenko@outlook.com>
2025-05-31  9:11 ` Dmitriy Kovalenko
2025-06-05 12:00   ` Martin Storsjö [this message]
2025-05-31  9:11 ` [FFmpeg-devel] [PATCH 2/2] swscale: Neon rgb_to_yuv_half process 32 pixels at a time Dmitriy Kovalenko
2025-05-31 10:32   ` Kieran Kunhya via ffmpeg-devel
2025-05-31 10:43     ` Dmitriy Kovalenko
2025-05-31 12:13       ` Martin Storsjö
2025-05-31 12:21         ` Dmitriy Kovalenko
2025-06-05 12:13   ` Martin Storsjö
2025-05-27 16:57 [FFmpeg-devel] [PATCH 1/2] swscale: rgb_to_yuv neon optimizations Dmitriy Kovalenko
2025-05-29 18:53 ` Martin Storsjö
2025-05-29 21:38   ` Dmitriy Kovalenko
2025-05-30  7:09     ` Martin Storsjö
2025-05-30  7:18       ` Dmitriy Kovalenko
2025-05-30  7:22         ` Martin Storsjö
2025-05-30  7:07   ` Martin Storsjö

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a9af9ffb-be49-4f35-79e5-edb589a1d19b@martin.st \
    --to=martin@martin.st \
    --cc=dmtr.kovalenko@outlook.com \
    --cc=ffmpeg-devel@ffmpeg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git