From: "Martin Storsjö" <martin@martin.st>
To: Krzysztof Pyrkosz via ffmpeg-devel <ffmpeg-devel@ffmpeg.org>
Cc: Krzysztof Pyrkosz <ffmpeg@szaka.eu>
Subject: Re: [FFmpeg-devel] [PATCH] swscale/aarch64: dotprod implementation of rgba32_to_Y
Date: Sun, 2 Mar 2025 00:55:55 +0200 (EET)
Message-ID: <eb9b4df7-2eaf-df0-33a1-cf718949be9@martin.st> (raw)
In-Reply-To: <20250227224454.6776-2-ffmpeg@szaka.eu>
On Thu, 27 Feb 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:
> ---
> I was curious whether it's possible to implement this function without
> any widening, and it turns out it not only is, but it's quite
> performant at the same time!
>
> The idea is to split the 16 bit coefficients into lower and upper half,
> invoke udot for the lower half, shift by 8, and follow by udot for the
> upper half. The code is based upon existing version.
As in the others; this explanation and the benchmarks are valuable to keep
even after committing it, so please include it in the permanent commit
message part above "---".
> Benchmark on A78:
> bgra_to_y_128_c: 682.0 ( 1.00x)
> bgra_to_y_128_neon: 181.2 ( 3.76x)
> bgra_to_y_128_dotprod: 117.8 ( 5.79x)
> bgra_to_y_1080_c: 5742.5 ( 1.00x)
> bgra_to_y_1080_neon: 1472.5 ( 3.90x)
> bgra_to_y_1080_dotprod: 906.5 ( 6.33x)
> bgra_to_y_1920_c: 10194.0 ( 1.00x)
> bgra_to_y_1920_neon: 2589.8 ( 3.94x)
> bgra_to_y_1920_dotprod: 1573.8 ( 6.48x)
>
> Krzysztof
>
> libswscale/aarch64/input.S | 88 ++++++++++++++++++++++++++++++++++++
> libswscale/aarch64/swscale.c | 17 +++++++
> 2 files changed, 105 insertions(+)
>
> diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S
> index 5cb18711fb..5fe6c3f6f5 100644
> --- a/libswscale/aarch64/input.S
> +++ b/libswscale/aarch64/input.S
> @@ -313,3 +313,91 @@ rgbToUV_neon bgr24, rgb24, element=3
> rgbToUV_neon bgra32, rgba32, element=4
>
> rgbToUV_neon abgr32, argb32, element=4, alpha_first=1
> +
> +#if HAVE_DOTPROD
> +ENABLE_DOTPROD
> +
> +function ff_bgra32ToY_neon_dotprod, export=1
> + cmp w4, #0 // check width > 0
> + ldp w12, w11, [x5] // w12: ry, w11: gy
> + ldr w10, [x5, #8] // w10: by
> + b.gt 4f
> + ret
> +endfunc
> +
> +function ff_rgba32ToY_neon_dotprod, export=1
> + cmp w4, #0 // check width > 0
> + ldp w10, w11, [x5] // w10: ry, w11: gy
> + ldr w12, [x5, #8] // w12: by
> + b.le 3f
> +4:
> + mov w9, #256 // w9 = 1 << (RGB2YUV_SHIFT - 7)
> + movk w9, #8, lsl #16 // w9 += 32 << (RGB2YUV_SHIFT - 1)
> + dup v6.4s, w9 // w9: const_offset
> +
> + cmp w4, #16
> + mov w7, w10
> + bfi w7, w11, 8, 8
> + bfi w7, w12, 16, 8
These bfi instructions are quite esoteric; it'd probably be good to add
some comments to explain what you do here.
> + dup v0.4s, w7
> +
> + lsr w6, w10, #8
> + lsr w7, w11, #8
> + lsr w8, w12, #8
> +
> + bfi w6, w7, 8, 8
> + bfi w6, w8, 16, 8
> + dup v1.4s, w6
> + b.lt 2f
> +1:
> + ld1 { v16.16b, v17.16b, v18.16b, v19.16b }, [x1], #64
> + sub w4, w4, #16 // width -= 16
> + cmp w4, #16 // width >= 16 ?
The cmp could be moved e.g. below the mov
Other than that, this patch looks really good to me, thanks!
And while swscale is being rewritten elsewhere, adding this function
shouldn't make the transition to a rewrite any harder, so I don't see any
problem with adding this in the meantime.
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2025-03-01 22:57 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-27 22:44 Krzysztof Pyrkosz via ffmpeg-devel
2025-02-28 2:31 ` Zhao Zhili
2025-02-28 10:21 ` Niklas Haas
2025-02-28 10:43 ` Martin Storsjö
2025-02-28 10:49 ` Andreas Rheinhardt
2025-02-28 11:32 ` Niklas Haas
2025-03-01 22:55 ` Martin Storsjö [this message]
2025-03-03 21:00 ` [FFmpeg-devel] [PATCH v2] " Krzysztof Pyrkosz via ffmpeg-devel
2025-03-04 8:27 ` Martin Storsjö
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=eb9b4df7-2eaf-df0-33a1-cf718949be9@martin.st \
--to=martin@martin.st \
--cc=ffmpeg-devel@ffmpeg.org \
--cc=ffmpeg@szaka.eu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git