Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
From: "Martin Storsjö" <martin@martin.st>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Ben Avison <bavison@riscosopen.org>
Subject: Re: [FFmpeg-devel] [PATCH 08/10] avcodec/idctdsp: Arm 64-bit NEON block add and clamp fast paths
Date: Wed, 30 Mar 2022 17:14:37 +0300 (EEST)
Message-ID: <83954566-3ee5-532e-2ff3-f726d9ab52cc@martin.st> (raw)
In-Reply-To: <20220325185257.513933-9-bavison@riscosopen.org>

On Fri, 25 Mar 2022, Ben Avison wrote:

> checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows.
>
> idctdsp.add_pixels_clamped_c: 323.0
> idctdsp.add_pixels_clamped_neon: 41.5
> idctdsp.put_pixels_clamped_c: 243.0
> idctdsp.put_pixels_clamped_neon: 30.0
> idctdsp.put_signed_pixels_clamped_c: 225.7
> idctdsp.put_signed_pixels_clamped_neon: 37.7
>
> Signed-off-by: Ben Avison <bavison@riscosopen.org>
> ---
> libavcodec/aarch64/Makefile               |   3 +-
> libavcodec/aarch64/idctdsp_init_aarch64.c |  26 +++--
> libavcodec/aarch64/idctdsp_neon.S         | 130 ++++++++++++++++++++++
> 3 files changed, 150 insertions(+), 9 deletions(-)
> create mode 100644 libavcodec/aarch64/idctdsp_neon.S

Generally LGTM

> +// Clamp 16-bit signed block coefficients to signed 8-bit (biased by 128)
> +// On entry:
> +//   x0 -> array of 64x 16-bit coefficients
> +//   x1 -> 8-bit results
> +//   x2 = row stride for results, bytes
> +function ff_put_signed_pixels_clamped_neon, export=1
> +        ld1             {v0.16b, v1.16b, v2.16b, v3.16b}, [x0], #64
> +        movi            v4.8b, #128
> +        ld1             {v16.16b, v17.16b, v18.16b, v19.16b}, [x0]
> +        sqxtn           v0.8b, v0.8h
> +        sqxtn           v1.8b, v1.8h
> +        sqxtn           v2.8b, v2.8h
> +        sqxtn           v3.8b, v3.8h
> +        sqxtn           v5.8b, v16.8h
> +        add             v0.8b, v0.8b, v4.8b

Here you could save 4 add instructions with sqxtn2 and adding .16b 
vectors, but I'm not sure if it's wortwhile. (It reduces the checkasm 
numbers by 0.7 for Cortex A72, by 0.3 for A73, but increases the runtime 
by 1.0 on A53.) Stranegely enough, I get much smaller numbers on my A72 
than you got. I get these:

idctdsp.add_pixels_clamped_c: 306.7
idctdsp.add_pixels_clamped_neon: 25.7
idctdsp.put_pixels_clamped_c: 217.2
idctdsp.put_pixels_clamped_neon: 15.2
idctdsp.put_signed_pixels_clamped_c: 216.7
idctdsp.put_signed_pixels_clamped_neon: 19.2

(The _c numbers are of course highly compiler dependent, but the assembly 
numbers should generally match quite closely. And AFAIK they should be 
measured in clock cycles, so CPU frequency shouldn't really play a role 
either.)

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

  reply	other threads:[~2022-03-30 14:14 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-17 18:58 [FFmpeg-devel] [PATCH 0/6] avcodec/vc1: Arm optimisations Ben Avison
2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 1/6] avcodec/vc1: Arm 64-bit NEON deblocking filter fast paths Ben Avison
2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 2/6] avcodec/vc1: Arm 32-bit " Ben Avison
2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 3/6] avcodec/vc1: Arm 64-bit NEON inverse transform " Ben Avison
2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 4/6] avcodec/idctdsp: Arm 64-bit NEON block add and clamp " Ben Avison
2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 5/6] avcodec/blockdsp: Arm 64-bit NEON block clear " Ben Avison
2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 6/6] avcodec/vc1: Introduce fast path for unescaping bitstream buffer Ben Avison
2022-03-18 19:10   ` Andreas Rheinhardt
2022-03-21 15:51     ` Ben Avison
2022-03-21 20:44       ` Martin Storsjö
2022-03-19 23:06 ` [FFmpeg-devel] [PATCH 0/6] avcodec/vc1: Arm optimisations Martin Storsjö
2022-03-19 23:07   ` Martin Storsjö
2022-03-21 17:37   ` Ben Avison
2022-03-21 22:29     ` Martin Storsjö
2022-03-25 18:52 ` [FFmpeg-devel] [PATCH v2 00/10] " Ben Avison
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 01/10] checkasm: Add vc1dsp in-loop deblocking filter tests Ben Avison
2022-03-25 22:53     ` Martin Storsjö
2022-03-28 18:28       ` Ben Avison
2022-03-29 11:47         ` Martin Storsjö
2022-03-29 12:24     ` Martin Storsjö
2022-03-29 12:43     ` Martin Storsjö
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 02/10] checkasm: Add vc1dsp inverse transform tests Ben Avison
2022-03-29 12:41     ` Martin Storsjö
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 03/10] checkasm: Add idctdsp add/put-pixels-clamped tests Ben Avison
2022-03-29 13:13     ` Martin Storsjö
2022-03-29 19:56       ` Martin Storsjö
2022-03-29 20:22       ` Ben Avison
2022-03-29 20:30         ` Martin Storsjö
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 04/10] avcodec/vc1: Introduce fast path for unescaping bitstream buffer Ben Avison
2022-03-29 20:37     ` Martin Storsjö
2022-03-31 13:58       ` Ben Avison
2022-03-31 14:07         ` Martin Storsjö
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 05/10] avcodec/vc1: Arm 64-bit NEON deblocking filter fast paths Ben Avison
2022-03-30 12:35     ` Martin Storsjö
2022-03-31 15:15       ` Ben Avison
2022-03-31 21:21         ` Martin Storsjö
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 06/10] avcodec/vc1: Arm 32-bit " Ben Avison
2022-03-25 19:27     ` Lynne
2022-03-25 19:49       ` Martin Storsjö
2022-03-25 19:55         ` Lynne
2022-03-30 12:37     ` Martin Storsjö
2022-03-30 13:03     ` Martin Storsjö
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform " Ben Avison
2022-03-30 13:49     ` Martin Storsjö
2022-03-30 14:01       ` Martin Storsjö
2022-03-31 15:37       ` Ben Avison
2022-03-31 21:32         ` Martin Storsjö
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 08/10] avcodec/idctdsp: Arm 64-bit NEON block add and clamp " Ben Avison
2022-03-30 14:14     ` Martin Storsjö [this message]
2022-03-31 16:47       ` Ben Avison
2022-03-31 21:42         ` Martin Storsjö
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 09/10] avcodec/vc1: Arm 64-bit NEON unescape fast path Ben Avison
2022-03-30 14:35     ` Martin Storsjö
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 10/10] avcodec/vc1: Arm 32-bit " Ben Avison
2022-03-30 14:35     ` Martin Storsjö

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83954566-3ee5-532e-2ff3-f726d9ab52cc@martin.st \
    --to=martin@martin.st \
    --cc=bavison@riscosopen.org \
    --cc=ffmpeg-devel@ffmpeg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git