Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
From: "Martin Storsjö" <martin@martin.st>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH 4/4] swscale/aarch64: add neon {lum, chr}ConvertRange
Date: Mon, 10 Jun 2024 14:56:00 +0300 (EEST)
Message-ID: <606a3e3d-c38c-c4a-c6d1-929e2dfa79dc@martin.st> (raw)
In-Reply-To: <20240607140543.130761-4-ramiro.polla@gmail.com>

On Fri, 7 Jun 2024, Ramiro Polla wrote:

> chrRangeFromJpeg_8_c: 28.5
> chrRangeFromJpeg_8_neon: 21.2
> chrRangeFromJpeg_24_c: 81.2
> chrRangeFromJpeg_24_neon: 34.7
> chrRangeFromJpeg_128_c: 425.2
> chrRangeFromJpeg_128_neon: 162.0
> chrRangeFromJpeg_144_c: 480.2
> chrRangeFromJpeg_144_neon: 180.2
> chrRangeFromJpeg_256_c: 838.2
> chrRangeFromJpeg_256_neon: 318.0
> chrRangeFromJpeg_512_c: 1698.2
> chrRangeFromJpeg_512_neon: 630.0
> chrRangeToJpeg_8_c: 56.0
> chrRangeToJpeg_8_neon: 23.5
> chrRangeToJpeg_24_c: 147.7
> chrRangeToJpeg_24_neon: 38.2
> chrRangeToJpeg_128_c: 760.2
> chrRangeToJpeg_128_neon: 182.5
> chrRangeToJpeg_144_c: 857.7
> chrRangeToJpeg_144_neon: 204.5
> chrRangeToJpeg_256_c: 1504.2
> chrRangeToJpeg_256_neon: 358.5
> chrRangeToJpeg_512_c: 3025.7
> chrRangeToJpeg_512_neon: 710.5
> lumRangeFromJpeg_8_c: 24.0
> lumRangeFromJpeg_8_neon: 18.2
> lumRangeFromJpeg_24_c: 64.0
> lumRangeFromJpeg_24_neon: 22.2
> lumRangeFromJpeg_128_c: 289.2
> lumRangeFromJpeg_128_neon: 79.2
> lumRangeFromJpeg_144_c: 334.7
> lumRangeFromJpeg_144_neon: 87.7
> lumRangeFromJpeg_256_c: 579.5
> lumRangeFromJpeg_256_neon: 152.0
> lumRangeFromJpeg_512_c: 1208.0
> lumRangeFromJpeg_512_neon: 299.0
> lumRangeToJpeg_8_c: 30.0
> lumRangeToJpeg_8_neon: 19.0
> lumRangeToJpeg_24_c: 82.2
> lumRangeToJpeg_24_neon: 24.0
> lumRangeToJpeg_128_c: 440.7
> lumRangeToJpeg_128_neon: 90.5
> lumRangeToJpeg_144_c: 502.0
> lumRangeToJpeg_144_neon: 102.2
> lumRangeToJpeg_256_c: 893.7
> lumRangeToJpeg_256_neon: 178.0
> lumRangeToJpeg_512_c: 1793.7
> lumRangeToJpeg_512_neon: 355.0
> ---
> libswscale/aarch64/Makefile             |   1 +
> libswscale/aarch64/range_convert_neon.S | 103 ++++++++++++++++++++++++
> libswscale/aarch64/swscale.c            |  21 +++++
> libswscale/swscale_internal.h           |   1 +
> libswscale/utils.c                      |   4 +-
> 5 files changed, 129 insertions(+), 1 deletion(-)
> create mode 100644 libswscale/aarch64/range_convert_neon.S
>
> diff --git a/libswscale/aarch64/Makefile b/libswscale/aarch64/Makefile
> index da1d909561..6923827f82 100644
> --- a/libswscale/aarch64/Makefile
> +++ b/libswscale/aarch64/Makefile
> @@ -4,5 +4,6 @@ OBJS        += aarch64/rgb2rgb.o                \
>
> NEON-OBJS   += aarch64/hscale.o                 \
>                aarch64/output.o                 \
> +               aarch64/range_convert_neon.o     \
>                aarch64/rgb2rgb_neon.o           \
>                aarch64/yuv2rgb_neon.o           \
> diff --git a/libswscale/aarch64/range_convert_neon.S b/libswscale/aarch64/range_convert_neon.S
> new file mode 100644
> index 0000000000..5e104971f0
> --- /dev/null
> +++ b/libswscale/aarch64/range_convert_neon.S
> @@ -0,0 +1,103 @@
> +/*
> + * Copyright (c) 2024 Ramiro Polla
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include "libavutil/aarch64/asm.S"
> +
> +.macro lumConvertRange name max mult offset shift

We usually use commas between the macro arguments here. Apparently it 
doesn't make any difference for any of the tools we support, but it would 
be nice for consistency. (When invoking macros, commas between arguments 
are optional for most platforms, but not when targeting Apple platforms, 
so being strict with consistent use of commas is generally good.)

> +const offset_\name, align=4
> +        .word \offset, \offset, \offset, \offset
> +endconst
> +function ff_\name, export=1
> +.if \max != 0
> +        mov             w3, #\max
> +        dup             v24.8h, w3
> +.endif
> +        mov             w3, #\mult
> +        dup             v25.4s, w3
> +        movrel          x3, offset_\name
> +        ld1             {v26.4s}, [x3]

FWIW, I did see that you were recommended this form, over ld1r, based on 
some microarchitectural performance numbers. However in our preexisting 
assembly, manually pre-splatting vectors like this is unusual I would say. 
I don't have a strong opinion on the matter though.

Anyway, the assembly looks reasonable to me.

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

  reply	other threads:[~2024-06-10 11:56 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-07 14:05 [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile Ramiro Polla
2024-06-07 14:05 ` [FFmpeg-devel] [PATCH 2/4] checkasm: add tests for {lum, chr}ConvertRange Ramiro Polla
2024-06-07 14:05 ` [FFmpeg-devel] [PATCH 3/4] swscale/x86: add sse4 " Ramiro Polla
2024-06-07 17:38   ` Ramiro Polla
2024-06-07 14:05 ` [FFmpeg-devel] [PATCH 4/4] swscale/aarch64: add neon " Ramiro Polla
2024-06-10 11:56   ` Martin Storsjö [this message]
2024-06-11 12:33     ` Ramiro Polla
2024-06-07 18:45 ` [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile Andreas Rheinhardt
2024-06-07 19:09   ` Ramiro Polla
2024-06-07 19:12     ` Andreas Rheinhardt
2024-06-07 19:47       ` Ramiro Polla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=606a3e3d-c38c-c4a-c6d1-929e2dfa79dc@martin.st \
    --to=martin@martin.st \
    --cc=ffmpeg-devel@ffmpeg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git