* [FFmpeg-devel] [PATCH v4 1/5] avcodec/ac3: Implement float_to_fixed24 for aarch64 NEON
2024-04-06 14:23 [FFmpeg-devel] [PATCH v4 0/5] avcodec/ac3: Add aarch64 NEON DSP Geoff Hill
@ 2024-04-06 14:25 ` Geoff Hill
2024-04-06 14:25 ` [FFmpeg-devel] [PATCH v4 2/5] avcodec/ac3: Implement ac3_exponent_min " Geoff Hill
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Geoff Hill @ 2024-04-06 14:25 UTC (permalink / raw)
To: ffmpeg-devel
Signed-off-by: Geoff Hill <geoff@geoffhill.org>
---
libavcodec/aarch64/Makefile | 2 ++
libavcodec/aarch64/ac3dsp_init_aarch64.c | 36 ++++++++++++++++++++++++
libavcodec/aarch64/ac3dsp_neon.S | 36 ++++++++++++++++++++++++
libavcodec/ac3dsp.c | 4 ++-
libavcodec/ac3dsp.h | 3 +-
tests/checkasm/ac3dsp.c | 1 +
6 files changed, 80 insertions(+), 2 deletions(-)
create mode 100644 libavcodec/aarch64/ac3dsp_init_aarch64.c
create mode 100644 libavcodec/aarch64/ac3dsp_neon.S
diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
index beb6a02f5f..95ad4dd202 100644
--- a/libavcodec/aarch64/Makefile
+++ b/libavcodec/aarch64/Makefile
@@ -1,4 +1,5 @@
# subsystems
+OBJS-$(CONFIG_AC3DSP) += aarch64/ac3dsp_init_aarch64.o
OBJS-$(CONFIG_FMTCONVERT) += aarch64/fmtconvert_init.o
OBJS-$(CONFIG_H264CHROMA) += aarch64/h264chroma_init_aarch64.o
OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_init_aarch64.o
@@ -35,6 +36,7 @@ ARMV8-OBJS-$(CONFIG_VIDEODSP) += aarch64/videodsp.o
# subsystems
NEON-OBJS-$(CONFIG_AAC_DECODER) += aarch64/sbrdsp_neon.o
+NEON-OBJS-$(CONFIG_AC3DSP) += aarch64/ac3dsp_neon.o
NEON-OBJS-$(CONFIG_FMTCONVERT) += aarch64/fmtconvert_neon.o
NEON-OBJS-$(CONFIG_H264CHROMA) += aarch64/h264cmc_neon.o
NEON-OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_neon.o \
diff --git a/libavcodec/aarch64/ac3dsp_init_aarch64.c b/libavcodec/aarch64/ac3dsp_init_aarch64.c
new file mode 100644
index 0000000000..e3320de0f5
--- /dev/null
+++ b/libavcodec/aarch64/ac3dsp_init_aarch64.c
@@ -0,0 +1,36 @@
+/*
+ * Copyright (c) 2024 Geoff Hill <geoff@geoffhill.org>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <stdint.h>
+
+#include "libavutil/arm/cpu.h"
+#include "libavutil/attributes.h"
+#include "libavcodec/ac3dsp.h"
+#include "config.h"
+
+void ff_float_to_fixed24_neon(int32_t *dst, const float *src, size_t len);
+
+av_cold void ff_ac3dsp_init_aarch64(AC3DSPContext *c)
+{
+ int cpu_flags = av_get_cpu_flags();
+ if (!have_neon(cpu_flags)) return;
+
+ c->float_to_fixed24 = ff_float_to_fixed24_neon;
+}
diff --git a/libavcodec/aarch64/ac3dsp_neon.S b/libavcodec/aarch64/ac3dsp_neon.S
new file mode 100644
index 0000000000..c4d204b51a
--- /dev/null
+++ b/libavcodec/aarch64/ac3dsp_neon.S
@@ -0,0 +1,36 @@
+/*
+ * Copyright (c) 2011 Mans Rullgard <mans@mansr.com>
+ * Copyright (c) 2024 Geoff Hill <geoff@geoffhill.org>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include "libavutil/aarch64/asm.S"
+
+function ff_float_to_fixed24_neon, export=1
+1: ld1 {v0.4s, v1.4s}, [x1], #32
+ fcvtzs v0.4s, v0.4s, #24
+ ld1 {v2.4s, v3.4s}, [x1], #32
+ fcvtzs v1.4s, v1.4s, #24
+ fcvtzs v2.4s, v2.4s, #24
+ st1 {v0.4s, v1.4s}, [x0], #32
+ fcvtzs v3.4s, v3.4s, #24
+ st1 {v2.4s, v3.4s}, [x0], #32
+ subs w2, w2, #16
+ b.ne 1b
+ ret
+endfunc
diff --git a/libavcodec/ac3dsp.c b/libavcodec/ac3dsp.c
index 8397e03d32..730fa70fff 100644
--- a/libavcodec/ac3dsp.c
+++ b/libavcodec/ac3dsp.c
@@ -389,7 +389,9 @@ av_cold void ff_ac3dsp_init(AC3DSPContext *c)
c->downmix = NULL;
c->downmix_fixed = NULL;
-#if ARCH_ARM
+#if ARCH_AARCH64
+ ff_ac3dsp_init_aarch64(c);
+#elif ARCH_ARM
ff_ac3dsp_init_arm(c);
#elif ARCH_X86
ff_ac3dsp_init_x86(c);
diff --git a/libavcodec/ac3dsp.h b/libavcodec/ac3dsp.h
index ae33b361a9..b1b2bced8f 100644
--- a/libavcodec/ac3dsp.h
+++ b/libavcodec/ac3dsp.h
@@ -106,7 +106,8 @@ typedef struct AC3DSPContext {
void (*downmix_fixed)(int32_t **samples, int16_t **matrix, int len);
} AC3DSPContext;
-void ff_ac3dsp_init (AC3DSPContext *c);
+void ff_ac3dsp_init(AC3DSPContext *c);
+void ff_ac3dsp_init_aarch64(AC3DSPContext *c);
void ff_ac3dsp_init_arm(AC3DSPContext *c);
void ff_ac3dsp_init_x86(AC3DSPContext *c);
void ff_ac3dsp_init_mips(AC3DSPContext *c);
diff --git a/tests/checkasm/ac3dsp.c b/tests/checkasm/ac3dsp.c
index 344e1fe5c2..b1064fccb4 100644
--- a/tests/checkasm/ac3dsp.c
+++ b/tests/checkasm/ac3dsp.c
@@ -1,5 +1,6 @@
/*
* Copyright (c) 2023 Institue of Software Chinese Academy of Sciences (ISCAS).
+ * Copyright (c) 2024 Geoff Hill <geoff@geoffhill.org>
*
* This file is part of FFmpeg.
*
--
2.42.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 7+ messages in thread
* [FFmpeg-devel] [PATCH v4 2/5] avcodec/ac3: Implement ac3_exponent_min for aarch64 NEON
2024-04-06 14:23 [FFmpeg-devel] [PATCH v4 0/5] avcodec/ac3: Add aarch64 NEON DSP Geoff Hill
2024-04-06 14:25 ` [FFmpeg-devel] [PATCH v4 1/5] avcodec/ac3: Implement float_to_fixed24 for aarch64 NEON Geoff Hill
@ 2024-04-06 14:25 ` Geoff Hill
2024-04-06 14:26 ` [FFmpeg-devel] [PATCH v4 3/5] avcodec/ac3: Implement ac3_extract_exponents " Geoff Hill
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Geoff Hill @ 2024-04-06 14:25 UTC (permalink / raw)
To: ffmpeg-devel
Signed-off-by: Geoff Hill <geoff@geoffhill.org>
---
libavcodec/aarch64/ac3dsp_init_aarch64.c | 2 ++
libavcodec/aarch64/ac3dsp_neon.S | 16 +++++++++
tests/checkasm/ac3dsp.c | 41 ++++++++++++++++++++++++
3 files changed, 59 insertions(+)
diff --git a/libavcodec/aarch64/ac3dsp_init_aarch64.c b/libavcodec/aarch64/ac3dsp_init_aarch64.c
index e3320de0f5..8874b41393 100644
--- a/libavcodec/aarch64/ac3dsp_init_aarch64.c
+++ b/libavcodec/aarch64/ac3dsp_init_aarch64.c
@@ -25,6 +25,7 @@
#include "libavcodec/ac3dsp.h"
#include "config.h"
+void ff_ac3_exponent_min_neon(uint8_t *exp, int num_reuse_blocks, int nb_coefs);
void ff_float_to_fixed24_neon(int32_t *dst, const float *src, size_t len);
av_cold void ff_ac3dsp_init_aarch64(AC3DSPContext *c)
@@ -32,5 +33,6 @@ av_cold void ff_ac3dsp_init_aarch64(AC3DSPContext *c)
int cpu_flags = av_get_cpu_flags();
if (!have_neon(cpu_flags)) return;
+ c->ac3_exponent_min = ff_ac3_exponent_min_neon;
c->float_to_fixed24 = ff_float_to_fixed24_neon;
}
diff --git a/libavcodec/aarch64/ac3dsp_neon.S b/libavcodec/aarch64/ac3dsp_neon.S
index c4d204b51a..f916c32538 100644
--- a/libavcodec/aarch64/ac3dsp_neon.S
+++ b/libavcodec/aarch64/ac3dsp_neon.S
@@ -21,6 +21,22 @@
#include "libavutil/aarch64/asm.S"
+function ff_ac3_exponent_min_neon, export=1
+ cbz w1, 3f
+1: ld1 {v0.16b}, [x0]
+ mov w3, w1
+ add x4, x0, #256
+2: ld1 {v1.16b}, [x4]
+ umin v0.16b, v0.16b, v1.16b
+ add x4, x4, #256
+ subs w3, w3, #1
+ b.gt 2b
+ st1 {v0.16b}, [x0], #16
+ subs w2, w2, #16
+ b.gt 1b
+3: ret
+endfunc
+
function ff_float_to_fixed24_neon, export=1
1: ld1 {v0.4s, v1.4s}, [x1], #32
fcvtzs v0.4s, v0.4s, #24
diff --git a/tests/checkasm/ac3dsp.c b/tests/checkasm/ac3dsp.c
index b1064fccb4..06f31339f9 100644
--- a/tests/checkasm/ac3dsp.c
+++ b/tests/checkasm/ac3dsp.c
@@ -28,6 +28,14 @@
#include "checkasm.h"
+#define randomize_exp(buf, len) \
+ do { \
+ int i; \
+ for (i = 0; i < len; i++) { \
+ buf[i] = (uint8_t)rnd(); \
+ } \
+ } while (0)
+
#define randomize_float(buf, len) \
do { \
int i; \
@@ -37,6 +45,38 @@
} \
} while (0)
+static void check_ac3_exponent_min(AC3DSPContext *c) {
+#define MAX_COEFS 256
+#define MAX_CTXT 6
+#define EXP_SIZE (MAX_CTXT * MAX_COEFS)
+
+ LOCAL_ALIGNED_16(uint8_t, src, [EXP_SIZE]);
+ LOCAL_ALIGNED_16(uint8_t, v1, [EXP_SIZE]);
+ LOCAL_ALIGNED_16(uint8_t, v2, [EXP_SIZE]);
+ int n;
+
+ declare_func(void, uint8_t *, int, int);
+
+ for (n = 0; n < MAX_CTXT; ++n) {
+ if (check_func(c->ac3_exponent_min, "ac3_exponent_min_reuse%d", n)) {
+ randomize_exp(src, EXP_SIZE);
+
+ memcpy(v1, src, EXP_SIZE);
+ memcpy(v2, src, EXP_SIZE);
+
+ call_ref(v1, n, MAX_COEFS);
+ call_new(v2, n, MAX_COEFS);
+
+ if (memcmp(v1, v2, EXP_SIZE) != 0)
+ fail();
+
+ bench_new(v2, n, MAX_COEFS);
+ }
+ }
+
+ report("ac3_exponent_min");
+}
+
static void check_float_to_fixed24(AC3DSPContext *c) {
#define BUF_SIZE 1024
LOCAL_ALIGNED_32(float, src, [BUF_SIZE]);
@@ -67,5 +107,6 @@ void checkasm_check_ac3dsp(void)
AC3DSPContext c;
ff_ac3dsp_init(&c);
+ check_ac3_exponent_min(&c);
check_float_to_fixed24(&c);
}
--
2.42.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 7+ messages in thread
* [FFmpeg-devel] [PATCH v4 3/5] avcodec/ac3: Implement ac3_extract_exponents for aarch64 NEON
2024-04-06 14:23 [FFmpeg-devel] [PATCH v4 0/5] avcodec/ac3: Add aarch64 NEON DSP Geoff Hill
2024-04-06 14:25 ` [FFmpeg-devel] [PATCH v4 1/5] avcodec/ac3: Implement float_to_fixed24 for aarch64 NEON Geoff Hill
2024-04-06 14:25 ` [FFmpeg-devel] [PATCH v4 2/5] avcodec/ac3: Implement ac3_exponent_min " Geoff Hill
@ 2024-04-06 14:26 ` Geoff Hill
2024-04-06 14:26 ` [FFmpeg-devel] [PATCH v4 4/5] avcodec/ac3: Implement sum_square_butterfly_int32 " Geoff Hill
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Geoff Hill @ 2024-04-06 14:26 UTC (permalink / raw)
To: ffmpeg-devel
Signed-off-by: Geoff Hill <geoff@geoffhill.org>
---
libavcodec/aarch64/ac3dsp_init_aarch64.c | 2 ++
libavcodec/aarch64/ac3dsp_neon.S | 14 +++++++++
tests/checkasm/ac3dsp.c | 38 ++++++++++++++++++++++++
3 files changed, 54 insertions(+)
diff --git a/libavcodec/aarch64/ac3dsp_init_aarch64.c b/libavcodec/aarch64/ac3dsp_init_aarch64.c
index 8874b41393..1bdc215b51 100644
--- a/libavcodec/aarch64/ac3dsp_init_aarch64.c
+++ b/libavcodec/aarch64/ac3dsp_init_aarch64.c
@@ -26,6 +26,7 @@
#include "config.h"
void ff_ac3_exponent_min_neon(uint8_t *exp, int num_reuse_blocks, int nb_coefs);
+void ff_ac3_extract_exponents_neon(uint8_t *exp, int32_t *coef, int nb_coefs);
void ff_float_to_fixed24_neon(int32_t *dst, const float *src, size_t len);
av_cold void ff_ac3dsp_init_aarch64(AC3DSPContext *c)
@@ -34,5 +35,6 @@ av_cold void ff_ac3dsp_init_aarch64(AC3DSPContext *c)
if (!have_neon(cpu_flags)) return;
c->ac3_exponent_min = ff_ac3_exponent_min_neon;
+ c->extract_exponents = ff_ac3_extract_exponents_neon;
c->float_to_fixed24 = ff_float_to_fixed24_neon;
}
diff --git a/libavcodec/aarch64/ac3dsp_neon.S b/libavcodec/aarch64/ac3dsp_neon.S
index f916c32538..c350c1f173 100644
--- a/libavcodec/aarch64/ac3dsp_neon.S
+++ b/libavcodec/aarch64/ac3dsp_neon.S
@@ -37,6 +37,20 @@ function ff_ac3_exponent_min_neon, export=1
3: ret
endfunc
+function ff_ac3_extract_exponents_neon, export=1
+ movi v1.4s, #8
+1: ld1 {v0.4s}, [x1], #16
+ abs v0.4s, v0.4s
+ clz v0.4s, v0.4s
+ sub v0.4s, v0.4s, v1.4s
+ xtn v0.4h, v0.4s
+ xtn v0.8b, v0.8h
+ st1 {v0.s}[0], [x0], #4
+ subs w2, w2, #4
+ b.gt 1b
+ ret
+endfunc
+
function ff_float_to_fixed24_neon, export=1
1: ld1 {v0.4s, v1.4s}, [x1], #32
fcvtzs v0.4s, v0.4s, #24
diff --git a/tests/checkasm/ac3dsp.c b/tests/checkasm/ac3dsp.c
index 06f31339f9..dc1b169e68 100644
--- a/tests/checkasm/ac3dsp.c
+++ b/tests/checkasm/ac3dsp.c
@@ -19,6 +19,7 @@
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*/
+#include <stdint.h>
#include <string.h>
#include "libavutil/mem.h"
@@ -36,6 +37,16 @@
} \
} while (0)
+#define randomize_i24(buf, len) \
+ do { \
+ int i; \
+ for (i = 0; i < len; i++) { \
+ int32_t v = (int32_t)rnd(); \
+ int32_t u = (v & 0xFFFFFF); \
+ buf[i] = (v < 0) ? -u : u; \
+ } \
+ } while (0)
+
#define randomize_float(buf, len) \
do { \
int i; \
@@ -77,6 +88,32 @@ static void check_ac3_exponent_min(AC3DSPContext *c) {
report("ac3_exponent_min");
}
+static void check_ac3_extract_exponents(AC3DSPContext *c) {
+#define MAX_EXPS 3072
+ LOCAL_ALIGNED_16(int32_t, src, [MAX_EXPS]);
+ LOCAL_ALIGNED_16(uint8_t, v1, [MAX_EXPS]);
+ LOCAL_ALIGNED_16(uint8_t, v2, [MAX_EXPS]);
+ int n;
+
+ declare_func(void, uint8_t *, int32_t *, int);
+
+ for (n = 512; n <= MAX_EXPS; n += 256) {
+ if (check_func(c->extract_exponents, "ac3_extract_exponents_n%d", n)) {
+ randomize_i24(src, n);
+
+ call_ref(v1, src, n);
+ call_new(v2, src, n);
+
+ if (memcmp(v1, v2, n) != 0)
+ fail();
+
+ bench_new(v1, src, n);
+ }
+ }
+
+ report("ac3_extract_exponents");
+}
+
static void check_float_to_fixed24(AC3DSPContext *c) {
#define BUF_SIZE 1024
LOCAL_ALIGNED_32(float, src, [BUF_SIZE]);
@@ -108,5 +145,6 @@ void checkasm_check_ac3dsp(void)
ff_ac3dsp_init(&c);
check_ac3_exponent_min(&c);
+ check_ac3_extract_exponents(&c);
check_float_to_fixed24(&c);
}
--
2.42.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 7+ messages in thread
* [FFmpeg-devel] [PATCH v4 4/5] avcodec/ac3: Implement sum_square_butterfly_int32 for aarch64 NEON
2024-04-06 14:23 [FFmpeg-devel] [PATCH v4 0/5] avcodec/ac3: Add aarch64 NEON DSP Geoff Hill
` (2 preceding siblings ...)
2024-04-06 14:26 ` [FFmpeg-devel] [PATCH v4 3/5] avcodec/ac3: Implement ac3_extract_exponents " Geoff Hill
@ 2024-04-06 14:26 ` Geoff Hill
2024-04-06 14:26 ` [FFmpeg-devel] [PATCH v4 5/5] avcodec/ac3: Implement sum_square_butterfly_float " Geoff Hill
2024-04-08 10:47 ` [FFmpeg-devel] [PATCH v4 0/5] avcodec/ac3: Add aarch64 NEON DSP Martin Storsjö
5 siblings, 0 replies; 7+ messages in thread
From: Geoff Hill @ 2024-04-06 14:26 UTC (permalink / raw)
To: ffmpeg-devel
Signed-off-by: Geoff Hill <geoff@geoffhill.org>
---
libavcodec/aarch64/ac3dsp_init_aarch64.c | 5 +++++
libavcodec/aarch64/ac3dsp_neon.S | 23 ++++++++++++++++++++
tests/checkasm/ac3dsp.c | 27 ++++++++++++++++++++++++
3 files changed, 55 insertions(+)
diff --git a/libavcodec/aarch64/ac3dsp_init_aarch64.c b/libavcodec/aarch64/ac3dsp_init_aarch64.c
index 1bdc215b51..e95436c651 100644
--- a/libavcodec/aarch64/ac3dsp_init_aarch64.c
+++ b/libavcodec/aarch64/ac3dsp_init_aarch64.c
@@ -28,6 +28,10 @@
void ff_ac3_exponent_min_neon(uint8_t *exp, int num_reuse_blocks, int nb_coefs);
void ff_ac3_extract_exponents_neon(uint8_t *exp, int32_t *coef, int nb_coefs);
void ff_float_to_fixed24_neon(int32_t *dst, const float *src, size_t len);
+void ff_ac3_sum_square_butterfly_int32_neon(int64_t sum[4],
+ const int32_t *coef0,
+ const int32_t *coef1,
+ int len);
av_cold void ff_ac3dsp_init_aarch64(AC3DSPContext *c)
{
@@ -37,4 +41,5 @@ av_cold void ff_ac3dsp_init_aarch64(AC3DSPContext *c)
c->ac3_exponent_min = ff_ac3_exponent_min_neon;
c->extract_exponents = ff_ac3_extract_exponents_neon;
c->float_to_fixed24 = ff_float_to_fixed24_neon;
+ c->sum_square_butterfly_int32 = ff_ac3_sum_square_butterfly_int32_neon;
}
diff --git a/libavcodec/aarch64/ac3dsp_neon.S b/libavcodec/aarch64/ac3dsp_neon.S
index c350c1f173..77f9d20275 100644
--- a/libavcodec/aarch64/ac3dsp_neon.S
+++ b/libavcodec/aarch64/ac3dsp_neon.S
@@ -64,3 +64,26 @@ function ff_float_to_fixed24_neon, export=1
b.ne 1b
ret
endfunc
+
+function ff_ac3_sum_square_butterfly_int32_neon, export=1
+ movi v0.2d, #0
+ movi v1.2d, #0
+ movi v2.2d, #0
+ movi v3.2d, #0
+1: ld1 {v4.2s}, [x1], #8
+ ld1 {v5.2s}, [x2], #8
+ add v6.2s, v4.2s, v5.2s
+ sub v7.2s, v4.2s, v5.2s
+ smlal v0.2d, v4.2s, v4.2s
+ smlal v1.2d, v5.2s, v5.2s
+ smlal v2.2d, v6.2s, v6.2s
+ smlal v3.2d, v7.2s, v7.2s
+ subs w3, w3, #2
+ b.gt 1b
+ addp d0, v0.2d
+ addp d1, v1.2d
+ addp d2, v2.2d
+ addp d3, v3.2d
+ st1 {v0.1d-v3.1d}, [x0]
+ ret
+endfunc
diff --git a/tests/checkasm/ac3dsp.c b/tests/checkasm/ac3dsp.c
index dc1b169e68..573a76c764 100644
--- a/tests/checkasm/ac3dsp.c
+++ b/tests/checkasm/ac3dsp.c
@@ -139,6 +139,32 @@ static void check_float_to_fixed24(AC3DSPContext *c) {
report("float_to_fixed24");
}
+static void check_ac3_sum_square_butterfly_int32(AC3DSPContext *c) {
+#define ELEMS 240
+ LOCAL_ALIGNED_16(int32_t, lt, [ELEMS]);
+ LOCAL_ALIGNED_16(int32_t, rt, [ELEMS]);
+ LOCAL_ALIGNED_16(uint64_t, v1, [4]);
+ LOCAL_ALIGNED_16(uint64_t, v2, [4]);
+
+ declare_func(void, int64_t[4], const int32_t *, const int32_t *, int);
+
+ randomize_i24(lt, ELEMS);
+ randomize_i24(rt, ELEMS);
+
+ if (check_func(c->sum_square_butterfly_int32,
+ "ac3_sum_square_bufferfly_int32")) {
+ call_ref(v1, lt, rt, ELEMS);
+ call_new(v2, lt, rt, ELEMS);
+
+ if (memcmp(v1, v2, sizeof(int64_t[4])) != 0)
+ fail();
+
+ bench_new(v2, lt, rt, ELEMS);
+ }
+
+ report("ac3_sum_square_butterfly_int32");
+}
+
void checkasm_check_ac3dsp(void)
{
AC3DSPContext c;
@@ -147,4 +173,5 @@ void checkasm_check_ac3dsp(void)
check_ac3_exponent_min(&c);
check_ac3_extract_exponents(&c);
check_float_to_fixed24(&c);
+ check_ac3_sum_square_butterfly_int32(&c);
}
--
2.42.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 7+ messages in thread
* [FFmpeg-devel] [PATCH v4 5/5] avcodec/ac3: Implement sum_square_butterfly_float for aarch64 NEON
2024-04-06 14:23 [FFmpeg-devel] [PATCH v4 0/5] avcodec/ac3: Add aarch64 NEON DSP Geoff Hill
` (3 preceding siblings ...)
2024-04-06 14:26 ` [FFmpeg-devel] [PATCH v4 4/5] avcodec/ac3: Implement sum_square_butterfly_int32 " Geoff Hill
@ 2024-04-06 14:26 ` Geoff Hill
2024-04-08 10:47 ` [FFmpeg-devel] [PATCH v4 0/5] avcodec/ac3: Add aarch64 NEON DSP Martin Storsjö
5 siblings, 0 replies; 7+ messages in thread
From: Geoff Hill @ 2024-04-06 14:26 UTC (permalink / raw)
To: ffmpeg-devel
Signed-off-by: Geoff Hill <geoff@geoffhill.org>
---
libavcodec/aarch64/ac3dsp_init_aarch64.c | 5 ++++
libavcodec/aarch64/ac3dsp_neon.S | 30 ++++++++++++++++++++++++
tests/checkasm/ac3dsp.c | 26 ++++++++++++++++++++
3 files changed, 61 insertions(+)
diff --git a/libavcodec/aarch64/ac3dsp_init_aarch64.c b/libavcodec/aarch64/ac3dsp_init_aarch64.c
index e95436c651..e367353e11 100644
--- a/libavcodec/aarch64/ac3dsp_init_aarch64.c
+++ b/libavcodec/aarch64/ac3dsp_init_aarch64.c
@@ -32,6 +32,10 @@ void ff_ac3_sum_square_butterfly_int32_neon(int64_t sum[4],
const int32_t *coef0,
const int32_t *coef1,
int len);
+void ff_ac3_sum_square_butterfly_float_neon(float sum[4],
+ const float *coef0,
+ const float *coef1,
+ int len);
av_cold void ff_ac3dsp_init_aarch64(AC3DSPContext *c)
{
@@ -42,4 +46,5 @@ av_cold void ff_ac3dsp_init_aarch64(AC3DSPContext *c)
c->extract_exponents = ff_ac3_extract_exponents_neon;
c->float_to_fixed24 = ff_float_to_fixed24_neon;
c->sum_square_butterfly_int32 = ff_ac3_sum_square_butterfly_int32_neon;
+ c->sum_square_butterfly_float = ff_ac3_sum_square_butterfly_float_neon;
}
diff --git a/libavcodec/aarch64/ac3dsp_neon.S b/libavcodec/aarch64/ac3dsp_neon.S
index 77f9d20275..20beb6cc50 100644
--- a/libavcodec/aarch64/ac3dsp_neon.S
+++ b/libavcodec/aarch64/ac3dsp_neon.S
@@ -87,3 +87,33 @@ function ff_ac3_sum_square_butterfly_int32_neon, export=1
st1 {v0.1d-v3.1d}, [x0]
ret
endfunc
+
+function ff_ac3_sum_square_butterfly_float_neon, export=1
+ movi v0.4s, #0
+ movi v1.4s, #0
+ movi v2.4s, #0
+ movi v3.4s, #0
+1: ld1 {v30.4s}, [x1], #16
+ ld1 {v31.4s}, [x2], #16
+ fadd v16.4s, v30.4s, v31.4s
+ fsub v17.4s, v30.4s, v31.4s
+ fmla v0.4s, v30.4s, v30.4s
+ fmla v1.4s, v31.4s, v31.4s
+ fmla v2.4s, v16.4s, v16.4s
+ fmla v3.4s, v17.4s, v17.4s
+ subs w3, w3, #4
+ b.gt 1b
+ faddp v0.4s, v0.4s, v0.4s
+ faddp v0.2s, v0.2s, v0.2s
+ st1 {v0.s}[0], [x0], #4
+ faddp v1.4s, v1.4s, v1.4s
+ faddp v1.2s, v1.2s, v1.2s
+ st1 {v1.s}[0], [x0], #4
+ faddp v2.4s, v2.4s, v2.4s
+ faddp v2.2s, v2.2s, v2.2s
+ st1 {v2.s}[0], [x0], #4
+ faddp v3.4s, v3.4s, v3.4s
+ faddp v3.2s, v3.2s, v3.2s
+ st1 {v3.s}[0], [x0]
+ ret
+endfunc
diff --git a/tests/checkasm/ac3dsp.c b/tests/checkasm/ac3dsp.c
index 573a76c764..442e965f3b 100644
--- a/tests/checkasm/ac3dsp.c
+++ b/tests/checkasm/ac3dsp.c
@@ -165,6 +165,31 @@ static void check_ac3_sum_square_butterfly_int32(AC3DSPContext *c) {
report("ac3_sum_square_butterfly_int32");
}
+static void check_ac3_sum_square_butterfly_float(AC3DSPContext *c) {
+ LOCAL_ALIGNED_32(float, lt, [ELEMS]);
+ LOCAL_ALIGNED_32(float, rt, [ELEMS]);
+ LOCAL_ALIGNED_16(float, v1, [4]);
+ LOCAL_ALIGNED_16(float, v2, [4]);
+
+ declare_func(void, float[4], const float *, const float *, int);
+
+ randomize_float(lt, ELEMS);
+ randomize_float(rt, ELEMS);
+
+ if (check_func(c->sum_square_butterfly_float,
+ "ac3_sum_square_bufferfly_float")) {
+ call_ref(v1, lt, rt, ELEMS);
+ call_new(v2, lt, rt, ELEMS);
+
+ if (!float_near_ulp_array(v1, v2, 10, 4))
+ fail();
+
+ bench_new(v2, lt, rt, ELEMS);
+ }
+
+ report("ac3_sum_square_butterfly_float");
+}
+
void checkasm_check_ac3dsp(void)
{
AC3DSPContext c;
@@ -174,4 +199,5 @@ void checkasm_check_ac3dsp(void)
check_ac3_extract_exponents(&c);
check_float_to_fixed24(&c);
check_ac3_sum_square_butterfly_int32(&c);
+ check_ac3_sum_square_butterfly_float(&c);
}
--
2.42.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [FFmpeg-devel] [PATCH v4 0/5] avcodec/ac3: Add aarch64 NEON DSP
2024-04-06 14:23 [FFmpeg-devel] [PATCH v4 0/5] avcodec/ac3: Add aarch64 NEON DSP Geoff Hill
` (4 preceding siblings ...)
2024-04-06 14:26 ` [FFmpeg-devel] [PATCH v4 5/5] avcodec/ac3: Implement sum_square_butterfly_float " Geoff Hill
@ 2024-04-08 10:47 ` Martin Storsjö
5 siblings, 0 replies; 7+ messages in thread
From: Martin Storsjö @ 2024-04-08 10:47 UTC (permalink / raw)
To: Geoff Hill; +Cc: ffmpeg-devel
On Sat, 6 Apr 2024, Geoff Hill wrote:
> Thanks Martin for your review and testing.
>
> Here's v4 with the following changes:
>
> * Use fmal in sum_square_butterfly_float loop. Faster.
>
> * Removed redundant loop bound zero checks in extract_exponents,
> sum_square_bufferfly_int32 and sum_square_bufferfly_float.
>
> * Fixed randomize_int24() to also use negative values.
>
> * Carry copyright from arm implementation over to aarch64. I
> did use this version as reference.
>
> * Fix indentation to match existing aarch64 assembly style.
>
> Tested once again on aarch64 and x86.
Thanks, this set looked good, so I pushed it.
I amended the commits a bit, moving the added copyright line from
checkasm/ac3dsp.c from patch 1 to 2, where that file actually gets
extended.
Actually, after pushing, I realized another thing that can be done better
in ff_ac3_sum_square_butterfly_float_neon - I'll send a patch for that.
> On AWS Graviton2 (t4g.medium), GCC 12.3:
>
> $ tests/checkasm/checkasm --bench --test=ac3dsp
> ...
> NEON:
> - ac3dsp.ac3_exponent_min [OK]
> - ac3dsp.ac3_extract_exponents [OK]
> - ac3dsp.float_to_fixed24 [OK]
> - ac3dsp.ac3_sum_square_butterfly_int32 [OK]
> - ac3dsp.ac3_sum_square_butterfly_float [OK]
> checkasm: all 20 tests passed
> float_to_fixed24_c: 2460.5
> float_to_fixed24_neon: 561.5
FWIW, it's usually neater to include such numbers in the commit message,
so it gets brought along into the final git history (to show the benefit
we got from the optimization at the time), quoting only those functions
that are added/modified in each patch. But I didn't amend in that in the
commit messages this time, but you can keep it in mind for the future.
Anyway, thanks for the patches!
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 7+ messages in thread