Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
* [FFmpeg-devel] [PATCH 0/6] RISC-V initial ac3dsp
@ 2023-06-15 10:36 Peiting Shen
  2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 1/6] lavc/ac3dsp: RISC-V V ac3_exponent_min Peiting Shen
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Peiting Shen @ 2023-06-15 10:36 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Shen Peiting

From: Shen Peiting <shenpeiting@eswincomputing.com>

We optimized the six interfaces of AC3 init by RVV, the optimized 
performance was tested on the RISC-V ISA simulator--Spike, and the 
results were attached to each commit.

shenpeiting (6):
  lavc/ac3dsp: RISC-V V ac3_exponent_min
  lavc/ac3dsp: RISC-V V float_to_fixed24
  lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32
  lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_float
  lavc/ac3dsp: RISC-V V ac3_compute_mantissa_size
  lavc/ac3dsp: RISC-V B ac3_extract_exponents

 libavcodec/ac3dsp.c            |   2 +
 libavcodec/ac3dsp.h            |   1 +
 libavcodec/riscv/Makefile      |   3 +
 libavcodec/riscv/ac3dsp_init.c |  60 +++++++++
 libavcodec/riscv/ac3dsp_rvb.S  |  42 ++++++
 libavcodec/riscv/ac3dsp_rvv.S  | 225 +++++++++++++++++++++++++++++++++
 6 files changed, 333 insertions(+)
 create mode 100644 libavcodec/riscv/ac3dsp_init.c
 create mode 100644 libavcodec/riscv/ac3dsp_rvb.S
 create mode 100644 libavcodec/riscv/ac3dsp_rvv.S

-- 
2.17.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [FFmpeg-devel] [PATCH 1/6] lavc/ac3dsp: RISC-V V ac3_exponent_min
  2023-06-15 10:36 [FFmpeg-devel] [PATCH 0/6] RISC-V initial ac3dsp Peiting Shen
@ 2023-06-15 10:36 ` Peiting Shen
  2023-06-15 18:02   ` Rémi Denis-Courmont
  2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 2/6] lavc/ac3dsp: RISC-V V float_to_fixed24 Peiting Shen
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Peiting Shen @ 2023-06-15 10:36 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Shen Peiting

From: Shen Peiting <shenpeiting@eswincomputing.com>

Find scalar minium optimized by using RVV instructions

Benchmarks on Spike(cycles):
*exp=1280*4;num_reuse_blocks=5;nb_coefs=16
ac3_exponent_min_c: 1993
ac3_exponent_min_rvv: 258
*exp=1280*4;num_reuse_blocks=19;nb_coefs=255
ac3_exponent_min_c: 99010
ac3_exponent_min_rvv: 3843

The optimization performance is more obvious with the increase of number of
reuse blocks and number of coefs.

Co-Authored by: Yang Xiaojun <yangxiaojun@eswincomputing.com>
Co-Authored by: Huang Xing <huangxing1@eswincomputing.com>
Co-Authored by: Zeng Fanchen <zengfanchen@eswincomputing.com>
Signed-off-by: Shen Peiting <shenpeiting@eswincomputing.com>
---
 libavcodec/ac3dsp.c            |  2 ++
 libavcodec/ac3dsp.h            |  1 +
 libavcodec/riscv/Makefile      |  2 ++
 libavcodec/riscv/ac3dsp_init.c | 37 +++++++++++++++++++++++++++
 libavcodec/riscv/ac3dsp_rvv.S  | 46 ++++++++++++++++++++++++++++++++++
 5 files changed, 88 insertions(+)
 create mode 100644 libavcodec/riscv/ac3dsp_init.c
 create mode 100644 libavcodec/riscv/ac3dsp_rvv.S

diff --git a/libavcodec/ac3dsp.c b/libavcodec/ac3dsp.c
index 22cb5f242e..302b786b15 100644
--- a/libavcodec/ac3dsp.c
+++ b/libavcodec/ac3dsp.c
@@ -395,5 +395,7 @@ av_cold void ff_ac3dsp_init(AC3DSPContext *c)
     ff_ac3dsp_init_x86(c);
 #elif ARCH_MIPS
     ff_ac3dsp_init_mips(c);
+#elif ARCH_RISCV
+    ff_ac3dsp_init_riscv(c);
 #endif
 }
diff --git a/libavcodec/ac3dsp.h b/libavcodec/ac3dsp.h
index 33e51e202e..a01bff3d11 100644
--- a/libavcodec/ac3dsp.h
+++ b/libavcodec/ac3dsp.h
@@ -109,6 +109,7 @@ void ff_ac3dsp_init    (AC3DSPContext *c);
 void ff_ac3dsp_init_arm(AC3DSPContext *c);
 void ff_ac3dsp_init_x86(AC3DSPContext *c);
 void ff_ac3dsp_init_mips(AC3DSPContext *c);
+void ff_ac3dsp_init_riscv(AC3DSPContext *c);
 
 void ff_ac3dsp_downmix(AC3DSPContext *c, float **samples, float **matrix,
                        int out_ch, int in_ch, int len);
diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile
index ee17a521fd..a627924cac 100644
--- a/libavcodec/riscv/Makefile
+++ b/libavcodec/riscv/Makefile
@@ -1,5 +1,7 @@
 OBJS-$(CONFIG_AAC_DECODER) += riscv/aacpsdsp_init.o
 RVV-OBJS-$(CONFIG_AAC_DECODER) += riscv/aacpsdsp_rvv.o
+OBJS-$(CONFIG_AC3DSP) += riscv/ac3dsp_init.o
+RVV-OBJS-$(CONFIG_AC3DSP) += riscv/ac3dsp_rvv.o
 OBJS-$(CONFIG_ALAC_DECODER) += riscv/alacdsp_init.o
 RVV-OBJS-$(CONFIG_ALAC_DECODER) += riscv/alacdsp_rvv.o
 OBJS-$(CONFIG_AUDIODSP) += riscv/audiodsp_init.o \
diff --git a/libavcodec/riscv/ac3dsp_init.c b/libavcodec/riscv/ac3dsp_init.c
new file mode 100644
index 0000000000..bb67d86998
--- /dev/null
+++ b/libavcodec/riscv/ac3dsp_init.c
@@ -0,0 +1,37 @@
+/*
+ * Copyright 2023 Beijing ESWIN Computing Technology Co., Ltd.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include <stdint.h>
+
+#include "libavutil/attributes.h"
+#include "libavcodec/ac3dsp.h"
+#include "libavutil/cpu.h"
+#include "config.h"
+
+void ff_ac3_exponent_min_rvv(uint8_t *exp, int num_reuse_blocks, int nb_coefs);
+
+av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
+{
+    int flags = av_get_cpu_flags();
+#if HAVE_RVV
+    if (flags & AV_CPU_FLAG_RVV_I32)
+        c->ac3_exponent_min = ff_ac3_exponent_min_rvv;
+#endif
+}
+
diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S
new file mode 100644
index 0000000000..879123f4a7
--- /dev/null
+++ b/libavcodec/riscv/ac3dsp_rvv.S
@@ -0,0 +1,46 @@
+/*
+ * Copyright 2023 Beijing ESWIN Computing Technology Co., Ltd.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/riscv/asm.S"
+
+func ff_ac3_exponent_min_rvv, zve32x
+    beq             a1, x0, 3f
+    li              t0, 256
+    addi            a1, a1, 1
+1:
+    mv              t2, a0
+    mv              t3, a1
+    lb              t4, (t2)
+2:
+    vsetvli         t1, t3, e8, m8
+    vlse8.v         v0, (t2), t0
+    vmv.s.x         v8, t4
+    sub             t3, t3, t1
+    vredminu.vs     v8, v0, v8
+    vmv.x.s         t4, v8
+    bnez            t3, 2b
+    vsetivli        t1, 1, e8
+    vse8.v          v8, (a0)
+    addi            a0, a0, 1
+    addi            a2, a2, -1
+    bnez            a2, 1b
+3:
+    ret
+endfunc
-- 
2.17.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [FFmpeg-devel] [PATCH 2/6] lavc/ac3dsp: RISC-V V float_to_fixed24
  2023-06-15 10:36 [FFmpeg-devel] [PATCH 0/6] RISC-V initial ac3dsp Peiting Shen
  2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 1/6] lavc/ac3dsp: RISC-V V ac3_exponent_min Peiting Shen
@ 2023-06-15 10:36 ` Peiting Shen
  2023-06-15 18:06   ` Rémi Denis-Courmont
  2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 3/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32 Peiting Shen
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Peiting Shen @ 2023-06-15 10:36 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Shen Peiting

From: Shen Peiting <shenpeiting@eswincomputing.com>

Vector instructions replaces scalar options of float convert to fixed

Benchmarks on Spike(cycles):
len=16
float_to_fixed24_c: 315
float_to_fixed24_rvv: 27
len=160
float_to_fixed24_c: 2871
float_to_fixed24_rvv: 67

Co-Authored by: Yang Xiaojun <yangxiaojun@eswincomputing.com>
Co-Authored by: Huang Xing <huangxing1@eswincomputing.com>
Co-Authored by: Zeng Fanchen <zengfanchen@eswincomputing.com>
Signed-off-by: Shen Peiting <shenpeiting@eswincomputing.com>
---
 libavcodec/riscv/ac3dsp_init.c |  5 ++++-
 libavcodec/riscv/ac3dsp_rvv.S  | 19 +++++++++++++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/libavcodec/riscv/ac3dsp_init.c b/libavcodec/riscv/ac3dsp_init.c
index bb67d86998..a4e75a7541 100644
--- a/libavcodec/riscv/ac3dsp_init.c
+++ b/libavcodec/riscv/ac3dsp_init.c
@@ -25,13 +25,16 @@
 #include "config.h"
 
 void ff_ac3_exponent_min_rvv(uint8_t *exp, int num_reuse_blocks, int nb_coefs);
+void ff_float_to_fixed24_rvv(int32_t *dst, const float *src, unsigned int len);
 
 av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
 {
     int flags = av_get_cpu_flags();
 #if HAVE_RVV
-    if (flags & AV_CPU_FLAG_RVV_I32)
+    if (flags & AV_CPU_FLAG_RVV_I32) {
         c->ac3_exponent_min = ff_ac3_exponent_min_rvv;
+        c->float_to_fixed24 = ff_float_to_fixed24_rvv;
+    }
 #endif
 }
 
diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S
index 879123f4a7..d98e72c12c 100644
--- a/libavcodec/riscv/ac3dsp_rvv.S
+++ b/libavcodec/riscv/ac3dsp_rvv.S
@@ -44,3 +44,22 @@ func ff_ac3_exponent_min_rvv, zve32x
 3:
     ret
 endfunc
+
+
+func ff_float_to_fixed24_rvv, zve32x
+    addi            t1, x0, 1
+    slli            t1, t1, 24
+    fcvt.s.w        f1, t1
+1:
+    vsetvli         t0, a2, e32, m8
+    vle32.v         v0, (a1)
+    vfmul.vf        v0, v0, f1
+    vfcvt.x.f.v     v16, v0
+    vse32.v         v16, (a0)
+    sub             a2, a2, t0
+    slli            t0, t0, 2
+    add             a1, a1, t0
+    add             a0, a0, t0
+    bgtz            a2, 1b
+    ret
+endfunc
-- 
2.17.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [FFmpeg-devel] [PATCH 3/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32
  2023-06-15 10:36 [FFmpeg-devel] [PATCH 0/6] RISC-V initial ac3dsp Peiting Shen
  2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 1/6] lavc/ac3dsp: RISC-V V ac3_exponent_min Peiting Shen
  2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 2/6] lavc/ac3dsp: RISC-V V float_to_fixed24 Peiting Shen
@ 2023-06-15 10:36 ` Peiting Shen
  2023-06-15 19:25   ` Rémi Denis-Courmont
  2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 4/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_float Peiting Shen
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Peiting Shen @ 2023-06-15 10:36 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Shen Peiting

From: Shen Peiting <shenpeiting@eswincomputing.com>

Scalar calculating int32 sum_square optimized by using RVV instructions

Benchmarks on Spike(cycles):
len=128
ac3_sum_square_butterfly_int32_c: 8497
ac3_sum_square_butterfly_int32_rvv: 258
len=1280
ac3_sum_square_butterfly_int32_c: 84529
ac3_sum_square_butterfly_int32_rvv: 2274

Co-Authored by: Yang Xiaojun <yangxiaojun@eswincomputing.com>
Co-Authored by: Huang Xing <huangxing1@eswincomputing.com>
Co-Authored by: Zeng Fanchen <zengfanchen@eswincomputing.com>
Signed-off-by: Shen Peiting <shenpeiting@eswincomputing.com>
---
 libavcodec/riscv/ac3dsp_init.c |  8 +++++
 libavcodec/riscv/ac3dsp_rvv.S  | 53 ++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+)

diff --git a/libavcodec/riscv/ac3dsp_init.c b/libavcodec/riscv/ac3dsp_init.c
index a4e75a7541..4fd4abe83e 100644
--- a/libavcodec/riscv/ac3dsp_init.c
+++ b/libavcodec/riscv/ac3dsp_init.c
@@ -26,6 +26,10 @@
 
 void ff_ac3_exponent_min_rvv(uint8_t *exp, int num_reuse_blocks, int nb_coefs);
 void ff_float_to_fixed24_rvv(int32_t *dst, const float *src, unsigned int len);
+void ff_ac3_sum_square_butterfly_int32_rvv(int64_t sum[4],
+                                            const int32_t *coef0,
+                                            const int32_t *coef1,
+                                            int len);
 
 av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
 {
@@ -35,6 +39,10 @@ av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
         c->ac3_exponent_min = ff_ac3_exponent_min_rvv;
         c->float_to_fixed24 = ff_float_to_fixed24_rvv;
     }
+#if (__riscv_xlen >= 64)
+    if (flags & AV_CPU_FLAG_RVV_I64)
+        c->sum_square_butterfly_int32 = ff_ac3_sum_square_butterfly_int32_rvv;
+#endif
 #endif
 }
 
diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S
index d98e72c12c..4e0d238f85 100644
--- a/libavcodec/riscv/ac3dsp_rvv.S
+++ b/libavcodec/riscv/ac3dsp_rvv.S
@@ -63,3 +63,56 @@ func ff_float_to_fixed24_rvv, zve32x
     bgtz            a2, 1b
     ret
 endfunc
+
+
+func ff_ac3_sum_square_butterfly_int32_rvv, zve64x
+    vsetvli         t0, a3, e32, m2
+    vle32.v         v0, (a1)
+    vle32.v         v2, (a2)
+    vadd.vv         v4, v0, v2
+    vsub.vv         v6, v0, v2
+    vwmul.vv        v8, v0, v0
+    vwmul.vv        v12, v2, v2
+    vwmul.vv        v16, v4, v4
+    vwmul.vv        v20, v6, v6
+    sub             a3, a3, t0
+    slli            t0, t0, 2
+    add             a1, a1, t0
+    add             a2, a2, t0
+    beq             a3, x0, 2f
+1:
+    vsetvli         t0, a3, e32, m2
+    vle32.v         v0, (a1)
+    vle32.v         v2, (a2)
+    vadd.vv         v4, v0, v2
+    vsub.vv         v6, v0, v2
+    vwmacc.vv       v8, v0, v0
+    vwmacc.vv       v12, v2, v2
+    vwmacc.vv       v16, v4, v4
+    vwmacc.vv       v20, v6, v6
+    sub             a3, a3, t0
+    slli            t0, t0, 2
+    add             a1, a1, t0
+    add             a2, a2, t0
+    bnez            a3, 1b
+2:
+    vsetvli         t0, x0, e64, m4
+    vmv.s.x         v24, x0
+    vmv.s.x         v25, x0
+    vmv.s.x         v26, x0
+    vmv.s.x         v27, x0
+    vredsum.vs      v24, v8, v24
+    vredsum.vs      v25, v12, v25
+    vredsum.vs      v26, v16, v26
+    vredsum.vs      v27, v20, v27
+    vsetivli        t0, 1, e64, m1
+    vse64.v         v24, (a0)
+    addi            a0, a0, 8
+    vse64.v         v25, (a0)
+    addi            a0, a0, 8
+    vse64.v         v26, (a0)
+    addi            a0, a0, 8
+    vse64.v         v27, (a0)
+    addi            a0, a0, 8
+    ret
+endfunc
-- 
2.17.1


_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [FFmpeg-devel] [PATCH 4/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_float
  2023-06-15 10:36 [FFmpeg-devel] [PATCH 0/6] RISC-V initial ac3dsp Peiting Shen
                   ` (2 preceding siblings ...)
  2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 3/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32 Peiting Shen
@ 2023-06-15 10:36 ` Peiting Shen
  2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 5/6] lavc/ac3dsp: RISC-V V ac3_compute_mantissa_size Peiting Shen
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Peiting Shen @ 2023-06-15 10:36 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Shen Peiting

From: Shen Peiting <shenpeiting@eswincomputing.com>

Scalar calculating float sum_square optimized by using RVV instructions

Benchmarks on Spike(cycles):
len=128
ac3_sum_square_butterfly_float_c: 7986
ac3_sum_square_butterfly_float_rvv: 146
len=1280
ac3_sum_square_butterfly_float_c: 79410
ac3_sum_square_butterfly_float_rvv: 1154

Co-Authored by: Yang Xiaojun <yangxiaojun@eswincomputing.com>
Co-Authored by: Huang Xing <huangxing1@eswincomputing.com>
Co-Authored by: Zeng Fanchen <zengfanchen@eswincomputing.com>
Signed-off-by: Shen Peiting <shenpeiting@eswincomputing.com>
---
 libavcodec/riscv/ac3dsp_init.c |  6 ++++
 libavcodec/riscv/ac3dsp_rvv.S  | 54 ++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)

diff --git a/libavcodec/riscv/ac3dsp_init.c b/libavcodec/riscv/ac3dsp_init.c
index 4fd4abe83e..d3aa20623a 100644
--- a/libavcodec/riscv/ac3dsp_init.c
+++ b/libavcodec/riscv/ac3dsp_init.c
@@ -30,6 +30,10 @@ void ff_ac3_sum_square_butterfly_int32_rvv(int64_t sum[4],
                                             const int32_t *coef0,
                                             const int32_t *coef1,
                                             int len);
+void ff_ac3_sum_square_butterfly_float_rvv(float sum[4],
+                                            const float *coef0,
+                                            const float *coef1,
+                                            int len);
 
 av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
 {
@@ -39,6 +43,8 @@ av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
         c->ac3_exponent_min = ff_ac3_exponent_min_rvv;
         c->float_to_fixed24 = ff_float_to_fixed24_rvv;
     }
+    if (flags & AV_CPU_FLAG_RVV_F32)
+        c->sum_square_butterfly_float = ff_ac3_sum_square_butterfly_float_rvv;
 #if (__riscv_xlen >= 64)
     if (flags & AV_CPU_FLAG_RVV_I64)
         c->sum_square_butterfly_int32 = ff_ac3_sum_square_butterfly_int32_rvv;
diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S
index 4e0d238f85..05a4d44938 100644
--- a/libavcodec/riscv/ac3dsp_rvv.S
+++ b/libavcodec/riscv/ac3dsp_rvv.S
@@ -116,3 +116,57 @@ func ff_ac3_sum_square_butterfly_int32_rvv, zve64x
     addi            a0, a0, 8
     ret
 endfunc
+
+
+func ff_ac3_sum_square_butterfly_float_rvv, zve32f
+    #Round Up
+    li              t1, 0x61
+    fscsr           t1
+    vsetvli         t0, a3, e32, m4
+    vle32.v         v0, (a1)
+    vle32.v         v4, (a2)
+    vfadd.vv        v8, v0, v4
+    vfsub.vv        v12, v0, v4
+    vfmul.vv        v16, v0, v0
+    vfmul.vv        v20, v4, v4
+    vfmul.vv        v24, v8, v8
+    vfmul.vv        v28, v12, v12
+    sub             a3, a3, t0
+    slli            t0, t0, 2
+    add             a1, a1, t0
+    add             a2, a2, t0
+    beq             a3, x0, 2f
+1:
+    vsetvli         t0, a3, e32, m4
+    vle32.v         v0, (a1)
+    vle32.v         v4, (a2)
+    vfadd.vv        v8, v0, v4
+    vfsub.vv        v12, v0, v4
+    vfmacc.vv       v16, v0, v0
+    vfmacc.vv       v20, v4, v4
+    vfmacc.vv       v24, v8, v8
+    vfmacc.vv       v28, v12, v12
+    sub             a3, a3, t0
+    slli            t0, t0, 2
+    add             a1, a1, t0
+    add             a2, a2, t0
+    bnez            a3, 1b
+2:
+    vsetvli         t0, x0, e32, m4
+    fcvt.s.w        f0, x0
+    vfmv.v.f        v0, f0
+    vfredsum.vs     v0, v16, v0
+    vfredsum.vs     v1, v20, v1
+    vfredsum.vs     v2, v24, v2
+    vfredsum.vs     v3, v28, v3
+    vsetivli        t0, 1, e32, m1
+    vse32.v         v0, (a0)
+    addi            a0, a0, 4
+    vse32.v         v1, (a0)
+    addi            a0, a0, 4
+    vse32.v         v2, (a0)
+    addi            a0, a0, 4
+    vse32.v         v3, (a0)
+    addi            a0, a0, 4
+    ret
+endfunc
-- 
2.17.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [FFmpeg-devel] [PATCH 5/6] lavc/ac3dsp: RISC-V V ac3_compute_mantissa_size
  2023-06-15 10:36 [FFmpeg-devel] [PATCH 0/6] RISC-V initial ac3dsp Peiting Shen
                   ` (3 preceding siblings ...)
  2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 4/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_float Peiting Shen
@ 2023-06-15 10:36 ` Peiting Shen
  2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 6/6] lavc/ac3dsp: RISC-V B ac3_extract_exponents Peiting Shen
  2023-06-15 13:57 ` [FFmpeg-devel] [PATCH 0/6] RISC-V initial ac3dsp Lynne
  6 siblings, 0 replies; 14+ messages in thread
From: Peiting Shen @ 2023-06-15 10:36 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Shen Peiting

From: Shen Peiting <shenpeiting@eswincomputing.com>

Use RVV instruction vlseg<nf>e<eew> to operate on matrix columns.

Benchmarks on Spike(cycles):
ac3_compute_mantissa_size_c: 2338
ac3_compute_mantissa_size_rvv: 55

Co-Authored by: Yang Xiaojun <yangxiaojun@eswincomputing.com>
Co-Authored by: Huang Xing <huangxing1@eswincomputing.com>
Co-Authored by: Zeng Fanchen <zengfanchen@eswincomputing.com>
Signed-off-by: Shen Peiting <shenpeiting@eswincomputing.com>
---
 libavcodec/riscv/ac3dsp_init.c |  3 ++
 libavcodec/riscv/ac3dsp_rvv.S  | 53 ++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/libavcodec/riscv/ac3dsp_init.c b/libavcodec/riscv/ac3dsp_init.c
index d3aa20623a..4769213ebc 100644
--- a/libavcodec/riscv/ac3dsp_init.c
+++ b/libavcodec/riscv/ac3dsp_init.c
@@ -35,6 +35,8 @@ void ff_ac3_sum_square_butterfly_float_rvv(float sum[4],
                                             const float *coef1,
                                             int len);
 
+void ff_ac3_compute_mantissa_size_rvv(uint16_t mant_cnt[6][16]);
+
 av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
 {
     int flags = av_get_cpu_flags();
@@ -42,6 +44,7 @@ av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
     if (flags & AV_CPU_FLAG_RVV_I32) {
         c->ac3_exponent_min = ff_ac3_exponent_min_rvv;
         c->float_to_fixed24 = ff_float_to_fixed24_rvv;
+        c->compute_mantissa_size = ff_ac3_compute_mantissa_size_rvv;
     }
     if (flags & AV_CPU_FLAG_RVV_F32)
         c->sum_square_butterfly_float = ff_ac3_sum_square_butterfly_float_rvv;
diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S
index 05a4d44938..cedd3d7d05 100644
--- a/libavcodec/riscv/ac3dsp_rvv.S
+++ b/libavcodec/riscv/ac3dsp_rvv.S
@@ -170,3 +170,56 @@ func ff_ac3_sum_square_butterfly_float_rvv, zve32f
     addi            a0, a0, 4
     ret
 endfunc
+
+
+func ff_ac3_compute_mantissa_size_rvv, zve32x
+    li               t1, 32
+    li               t2, 3
+    vsetivli         t0, 6, e16
+    vlsseg5e16.v     v0, (a0), t1
+    #(clolum[[i]1]/3)
+    vdivu.vx         v1, v1, t2
+    li               t3, 5
+    vwmul.vx         v22, v1, t3
+    #(clolum[[i]2]/3)
+    vdivu.vx         v2, v2, t2
+    vwmacc.vx        v22, t2, v3
+    vsra.vi          v4, v4, 1
+    vadd.vv          v4, v4, v2
+    li               t2, 7
+    vwmacc.vx        v22, t2, v4
+
+    addi             a0, a0, 10
+    vlsseg8e16.v     v5, (a0), t1
+    li               t3, 4
+    vwmacc.vx        v22, t3, v5
+    li               t3, 5
+    vwmacc.vx        v22, t3, v6
+    li               t3, 6
+    vwmacc.vx        v22, t3, v7
+    li               t3, 7
+    vwmacc.vx        v22, t3, v8
+    li               t3, 8
+    vwmacc.vx        v22, t3, v9
+    li               t3, 9
+    vwmacc.vx        v22, t3, v10
+    li               t3, 10
+    vwmacc.vx        v22, t3, v11
+    li               t3, 11
+    vwmacc.vx        v22, t3, v12
+
+    addi             a0, a0, 16
+    vlsseg3e16.v     v5, (a0), t1
+    li               t3, 12
+    vwmacc.vx        v22, t3, v5
+    li               t3, 14
+    vwmacc.vx        v22, t3, v6
+    li               t3, 16
+    vwmacc.vx        v22, t3, v7
+
+    vsetivli         t0, 6, e32, m2
+    vmv.s.x          v30, x0
+    vredsum.vs       v30, v22, v30
+    vmv.x.s          a0, v30
+    ret
+endfunc
-- 
2.17.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [FFmpeg-devel] [PATCH 6/6] lavc/ac3dsp: RISC-V B ac3_extract_exponents
  2023-06-15 10:36 [FFmpeg-devel] [PATCH 0/6] RISC-V initial ac3dsp Peiting Shen
                   ` (4 preceding siblings ...)
  2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 5/6] lavc/ac3dsp: RISC-V V ac3_compute_mantissa_size Peiting Shen
@ 2023-06-15 10:36 ` Peiting Shen
  2023-06-15 19:18   ` Rémi Denis-Courmont
  2023-06-15 13:57 ` [FFmpeg-devel] [PATCH 0/6] RISC-V initial ac3dsp Lynne
  6 siblings, 1 reply; 14+ messages in thread
From: Peiting Shen @ 2023-06-15 10:36 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Shen Peiting

From: Shen Peiting <shenpeiting@eswincomputing.com>

Use RVB instruction clz to calculate the number of leading zeros of MSB instead of av_log2.

Benchmarks on Spike(cycles):
ac3_extract_exponents_c: 8226
ac3_extract_exponents_rvb: 1167

Co-Authored by: Yang Xiaojun <yangxiaojun@eswincomputing.com>
Co-Authored by: Huang Xing <huangxing1@eswincomputing.com>
Co-Authored by: Zeng Fanchen <zengfanchen@eswincomputing.com>
Signed-off-by: Shen Peiting <shenpeiting@eswincomputing.com>
---
 libavcodec/riscv/Makefile      |  3 ++-
 libavcodec/riscv/ac3dsp_init.c |  3 +++
 libavcodec/riscv/ac3dsp_rvb.S  | 42 ++++++++++++++++++++++++++++++++++
 3 files changed, 47 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/riscv/ac3dsp_rvb.S

diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile
index a627924cac..3d0c196cb9 100644
--- a/libavcodec/riscv/Makefile
+++ b/libavcodec/riscv/Makefile
@@ -1,7 +1,8 @@
 OBJS-$(CONFIG_AAC_DECODER) += riscv/aacpsdsp_init.o
 RVV-OBJS-$(CONFIG_AAC_DECODER) += riscv/aacpsdsp_rvv.o
 OBJS-$(CONFIG_AC3DSP) += riscv/ac3dsp_init.o
-RVV-OBJS-$(CONFIG_AC3DSP) += riscv/ac3dsp_rvv.o
+RVV-OBJS-$(CONFIG_AC3DSP) += riscv/ac3dsp_rvv.o \
+                             riscv/ac3dsp_rvb.o
 OBJS-$(CONFIG_ALAC_DECODER) += riscv/alacdsp_init.o
 RVV-OBJS-$(CONFIG_ALAC_DECODER) += riscv/alacdsp_rvv.o
 OBJS-$(CONFIG_AUDIODSP) += riscv/audiodsp_init.o \
diff --git a/libavcodec/riscv/ac3dsp_init.c b/libavcodec/riscv/ac3dsp_init.c
index 4769213ebc..75cd3c7e11 100644
--- a/libavcodec/riscv/ac3dsp_init.c
+++ b/libavcodec/riscv/ac3dsp_init.c
@@ -26,6 +26,7 @@
 
 void ff_ac3_exponent_min_rvv(uint8_t *exp, int num_reuse_blocks, int nb_coefs);
 void ff_float_to_fixed24_rvv(int32_t *dst, const float *src, unsigned int len);
+void ff_ac3_extract_exponents_rvb(uint8_t *exp, int32_t *coef, int nb_coefs);
 void ff_ac3_sum_square_butterfly_int32_rvv(int64_t sum[4],
                                             const int32_t *coef0,
                                             const int32_t *coef1,
@@ -40,6 +41,8 @@ void ff_ac3_compute_mantissa_size_rvv(uint16_t mant_cnt[6][16]);
 av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
 {
     int flags = av_get_cpu_flags();
+    if (flags & AV_CPU_FLAG_RVB_BASIC)
+        c->extract_exponents = ff_ac3_extract_exponents_rvb;
 #if HAVE_RVV
     if (flags & AV_CPU_FLAG_RVV_I32) {
         c->ac3_exponent_min = ff_ac3_exponent_min_rvv;
diff --git a/libavcodec/riscv/ac3dsp_rvb.S b/libavcodec/riscv/ac3dsp_rvb.S
new file mode 100644
index 0000000000..3bf24c7392
--- /dev/null
+++ b/libavcodec/riscv/ac3dsp_rvb.S
@@ -0,0 +1,42 @@
+/*
+ * Copyright 2023 Beijing ESWIN Computing Technology Co., Ltd.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config.h"
+#include "libavutil/riscv/asm.S"
+
+func ff_ac3_extract_exponents_rvb, zbb
+    li               t1, __riscv_xlen - 24
+1:
+    lw               t0, (a1)
+    bgez             t0, 2f
+    neg              t0, t0
+
+2:
+    clz              t4, t0
+    sub              t4, t4, t1
+    sb               t4,(a0)
+    addi             a2, a2, -1
+    addi             a1, a1, 4
+    addi             a0, a0, 1
+
+    bgtz             a2, 1b
+
+    ret
+endfunc
\ No newline at end of file
-- 
2.17.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH 0/6] RISC-V initial ac3dsp
  2023-06-15 10:36 [FFmpeg-devel] [PATCH 0/6] RISC-V initial ac3dsp Peiting Shen
                   ` (5 preceding siblings ...)
  2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 6/6] lavc/ac3dsp: RISC-V B ac3_extract_exponents Peiting Shen
@ 2023-06-15 13:57 ` Lynne
  2023-06-15 19:10   ` Rémi Denis-Courmont
  6 siblings, 1 reply; 14+ messages in thread
From: Lynne @ 2023-06-15 13:57 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

Jun 15, 2023, 12:37 by shenpeiting@eswincomputing.com:

> From: Shen Peiting <shenpeiting@eswincomputing.com>
>
> We optimized the six interfaces of AC3 init by RVV, the optimized 
> performance was tested on the RISC-V ISA simulator--Spike, and the 
> results were attached to each commit.
>
> shenpeiting (6):
>  lavc/ac3dsp: RISC-V V ac3_exponent_min
>  lavc/ac3dsp: RISC-V V float_to_fixed24
>  lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32
>  lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_float
>  lavc/ac3dsp: RISC-V V ac3_compute_mantissa_size
>  lavc/ac3dsp: RISC-V B ac3_extract_exponents
>
>  libavcodec/ac3dsp.c            |   2 +
>  libavcodec/ac3dsp.h            |   1 +
>  libavcodec/riscv/Makefile      |   3 +
>  libavcodec/riscv/ac3dsp_init.c |  60 +++++++++
>  libavcodec/riscv/ac3dsp_rvb.S  |  42 ++++++
>  libavcodec/riscv/ac3dsp_rvv.S  | 225 +++++++++++++++++++++++++++++++++
>  6 files changed, 333 insertions(+)
>  create mode 100644 libavcodec/riscv/ac3dsp_init.c
>  create mode 100644 libavcodec/riscv/ac3dsp_rvb.S
>  create mode 100644 libavcodec/riscv/ac3dsp_rvv.S
>

Could you implement checkasm for this? It shouldn't
be more than a hundred lines, and there are examples,
tests/checkasm/aacpsdsp.c being the most similar.
Since CPUs with the needed extensions aren't released,
we're not doing any FATE runs, and so if the results don't
match the C version, we'll end up with broken code once
they do exist. And no one wants to debug someone else's
assembly.

Those results look far too optimistic, and I'm guessing
it's because they're using a theoretical huge vector size
limit. Could you re-test with something more realistic,
like 256-bit vectors, using checkasm --bench?
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH 1/6] lavc/ac3dsp: RISC-V V ac3_exponent_min
  2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 1/6] lavc/ac3dsp: RISC-V V ac3_exponent_min Peiting Shen
@ 2023-06-15 18:02   ` Rémi Denis-Courmont
  0 siblings, 0 replies; 14+ messages in thread
From: Rémi Denis-Courmont @ 2023-06-15 18:02 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Shen Peiting

Nihao

Le torstaina 15. kesäkuuta 2023, 13.36.40 EEST Peiting Shen a écrit :
> From: Shen Peiting <shenpeiting@eswincomputing.com>
> 
> Find scalar minium optimized by using RVV instructions
> 
> Benchmarks on Spike(cycles):
> *exp=1280*4;num_reuse_blocks=5;nb_coefs=16
> ac3_exponent_min_c: 1993
> ac3_exponent_min_rvv: 258
> *exp=1280*4;num_reuse_blocks=19;nb_coefs=255
> ac3_exponent_min_c: 99010
> ac3_exponent_min_rvv: 3843
> 
> The optimization performance is more obvious with the increase of number of
> reuse blocks and number of coefs.
> 
> Co-Authored by: Yang Xiaojun <yangxiaojun@eswincomputing.com>
> Co-Authored by: Huang Xing <huangxing1@eswincomputing.com>
> Co-Authored by: Zeng Fanchen <zengfanchen@eswincomputing.com>
> Signed-off-by: Shen Peiting <shenpeiting@eswincomputing.com>
> ---
>  libavcodec/ac3dsp.c            |  2 ++
>  libavcodec/ac3dsp.h            |  1 +
>  libavcodec/riscv/Makefile      |  2 ++
>  libavcodec/riscv/ac3dsp_init.c | 37 +++++++++++++++++++++++++++
>  libavcodec/riscv/ac3dsp_rvv.S  | 46 ++++++++++++++++++++++++++++++++++
>  5 files changed, 88 insertions(+)
>  create mode 100644 libavcodec/riscv/ac3dsp_init.c
>  create mode 100644 libavcodec/riscv/ac3dsp_rvv.S
> 
> diff --git a/libavcodec/ac3dsp.c b/libavcodec/ac3dsp.c
> index 22cb5f242e..302b786b15 100644
> --- a/libavcodec/ac3dsp.c
> +++ b/libavcodec/ac3dsp.c
> @@ -395,5 +395,7 @@ av_cold void ff_ac3dsp_init(AC3DSPContext *c)
>      ff_ac3dsp_init_x86(c);
>  #elif ARCH_MIPS
>      ff_ac3dsp_init_mips(c);
> +#elif ARCH_RISCV
> +    ff_ac3dsp_init_riscv(c);
>  #endif
>  }
> diff --git a/libavcodec/ac3dsp.h b/libavcodec/ac3dsp.h
> index 33e51e202e..a01bff3d11 100644
> --- a/libavcodec/ac3dsp.h
> +++ b/libavcodec/ac3dsp.h
> @@ -109,6 +109,7 @@ void ff_ac3dsp_init    (AC3DSPContext *c);
>  void ff_ac3dsp_init_arm(AC3DSPContext *c);
>  void ff_ac3dsp_init_x86(AC3DSPContext *c);
>  void ff_ac3dsp_init_mips(AC3DSPContext *c);
> +void ff_ac3dsp_init_riscv(AC3DSPContext *c);
> 
>  void ff_ac3dsp_downmix(AC3DSPContext *c, float **samples, float **matrix,
>                         int out_ch, int in_ch, int len);
> diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile
> index ee17a521fd..a627924cac 100644
> --- a/libavcodec/riscv/Makefile
> +++ b/libavcodec/riscv/Makefile
> @@ -1,5 +1,7 @@
>  OBJS-$(CONFIG_AAC_DECODER) += riscv/aacpsdsp_init.o
>  RVV-OBJS-$(CONFIG_AAC_DECODER) += riscv/aacpsdsp_rvv.o
> +OBJS-$(CONFIG_AC3DSP) += riscv/ac3dsp_init.o
> +RVV-OBJS-$(CONFIG_AC3DSP) += riscv/ac3dsp_rvv.o
>  OBJS-$(CONFIG_ALAC_DECODER) += riscv/alacdsp_init.o
>  RVV-OBJS-$(CONFIG_ALAC_DECODER) += riscv/alacdsp_rvv.o
>  OBJS-$(CONFIG_AUDIODSP) += riscv/audiodsp_init.o \
> diff --git a/libavcodec/riscv/ac3dsp_init.c b/libavcodec/riscv/ac3dsp_init.c
> new file mode 100644
> index 0000000000..bb67d86998
> --- /dev/null
> +++ b/libavcodec/riscv/ac3dsp_init.c
> @@ -0,0 +1,37 @@
> +/*
> + * Copyright 2023 Beijing ESWIN Computing Technology Co., Ltd.
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA + */
> +#include <stdint.h>
> +
> +#include "libavutil/attributes.h"
> +#include "libavcodec/ac3dsp.h"
> +#include "libavutil/cpu.h"
> +#include "config.h"
> +
> +void ff_ac3_exponent_min_rvv(uint8_t *exp, int num_reuse_blocks, int
> nb_coefs); +
> +av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
> +{
> +    int flags = av_get_cpu_flags();
> +#if HAVE_RVV
> +    if (flags & AV_CPU_FLAG_RVV_I32)
> +        c->ac3_exponent_min = ff_ac3_exponent_min_rvv;
> +#endif
> +}
> +
> diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S
> new file mode 100644
> index 0000000000..879123f4a7
> --- /dev/null
> +++ b/libavcodec/riscv/ac3dsp_rvv.S
> @@ -0,0 +1,46 @@
> +/*
> + * Copyright 2023 Beijing ESWIN Computing Technology Co., Ltd.
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA + */
> +
> +#include "libavutil/riscv/asm.S"
> +
> +func ff_ac3_exponent_min_rvv, zve32x
> +    beq             a1, x0, 3f

Conventionally, we use ABI names for GP and FP registers like almost everybody 
else and their moms in RISC-V world. So that would be `zero`.

But in this case, you should use the `beqz` alias anyway.

> +    li              t0, 256
> +    addi            a1, a1, 1
> +1:
> +    mv              t2, a0

AFAICT, t2 is always the same as a0, and thus this is unnecessary.

> +    mv              t3, a1
> +    lb              t4, (t2)
> +2:
> +    vsetvli         t1, t3, e8, m8
> +    vlse8.v         v0, (t2), t0
> +    vmv.s.x         v8, t4
> +    sub             t3, t3, t1
> +    vredminu.vs     v8, v0, v8
> +    vmv.x.s         t4, v8
> +    bnez            t3, 2b
> +    vsetivli        t1, 1, e8

When you're not using the output, so use zero.

But you don't even need to reset the vector configuration here. Just use 
masking to store the one element (you could also transfer to scalar and store, 
but that's probably slower than masking).

> +    vse8.v          v8, (a0)
> +    addi            a0, a0, 1
> +    addi            a2, a2, -1

This will stall on an in-order CPU. Please avoid immediately consecutive 
interdependent instructions.

> +    bnez            a2, 1b
> +3:
> +    ret
> +endfunc


-- 
Rémi Denis-Courmont
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH 2/6] lavc/ac3dsp: RISC-V V float_to_fixed24
  2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 2/6] lavc/ac3dsp: RISC-V V float_to_fixed24 Peiting Shen
@ 2023-06-15 18:06   ` Rémi Denis-Courmont
  0 siblings, 0 replies; 14+ messages in thread
From: Rémi Denis-Courmont @ 2023-06-15 18:06 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Shen Peiting

Le torstaina 15. kesäkuuta 2023, 13.36.41 EEST Peiting Shen a écrit :
> From: Shen Peiting <shenpeiting@eswincomputing.com>
> 
> Vector instructions replaces scalar options of float convert to fixed
> 
> Benchmarks on Spike(cycles):
> len=16
> float_to_fixed24_c: 315
> float_to_fixed24_rvv: 27
> len=160
> float_to_fixed24_c: 2871
> float_to_fixed24_rvv: 67
> 
> Co-Authored by: Yang Xiaojun <yangxiaojun@eswincomputing.com>
> Co-Authored by: Huang Xing <huangxing1@eswincomputing.com>
> Co-Authored by: Zeng Fanchen <zengfanchen@eswincomputing.com>
> Signed-off-by: Shen Peiting <shenpeiting@eswincomputing.com>
> ---
>  libavcodec/riscv/ac3dsp_init.c |  5 ++++-
>  libavcodec/riscv/ac3dsp_rvv.S  | 19 +++++++++++++++++++
>  2 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/libavcodec/riscv/ac3dsp_init.c b/libavcodec/riscv/ac3dsp_init.c
> index bb67d86998..a4e75a7541 100644
> --- a/libavcodec/riscv/ac3dsp_init.c
> +++ b/libavcodec/riscv/ac3dsp_init.c
> @@ -25,13 +25,16 @@
>  #include "config.h"
> 
>  void ff_ac3_exponent_min_rvv(uint8_t *exp, int num_reuse_blocks, int
> nb_coefs); +void ff_float_to_fixed24_rvv(int32_t *dst, const float *src,
> unsigned int len);
> 
>  av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
>  {
>      int flags = av_get_cpu_flags();
>  #if HAVE_RVV
> -    if (flags & AV_CPU_FLAG_RVV_I32)
> +    if (flags & AV_CPU_FLAG_RVV_I32) {
>          c->ac3_exponent_min = ff_ac3_exponent_min_rvv;
> +        c->float_to_fixed24 = ff_float_to_fixed24_rvv;
> +    }
>  #endif
>  }
> 
> diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S
> index 879123f4a7..d98e72c12c 100644
> --- a/libavcodec/riscv/ac3dsp_rvv.S
> +++ b/libavcodec/riscv/ac3dsp_rvv.S
> @@ -44,3 +44,22 @@ func ff_ac3_exponent_min_rvv, zve32x
>  3:
>      ret
>  endfunc
> +
> +
> +func ff_float_to_fixed24_rvv, zve32x
> +    addi            t1, x0, 1

That's `li t1, 1` please.

> +    slli            t1, t1, 24
> +    fcvt.s.w        f1, t1

Please use ABI names for FPRs, e.g. `ft0`. Nobody wants to have to remember 
which ones are callee-saved and which ones aren't.

> +1:
> +    vsetvli         t0, a2, e32, m8
> +    vle32.v         v0, (a1)
> +    vfmul.vf        v0, v0, f1
> +    vfcvt.x.f.v     v16, v0
> +    vse32.v         v16, (a0)
> +    sub             a2, a2, t0
> +    slli            t0, t0, 2
> +    add             a1, a1, t0
> +    add             a0, a0, t0

Use sh2add to save one in three instruction here.

And please interleave scalar and vector instructions so in-order CPU can 
potentially multi-issue.

> +    bgtz            a2, 1b
> +    ret
> +endfunc

-- 
Реми Дёни-Курмон
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH 0/6] RISC-V initial ac3dsp
  2023-06-15 13:57 ` [FFmpeg-devel] [PATCH 0/6] RISC-V initial ac3dsp Lynne
@ 2023-06-15 19:10   ` Rémi Denis-Courmont
  0 siblings, 0 replies; 14+ messages in thread
From: Rémi Denis-Courmont @ 2023-06-15 19:10 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

Le torstaina 15. kesäkuuta 2023, 16.57.18 EEST Lynne a écrit :
> Jun 15, 2023, 12:37 by shenpeiting@eswincomputing.com:
> > From: Shen Peiting <shenpeiting@eswincomputing.com>
> > 
> > We optimized the six interfaces of AC3 init by RVV, the optimized
> > performance was tested on the RISC-V ISA simulator--Spike, and the
> > results were attached to each commit.
> > 
> > shenpeiting (6):
> >  lavc/ac3dsp: RISC-V V ac3_exponent_min
> >  lavc/ac3dsp: RISC-V V float_to_fixed24
> >  lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32
> >  lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_float
> >  lavc/ac3dsp: RISC-V V ac3_compute_mantissa_size
> >  lavc/ac3dsp: RISC-V B ac3_extract_exponents
> >  
> >  libavcodec/ac3dsp.c            |   2 +
> >  libavcodec/ac3dsp.h            |   1 +
> >  libavcodec/riscv/Makefile      |   3 +
> >  libavcodec/riscv/ac3dsp_init.c |  60 +++++++++
> >  libavcodec/riscv/ac3dsp_rvb.S  |  42 ++++++
> >  libavcodec/riscv/ac3dsp_rvv.S  | 225 +++++++++++++++++++++++++++++++++
> >  6 files changed, 333 insertions(+)
> >  create mode 100644 libavcodec/riscv/ac3dsp_init.c
> >  create mode 100644 libavcodec/riscv/ac3dsp_rvb.S
> >  create mode 100644 libavcodec/riscv/ac3dsp_rvv.S
> 
> Could you implement checkasm for this? It shouldn't
> be more than a hundred lines, and there are examples,
> tests/checkasm/aacpsdsp.c being the most similar.
> Since CPUs with the needed extensions aren't released,
> we're not doing any FATE runs,

Well... I accept hardware donations (with regular USB-C power supply and 
passive cooling) to back what would be the third generation of RISC-V FATE 
instances.

Until R-V-V 1.0 hardware production substitutes unobtainium for silicium, I 
also accept Lichee Pi4A or equivalent hardware bundles, which would be able to 
run most (but definitely not all) of FFmpeg's RVV functions with a sizable 
amount of kludging.

> and so if the results don't
> match the C version, we'll end up with broken code once
> they do exist. And no one wants to debug someone else's
> assembly.
> 
> Those results look far too optimistic, and I'm guessing
> it's because they're using a theoretical huge vector size
> limit. Could you re-test with something more realistic,
> like 256-bit vectors, using checkasm --bench?

It could also be that Spike counts everything as one cycle, regardless of the 
group multipler, not (just) the vector size.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH 6/6] lavc/ac3dsp: RISC-V B ac3_extract_exponents
  2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 6/6] lavc/ac3dsp: RISC-V B ac3_extract_exponents Peiting Shen
@ 2023-06-15 19:18   ` Rémi Denis-Courmont
  0 siblings, 0 replies; 14+ messages in thread
From: Rémi Denis-Courmont @ 2023-06-15 19:18 UTC (permalink / raw)
  To: ffmpeg-devel

Le torstaina 15. kesäkuuta 2023, 13.36.45 EEST Peiting Shen a écrit :
> From: Shen Peiting <shenpeiting@eswincomputing.com>
> 
> Use RVB instruction clz to calculate the number of leading zeros of MSB
> instead of av_log2.
> 
> Benchmarks on Spike(cycles):
> ac3_extract_exponents_c: 8226
> ac3_extract_exponents_rvb: 1167

FWIW, RV-Zbb can be benchmarked on real hardware.

I would have done it already if only there was a checkasm case for this.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH 3/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32
  2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 3/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32 Peiting Shen
@ 2023-06-15 19:25   ` Rémi Denis-Courmont
  2023-06-16 10:15     ` 沈佩婷
  0 siblings, 1 reply; 14+ messages in thread
From: Rémi Denis-Courmont @ 2023-06-15 19:25 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Shen Peiting

Le torstaina 15. kesäkuuta 2023, 13.36.42 EEST Peiting Shen a écrit :
> From: Shen Peiting <shenpeiting@eswincomputing.com>
> 
> Scalar calculating int32 sum_square optimized by using RVV instructions
> 
> Benchmarks on Spike(cycles):
> len=128
> ac3_sum_square_butterfly_int32_c: 8497
> ac3_sum_square_butterfly_int32_rvv: 258
> len=1280
> ac3_sum_square_butterfly_int32_c: 84529
> ac3_sum_square_butterfly_int32_rvv: 2274
> 
> Co-Authored by: Yang Xiaojun <yangxiaojun@eswincomputing.com>
> Co-Authored by: Huang Xing <huangxing1@eswincomputing.com>
> Co-Authored by: Zeng Fanchen <zengfanchen@eswincomputing.com>
> Signed-off-by: Shen Peiting <shenpeiting@eswincomputing.com>
> ---
>  libavcodec/riscv/ac3dsp_init.c |  8 +++++
>  libavcodec/riscv/ac3dsp_rvv.S  | 53 ++++++++++++++++++++++++++++++++++
>  2 files changed, 61 insertions(+)
> 
> diff --git a/libavcodec/riscv/ac3dsp_init.c b/libavcodec/riscv/ac3dsp_init.c
> index a4e75a7541..4fd4abe83e 100644
> --- a/libavcodec/riscv/ac3dsp_init.c
> +++ b/libavcodec/riscv/ac3dsp_init.c
> @@ -26,6 +26,10 @@
> 
>  void ff_ac3_exponent_min_rvv(uint8_t *exp, int num_reuse_blocks, int
> nb_coefs); void ff_float_to_fixed24_rvv(int32_t *dst, const float *src,
> unsigned int len); +void ff_ac3_sum_square_butterfly_int32_rvv(int64_t
> sum[4],
> +                                            const int32_t *coef0,
> +                                            const int32_t *coef1,
> +                                            int len);
> 
>  av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
>  {
> @@ -35,6 +39,10 @@ av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
>          c->ac3_exponent_min = ff_ac3_exponent_min_rvv;
>          c->float_to_fixed24 = ff_float_to_fixed24_rvv;
>      }
> +#if (__riscv_xlen >= 64)
> +    if (flags & AV_CPU_FLAG_RVV_I64)
> +        c->sum_square_butterfly_int32 =
> ff_ac3_sum_square_butterfly_int32_rvv; +#endif
>  #endif
>  }
> 
> diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S
> index d98e72c12c..4e0d238f85 100644
> --- a/libavcodec/riscv/ac3dsp_rvv.S
> +++ b/libavcodec/riscv/ac3dsp_rvv.S
> @@ -63,3 +63,56 @@ func ff_float_to_fixed24_rvv, zve32x
>      bgtz            a2, 1b
>      ret
>  endfunc
> +
> +
> +func ff_ac3_sum_square_butterfly_int32_rvv, zve64x
> +    vsetvli         t0, a3, e32, m2
> +    vle32.v         v0, (a1)
> +    vle32.v         v2, (a2)
> +    vadd.vv         v4, v0, v2
> +    vsub.vv         v6, v0, v2
> +    vwmul.vv        v8, v0, v0
> +    vwmul.vv        v12, v2, v2
> +    vwmul.vv        v16, v4, v4
> +    vwmul.vv        v20, v6, v6
> +    sub             a3, a3, t0
> +    slli            t0, t0, 2
> +    add             a1, a1, t0
> +    add             a2, a2, t0
> +    beq             a3, x0, 2f
> +1:
> +    vsetvli         t0, a3, e32, m2
> +    vle32.v         v0, (a1)
> +    vle32.v         v2, (a2)
> +    vadd.vv         v4, v0, v2
> +    vsub.vv         v6, v0, v2
> +    vwmacc.vv       v8, v0, v0
> +    vwmacc.vv       v12, v2, v2
> +    vwmacc.vv       v16, v4, v4
> +    vwmacc.vv       v20, v6, v6
> +    sub             a3, a3, t0
> +    slli            t0, t0, 2
> +    add             a1, a1, t0
> +    add             a2, a2, t0
> +    bnez            a3, 1b
> +2:
> +    vsetvli         t0, x0, e64, m4
> +    vmv.s.x         v24, x0
> +    vmv.s.x         v25, x0
> +    vmv.s.x         v26, x0
> +    vmv.s.x         v27, x0
> +    vredsum.vs      v24, v8, v24
> +    vredsum.vs      v25, v12, v25
> +    vredsum.vs      v26, v16, v26
> +    vredsum.vs      v27, v20, v27

As far as I can tell this is a reserved encoding (c.f. RVV 1.0 §3.4.2), and I 
believe that QEMU throws an Illegal instruction in this case. (I would check 
but there are no checkasm test case for this function.) Does this actual work 
on your simulator? Because if so, then your simulator is probably broken/
buggy.

> +    vsetivli        t0, 1, e64, m1
> +    vse64.v         v24, (a0)
> +    addi            a0, a0, 8
> +    vse64.v         v25, (a0)
> +    addi            a0, a0, 8
> +    vse64.v         v26, (a0)
> +    addi            a0, a0, 8
> +    vse64.v         v27, (a0)
> +    addi            a0, a0, 8
> +    ret
> +endfunc


-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH 3/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32
  2023-06-15 19:25   ` Rémi Denis-Courmont
@ 2023-06-16 10:15     ` 沈佩婷
  0 siblings, 0 replies; 14+ messages in thread
From: 沈佩婷 @ 2023-06-16 10:15 UTC (permalink / raw)
  To: FFmpeg development discussions and patches


Hei,

> -----原始邮件-----发件人:"Rémi Denis-Courmont" <remi@remlab.net>发送时间:2023-06-16 03:25:07 (星期五)收件人:ffmpeg-devel@ffmpeg.org抄送:"Shen Peiting" <shenpeiting@eswincomputing.com>主题:Re: [FFmpeg-devel] [PATCH 3/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32
> 
> Le torstaina 15. kesäkuuta 2023, 13.36.42 EEST Peiting Shen a écrit :
> > From: Shen Peiting <shenpeiting@eswincomputing.com>
> > 
> > Scalar calculating int32 sum_square optimized by using RVV instructions
> > 
> > Benchmarks on Spike(cycles):
> > len=128
> > ac3_sum_square_butterfly_int32_c: 8497
> > ac3_sum_square_butterfly_int32_rvv: 258
> > len=1280
> > ac3_sum_square_butterfly_int32_c: 84529
> > ac3_sum_square_butterfly_int32_rvv: 2274
> > 
> > Co-Authored by: Yang Xiaojun <yangxiaojun@eswincomputing.com>
> > Co-Authored by: Huang Xing <huangxing1@eswincomputing.com>
> > Co-Authored by: Zeng Fanchen <zengfanchen@eswincomputing.com>
> > Signed-off-by: Shen Peiting <shenpeiting@eswincomputing.com>
> > ---
> >  libavcodec/riscv/ac3dsp_init.c |  8 +++++
> >  libavcodec/riscv/ac3dsp_rvv.S  | 53 ++++++++++++++++++++++++++++++++++
> >  2 files changed, 61 insertions(+)
> > 
> > diff --git a/libavcodec/riscv/ac3dsp_init.c b/libavcodec/riscv/ac3dsp_init.c
> > index a4e75a7541..4fd4abe83e 100644
> > --- a/libavcodec/riscv/ac3dsp_init.c
> > +++ b/libavcodec/riscv/ac3dsp_init.c
> > @@ -26,6 +26,10 @@
> > 
> >  void ff_ac3_exponent_min_rvv(uint8_t *exp, int num_reuse_blocks, int
> > nb_coefs); void ff_float_to_fixed24_rvv(int32_t *dst, const float *src,
> > unsigned int len); +void ff_ac3_sum_square_butterfly_int32_rvv(int64_t
> > sum[4],
> > +                                            const int32_t *coef0,
> > +                                            const int32_t *coef1,
> > +                                            int len);
> > 
> >  av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
> >  {
> > @@ -35,6 +39,10 @@ av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
> >          c->ac3_exponent_min = ff_ac3_exponent_min_rvv;
> >          c->float_to_fixed24 = ff_float_to_fixed24_rvv;
> >      }
> > +#if (__riscv_xlen >= 64)
> > +    if (flags & AV_CPU_FLAG_RVV_I64)
> > +        c->sum_square_butterfly_int32 =
> > ff_ac3_sum_square_butterfly_int32_rvv; +#endif
> >  #endif
> >  }
> > 
> > diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S
> > index d98e72c12c..4e0d238f85 100644
> > --- a/libavcodec/riscv/ac3dsp_rvv.S
> > +++ b/libavcodec/riscv/ac3dsp_rvv.S
> > @@ -63,3 +63,56 @@ func ff_float_to_fixed24_rvv, zve32x
> >      bgtz            a2, 1b
> >      ret
> >  endfunc
> > +
> > +
> > +func ff_ac3_sum_square_butterfly_int32_rvv, zve64x
> > +    vsetvli         t0, a3, e32, m2
> > +    vle32.v         v0, (a1)
> > +    vle32.v         v2, (a2)
> > +    vadd.vv         v4, v0, v2
> > +    vsub.vv         v6, v0, v2
> > +    vwmul.vv        v8, v0, v0
> > +    vwmul.vv        v12, v2, v2
> > +    vwmul.vv        v16, v4, v4
> > +    vwmul.vv        v20, v6, v6
> > +    sub             a3, a3, t0
> > +    slli            t0, t0, 2
> > +    add             a1, a1, t0
> > +    add             a2, a2, t0
> > +    beq             a3, x0, 2f
> > +1:
> > +    vsetvli         t0, a3, e32, m2
> > +    vle32.v         v0, (a1)
> > +    vle32.v         v2, (a2)
> > +    vadd.vv         v4, v0, v2
> > +    vsub.vv         v6, v0, v2
> > +    vwmacc.vv       v8, v0, v0
> > +    vwmacc.vv       v12, v2, v2
> > +    vwmacc.vv       v16, v4, v4
> > +    vwmacc.vv       v20, v6, v6
> > +    sub             a3, a3, t0
> > +    slli            t0, t0, 2
> > +    add             a1, a1, t0
> > +    add             a2, a2, t0
> > +    bnez            a3, 1b
> > +2:
> > +    vsetvli         t0, x0, e64, m4
> > +    vmv.s.x         v24, x0
> > +    vmv.s.x         v25, x0
> > +    vmv.s.x         v26, x0
> > +    vmv.s.x         v27, x0
> > +    vredsum.vs      v24, v8, v24
> > +    vredsum.vs      v25, v12, v25
> > +    vredsum.vs      v26, v16, v26
> > +    vredsum.vs      v27, v20, v27
> 
> As far as I can tell this is a reserved encoding (c.f. RVV 1.0 §3.4.2), and I 
> believe that QEMU throws an Illegal instruction in this case. (I would check 
> but there are no checkasm test case for this function.) Does this actual work 
> on your simulator? Because if so, then your simulator is probably broken/
> buggy.
> 
RVV 1.0 §14 
Vector reduction operations take a vector register group of elements and a scalar held in 
element 0 of a vector register, and perform a reduction using some binary operator, to produce
a scalar result in element 0 of a vector register. The scalar input and output operands 
are held in element 0 of a single vector register, not a vector register group, so any vector
register can be the scalar source or destination of a vector reduction regardless of LMUL setting.

RVV 1.0 §16.1. Integer Scalar Move Instructions
The integer scalar read/write instructions transfer a single value between a scalar x register and
element 0 of a vector register. The instructions ignore LMUL and vector register groups.

According to the above, I think this coding is legal.

Actually, we have passed all the fate tests on the qemu 6.0.0,compiled riscv-unknown-linux-gnu-gcc 13.0.1, configuration as
./configure --enable-cross-compile --cross-prefix=riscv64-unknown-linux-gnu- --arch=riscv 
--extra-cflags="-march=rv64imafdcbv -mabi=lp64d --static -I/home/user/code/iconv/iconv-riscv/include" 
--prefix=ffshare --extra-libs="-static -liconv" --extra-ldflags="-L/home/user/code/iconv/iconv-riscv/lib" 
--target-os=linux --target-exec="qemu-riscv64 -cpu rv64,x-v=true,x-b=true,x-zpn=true,x-zbpbo=true,x-zpsfoperand=true,x-arith=true" 
--enable-gpl --enable-memory-poisoning

We will modify the non-standard coding mentioned in emails, and complete the checkasm code in patch v2
> > +    vsetivli        t0, 1, e64, m1
> > +    vse64.v         v24, (a0)
> > +    addi            a0, a0, 8
> > +    vse64.v         v25, (a0)
> > +    addi            a0, a0, 8
> > +    vse64.v         v26, (a0)
> > +    addi            a0, a0, 8
> > +    vse64.v         v27, (a0)
> > +    addi            a0, a0, 8
> > +    ret
> > +endfunc
> 
> 
> -- 
> 雷米‧德尼-库尔蒙
> http://www.remlab.net/
> 
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-06-16 10:15 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-15 10:36 [FFmpeg-devel] [PATCH 0/6] RISC-V initial ac3dsp Peiting Shen
2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 1/6] lavc/ac3dsp: RISC-V V ac3_exponent_min Peiting Shen
2023-06-15 18:02   ` Rémi Denis-Courmont
2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 2/6] lavc/ac3dsp: RISC-V V float_to_fixed24 Peiting Shen
2023-06-15 18:06   ` Rémi Denis-Courmont
2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 3/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32 Peiting Shen
2023-06-15 19:25   ` Rémi Denis-Courmont
2023-06-16 10:15     ` 沈佩婷
2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 4/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_float Peiting Shen
2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 5/6] lavc/ac3dsp: RISC-V V ac3_compute_mantissa_size Peiting Shen
2023-06-15 10:36 ` [FFmpeg-devel] [PATCH 6/6] lavc/ac3dsp: RISC-V B ac3_extract_exponents Peiting Shen
2023-06-15 19:18   ` Rémi Denis-Courmont
2023-06-15 13:57 ` [FFmpeg-devel] [PATCH 0/6] RISC-V initial ac3dsp Lynne
2023-06-15 19:10   ` Rémi Denis-Courmont

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git