Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
* [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP
@ 2022-09-04 13:54 Rémi Denis-Courmont
  2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 01/10] riscv: add CPU flags for the RISC-V Vector extension remi
                   ` (10 more replies)
  0 siblings, 11 replies; 13+ messages in thread
From: Rémi Denis-Courmont @ 2022-09-04 13:54 UTC (permalink / raw)
  To: ffmpeg-devel

The following changes since commit b6e8fc1c201d58672639134a737137e1ba7b55fe:

  avcodec/speexdec: improve support for speex in non-ogg (2022-09-04 11:31:57 +0200)

are waiting thorough bashing at your express convenience up to:

  riscv: float vector dot product with RVV (2022-09-04 16:45:38 +0300)

Changes since v1:

- Removed stray define.
- Fixed mismatch between byte and element size in mul-scalar.
- Added fmul, fac, dmul, dmac, fmul-add, fmul-reverse, fmul-window.
- Added float butterfly and dot product.

All operations are unrolled to the maximum group size (8), with the
exception of overlap/add. The later seems to require a minimum of 6
vectors (maybe 5 by extremely careful ordering), so the group size is
only 4.

The pointer arithmetic could be slightly optimised with SH2ADD and
SH3ADD instructions from the Zvba extension. This would require more
conditional code, or requiring support for Zvba for probably neglible
performance gains though.

----------------------------------------------------------------
Rémi Denis-Courmont (10):
      riscv: add CPU flags for the RISC-V Vector extension
      riscv: initial common header for assembler macros
      riscv: float vector-scalar multiplication with RVV
      riscv: float vector-vector multiplication with RVV
      riscv: float vector multiply-accumulate with RVV
      riscv: float vector multiplication-addition with RVV
      riscv: float vector sum-and-difference with RVV
      riscv: float reversed vector multiplication with RVV
      riscv: float vector windowed overlap/add with RVV
      riscv: float vector dot product with RVV

 libavutil/cpu.c                  |  14 +++
 libavutil/cpu.h                  |   6 +
 libavutil/cpu_internal.h         |   1 +
 libavutil/float_dsp.c            |   2 +
 libavutil/float_dsp.h            |   1 +
 libavutil/riscv/Makefile         |   3 +
 libavutil/riscv/asm.S            |  33 +++++
 libavutil/riscv/cpu.c            |  57 +++++++++
 libavutil/riscv/float_dsp_init.c |  67 ++++++++++
 libavutil/riscv/float_dsp_rvv.S  | 255 +++++++++++++++++++++++++++++++++++++++
 10 files changed, 439 insertions(+)
 create mode 100644 libavutil/riscv/Makefile
 create mode 100644 libavutil/riscv/asm.S
 create mode 100644 libavutil/riscv/cpu.c
 create mode 100644 libavutil/riscv/float_dsp_init.c
 create mode 100644 libavutil/riscv/float_dsp_rvv.S

-- 
レミ・デニ-クールモン
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [FFmpeg-devel] [PATCH 01/10] riscv: add CPU flags for the RISC-V Vector extension
  2022-09-04 13:54 [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP Rémi Denis-Courmont
@ 2022-09-04 13:54 ` remi
  2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 02/10] riscv: initial common header for assembler macros remi
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: remi @ 2022-09-04 13:54 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

RVV defines a total of 12 different extensions: V, Zvl32b, Zvl64b,
Zvl128b, Zvl256b, Zvl512b, Zvl1024b, Zve32x, Zve32f, Zve64x, Zve64f and
Zve64d.

At this stage, we don't expose the vector length extensions Zvl*, as
the vector length is most commonly determined at run-time depending on
the element size and the effective multipler. There are anyways no
other run-time mechanisms defiend to determine the actual maximum
vector length than to invoke VSETVL.

Zve64f is equivalent to Zve32f plus Zve64x, so it is exposed as a
convenience flag, but not tracked internally. Likewise V is the
equivalent of Zve64d plus Zvl128b.

Technically, Zve32f and Zve64x are both implied by Zve64d and both
imply Zve32x, leaving only 5 possibilities (including no vector
support), but we keep 4 separate bits for easy run-time checks as on
other instruction set architectures.
---
 libavutil/cpu.c          | 14 ++++++++++
 libavutil/cpu.h          |  6 +++++
 libavutil/cpu_internal.h |  1 +
 libavutil/riscv/Makefile |  1 +
 libavutil/riscv/cpu.c    | 57 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 79 insertions(+)
 create mode 100644 libavutil/riscv/Makefile
 create mode 100644 libavutil/riscv/cpu.c

diff --git a/libavutil/cpu.c b/libavutil/cpu.c
index 0035e927a5..83bf513cf2 100644
--- a/libavutil/cpu.c
+++ b/libavutil/cpu.c
@@ -62,6 +62,8 @@ static int get_cpu_flags(void)
     return ff_get_cpu_flags_arm();
 #elif ARCH_PPC
     return ff_get_cpu_flags_ppc();
+#elif ARCH_RISCV
+    return ff_get_cpu_flags_riscv();
 #elif ARCH_X86
     return ff_get_cpu_flags_x86();
 #elif ARCH_LOONGARCH
@@ -178,6 +180,18 @@ int av_parse_cpu_caps(unsigned *flags, const char *s)
 #elif ARCH_LOONGARCH
         { "lsx",      NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_LSX      },    .unit = "flags" },
         { "lasx",     NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_LASX     },    .unit = "flags" },
+#elif ARCH_RISCV
+#define AV_CPU_FLAG_ZVE32F_M (AV_CPU_FLAG_ZVE32F | AV_CPU_FLAG_ZVE32X)
+#define AV_CPU_FLAG_ZVE64X_M (AV_CPU_FLAG_ZVE64X | AV_CPU_FLAG_ZVE32X)
+#define AV_CPU_FLAG_ZVE64D_M (AV_CPU_FLAG_ZVE64D | AV_CPU_FLAG_ZVE64F_M)
+#define AV_CPU_FLAG_ZVE64F_M (AV_CPU_FLAG_ZVE32F_M | AV_CPU_FLAG_ZVE64X_M)
+#define AV_CPU_FLAG_VECTORS  AV_CPU_FLAG_ZVE64D_M
+        { "vectors",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_VECTORS  },    .unit = "flags" },
+        { "zve32x",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_ZVE32X   },    .unit = "flags" },
+        { "zve32f",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_ZVE32F_M },    .unit = "flags" },
+        { "zve64x",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_ZVE64X_M },    .unit = "flags" },
+        { "zve64f",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_ZVE64F_M },    .unit = "flags" },
+        { "zve64d",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_ZVE64D_M },    .unit = "flags" },
 #endif
         { NULL },
     };
diff --git a/libavutil/cpu.h b/libavutil/cpu.h
index 9711e574c5..44836e50d6 100644
--- a/libavutil/cpu.h
+++ b/libavutil/cpu.h
@@ -78,6 +78,12 @@
 #define AV_CPU_FLAG_LSX          (1 << 0)
 #define AV_CPU_FLAG_LASX         (1 << 1)
 
+// RISC-V Vector extension
+#define AV_CPU_FLAG_ZVE32X       (1 << 0) /* 8-, 16-, 32-bit integers */
+#define AV_CPU_FLAG_ZVE32F       (1 << 1) /* single precision scalars */
+#define AV_CPU_FLAG_ZVE64X       (1 << 2) /* 64-bit integers */
+#define AV_CPU_FLAG_ZVE64D       (1 << 3) /* double precision scalars */
+
 /**
  * Return the flags which specify extensions supported by the CPU.
  * The returned value is affected by av_force_cpu_flags() if that was used
diff --git a/libavutil/cpu_internal.h b/libavutil/cpu_internal.h
index 650d47fc96..634f28bac4 100644
--- a/libavutil/cpu_internal.h
+++ b/libavutil/cpu_internal.h
@@ -48,6 +48,7 @@ int ff_get_cpu_flags_mips(void);
 int ff_get_cpu_flags_aarch64(void);
 int ff_get_cpu_flags_arm(void);
 int ff_get_cpu_flags_ppc(void);
+int ff_get_cpu_flags_riscv(void);
 int ff_get_cpu_flags_x86(void);
 int ff_get_cpu_flags_loongarch(void);
 
diff --git a/libavutil/riscv/Makefile b/libavutil/riscv/Makefile
new file mode 100644
index 0000000000..1f818043dc
--- /dev/null
+++ b/libavutil/riscv/Makefile
@@ -0,0 +1 @@
+OBJS += riscv/cpu.o
diff --git a/libavutil/riscv/cpu.c b/libavutil/riscv/cpu.c
new file mode 100644
index 0000000000..9e4cce5e8b
--- /dev/null
+++ b/libavutil/riscv/cpu.c
@@ -0,0 +1,57 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/cpu.h"
+#include "libavutil/cpu_internal.h"
+#include "config.h"
+
+#if HAVE_GETAUXVAL
+#include <sys/auxv.h>
+#endif
+
+#define HWCAP_RV(letter) (1ul << ((letter) - 'A'))
+
+int ff_get_cpu_flags_riscv(void)
+{
+    int ret = 0;
+
+    /* If RV-V is enabled statically at compile-time, check the details. */
+#ifdef __riscv_vectors
+    ret |= AV_CPU_FLAG_ZVE32X;
+#if __riscv_v_elen >= 64
+    ret |= AV_CPU_FLAG_ZVE64X;
+#endif
+#if __riscv_v_elen_fp >= 32
+    ret |= AV_CPU_FLAG_ZVE32F;
+#if __riscv_v_elen_fp >= 64
+    ret |= AV_CPU_FLAG_ZVE64F;
+#endif
+#endif
+#endif
+
+#if HAVE_GETAUXVAL
+    const unsigned long hwcap = getauxval(AT_HWCAP);
+
+    /* The V extension implies all subsets */
+    if (hwcap & HWCAP_RV('V'))
+        ret |= AV_CPU_FLAG_ZVE32X | AV_CPU_FLAG_ZVE64X
+             | AV_CPU_FLAG_ZVE32F | AV_CPU_FLAG_ZVE64D;
+#endif
+
+    return ret;
+}
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [FFmpeg-devel] [PATCH 02/10] riscv: initial common header for assembler macros
  2022-09-04 13:54 [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP Rémi Denis-Courmont
  2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 01/10] riscv: add CPU flags for the RISC-V Vector extension remi
@ 2022-09-04 13:54 ` remi
  2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 03/10] riscv: float vector-scalar multiplication with RVV remi
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: remi @ 2022-09-04 13:54 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 libavutil/riscv/asm.S | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)
 create mode 100644 libavutil/riscv/asm.S

diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S
new file mode 100644
index 0000000000..31001b8bdb
--- /dev/null
+++ b/libavutil/riscv/asm.S
@@ -0,0 +1,33 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config.h"
+
+        .macro func sym
+                .text
+                .align 2
+
+                .global \sym
+                .type   \sym, %function
+                \sym:
+
+                .macro endfunc
+                        .size   \sym, . - \sym
+                        .purgem endfunc
+                .endm
+        .endm
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [FFmpeg-devel] [PATCH 03/10] riscv: float vector-scalar multiplication with RVV
  2022-09-04 13:54 [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP Rémi Denis-Courmont
  2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 01/10] riscv: add CPU flags for the RISC-V Vector extension remi
  2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 02/10] riscv: initial common header for assembler macros remi
@ 2022-09-04 13:54 ` remi
  2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 04/10] riscv: float vector-vector " remi
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: remi @ 2022-09-04 13:54 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

This is based on existing code from the VLC git tree with two minor
changes to account for the different function prototypes.
---
 libavutil/float_dsp.c            |  2 ++
 libavutil/float_dsp.h            |  1 +
 libavutil/riscv/Makefile         |  4 ++-
 libavutil/riscv/float_dsp_init.c | 41 +++++++++++++++++++++
 libavutil/riscv/float_dsp_rvv.S  | 62 ++++++++++++++++++++++++++++++++
 5 files changed, 109 insertions(+), 1 deletion(-)
 create mode 100644 libavutil/riscv/float_dsp_init.c
 create mode 100644 libavutil/riscv/float_dsp_rvv.S

diff --git a/libavutil/float_dsp.c b/libavutil/float_dsp.c
index 8676c8b0f8..742dd679d2 100644
--- a/libavutil/float_dsp.c
+++ b/libavutil/float_dsp.c
@@ -156,6 +156,8 @@ av_cold AVFloatDSPContext *avpriv_float_dsp_alloc(int bit_exact)
     ff_float_dsp_init_arm(fdsp);
 #elif ARCH_PPC
     ff_float_dsp_init_ppc(fdsp, bit_exact);
+#elif ARCH_RISCV
+    ff_float_dsp_init_riscv(fdsp);
 #elif ARCH_X86
     ff_float_dsp_init_x86(fdsp);
 #elif ARCH_MIPS
diff --git a/libavutil/float_dsp.h b/libavutil/float_dsp.h
index 9c664592bd..7cad9fc622 100644
--- a/libavutil/float_dsp.h
+++ b/libavutil/float_dsp.h
@@ -205,6 +205,7 @@ float avpriv_scalarproduct_float_c(const float *v1, const float *v2, int len);
 void ff_float_dsp_init_aarch64(AVFloatDSPContext *fdsp);
 void ff_float_dsp_init_arm(AVFloatDSPContext *fdsp);
 void ff_float_dsp_init_ppc(AVFloatDSPContext *fdsp, int strict);
+void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp);
 void ff_float_dsp_init_x86(AVFloatDSPContext *fdsp);
 void ff_float_dsp_init_mips(AVFloatDSPContext *fdsp);
 
diff --git a/libavutil/riscv/Makefile b/libavutil/riscv/Makefile
index 1f818043dc..6bf8243e8d 100644
--- a/libavutil/riscv/Makefile
+++ b/libavutil/riscv/Makefile
@@ -1 +1,3 @@
-OBJS += riscv/cpu.o
+OBJS += riscv/cpu.o \
+        riscv/float_dsp_init.o \
+        riscv/float_dsp_rvv.o
diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
new file mode 100644
index 0000000000..279412c036
--- /dev/null
+++ b/libavutil/riscv/float_dsp_init.c
@@ -0,0 +1,41 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <stdint.h>
+
+#include "libavutil/attributes.h"
+#include "libavutil/cpu.h"
+#include "libavutil/float_dsp.h"
+
+void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
+                                int len);
+
+void ff_vector_dmul_scalar_rvv(double *dst, const double *src, double mul,
+                                int len);
+
+av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
+{
+    int flags = av_get_cpu_flags();
+
+    if (flags & AV_CPU_FLAG_ZVE32F) {
+        fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
+
+        if (flags & AV_CPU_FLAG_ZVE64D)
+            fdsp->vector_dmul_scalar = ff_vector_dmul_scalar_rvv;
+    }
+}
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
new file mode 100644
index 0000000000..98d06c6d07
--- /dev/null
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -0,0 +1,62 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config.h"
+#include "asm.S"
+
+        .option  arch, +v
+
+// (a0) = (a1) * fa0 [0..a2-1]
+func ff_vector_fmul_scalar_rvv
+#if defined (__riscv_float_abi_soft)
+        fmv.w.x  fa0, a2
+        mv       a2, a3
+#endif
+
+1:      vsetvli  t0, a2, e32, m8, ta, ma
+        slli     t1, t0, 2
+        vle32.v  v16, (a1)
+        add      a1, a1, t1
+        vfmul.vf v16, v16, fa0
+        sub      a2, a2, t0
+        vse32.v  v16, (a0)
+        add      a0, a0, t1
+        bnez     a2, 1b
+
+        ret
+endfunc
+
+// (a0) = (a1) * fa0 [0..a2-1]
+func ff_vector_dmul_scalar_rvv
+#if defined (__riscv_float_abi_soft) || defined (__riscv_float_abi_single)
+        fmv.d.x  fa0, a2
+        mv       a2, a3
+#endif
+
+1:      vsetvli  t0, a2, e64, m8, ta, ma
+        slli     t1, t0, 3
+        vle64.v  v16, (a1)
+        add      a1, a1, t1
+        vfmul.vf v16, v16, fa0
+        sub      a2, a2, t0
+        vse64.v  v16, (a0)
+        add      a0, a0, t1
+        bnez     a2, 1b
+
+        ret
+endfunc
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [FFmpeg-devel] [PATCH 04/10] riscv: float vector-vector multiplication with RVV
  2022-09-04 13:54 [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP Rémi Denis-Courmont
                   ` (2 preceding siblings ...)
  2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 03/10] riscv: float vector-scalar multiplication with RVV remi
@ 2022-09-04 13:54 ` remi
  2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 05/10] riscv: float vector multiply-accumulate " remi
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: remi @ 2022-09-04 13:54 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 libavutil/riscv/float_dsp_init.c |  9 ++++++++-
 libavutil/riscv/float_dsp_rvv.S  | 34 ++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index 279412c036..4135284c76 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -22,9 +22,13 @@
 #include "libavutil/cpu.h"
 #include "libavutil/float_dsp.h"
 
+void ff_vector_fmul_rvv(float *dst, const float *src0, const float *src1,
+                         int len);
 void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
                                 int len);
 
+void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
+                         int len);
 void ff_vector_dmul_scalar_rvv(double *dst, const double *src, double mul,
                                 int len);
 
@@ -33,9 +37,12 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
     int flags = av_get_cpu_flags();
 
     if (flags & AV_CPU_FLAG_ZVE32F) {
+        fdsp->vector_fmul = ff_vector_fmul_rvv;
         fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
 
-        if (flags & AV_CPU_FLAG_ZVE64D)
+        if (flags & AV_CPU_FLAG_ZVE64D) {
+            fdsp->vector_dmul = ff_vector_dmul_rvv;
             fdsp->vector_dmul_scalar = ff_vector_dmul_scalar_rvv;
+        }
     }
 }
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index 98d06c6d07..15c875f9d2 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -21,6 +21,23 @@
 
         .option  arch, +v
 
+// (a0) = (a1) * (a2) [0..a3-1]
+func ff_vector_fmul_rvv
+1:      vsetvli  t0, a3, e32, m8, ta, ma
+        slli     t1, t0, 2
+        vle32.v  v16, (a1)
+        add      a1, a1, t1
+        vle32.v  v24, (a2)
+        add      a2, a2, t1
+        vfmul.vv v16, v16, v24
+        sub      a3, a3, t0
+        vse32.v  v16, (a0)
+        add      a0, a0, t1
+        bnez     a3, 1b
+
+        ret
+endfunc
+
 // (a0) = (a1) * fa0 [0..a2-1]
 func ff_vector_fmul_scalar_rvv
 #if defined (__riscv_float_abi_soft)
@@ -41,6 +58,23 @@ func ff_vector_fmul_scalar_rvv
         ret
 endfunc
 
+// (a0) = (a1) * (a2) [0..a3-1]
+func ff_vector_dmul_rvv
+1:      vsetvli  t0, a3, e64, m8, ta, ma
+        slli     t1, t0, 3
+        vle64.v  v16, (a1)
+        add      a1, a1, t1
+        vle64.v  v24, (a2)
+        add      a2, a2, t1
+        vfmul.vv v16, v16, v24
+        sub      a3, a3, t0
+        vse64.v  v16, (a0)
+        add      a0, a0, t1
+        bnez     a3, 1b
+
+        ret
+endfunc
+
 // (a0) = (a1) * fa0 [0..a2-1]
 func ff_vector_dmul_scalar_rvv
 #if defined (__riscv_float_abi_soft) || defined (__riscv_float_abi_single)
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [FFmpeg-devel] [PATCH 05/10] riscv: float vector multiply-accumulate with RVV
  2022-09-04 13:54 [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP Rémi Denis-Courmont
                   ` (3 preceding siblings ...)
  2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 04/10] riscv: float vector-vector " remi
@ 2022-09-04 13:54 ` remi
  2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 06/10] riscv: float vector multiplication-addition " remi
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: remi @ 2022-09-04 13:54 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 libavutil/riscv/float_dsp_init.c |  6 +++++
 libavutil/riscv/float_dsp_rvv.S  | 42 ++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index 4135284c76..a1bb112ec7 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -24,11 +24,15 @@
 
 void ff_vector_fmul_rvv(float *dst, const float *src0, const float *src1,
                          int len);
+void ff_vector_fmac_scalar_rvv(float *dst, const float *src, float mul,
+                                int len);
 void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
                                 int len);
 
 void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
                          int len);
+void ff_vector_dmac_scalar_rvv(double *dst, const double *src, double mul,
+                                int len);
 void ff_vector_dmul_scalar_rvv(double *dst, const double *src, double mul,
                                 int len);
 
@@ -38,10 +42,12 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
 
     if (flags & AV_CPU_FLAG_ZVE32F) {
         fdsp->vector_fmul = ff_vector_fmul_rvv;
+        fdsp->vector_fmac_scalar = ff_vector_fmac_scalar_rvv;
         fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
 
         if (flags & AV_CPU_FLAG_ZVE64D) {
             fdsp->vector_dmul = ff_vector_dmul_rvv;
+            fdsp->vector_dmac_scalar = ff_vector_dmac_scalar_rvv;
             fdsp->vector_dmul_scalar = ff_vector_dmul_scalar_rvv;
         }
     }
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index 15c875f9d2..8adfa6085c 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -38,6 +38,27 @@ func ff_vector_fmul_rvv
         ret
 endfunc
 
+// (a0) += (a1) * fa0 [0..a2-1]
+func ff_vector_fmac_scalar_rvv
+#if defined (__riscv_float_abi_soft)
+        fmv.w.x   fa0, a2
+        mv        a2, a3
+#endif
+
+1:      vsetvli   t0, a2, e32, m8, ta, ma
+        slli      t1, t0, 2
+        vle32.v   v24, (a1)
+        add       a1, a1, t1
+        vle32.v   v16, (a0)
+        vfmacc.vf v16, fa0, v24
+        sub       a2, a2, t0
+        vse32.v   v16, (a0)
+        add       a0, a0, t1
+        bnez      a2, 1b
+
+        ret
+endfunc
+
 // (a0) = (a1) * fa0 [0..a2-1]
 func ff_vector_fmul_scalar_rvv
 #if defined (__riscv_float_abi_soft)
@@ -75,6 +96,27 @@ func ff_vector_dmul_rvv
         ret
 endfunc
 
+// (a0) += (a1) * fa0 [0..a2-1]
+func ff_vector_dmac_scalar_rvv
+#if defined (__riscv_float_abi_soft) || defined (__riscv_float_abi_single)
+        fmv.d.x   fa0, a2
+        mv        a2, a3
+#endif
+
+1:      vsetvli   t0, a2, e64, m8, ta, ma
+        slli      t1, t0, 3
+        vle64.v   v24, (a1)
+        add       a1, a1, t1
+        vle64.v   v16, (a0)
+        vfmacc.vf v16, fa0, v24
+        sub       a2, a2, t0
+        vse64.v   v16, (a0)
+        add       a0, a0, t1
+        bnez      a2, 1b
+
+        ret
+endfunc
+
 // (a0) = (a1) * fa0 [0..a2-1]
 func ff_vector_dmul_scalar_rvv
 #if defined (__riscv_float_abi_soft) || defined (__riscv_float_abi_single)
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [FFmpeg-devel] [PATCH 06/10] riscv: float vector multiplication-addition with RVV
  2022-09-04 13:54 [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP Rémi Denis-Courmont
                   ` (4 preceding siblings ...)
  2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 05/10] riscv: float vector multiply-accumulate " remi
@ 2022-09-04 13:54 ` remi
  2022-09-04 13:55 ` [FFmpeg-devel] [PATCH 07/10] riscv: float vector sum-and-difference " remi
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: remi @ 2022-09-04 13:54 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 libavutil/riscv/float_dsp_init.c |  3 +++
 libavutil/riscv/float_dsp_rvv.S  | 19 +++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index a1bb112ec7..8539fe9ac5 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -28,6 +28,8 @@ void ff_vector_fmac_scalar_rvv(float *dst, const float *src, float mul,
                                 int len);
 void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
                                 int len);
+void ff_vector_fmul_add_rvv(float *dst, const float *src0, const float *src1,
+                             const float *src2, int len);
 
 void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
                          int len);
@@ -44,6 +46,7 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
         fdsp->vector_fmul = ff_vector_fmul_rvv;
         fdsp->vector_fmac_scalar = ff_vector_fmac_scalar_rvv;
         fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
+        fdsp->vector_fmul_add = ff_vector_fmul_add_rvv;
 
         if (flags & AV_CPU_FLAG_ZVE64D) {
             fdsp->vector_dmul = ff_vector_dmul_rvv;
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index 8adfa6085c..27190c21ff 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -79,6 +79,25 @@ func ff_vector_fmul_scalar_rvv
         ret
 endfunc
 
+// (a0) = (a1) * (a2) + (a3) [0..a4-1]
+func ff_vector_fmul_add_rvv
+1:      vsetvli   t0, a4, e32, m8, ta, ma
+        slli      t1, t0, 2
+        vle32.v   v8, (a1)
+        add       a1, a1, t1
+        vle32.v   v16, (a2)
+        add       a2, a2, t1
+        vle32.v   v24, (a3)
+        add       a3, a3, t1
+        vfmadd.vv v8, v16, v24
+        sub       a4, a4, t0
+        vse32.v   v8, (a0)
+        add       a0, a0, t1
+        bnez      a4, 1b
+
+        ret
+endfunc
+
 // (a0) = (a1) * (a2) [0..a3-1]
 func ff_vector_dmul_rvv
 1:      vsetvli  t0, a3, e64, m8, ta, ma
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [FFmpeg-devel] [PATCH 07/10] riscv: float vector sum-and-difference with RVV
  2022-09-04 13:54 [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP Rémi Denis-Courmont
                   ` (5 preceding siblings ...)
  2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 06/10] riscv: float vector multiplication-addition " remi
@ 2022-09-04 13:55 ` remi
  2022-09-04 13:55 ` [FFmpeg-devel] [PATCH 08/10] riscv: float reversed vector multiplication " remi
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: remi @ 2022-09-04 13:55 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 libavutil/riscv/float_dsp_init.c |  2 ++
 libavutil/riscv/float_dsp_rvv.S  | 18 ++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index 8539fe9ac5..2165394585 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -30,6 +30,7 @@ void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
                                 int len);
 void ff_vector_fmul_add_rvv(float *dst, const float *src0, const float *src1,
                              const float *src2, int len);
+void ff_butterflies_float_rvv(float *v1, float *v2, int len);
 
 void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
                          int len);
@@ -47,6 +48,7 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
         fdsp->vector_fmac_scalar = ff_vector_fmac_scalar_rvv;
         fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
         fdsp->vector_fmul_add = ff_vector_fmul_add_rvv;
+        fdsp->butterflies_float = ff_butterflies_float_rvv;
 
         if (flags & AV_CPU_FLAG_ZVE64D) {
             fdsp->vector_dmul = ff_vector_dmul_rvv;
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index 27190c21ff..61beb868b0 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -98,6 +98,24 @@ func ff_vector_fmul_add_rvv
         ret
 endfunc
 
+// (a0) = (a0) + (a1), (a1) = (a0) - (a1) [0..a2-1]
+func ff_butterflies_float_rvv
+1:      vsetvli  t0, a2, e32, m8, ta, ma
+        slli     t1, t0, 2
+        vle32.v  v16, (a0)
+        vle32.v  v24, (a1)
+        vfadd.vv v0, v16, v24
+        vfsub.vv v8, v16, v24
+        sub      a2, a2, t0
+        vse32.v  v0, (a0)
+        add      a0, a0, t1
+        vse32.v  v8, (a1)
+        add      a1, a1, t1
+        bnez     a2, 1b
+
+        ret
+endfunc
+
 // (a0) = (a1) * (a2) [0..a3-1]
 func ff_vector_dmul_rvv
 1:      vsetvli  t0, a3, e64, m8, ta, ma
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [FFmpeg-devel] [PATCH 08/10] riscv: float reversed vector multiplication with RVV
  2022-09-04 13:54 [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP Rémi Denis-Courmont
                   ` (6 preceding siblings ...)
  2022-09-04 13:55 ` [FFmpeg-devel] [PATCH 07/10] riscv: float vector sum-and-difference " remi
@ 2022-09-04 13:55 ` remi
  2022-09-04 13:55 ` [FFmpeg-devel] [PATCH 09/10] riscv: float vector windowed overlap/add " remi
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: remi @ 2022-09-04 13:55 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 libavutil/riscv/float_dsp_init.c |  3 +++
 libavutil/riscv/float_dsp_rvv.S  | 22 ++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index 2165394585..1183460181 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -30,6 +30,8 @@ void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
                                 int len);
 void ff_vector_fmul_add_rvv(float *dst, const float *src0, const float *src1,
                              const float *src2, int len);
+void ff_vector_fmul_reverse_rvv(float *dst, const float *src0,
+                                 const float *src1, int len);
 void ff_butterflies_float_rvv(float *v1, float *v2, int len);
 
 void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
@@ -48,6 +50,7 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
         fdsp->vector_fmac_scalar = ff_vector_fmac_scalar_rvv;
         fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
         fdsp->vector_fmul_add = ff_vector_fmul_add_rvv;
+        fdsp->vector_fmul_reverse = ff_vector_fmul_reverse_rvv;
         fdsp->butterflies_float = ff_butterflies_float_rvv;
 
         if (flags & AV_CPU_FLAG_ZVE64D) {
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index 61beb868b0..e3738ef7c5 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -98,6 +98,28 @@ func ff_vector_fmul_add_rvv
         ret
 endfunc
 
+// (a0) = (a1) * reverse(a2) [0..a3-1]
+func ff_vector_fmul_reverse_rvv
+        add      t3, a3, -1
+        li       t2, -4 // byte stride
+        slli     t3, t3, 2
+        add      a2, a2, t3
+
+1:      vsetvli  t0, a3, e32, m8, ta, ma
+        slli     t1, t0, 2
+        vle32.v  v16, (a1)
+        add      a1, a1, t1
+        vlse32.v v24, (a2), t2
+        sub      a2, a2, t1
+        vfmul.vv v16, v16, v24
+        sub      a3, a3, t0
+        vse32.v  v16, (a0)
+        add      a0, a0, t1
+        bnez     a3, 1b
+
+        ret
+endfunc
+
 // (a0) = (a0) + (a1), (a1) = (a0) - (a1) [0..a2-1]
 func ff_butterflies_float_rvv
 1:      vsetvli  t0, a2, e32, m8, ta, ma
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [FFmpeg-devel] [PATCH 09/10] riscv: float vector windowed overlap/add with RVV
  2022-09-04 13:54 [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP Rémi Denis-Courmont
                   ` (7 preceding siblings ...)
  2022-09-04 13:55 ` [FFmpeg-devel] [PATCH 08/10] riscv: float reversed vector multiplication " remi
@ 2022-09-04 13:55 ` remi
  2022-09-04 13:55 ` [FFmpeg-devel] [PATCH 10/10] riscv: float vector dot product " remi
  2022-09-04 17:48 ` [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP Lynne
  10 siblings, 0 replies; 13+ messages in thread
From: remi @ 2022-09-04 13:55 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 libavutil/riscv/float_dsp_init.c |  3 +++
 libavutil/riscv/float_dsp_rvv.S  | 35 ++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index 1183460181..887706d899 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -28,6 +28,8 @@ void ff_vector_fmac_scalar_rvv(float *dst, const float *src, float mul,
                                 int len);
 void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
                                 int len);
+void ff_vector_fmul_window_rvv(float *dst, const float *src0,
+                                const float *src1, const float *win, int len);
 void ff_vector_fmul_add_rvv(float *dst, const float *src0, const float *src1,
                              const float *src2, int len);
 void ff_vector_fmul_reverse_rvv(float *dst, const float *src0,
@@ -49,6 +51,7 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
         fdsp->vector_fmul = ff_vector_fmul_rvv;
         fdsp->vector_fmac_scalar = ff_vector_fmac_scalar_rvv;
         fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
+        fdsp->vector_fmul_window = ff_vector_fmul_window_rvv;
         fdsp->vector_fmul_add = ff_vector_fmul_add_rvv;
         fdsp->vector_fmul_reverse = ff_vector_fmul_reverse_rvv;
         fdsp->butterflies_float = ff_butterflies_float_rvv;
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index e3738ef7c5..7e7d48374f 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -79,6 +79,41 @@ func ff_vector_fmul_scalar_rvv
         ret
 endfunc
 
+func ff_vector_fmul_window_rvv
+        // a0: dst, a1: src0, a2: src1, a3: window, a4: length
+        addi       t0, a4, -1
+        add        t1, t0, a4
+        slli       t0, t0, 2
+        slli       t1, t1, 2
+        add        a2, a2, t0
+        add        t0, a0, t1
+        add        t3, a3, t1
+        li         t1, -4 // byte stride
+
+1:      vsetvli    t2, a4, e32, m4, ta, ma
+        slli       t4, t2, 2
+        vle32.v    v16, (a1)
+        add        a1, a1, t4
+        vlse32.v   v20, (a2), t1
+        sub        a2, a2, t4
+        vle32.v    v24, (a3)
+        add        a3, a3, t4
+        vlse32.v   v28, (t3), t1
+        sub        t3, t3, t4
+        vfmul.vv   v0, v16, v28
+        sub        a4, a4, t2
+        vfmul.vv   v8, v16, v24
+        vfnmsac.vv v0, v20, v24
+        vfmacc.vv  v8, v20, v28
+        vse32.v    v0, (a0)
+        add        a0, a0, t4
+        vsse32.v   v8, (t0), t1
+        sub        t0, t0, t4
+        bnez       a4, 1b
+
+        ret
+endfunc
+
 // (a0) = (a1) * (a2) + (a3) [0..a4-1]
 func ff_vector_fmul_add_rvv
 1:      vsetvli   t0, a4, e32, m8, ta, ma
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [FFmpeg-devel] [PATCH 10/10] riscv: float vector dot product with RVV
  2022-09-04 13:54 [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP Rémi Denis-Courmont
                   ` (8 preceding siblings ...)
  2022-09-04 13:55 ` [FFmpeg-devel] [PATCH 09/10] riscv: float vector windowed overlap/add " remi
@ 2022-09-04 13:55 ` remi
  2022-09-04 17:48 ` [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP Lynne
  10 siblings, 0 replies; 13+ messages in thread
From: remi @ 2022-09-04 13:55 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 libavutil/riscv/float_dsp_init.c |  2 ++
 libavutil/riscv/float_dsp_rvv.S  | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index 887706d899..7c2fc10e99 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -35,6 +35,7 @@ void ff_vector_fmul_add_rvv(float *dst, const float *src0, const float *src1,
 void ff_vector_fmul_reverse_rvv(float *dst, const float *src0,
                                  const float *src1, int len);
 void ff_butterflies_float_rvv(float *v1, float *v2, int len);
+float ff_scalarproduct_float_rvv(const float *v1, const float *v2, int len);
 
 void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
                          int len);
@@ -55,6 +56,7 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
         fdsp->vector_fmul_add = ff_vector_fmul_add_rvv;
         fdsp->vector_fmul_reverse = ff_vector_fmul_reverse_rvv;
         fdsp->butterflies_float = ff_butterflies_float_rvv;
+        fdsp->scalarproduct_float = ff_scalarproduct_float_rvv;
 
         if (flags & AV_CPU_FLAG_ZVE64D) {
             fdsp->vector_dmul = ff_vector_dmul_rvv;
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index 7e7d48374f..7616abb9f6 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -173,6 +173,29 @@ func ff_butterflies_float_rvv
         ret
 endfunc
 
+// a0 = (a0).(a1) [0..a2-1]
+func ff_scalarproduct_float_rvv
+        vsetvli      zero, zero, e32, m8, ta, ma
+        vmv.s.x      v8, zero
+
+1:      vsetvli      t0, a2, e32, m8, ta, ma
+        slli         t1, t0, 2
+        vle32.v      v16, (a0)
+        add          a0, a0, t1
+        vle32.v      v24, (a1)
+        add          a1, a1, t1
+        vfmul.vv     v16, v16, v24
+        sub          a2, a2, t0
+        vfredusum.vs v8, v16, v8
+        bnez         a2, 1b
+
+        vfmv.f.s fa0, v8
+#if defined (__riscv_float_abi_soft)
+        fmv.x.w  a0, fa0
+#endif
+        ret
+endfunc
+
 // (a0) = (a1) * (a2) [0..a3-1]
 func ff_vector_dmul_rvv
 1:      vsetvli  t0, a3, e64, m8, ta, ma
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP
  2022-09-04 13:54 [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP Rémi Denis-Courmont
                   ` (9 preceding siblings ...)
  2022-09-04 13:55 ` [FFmpeg-devel] [PATCH 10/10] riscv: float vector dot product " remi
@ 2022-09-04 17:48 ` Lynne
  2022-09-04 19:01   ` Rémi Denis-Courmont
  10 siblings, 1 reply; 13+ messages in thread
From: Lynne @ 2022-09-04 17:48 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

Sep 4, 2022, 15:54 by remi@remlab.net:

> The following changes since commit b6e8fc1c201d58672639134a737137e1ba7b55fe:
>
>  avcodec/speexdec: improve support for speex in non-ogg (2022-09-04 11:31:57 +0200)
>
> are waiting thorough bashing at your express convenience up to:
>
>  riscv: float vector dot product with RVV (2022-09-04 16:45:38 +0300)
>
> Changes since v1:
>
> - Removed stray define.
> - Fixed mismatch between byte and element size in mul-scalar.
> - Added fmul, fac, dmul, dmac, fmul-add, fmul-reverse, fmul-window.
> - Added float butterfly and dot product.
>
> All operations are unrolled to the maximum group size (8), with the
> exception of overlap/add. The later seems to require a minimum of 6
> vectors (maybe 5 by extremely careful ordering), so the group size is
> only 4.
>
> The pointer arithmetic could be slightly optimised with SH2ADD and
> SH3ADD instructions from the Zvba extension. This would require more
> conditional code, or requiring support for Zvba for probably neglible
> performance gains though.
>

Did you test on real hardware or a VM?
If the former, what does checkasm --bench report?

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP
  2022-09-04 17:48 ` [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP Lynne
@ 2022-09-04 19:01   ` Rémi Denis-Courmont
  0 siblings, 0 replies; 13+ messages in thread
From: Rémi Denis-Courmont @ 2022-09-04 19:01 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

Le sunnuntaina 4. syyskuuta 2022, 20.48.26 EEST Lynne a écrit :
> > The pointer arithmetic could be slightly optimised with SH2ADD and
> > SH3ADD instructions from the Zvba extension. This would require more
> > conditional code, or requiring support for Zvba for probably neglible
> > performance gains though.
> 
> Did you test on real hardware or a VM?

I don't think we will see real and conforming RV64GV hardware this year, 
unless you count FPGAs. I hope we can get affordable stuff in 2023H1.

This is running on simulator. But since the code is already (AFAICT) using the 
largest possible grouping, there won't be that much room for further 
optimisations, other maybe than to add prefetching hints. In that sense, RVV 
is really nice in how it makes unrolling almost unnecessary/effortless.

The code is also already laid out to leverage multiple issue if available. 
RISC-V does not have post-index addressing modes, so the interleaving a fair 
amount of pointer arithmetic is unavoidable.

> If the former, what does checkasm --bench report?

...

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-09-04 19:01 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-04 13:54 [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP Rémi Denis-Courmont
2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 01/10] riscv: add CPU flags for the RISC-V Vector extension remi
2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 02/10] riscv: initial common header for assembler macros remi
2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 03/10] riscv: float vector-scalar multiplication with RVV remi
2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 04/10] riscv: float vector-vector " remi
2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 05/10] riscv: float vector multiply-accumulate " remi
2022-09-04 13:54 ` [FFmpeg-devel] [PATCH 06/10] riscv: float vector multiplication-addition " remi
2022-09-04 13:55 ` [FFmpeg-devel] [PATCH 07/10] riscv: float vector sum-and-difference " remi
2022-09-04 13:55 ` [FFmpeg-devel] [PATCH 08/10] riscv: float reversed vector multiplication " remi
2022-09-04 13:55 ` [FFmpeg-devel] [PATCH 09/10] riscv: float vector windowed overlap/add " remi
2022-09-04 13:55 ` [FFmpeg-devel] [PATCH 10/10] riscv: float vector dot product " remi
2022-09-04 17:48 ` [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP Lynne
2022-09-04 19:01   ` Rémi Denis-Courmont

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git