[FFmpeg-devel] [PATCH 01/18] doc: reference the RISC-V specification

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed

* [FFmpeg-devel] [PATCH 01/18] doc: reference the RISC-V specification
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
@ 2022-09-12 15:53 ` remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 02/18] lavu/riscv: AV_READ_TIME cycle counter remi
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 doc/optimization.txt | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/doc/optimization.txt b/doc/optimization.txt
index 974e2f9af2..3ed29fe38c 100644
--- a/doc/optimization.txt
+++ b/doc/optimization.txt
@@ -267,6 +267,11 @@ CELL/SPU:
 http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/30B3520C93F437AB87257060006FFE5E/$file/Language_Extensions_for_CBEA_2.4.pdf
 http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/9F820A5FFA3ECE8C8725716A0062585F/$file/CBE_Handbook_v1.1_24APR2007_pub.pdf
 
+RISC-V-specific:
+----------------
+The RISC-V Instruction Set Manual, Volume 1, Unprivileged ISA:
+https://riscv.org/technical/specifications/
+
 GCC asm links:
 --------------
 official doc but quite ugly
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 02/18] lavu/riscv: AV_READ_TIME cycle counter
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 01/18] doc: reference the RISC-V specification remi
@ 2022-09-12 15:53 ` remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 03/18] configure/riscv: detect fast CLZ remi
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

This uses the architected RISC-V 64-bit cycle counter from the
RISC-V unprivileged instruction set.

In 64-bit and 128-bit, this is a straightforward CSR read.
In 32-bit mode, the 64-bit value is exposed as two CSRs, which
cannot be read atomically, so a loop is necessary to detect and fix up
the race condition where the bottom half wraps exactly between the two
reads.
---
 libavutil/riscv/timer.h | 53 +++++++++++++++++++++++++++++++++++++++++
 libavutil/timer.h       |  2 ++
 2 files changed, 55 insertions(+)
 create mode 100644 libavutil/riscv/timer.h

diff --git a/libavutil/riscv/timer.h b/libavutil/riscv/timer.h
new file mode 100644
index 0000000000..a34157a566
--- /dev/null
+++ b/libavutil/riscv/timer.h
@@ -0,0 +1,53 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVUTIL_RISCV_TIMER_H
+#define AVUTIL_RISCV_TIMER_H
+
+#include "config.h"
+
+#if HAVE_INLINE_ASM
+#include <stdint.h>
+
+static inline uint64_t rdcycle64(void)
+{
+#if (__riscv_xlen >= 64)
+    uintptr_t cycles;
+
+    __asm__ volatile ("rdcycle %0" : "=r"(cycles));
+
+#else
+    uint64_t cycles;
+    uint32_t hi, lo, check;
+
+    __asm__ volatile (
+        "1: rdcycleh %0\n"
+        "   rdcycle  %1\n"
+        "   rdcycleh %2\n"
+        "   bne %0, %2, 1b\n" : "=r" (hi), "=r" (lo), "=r" (check));
+
+    cycles = (((uint64_t)hi) << 32) | lo;
+
+#endif
+    return cycles;
+}
+
+#define AV_READ_TIME rdcycle64
+
+#endif
+#endif /* AVUTIL_RISCV_TIMER_H */
diff --git a/libavutil/timer.h b/libavutil/timer.h
index 48e576739f..d3db5a27ef 100644
--- a/libavutil/timer.h
+++ b/libavutil/timer.h
@@ -57,6 +57,8 @@
 #   include "arm/timer.h"
 #elif ARCH_PPC
 #   include "ppc/timer.h"
+#elif ARCH_RISCV
+#   include "riscv/timer.h"
 #elif ARCH_X86
 #   include "x86/timer.h"
 #endif
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 03/18] configure/riscv: detect fast CLZ
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 01/18] doc: reference the RISC-V specification remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 02/18] lavu/riscv: AV_READ_TIME cycle counter remi
@ 2022-09-12 15:53 ` remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 04/18] lavu/riscv: byte-swap operations remi
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

RISC-V defines the CLZ instruction as part of the ratified Zbb subset
of the (not yet ratified) bit mapulation extension (B). We can detect
it from the __riscv_zbb predefined constant. At least GCC 12 already
supports this correctly.

Note that the macro will be non-zero if supported, zero if enabled
in the compiler flags (e.g. -march=rv64gzbb) but not known to the
compiler, and undefined otherwise.
---
 configure | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/configure b/configure
index 9e51abd0d3..b7dc1d8656 100755
--- a/configure
+++ b/configure
@@ -5334,6 +5334,12 @@ elif enabled ppc; then
         ;;
     esac
 
+elif enabled riscv; then
+
+    if test_cpp_condition stddef.h "__riscv_zbb"; then
+        enable fast_clz
+    fi
+
 elif enabled sparc; then
 
     case $cpu in
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 04/18] lavu/riscv: byte-swap operations
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
                   ` (2 preceding siblings ...)
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 03/18] configure/riscv: detect fast CLZ remi
@ 2022-09-12 15:53 ` remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 05/18] lavu/riscv: add <intmath.h> optimisations remi
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

If the target supports the Basic bit-manipulation (Zbb) extension, then
the REV8 instruction is available to reverse byte order.

Note that this instruction only exists at the "XLEN" register size,
so we need to right shift the result down to the data width.

If Zbb is not supported, then this patchset does nothing. Support for
run-time detection is left for the future. Currently, there are no
bits in auxv/ELF HWCAP for Z-extensions, so there are no clean ways to
do this.
---
 libavutil/bswap.h       |  2 ++
 libavutil/riscv/bswap.h | 74 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+)
 create mode 100644 libavutil/riscv/bswap.h

diff --git a/libavutil/bswap.h b/libavutil/bswap.h
index 91cb79538d..4840ab433f 100644
--- a/libavutil/bswap.h
+++ b/libavutil/bswap.h
@@ -40,6 +40,8 @@
 #   include "arm/bswap.h"
 #elif ARCH_AVR32
 #   include "avr32/bswap.h"
+#elif ARCH_RISCV
+#   include "riscv/bswap.h"
 #elif ARCH_SH4
 #   include "sh4/bswap.h"
 #elif ARCH_X86
diff --git a/libavutil/riscv/bswap.h b/libavutil/riscv/bswap.h
new file mode 100644
index 0000000000..de1429c0f7
--- /dev/null
+++ b/libavutil/riscv/bswap.h
@@ -0,0 +1,74 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVUTIL_RISCV_BSWAP_H
+#define AVUTIL_RISCV_BSWAP_H
+
+#include <stdint.h>
+#include "config.h"
+#include "libavutil/attributes.h"
+
+#if defined (__riscv_zbb) && (__riscv_zbb > 0) && HAVE_INLINE_ASM
+
+static av_always_inline av_const uintptr_t av_bswap_xlen(uintptr_t x)
+{
+    uintptr_t y;
+
+    __asm__("rev8 %0, %1" : "=r" (y) : "r" (x));
+    return y;
+}
+
+#define av_bswap16 av_bswap16
+
+static av_always_inline av_const uint_fast16_t av_bswap16(uint_fast16_t x)
+{
+    return av_bswap_xlen(x) >> (__riscv_xlen - 16);
+}
+
+#if (__riscv_xlen == 32)
+#define av_bswap32 av_bswap_xlen
+#define av_bswap64 av_bswap64
+
+static av_always_inline av_const uint64_t av_bswap64(uint64_t x)
+{
+    return (((uint64_t)av_bswap32(x)) << 32) | av_bswap32(x >> 32);
+}
+
+#else
+#define av_bswap32 av_bswap32
+
+static av_always_inline av_const uint_fast32_t av_bswap32(uint_fast32_t x)
+{
+    return av_bswap_xlen(x) >> (__riscv_xlen - 32);
+}
+
+#if (__riscv_xlen == 64)
+#define av_bswap64 av_bswap_xlen
+
+#else
+#define av_bswap64 av_bswap64
+
+static av_always_inline av_const uint_fast64_t av_bswap64(uint_fast64_t x)
+{
+    return av_bswap_xlen(x) >> (__riscv_xlen - 64);
+}
+
+#endif /* __riscv_xlen > 64 */
+#endif /* __riscv_xlen > 32 */
+#endif /* __riscv_zbb */
+#endif /* AVUTIL_RISCV_BSWAP_H */
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 05/18] lavu/riscv: add <intmath.h> optimisations
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
                   ` (3 preceding siblings ...)
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 04/18] lavu/riscv: byte-swap operations remi
@ 2022-09-12 15:53 ` remi
  2022-09-14  9:28   ` Rémi Denis-Courmont
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 06/18] configure: probe RISC-V Vector extension remi
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

This provides some micro-optimisations for signed integer clipping, and
support for bit weight with the Zbb extension.
---
 libavutil/intmath.h       |   5 +-
 libavutil/riscv/intmath.h | 103 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 106 insertions(+), 2 deletions(-)
 create mode 100644 libavutil/riscv/intmath.h

diff --git a/libavutil/intmath.h b/libavutil/intmath.h
index 9573109e9d..c54d23b7bf 100644
--- a/libavutil/intmath.h
+++ b/libavutil/intmath.h
@@ -28,8 +28,9 @@
 
 #if ARCH_ARM
 #   include "arm/intmath.h"
-#endif
-#if ARCH_X86
+#elif ARCH_RISCV
+#   include "riscv/intmath.h"
+#elif ARCH_X86
 #   include "x86/intmath.h"
 #endif
 
diff --git a/libavutil/riscv/intmath.h b/libavutil/riscv/intmath.h
new file mode 100644
index 0000000000..78f7ba930a
--- /dev/null
+++ b/libavutil/riscv/intmath.h
@@ -0,0 +1,103 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVUTIL_RISCV_INTMATH_H
+#define AVUTIL_RISCV_INTMATH_H
+
+#include <stdint.h>
+
+#include "config.h"
+#include "libavutil/attributes.h"
+
+/*
+ * The compiler is forced to sign-extend the result anyhow, so it is faster to
+ * compute it explicitly and use it.
+ */
+#define av_clip_int8 av_clip_int8_rvi
+static av_always_inline av_const int8_t av_clip_int8_rvi(int a)
+{
+    union { uint8_t u; int8_t s; } u = { .u = a };
+
+    if (a != u.s)
+        a = ((a >> 31) ^ 0x7F);
+    return a;
+}
+
+#define av_clip_int16 av_clip_int16_rvi
+static av_always_inline av_const int16_t av_clip_int16_rvi(int a)
+{
+    union { uint8_t u; int8_t s; } u = { .u = a };
+
+    if (a != u.s)
+        a = ((a >> 31) ^ 0x7F);
+    return a;
+}
+
+#define av_clipl_int32 av_clipl_int32_rvi
+static av_always_inline av_const int32_t av_clipl_int32_rvi(int64_t a)
+{
+    union { uint32_t u; int32_t s; } u = { .u = a };
+
+    if (a != u.s)
+        a = ((a >> 63) ^ 0x7FFFFFFF);
+    return a;
+}
+
+#define av_clip_intp2 av_clip_intp2_rvi
+static av_always_inline av_const int av_clip_intp2_rvi(int a, int p)
+{
+    const int shift = 32 - p;
+    int b = (a << shift) >> shift;
+
+    if (a != b)
+        b = (a >> 31) ^ ((1 << p) - 1);
+    return b;
+}
+
+#if defined (__riscv_zbb) && (__riscv_zbb > 0) && HAVE_INLINE_ASM
+
+#define av_popcount av_popcount_rvb
+static av_always_inline av_const int av_popcount_rvb(uint32_t x)
+{
+    int ret;
+
+#if (__riscv_xlen >= 64)
+    __asm__ ("cpopw %0, %1\n" : "=r" (ret) : "r" (x));
+#else
+    __asm__ ("cpop %0, %1\n" : "=r" (ret) : "r" (x));
+#endif
+    return ret;
+}
+
+#if (__riscv_xlen >= 64)
+#define av_popcount64 av_popcount64_rvb
+static av_always_inline av_const int av_popcount64_rvb(uint64_t x)
+{
+    int ret;
+
+#if (__riscv_xlen >= 128)
+    __asm__ ("cpopd %0, %1\n" : "=r" (ret) : "r" (x));
+#else
+    __asm__ ("cpop %0, %1\n" : "=r" (ret) : "r" (x));
+#endif
+    return ret;
+}
+#endif /* __riscv_xlen >= 64 */
+#endif /* __riscv_zbb */
+
+#endif /* AVUTIL_RISCV_INTMATH_H */
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 06/18] configure: probe RISC-V Vector extension
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
                   ` (4 preceding siblings ...)
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 05/18] lavu/riscv: add <intmath.h> optimisations remi
@ 2022-09-12 15:53 ` remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 07/18] lavu/riscv: initial common header for assembler macros remi
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 Makefile         |  2 +-
 configure        | 15 +++++++++++++++
 ffbuild/arch.mak |  2 ++
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 61f79e27ae..1fb742f390 100644
--- a/Makefile
+++ b/Makefile
@@ -91,7 +91,7 @@ ffbuild/.config: $(CONFIGURABLE_COMPONENTS)
 SUBDIR_VARS := CLEANFILES FFLIBS HOSTPROGS TESTPROGS TOOLS               \
                HEADERS ARCH_HEADERS BUILT_HEADERS SKIPHEADERS            \
                ARMV5TE-OBJS ARMV6-OBJS ARMV8-OBJS VFP-OBJS NEON-OBJS     \
-               ALTIVEC-OBJS VSX-OBJS MMX-OBJS X86ASM-OBJS                \
+               ALTIVEC-OBJS VSX-OBJS RVV-OBJS MMX-OBJS X86ASM-OBJS       \
                MIPSFPU-OBJS MIPSDSPR2-OBJS MIPSDSP-OBJS MSA-OBJS         \
                MMI-OBJS LSX-OBJS LASX-OBJS OBJS SLIBOBJS SHLIBOBJS       \
                STLIBOBJS HOSTOBJS TESTOBJS
diff --git a/configure b/configure
index b7dc1d8656..c5f20cc323 100755
--- a/configure
+++ b/configure
@@ -462,6 +462,7 @@ Optimization options (experts only):
   --disable-mmi            disable Loongson MMI optimizations
   --disable-lsx            disable Loongson LSX optimizations
   --disable-lasx           disable Loongson LASX optimizations
+  --disable-rvv            disable RISC-V Vector optimizations
   --disable-fast-unaligned consider unaligned accesses slow
 
 Developer options (useful when working on FFmpeg itself):
@@ -2126,6 +2127,10 @@ ARCH_EXT_LIST_PPC="
     vsx
 "
 
+ARCH_EXT_LIST_RISCV="
+    rvv
+"
+
 ARCH_EXT_LIST_X86="
     $ARCH_EXT_LIST_X86_SIMD
     cpunop
@@ -2135,6 +2140,7 @@ ARCH_EXT_LIST_X86="
 ARCH_EXT_LIST="
     $ARCH_EXT_LIST_ARM
     $ARCH_EXT_LIST_PPC
+    $ARCH_EXT_LIST_RISCV
     $ARCH_EXT_LIST_X86
     $ARCH_EXT_LIST_MIPS
     $ARCH_EXT_LIST_LOONGSON
@@ -2642,6 +2648,8 @@ ppc4xx_deps="ppc"
 vsx_deps="altivec"
 power8_deps="vsx"
 
+rvv_deps="riscv"
+
 loongson2_deps="mips"
 loongson3_deps="mips"
 mmi_deps_any="loongson2 loongson3"
@@ -6110,6 +6118,10 @@ elif enabled ppc; then
         check_cpp_condition power8 "altivec.h" "defined(_ARCH_PWR8)"
     fi
 
+elif enabled riscv; then
+
+    enabled rvv && check_inline_asm rvv '".option arch, +v\nvsetivli zero, 0, e8, m1, ta, ma"'
+
 elif enabled x86; then
 
     check_builtin rdtsc    intrin.h   "__rdtsc()"
@@ -7596,6 +7608,9 @@ if enabled loongarch; then
     echo "LSX enabled               ${lsx-no}"
     echo "LASX enabled              ${lasx-no}"
 fi
+if enabled riscv; then
+    echo "RISC-V Vector enabled     ${riscv-no}"
+fi
 echo "debug symbols             ${debug-no}"
 echo "strip symbols             ${stripping-no}"
 echo "optimize for size         ${small-no}"
diff --git a/ffbuild/arch.mak b/ffbuild/arch.mak
index 997e31e85e..39d76ee152 100644
--- a/ffbuild/arch.mak
+++ b/ffbuild/arch.mak
@@ -15,5 +15,7 @@ OBJS-$(HAVE_LASX)      += $(LASX-OBJS)       $(LASX-OBJS-yes)
 OBJS-$(HAVE_ALTIVEC) += $(ALTIVEC-OBJS) $(ALTIVEC-OBJS-yes)
 OBJS-$(HAVE_VSX)     += $(VSX-OBJS) $(VSX-OBJS-yes)
 
+OBJS-$(HAVE_RVV)     += $(RVV-OBJS)     $(RVV-OBJS-yes)
+
 OBJS-$(HAVE_MMX)     += $(MMX-OBJS)     $(MMX-OBJS-yes)
 OBJS-$(HAVE_X86ASM)  += $(X86ASM-OBJS)  $(X86ASM-OBJS-yes)
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 07/18] lavu/riscv: initial common header for assembler macros
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
                   ` (5 preceding siblings ...)
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 06/18] configure: probe RISC-V Vector extension remi
@ 2022-09-12 15:53 ` remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 08/18] lavu/riscv: add CPU flags for the RISC-V Vector extension remi
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 libavutil/riscv/asm.S | 74 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)
 create mode 100644 libavutil/riscv/asm.S

diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S
new file mode 100644
index 0000000000..7623c161cf
--- /dev/null
+++ b/libavutil/riscv/asm.S
@@ -0,0 +1,74 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config.h"
+
+#if defined (__riscv_float_abi_soft)
+#define NOHWF
+#define NOHWD
+#define HWF   #
+#define HWD   #
+#elif defined (__riscv_float_abi_single)
+#define NOHWF #
+#define NOHWD
+#define HWF
+#define HWD   #
+#else
+#define NOHWF #
+#define NOHWD #
+#define HWF
+#define HWD
+#endif
+
+        .macro func sym, ext=
+            .text
+            .align 2
+
+            .option push
+            .ifnb \ext
+            .option arch, +\ext
+            .endif
+
+            .global \sym
+            .hidden \sym
+            .type   \sym, %function
+            \sym:
+
+            .macro endfunc
+                .size   \sym, . - \sym
+                .option pop
+                .previous
+                .purgem endfunc
+            .endm
+        .endm
+
+        .macro const sym, align=3, relocate=0
+            .if \relocate
+                .pushsection .data.rel.ro
+            .else
+                .pushsection .rodata
+            .endif
+            .align \align
+            \sym:
+
+            .macro endconst
+                .size  \sym, . - \sym
+                .popsection
+                .purgem endconst
+            .endm
+        .endm
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 08/18] lavu/riscv: add CPU flags for the RISC-V Vector extension
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
                   ` (6 preceding siblings ...)
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 07/18] lavu/riscv: initial common header for assembler macros remi
@ 2022-09-12 15:53 ` remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 09/18] checkasm: register the RISC-V V subsets remi
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

RVV defines a total of 12 different extensions, including:

- 5 different instruction subsets:
  - Zve32x: 8-, 16- and 32-bit integers,
  - Zve32f: Zve32x plus single precision floats,
  - Zve64x: Zve32x plus 64-bit integers,
  - Zve64f: Zve32f plus Zve64x,
  - Zve64d: Zve64f plus double precision floats.

- 6 different vector lengths:
  - Zvl32b (embedded only),
  - Zvl64b (embedded only),
  - Zvl128b,
  - Zvl256b,
  - Zvl512b,
  - Zvl1024b,

- and the V extension proper: equivalent to Zve64f and Zvl128b.

In total, there are 6 different possible sets of supported instructions
(including the empty set), but for convenience we allocate one bit for
each type sets: up-to-32-bit ints (ZVE32X), floats (ZV32F),
64-bit ints (ZV64X) and doubles (ZVE64D).

Whence the vector size is needed, it can be retrieved by reading the
unprivileged read-only vlenb CSR. This should probably be a separate
helper macro if needed at a later point.
---
 libavutil/cpu.c          | 15 +++++++++++
 libavutil/cpu.h          |  6 +++++
 libavutil/cpu_internal.h |  1 +
 libavutil/riscv/Makefile |  1 +
 libavutil/riscv/cpu.c    | 57 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 80 insertions(+)
 create mode 100644 libavutil/riscv/Makefile
 create mode 100644 libavutil/riscv/cpu.c

diff --git a/libavutil/cpu.c b/libavutil/cpu.c
index 0035e927a5..89d2fb6f56 100644
--- a/libavutil/cpu.c
+++ b/libavutil/cpu.c
@@ -62,6 +62,8 @@ static int get_cpu_flags(void)
     return ff_get_cpu_flags_arm();
 #elif ARCH_PPC
     return ff_get_cpu_flags_ppc();
+#elif ARCH_RISCV
+    return ff_get_cpu_flags_riscv();
 #elif ARCH_X86
     return ff_get_cpu_flags_x86();
 #elif ARCH_LOONGARCH
@@ -178,6 +180,19 @@ int av_parse_cpu_caps(unsigned *flags, const char *s)
 #elif ARCH_LOONGARCH
         { "lsx",      NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_LSX      },    .unit = "flags" },
         { "lasx",     NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_LASX     },    .unit = "flags" },
+#elif ARCH_RISCV
+#define AV_CPU_FLAG_ZVE32X_M (AV_CPU_FLAG_ZVE32X)
+#define AV_CPU_FLAG_ZVE32F_M (AV_CPU_FLAG_ZVE32X_M | AV_CPU_FLAG_ZVE32F)
+#define AV_CPU_FLAG_ZVE64X_M (AV_CPU_FLAG_ZVE32X_M | AV_CPU_FLAG_ZVE64X)
+#define AV_CPU_FLAG_ZVE64F_M (AV_CPU_FLAG_ZVE32F_M | AV_CPU_FLAG_ZVE64X)
+#define AV_CPU_FLAG_ZVE64D_M (AV_CPU_FLAG_ZVE64F_M | AV_CPU_FLAG_ZVE64D)
+#define AV_CPU_FLAG_VECTORS  AV_CPU_FLAG_ZVE64D_M
+        { "vectors",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_VECTORS  },    .unit = "flags" },
+        { "zve32x",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_ZVE32X   },    .unit = "flags" },
+        { "zve32f",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_ZVE32F_M },    .unit = "flags" },
+        { "zve64x",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_ZVE64X_M },    .unit = "flags" },
+        { "zve64f",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_ZVE64F_M },    .unit = "flags" },
+        { "zve64d",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_ZVE64D_M },    .unit = "flags" },
 #endif
         { NULL },
     };
diff --git a/libavutil/cpu.h b/libavutil/cpu.h
index 9711e574c5..44836e50d6 100644
--- a/libavutil/cpu.h
+++ b/libavutil/cpu.h
@@ -78,6 +78,12 @@
 #define AV_CPU_FLAG_LSX          (1 << 0)
 #define AV_CPU_FLAG_LASX         (1 << 1)
 
+// RISC-V Vector extension
+#define AV_CPU_FLAG_ZVE32X       (1 << 0) /* 8-, 16-, 32-bit integers */
+#define AV_CPU_FLAG_ZVE32F       (1 << 1) /* single precision scalars */
+#define AV_CPU_FLAG_ZVE64X       (1 << 2) /* 64-bit integers */
+#define AV_CPU_FLAG_ZVE64D       (1 << 3) /* double precision scalars */
+
 /**
  * Return the flags which specify extensions supported by the CPU.
  * The returned value is affected by av_force_cpu_flags() if that was used
diff --git a/libavutil/cpu_internal.h b/libavutil/cpu_internal.h
index 650d47fc96..634f28bac4 100644
--- a/libavutil/cpu_internal.h
+++ b/libavutil/cpu_internal.h
@@ -48,6 +48,7 @@ int ff_get_cpu_flags_mips(void);
 int ff_get_cpu_flags_aarch64(void);
 int ff_get_cpu_flags_arm(void);
 int ff_get_cpu_flags_ppc(void);
+int ff_get_cpu_flags_riscv(void);
 int ff_get_cpu_flags_x86(void);
 int ff_get_cpu_flags_loongarch(void);
 
diff --git a/libavutil/riscv/Makefile b/libavutil/riscv/Makefile
new file mode 100644
index 0000000000..1f818043dc
--- /dev/null
+++ b/libavutil/riscv/Makefile
@@ -0,0 +1 @@
+OBJS += riscv/cpu.o
diff --git a/libavutil/riscv/cpu.c b/libavutil/riscv/cpu.c
new file mode 100644
index 0000000000..9e4cce5e8b
--- /dev/null
+++ b/libavutil/riscv/cpu.c
@@ -0,0 +1,57 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/cpu.h"
+#include "libavutil/cpu_internal.h"
+#include "config.h"
+
+#if HAVE_GETAUXVAL
+#include <sys/auxv.h>
+#endif
+
+#define HWCAP_RV(letter) (1ul << ((letter) - 'A'))
+
+int ff_get_cpu_flags_riscv(void)
+{
+    int ret = 0;
+
+    /* If RV-V is enabled statically at compile-time, check the details. */
+#ifdef __riscv_vectors
+    ret |= AV_CPU_FLAG_ZVE32X;
+#if __riscv_v_elen >= 64
+    ret |= AV_CPU_FLAG_ZVE64X;
+#endif
+#if __riscv_v_elen_fp >= 32
+    ret |= AV_CPU_FLAG_ZVE32F;
+#if __riscv_v_elen_fp >= 64
+    ret |= AV_CPU_FLAG_ZVE64F;
+#endif
+#endif
+#endif
+
+#if HAVE_GETAUXVAL
+    const unsigned long hwcap = getauxval(AT_HWCAP);
+
+    /* The V extension implies all subsets */
+    if (hwcap & HWCAP_RV('V'))
+        ret |= AV_CPU_FLAG_ZVE32X | AV_CPU_FLAG_ZVE64X
+             | AV_CPU_FLAG_ZVE32F | AV_CPU_FLAG_ZVE64D;
+#endif
+
+    return ret;
+}
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 09/18] checkasm: register the RISC-V V subsets
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
                   ` (7 preceding siblings ...)
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 08/18] lavu/riscv: add CPU flags for the RISC-V Vector extension remi
@ 2022-09-12 15:53 ` remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 10/18] lavu/riscv: float vector-scalar multiplication with RVV remi
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 tests/checkasm/checkasm.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index e56fd3850e..a5d0503811 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -226,6 +226,11 @@ static const struct {
     { "ALTIVEC",  "altivec",  AV_CPU_FLAG_ALTIVEC },
     { "VSX",      "vsx",      AV_CPU_FLAG_VSX },
     { "POWER8",   "power8",   AV_CPU_FLAG_POWER8 },
+#elif ARCH_RISCV
+    { "Zve32x",   "zve32x",   AV_CPU_FLAG_ZVE32X },
+    { "Zve32f",   "zve32f",   AV_CPU_FLAG_ZVE32F },
+    { "Zve64x",   "zve64x",   AV_CPU_FLAG_ZVE64X },
+    { "Zve64d",   "zve64d",   AV_CPU_FLAG_ZVE64D },
 #elif ARCH_MIPS
     { "MMI",      "mmi",      AV_CPU_FLAG_MMI },
     { "MSA",      "msa",      AV_CPU_FLAG_MSA },
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 10/18] lavu/riscv: float vector-scalar multiplication with RVV
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
                   ` (8 preceding siblings ...)
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 09/18] checkasm: register the RISC-V V subsets remi
@ 2022-09-12 15:53 ` remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 11/18] lavu/riscv: float vector-vector " remi
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

This is based on existing code from the VLC git tree with two minor
changes to account for the different function prototypes.
---
 libavutil/float_dsp.c            |  2 ++
 libavutil/float_dsp.h            |  1 +
 libavutil/riscv/Makefile         |  4 ++-
 libavutil/riscv/float_dsp_init.c | 44 +++++++++++++++++++++++++
 libavutil/riscv/float_dsp_rvv.S  | 56 ++++++++++++++++++++++++++++++++
 5 files changed, 106 insertions(+), 1 deletion(-)
 create mode 100644 libavutil/riscv/float_dsp_init.c
 create mode 100644 libavutil/riscv/float_dsp_rvv.S

diff --git a/libavutil/float_dsp.c b/libavutil/float_dsp.c
index 8676c8b0f8..742dd679d2 100644
--- a/libavutil/float_dsp.c
+++ b/libavutil/float_dsp.c
@@ -156,6 +156,8 @@ av_cold AVFloatDSPContext *avpriv_float_dsp_alloc(int bit_exact)
     ff_float_dsp_init_arm(fdsp);
 #elif ARCH_PPC
     ff_float_dsp_init_ppc(fdsp, bit_exact);
+#elif ARCH_RISCV
+    ff_float_dsp_init_riscv(fdsp);
 #elif ARCH_X86
     ff_float_dsp_init_x86(fdsp);
 #elif ARCH_MIPS
diff --git a/libavutil/float_dsp.h b/libavutil/float_dsp.h
index 9c664592bd..7cad9fc622 100644
--- a/libavutil/float_dsp.h
+++ b/libavutil/float_dsp.h
@@ -205,6 +205,7 @@ float avpriv_scalarproduct_float_c(const float *v1, const float *v2, int len);
 void ff_float_dsp_init_aarch64(AVFloatDSPContext *fdsp);
 void ff_float_dsp_init_arm(AVFloatDSPContext *fdsp);
 void ff_float_dsp_init_ppc(AVFloatDSPContext *fdsp, int strict);
+void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp);
 void ff_float_dsp_init_x86(AVFloatDSPContext *fdsp);
 void ff_float_dsp_init_mips(AVFloatDSPContext *fdsp);
 
diff --git a/libavutil/riscv/Makefile b/libavutil/riscv/Makefile
index 1f818043dc..89a8d0d990 100644
--- a/libavutil/riscv/Makefile
+++ b/libavutil/riscv/Makefile
@@ -1 +1,3 @@
-OBJS += riscv/cpu.o
+OBJS +=     riscv/float_dsp_init.o \
+            riscv/cpu.o
+RVV-OBJS += riscv/float_dsp_rvv.o
diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
new file mode 100644
index 0000000000..f1d3d52877
--- /dev/null
+++ b/libavutil/riscv/float_dsp_init.c
@@ -0,0 +1,44 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <stdint.h>
+
+#include "config.h"
+#include "libavutil/attributes.h"
+#include "libavutil/cpu.h"
+#include "libavutil/float_dsp.h"
+
+void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
+                                int len);
+
+void ff_vector_dmul_scalar_rvv(double *dst, const double *src, double mul,
+                                int len);
+
+av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
+{
+#if HAVE_RVV
+    int flags = av_get_cpu_flags();
+
+    if (flags & AV_CPU_FLAG_ZVE32F) {
+        fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
+
+        if (flags & AV_CPU_FLAG_ZVE64D)
+            fdsp->vector_dmul_scalar = ff_vector_dmul_scalar_rvv;
+    }
+#endif
+}
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
new file mode 100644
index 0000000000..365e00190c
--- /dev/null
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -0,0 +1,56 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config.h"
+#include "asm.S"
+
+// (a0) = (a1) * fa0 [0..a2-1]
+func ff_vector_fmul_scalar_rvv, zve32f
+NOHWF   fmv.w.x  fa0, a2
+NOHWF   mv       a2, a3
+
+1:      vsetvli  t0, a2, e32, m8, ta, ma
+        slli     t1, t0, 2
+        vle32.v  v16, (a1)
+        add      a1, a1, t1
+        vfmul.vf v16, v16, fa0
+        sub      a2, a2, t0
+        vse32.v  v16, (a0)
+        add      a0, a0, t1
+        bnez     a2, 1b
+
+        ret
+endfunc
+
+// (a0) = (a1) * fa0 [0..a2-1]
+func ff_vector_dmul_scalar_rvv, zve64d
+NOHWD   fmv.d.x  fa0, a2
+NOHWD   mv       a2, a3
+
+1:      vsetvli  t0, a2, e64, m8, ta, ma
+        slli     t1, t0, 3
+        vle64.v  v16, (a1)
+        add      a1, a1, t1
+        vfmul.vf v16, v16, fa0
+        sub      a2, a2, t0
+        vse64.v  v16, (a0)
+        add      a0, a0, t1
+        bnez     a2, 1b
+
+        ret
+endfunc
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 11/18] lavu/riscv: float vector-vector multiplication with RVV
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
                   ` (9 preceding siblings ...)
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 10/18] lavu/riscv: float vector-scalar multiplication with RVV remi
@ 2022-09-12 15:53 ` remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 12/18] lavu/riscv: float vector multiply-accumulate " remi
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 libavutil/riscv/float_dsp_init.c |  9 ++++++++-
 libavutil/riscv/float_dsp_rvv.S  | 34 ++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index f1d3d52877..903da4eeda 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -23,9 +23,13 @@
 #include "libavutil/cpu.h"
 #include "libavutil/float_dsp.h"
 
+void ff_vector_fmul_rvv(float *dst, const float *src0, const float *src1,
+                         int len);
 void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
                                 int len);
 
+void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
+                         int len);
 void ff_vector_dmul_scalar_rvv(double *dst, const double *src, double mul,
                                 int len);
 
@@ -35,10 +39,13 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
     int flags = av_get_cpu_flags();
 
     if (flags & AV_CPU_FLAG_ZVE32F) {
+        fdsp->vector_fmul = ff_vector_fmul_rvv;
         fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
 
-        if (flags & AV_CPU_FLAG_ZVE64D)
+        if (flags & AV_CPU_FLAG_ZVE64D) {
+            fdsp->vector_dmul = ff_vector_dmul_rvv;
             fdsp->vector_dmul_scalar = ff_vector_dmul_scalar_rvv;
+        }
     }
 #endif
 }
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index 365e00190c..65c3a77b01 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -19,6 +19,23 @@
 #include "config.h"
 #include "asm.S"
 
+// (a0) = (a1) * (a2) [0..a3-1]
+func ff_vector_fmul_rvv, zve32f
+1:      vsetvli  t0, a3, e32, m8, ta, ma
+        slli     t1, t0, 2
+        vle32.v  v16, (a1)
+        add      a1, a1, t1
+        vle32.v  v24, (a2)
+        add      a2, a2, t1
+        vfmul.vv v16, v16, v24
+        sub      a3, a3, t0
+        vse32.v  v16, (a0)
+        add      a0, a0, t1
+        bnez     a3, 1b
+
+        ret
+endfunc
+
 // (a0) = (a1) * fa0 [0..a2-1]
 func ff_vector_fmul_scalar_rvv, zve32f
 NOHWF   fmv.w.x  fa0, a2
@@ -37,6 +54,23 @@ NOHWF   mv       a2, a3
         ret
 endfunc
 
+// (a0) = (a1) * (a2) [0..a3-1]
+func ff_vector_dmul_rvv, zve64d
+1:      vsetvli  t0, a3, e64, m8, ta, ma
+        slli     t1, t0, 3
+        vle64.v  v16, (a1)
+        add      a1, a1, t1
+        vle64.v  v24, (a2)
+        add      a2, a2, t1
+        vfmul.vv v16, v16, v24
+        sub      a3, a3, t0
+        vse64.v  v16, (a0)
+        add      a0, a0, t1
+        bnez     a3, 1b
+
+        ret
+endfunc
+
 // (a0) = (a1) * fa0 [0..a2-1]
 func ff_vector_dmul_scalar_rvv, zve64d
 NOHWD   fmv.d.x  fa0, a2
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 12/18] lavu/riscv: float vector multiply-accumulate with RVV
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
                   ` (10 preceding siblings ...)
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 11/18] lavu/riscv: float vector-vector " remi
@ 2022-09-12 15:53 ` remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 13/18] lavu/riscv: float vector multiplication-addition " remi
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 libavutil/riscv/float_dsp_init.c |  6 +++++
 libavutil/riscv/float_dsp_rvv.S  | 38 ++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index 903da4eeda..1381eadab6 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -25,11 +25,15 @@
 
 void ff_vector_fmul_rvv(float *dst, const float *src0, const float *src1,
                          int len);
+void ff_vector_fmac_scalar_rvv(float *dst, const float *src, float mul,
+                                int len);
 void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
                                 int len);
 
 void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
                          int len);
+void ff_vector_dmac_scalar_rvv(double *dst, const double *src, double mul,
+                                int len);
 void ff_vector_dmul_scalar_rvv(double *dst, const double *src, double mul,
                                 int len);
 
@@ -40,10 +44,12 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
 
     if (flags & AV_CPU_FLAG_ZVE32F) {
         fdsp->vector_fmul = ff_vector_fmul_rvv;
+        fdsp->vector_fmac_scalar = ff_vector_fmac_scalar_rvv;
         fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
 
         if (flags & AV_CPU_FLAG_ZVE64D) {
             fdsp->vector_dmul = ff_vector_dmul_rvv;
+            fdsp->vector_dmac_scalar = ff_vector_dmac_scalar_rvv;
             fdsp->vector_dmul_scalar = ff_vector_dmul_scalar_rvv;
         }
     }
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index 65c3a77b01..5a7d92abd6 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -36,6 +36,25 @@ func ff_vector_fmul_rvv, zve32f
         ret
 endfunc
 
+// (a0) += (a1) * fa0 [0..a2-1]
+func ff_vector_fmac_scalar_rvv, zve32f
+NOHWF   fmv.w.x   fa0, a2
+NOHWF   mv        a2, a3
+
+1:      vsetvli   t0, a2, e32, m8, ta, ma
+        slli      t1, t0, 2
+        vle32.v   v24, (a1)
+        add       a1, a1, t1
+        vle32.v   v16, (a0)
+        vfmacc.vf v16, fa0, v24
+        sub       a2, a2, t0
+        vse32.v   v16, (a0)
+        add       a0, a0, t1
+        bnez      a2, 1b
+
+        ret
+endfunc
+
 // (a0) = (a1) * fa0 [0..a2-1]
 func ff_vector_fmul_scalar_rvv, zve32f
 NOHWF   fmv.w.x  fa0, a2
@@ -71,6 +90,25 @@ func ff_vector_dmul_rvv, zve64d
         ret
 endfunc
 
+// (a0) += (a1) * fa0 [0..a2-1]
+func ff_vector_dmac_scalar_rvv, zve64d
+NOHWD   fmv.d.x   fa0, a2
+NOHWD   mv        a2, a3
+
+1:      vsetvli   t0, a2, e64, m8, ta, ma
+        slli      t1, t0, 3
+        vle64.v   v24, (a1)
+        add       a1, a1, t1
+        vle64.v   v16, (a0)
+        vfmacc.vf v16, fa0, v24
+        sub       a2, a2, t0
+        vse64.v   v16, (a0)
+        add       a0, a0, t1
+        bnez      a2, 1b
+
+        ret
+endfunc
+
 // (a0) = (a1) * fa0 [0..a2-1]
 func ff_vector_dmul_scalar_rvv, zve64d
 NOHWD   fmv.d.x  fa0, a2
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 13/18] lavu/riscv: float vector multiplication-addition with RVV
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
                   ` (11 preceding siblings ...)
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 12/18] lavu/riscv: float vector multiply-accumulate " remi
@ 2022-09-12 15:53 ` remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 14/18] lavu/riscv: float vector sum-and-difference " remi
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 libavutil/riscv/float_dsp_init.c |  3 +++
 libavutil/riscv/float_dsp_rvv.S  | 19 +++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index 1381eadab6..9bc1976d04 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -29,6 +29,8 @@ void ff_vector_fmac_scalar_rvv(float *dst, const float *src, float mul,
                                 int len);
 void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
                                 int len);
+void ff_vector_fmul_add_rvv(float *dst, const float *src0, const float *src1,
+                             const float *src2, int len);
 
 void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
                          int len);
@@ -46,6 +48,7 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
         fdsp->vector_fmul = ff_vector_fmul_rvv;
         fdsp->vector_fmac_scalar = ff_vector_fmac_scalar_rvv;
         fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
+        fdsp->vector_fmul_add = ff_vector_fmul_add_rvv;
 
         if (flags & AV_CPU_FLAG_ZVE64D) {
             fdsp->vector_dmul = ff_vector_dmul_rvv;
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index 5a7d92abd6..efbf12179f 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -73,6 +73,25 @@ NOHWF   mv       a2, a3
         ret
 endfunc
 
+// (a0) = (a1) * (a2) + (a3) [0..a4-1]
+func ff_vector_fmul_add_rvv, zve32f
+1:      vsetvli   t0, a4, e32, m8, ta, ma
+        slli      t1, t0, 2
+        vle32.v   v8, (a1)
+        add       a1, a1, t1
+        vle32.v   v16, (a2)
+        add       a2, a2, t1
+        vle32.v   v24, (a3)
+        add       a3, a3, t1
+        vfmadd.vv v8, v16, v24
+        sub       a4, a4, t0
+        vse32.v   v8, (a0)
+        add       a0, a0, t1
+        bnez      a4, 1b
+
+        ret
+endfunc
+
 // (a0) = (a1) * (a2) [0..a3-1]
 func ff_vector_dmul_rvv, zve64d
 1:      vsetvli  t0, a3, e64, m8, ta, ma
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 14/18] lavu/riscv: float vector sum-and-difference with RVV
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
                   ` (12 preceding siblings ...)
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 13/18] lavu/riscv: float vector multiplication-addition " remi
@ 2022-09-12 15:53 ` remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 15/18] lavu/riscv: float reversed vector multiplication " remi
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 libavutil/riscv/float_dsp_init.c |  2 ++
 libavutil/riscv/float_dsp_rvv.S  | 18 ++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index 9bc1976d04..c2b72c3b25 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -31,6 +31,7 @@ void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
                                 int len);
 void ff_vector_fmul_add_rvv(float *dst, const float *src0, const float *src1,
                              const float *src2, int len);
+void ff_butterflies_float_rvv(float *v1, float *v2, int len);
 
 void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
                          int len);
@@ -49,6 +50,7 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
         fdsp->vector_fmac_scalar = ff_vector_fmac_scalar_rvv;
         fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
         fdsp->vector_fmul_add = ff_vector_fmul_add_rvv;
+        fdsp->butterflies_float = ff_butterflies_float_rvv;
 
         if (flags & AV_CPU_FLAG_ZVE64D) {
             fdsp->vector_dmul = ff_vector_dmul_rvv;
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index efbf12179f..1c3b08b94f 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -92,6 +92,24 @@ func ff_vector_fmul_add_rvv, zve32f
         ret
 endfunc
 
+// (a0) = (a0) + (a1), (a1) = (a0) - (a1) [0..a2-1]
+func ff_butterflies_float_rvv, zve32f
+1:      vsetvli  t0, a2, e32, m8, ta, ma
+        slli     t1, t0, 2
+        vle32.v  v16, (a0)
+        vle32.v  v24, (a1)
+        vfadd.vv v0, v16, v24
+        vfsub.vv v8, v16, v24
+        sub      a2, a2, t0
+        vse32.v  v0, (a0)
+        add      a0, a0, t1
+        vse32.v  v8, (a1)
+        add      a1, a1, t1
+        bnez     a2, 1b
+
+        ret
+endfunc
+
 // (a0) = (a1) * (a2) [0..a3-1]
 func ff_vector_dmul_rvv, zve64d
 1:      vsetvli  t0, a3, e64, m8, ta, ma
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 15/18] lavu/riscv: float reversed vector multiplication with RVV
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
                   ` (13 preceding siblings ...)
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 14/18] lavu/riscv: float vector sum-and-difference " remi
@ 2022-09-12 15:53 ` remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 16/18] lavu/riscv: float vector windowed overlap/add " remi
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 libavutil/riscv/float_dsp_init.c |  3 +++
 libavutil/riscv/float_dsp_rvv.S  | 22 ++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index c2b72c3b25..ae089d2fdb 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -31,6 +31,8 @@ void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
                                 int len);
 void ff_vector_fmul_add_rvv(float *dst, const float *src0, const float *src1,
                              const float *src2, int len);
+void ff_vector_fmul_reverse_rvv(float *dst, const float *src0,
+                                 const float *src1, int len);
 void ff_butterflies_float_rvv(float *v1, float *v2, int len);
 
 void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
@@ -50,6 +52,7 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
         fdsp->vector_fmac_scalar = ff_vector_fmac_scalar_rvv;
         fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
         fdsp->vector_fmul_add = ff_vector_fmul_add_rvv;
+        fdsp->vector_fmul_reverse = ff_vector_fmul_reverse_rvv;
         fdsp->butterflies_float = ff_butterflies_float_rvv;
 
         if (flags & AV_CPU_FLAG_ZVE64D) {
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index 1c3b08b94f..b376392294 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -92,6 +92,28 @@ func ff_vector_fmul_add_rvv, zve32f
         ret
 endfunc
 
+// (a0) = (a1) * reverse(a2) [0..a3-1]
+func ff_vector_fmul_reverse_rvv, zve32f
+        add      t3, a3, -1
+        li       t2, -4 // byte stride
+        slli     t3, t3, 2
+        add      a2, a2, t3
+
+1:      vsetvli  t0, a3, e32, m8, ta, ma
+        slli     t1, t0, 2
+        vle32.v  v16, (a1)
+        add      a1, a1, t1
+        vlse32.v v24, (a2), t2
+        sub      a2, a2, t1
+        vfmul.vv v16, v16, v24
+        sub      a3, a3, t0
+        vse32.v  v16, (a0)
+        add      a0, a0, t1
+        bnez     a3, 1b
+
+        ret
+endfunc
+
 // (a0) = (a0) + (a1), (a1) = (a0) - (a1) [0..a2-1]
 func ff_butterflies_float_rvv, zve32f
 1:      vsetvli  t0, a2, e32, m8, ta, ma
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 16/18] lavu/riscv: float vector windowed overlap/add with RVV
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
                   ` (14 preceding siblings ...)
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 15/18] lavu/riscv: float reversed vector multiplication " remi
@ 2022-09-12 15:53 ` remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 17/18] lavu/riscv: float vector dot product " remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 18/18] lavu/riscv: fixed vector sum-and-difference " remi
  17 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 libavutil/riscv/float_dsp_init.c |  3 +++
 libavutil/riscv/float_dsp_rvv.S  | 35 ++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index ae089d2fdb..cf8c995d7c 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -29,6 +29,8 @@ void ff_vector_fmac_scalar_rvv(float *dst, const float *src, float mul,
                                 int len);
 void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
                                 int len);
+void ff_vector_fmul_window_rvv(float *dst, const float *src0,
+                                const float *src1, const float *win, int len);
 void ff_vector_fmul_add_rvv(float *dst, const float *src0, const float *src1,
                              const float *src2, int len);
 void ff_vector_fmul_reverse_rvv(float *dst, const float *src0,
@@ -51,6 +53,7 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
         fdsp->vector_fmul = ff_vector_fmul_rvv;
         fdsp->vector_fmac_scalar = ff_vector_fmac_scalar_rvv;
         fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
+        fdsp->vector_fmul_window = ff_vector_fmul_window_rvv;
         fdsp->vector_fmul_add = ff_vector_fmul_add_rvv;
         fdsp->vector_fmul_reverse = ff_vector_fmul_reverse_rvv;
         fdsp->butterflies_float = ff_butterflies_float_rvv;
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index b376392294..65daaa2d27 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -73,6 +73,41 @@ NOHWF   mv       a2, a3
         ret
 endfunc
 
+func ff_vector_fmul_window_rvv, zve32f
+        // a0: dst, a1: src0, a2: src1, a3: window, a4: length
+        addi       t0, a4, -1
+        add        t1, t0, a4
+        slli       t0, t0, 2
+        slli       t1, t1, 2
+        add        a2, a2, t0
+        add        t0, a0, t1
+        add        t3, a3, t1
+        li         t1, -4 // byte stride
+
+1:      vsetvli    t2, a4, e32, m4, ta, ma
+        slli       t4, t2, 2
+        vle32.v    v16, (a1)
+        add        a1, a1, t4
+        vlse32.v   v20, (a2), t1
+        sub        a2, a2, t4
+        vle32.v    v24, (a3)
+        add        a3, a3, t4
+        vlse32.v   v28, (t3), t1
+        sub        t3, t3, t4
+        vfmul.vv   v0, v16, v28
+        sub        a4, a4, t2
+        vfmul.vv   v8, v16, v24
+        vfnmsac.vv v0, v20, v24
+        vfmacc.vv  v8, v20, v28
+        vse32.v    v0, (a0)
+        add        a0, a0, t4
+        vsse32.v   v8, (t0), t1
+        sub        t0, t0, t4
+        bnez       a4, 1b
+
+        ret
+endfunc
+
 // (a0) = (a1) * (a2) + (a3) [0..a4-1]
 func ff_vector_fmul_add_rvv, zve32f
 1:      vsetvli   t0, a4, e32, m8, ta, ma
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 17/18] lavu/riscv: float vector dot product with RVV
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
                   ` (15 preceding siblings ...)
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 16/18] lavu/riscv: float vector windowed overlap/add " remi
@ 2022-09-12 15:53 ` remi
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 18/18] lavu/riscv: fixed vector sum-and-difference " remi
  17 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 libavutil/riscv/float_dsp_init.c |  2 ++
 libavutil/riscv/float_dsp_rvv.S  | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index cf8c995d7c..055cdc7520 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -36,6 +36,7 @@ void ff_vector_fmul_add_rvv(float *dst, const float *src0, const float *src1,
 void ff_vector_fmul_reverse_rvv(float *dst, const float *src0,
                                  const float *src1, int len);
 void ff_butterflies_float_rvv(float *v1, float *v2, int len);
+float ff_scalarproduct_float_rvv(const float *v1, const float *v2, int len);
 
 void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
                          int len);
@@ -57,6 +58,7 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
         fdsp->vector_fmul_add = ff_vector_fmul_add_rvv;
         fdsp->vector_fmul_reverse = ff_vector_fmul_reverse_rvv;
         fdsp->butterflies_float = ff_butterflies_float_rvv;
+        fdsp->scalarproduct_float = ff_scalarproduct_float_rvv;
 
         if (flags & AV_CPU_FLAG_ZVE64D) {
             fdsp->vector_dmul = ff_vector_dmul_rvv;
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index 65daaa2d27..81bd0e510a 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -167,6 +167,27 @@ func ff_butterflies_float_rvv, zve32f
         ret
 endfunc
 
+// a0 = (a0).(a1) [0..a2-1]
+func ff_scalarproduct_float_rvv, zve32f
+        vsetvli      zero, zero, e32, m8, ta, ma
+        vmv.s.x      v8, zero
+
+1:      vsetvli      t0, a2, e32, m8, ta, ma
+        slli         t1, t0, 2
+        vle32.v      v16, (a0)
+        add          a0, a0, t1
+        vle32.v      v24, (a1)
+        add          a1, a1, t1
+        vfmul.vv     v16, v16, v24
+        sub          a2, a2, t0
+        vfredusum.vs v8, v16, v8
+        bnez         a2, 1b
+
+        vfmv.f.s fa0, v8
+NOHWF   fmv.x.w  a0, fa0
+        ret
+endfunc
+
 // (a0) = (a1) * (a2) [0..a3-1]
 func ff_vector_dmul_rvv, zve64d
 1:      vsetvli  t0, a3, e64, m8, ta, ma
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 18/18] lavu/riscv: fixed vector sum-and-difference with RVV
       [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
                   ` (16 preceding siblings ...)
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 17/18] lavu/riscv: float vector dot product " remi
@ 2022-09-12 15:53 ` remi
  17 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-12 15:53 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

---
 libavutil/fixed_dsp.c            |  4 +++-
 libavutil/fixed_dsp.h            |  1 +
 libavutil/riscv/Makefile         |  4 +++-
 libavutil/riscv/fixed_dsp_init.c | 36 ++++++++++++++++++++++++++++++
 libavutil/riscv/fixed_dsp_rvv.S  | 38 ++++++++++++++++++++++++++++++++
 5 files changed, 81 insertions(+), 2 deletions(-)
 create mode 100644 libavutil/riscv/fixed_dsp_init.c
 create mode 100644 libavutil/riscv/fixed_dsp_rvv.S

diff --git a/libavutil/fixed_dsp.c b/libavutil/fixed_dsp.c
index 154f3bc2d3..bc847949dc 100644
--- a/libavutil/fixed_dsp.c
+++ b/libavutil/fixed_dsp.c
@@ -162,7 +162,9 @@ AVFixedDSPContext * avpriv_alloc_fixed_dsp(int bit_exact)
     fdsp->butterflies_fixed = butterflies_fixed_c;
     fdsp->scalarproduct_fixed = scalarproduct_fixed_c;
 
-#if ARCH_X86
+#if ARCH_RISCV
+    ff_fixed_dsp_init_riscv(fdsp);
+#elif ARCH_X86
     ff_fixed_dsp_init_x86(fdsp);
 #endif
 
diff --git a/libavutil/fixed_dsp.h b/libavutil/fixed_dsp.h
index fec806ff2d..1217d3a53b 100644
--- a/libavutil/fixed_dsp.h
+++ b/libavutil/fixed_dsp.h
@@ -161,6 +161,7 @@ typedef struct AVFixedDSPContext {
  */
 AVFixedDSPContext * avpriv_alloc_fixed_dsp(int strict);
 
+void ff_fixed_dsp_init_riscv(AVFixedDSPContext *fdsp);
 void ff_fixed_dsp_init_x86(AVFixedDSPContext *fdsp);
 
 /**
diff --git a/libavutil/riscv/Makefile b/libavutil/riscv/Makefile
index 89a8d0d990..1597154ba5 100644
--- a/libavutil/riscv/Makefile
+++ b/libavutil/riscv/Makefile
@@ -1,3 +1,5 @@
 OBJS +=     riscv/float_dsp_init.o \
+            riscv/fixed_dsp_init.o \
             riscv/cpu.o
-RVV-OBJS += riscv/float_dsp_rvv.o
+RVV-OBJS += riscv/float_dsp_rvv.o \
+            riscv/fixed_dsp_rvv.o
diff --git a/libavutil/riscv/fixed_dsp_init.c b/libavutil/riscv/fixed_dsp_init.c
new file mode 100644
index 0000000000..fc143fb419
--- /dev/null
+++ b/libavutil/riscv/fixed_dsp_init.c
@@ -0,0 +1,36 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <stdint.h>
+
+#include "config.h"
+#include "libavutil/attributes.h"
+#include "libavutil/cpu.h"
+#include "libavutil/fixed_dsp.h"
+
+void ff_butterflies_fixed_rvv(int *v1, int *v2, int len);
+
+av_cold void ff_fixed_dsp_init_riscv(AVFixedDSPContext *fdsp)
+{
+#if HAVE_RVV
+    int flags = av_get_cpu_flags();
+
+    if (flags & AV_CPU_FLAG_ZVE32X)
+        fdsp->butterflies_fixed = ff_butterflies_fixed_rvv;
+#endif
+}
diff --git a/libavutil/riscv/fixed_dsp_rvv.S b/libavutil/riscv/fixed_dsp_rvv.S
new file mode 100644
index 0000000000..beb1b949f7
--- /dev/null
+++ b/libavutil/riscv/fixed_dsp_rvv.S
@@ -0,0 +1,38 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config.h"
+#include "asm.S"
+
+// (a0) = (a0) + (a1), (a1) = (a0) - (a1) [0..a2-1]
+func ff_butterflies_fixed_rvv, zve32x
+1:      vsetvli t0, a2, e32, m8, ta, ma
+        slli    t1, t0, 2
+        vle32.v v16, (a0)
+        vle32.v v24, (a1)
+        vadd.vv v0, v16, v24
+        vsub.vv v8, v16, v24
+        sub     a2, a2, t0
+        vse32.v v0, (a0)
+        add     a0, a0, t1
+        vse32.v v8, (a1)
+        add     a1, a1, t1
+        bnez    a2, 1b
+
+        ret
+endfunc
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [FFmpeg-devel] [PATCH 05/18] lavu/riscv: add <intmath.h> optimisations
  2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 05/18] lavu/riscv: add <intmath.h> optimisations remi
@ 2022-09-14  9:28   ` Rémi Denis-Courmont
  0 siblings, 0 replies; 20+ messages in thread
From: Rémi Denis-Courmont @ 2022-09-14  9:28 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

Hmm. It looks like I accidentally dropped a fix-up while rebasing/squashing and bow there's 8-bit instead of 16-bit clipping :-(

Will send the trivial fix in a few hours.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 02/18] lavu/riscv: AV_READ_TIME cycle counter
@ 2022-09-09 15:48 remi
  0 siblings, 0 replies; 20+ messages in thread
From: remi @ 2022-09-09 15:48 UTC (permalink / raw)
  To: ffmpeg-devel

From: Rémi Denis-Courmont <remi@remlab.net>

This uses the architected RISC-V 64-bit cycle counter from the
RISC-V unprivileged instruction set.

In 64-bit and 128-bit, this is a straightforward CSR read.
In 32-bit mode, the 64-bit value is exposed as two CSRs, which
cannot be read atomically, so a loop is necessary to detect and fix up
the race condition where the bottom half wraps exactly between the two
reads.
---
 libavutil/riscv/timer.h | 53 +++++++++++++++++++++++++++++++++++++++++
 libavutil/timer.h       |  2 ++
 2 files changed, 55 insertions(+)
 create mode 100644 libavutil/riscv/timer.h

diff --git a/libavutil/riscv/timer.h b/libavutil/riscv/timer.h
new file mode 100644
index 0000000000..a34157a566
--- /dev/null
+++ b/libavutil/riscv/timer.h
@@ -0,0 +1,53 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVUTIL_RISCV_TIMER_H
+#define AVUTIL_RISCV_TIMER_H
+
+#include "config.h"
+
+#if HAVE_INLINE_ASM
+#include <stdint.h>
+
+static inline uint64_t rdcycle64(void)
+{
+#if (__riscv_xlen >= 64)
+    uintptr_t cycles;
+
+    __asm__ volatile ("rdcycle %0" : "=r"(cycles));
+
+#else
+    uint64_t cycles;
+    uint32_t hi, lo, check;
+
+    __asm__ volatile (
+        "1: rdcycleh %0\n"
+        "   rdcycle  %1\n"
+        "   rdcycleh %2\n"
+        "   bne %0, %2, 1b\n" : "=r" (hi), "=r" (lo), "=r" (check));
+
+    cycles = (((uint64_t)hi) << 32) | lo;
+
+#endif
+    return cycles;
+}
+
+#define AV_READ_TIME rdcycle64
+
+#endif
+#endif /* AVUTIL_RISCV_TIMER_H */
diff --git a/libavutil/timer.h b/libavutil/timer.h
index 48e576739f..d3db5a27ef 100644
--- a/libavutil/timer.h
+++ b/libavutil/timer.h
@@ -57,6 +57,8 @@
 #   include "arm/timer.h"
 #elif ARCH_PPC
 #   include "ppc/timer.h"
+#elif ARCH_RISCV
+#   include "riscv/timer.h"
 #elif ARCH_X86
 #   include "x86/timer.h"
 #endif
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-09-14  9:28 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <2652141.mvXUDI8C0e@basile.remlab.net>
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 01/18] doc: reference the RISC-V specification remi
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 02/18] lavu/riscv: AV_READ_TIME cycle counter remi
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 03/18] configure/riscv: detect fast CLZ remi
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 04/18] lavu/riscv: byte-swap operations remi
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 05/18] lavu/riscv: add <intmath.h> optimisations remi
2022-09-14  9:28   ` Rémi Denis-Courmont
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 06/18] configure: probe RISC-V Vector extension remi
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 07/18] lavu/riscv: initial common header for assembler macros remi
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 08/18] lavu/riscv: add CPU flags for the RISC-V Vector extension remi
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 09/18] checkasm: register the RISC-V V subsets remi
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 10/18] lavu/riscv: float vector-scalar multiplication with RVV remi
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 11/18] lavu/riscv: float vector-vector " remi
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 12/18] lavu/riscv: float vector multiply-accumulate " remi
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 13/18] lavu/riscv: float vector multiplication-addition " remi
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 14/18] lavu/riscv: float vector sum-and-difference " remi
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 15/18] lavu/riscv: float reversed vector multiplication " remi
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 16/18] lavu/riscv: float vector windowed overlap/add " remi
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 17/18] lavu/riscv: float vector dot product " remi
2022-09-12 15:53 ` [FFmpeg-devel] [PATCH 18/18] lavu/riscv: fixed vector sum-and-difference " remi
2022-09-09 15:48 [FFmpeg-devel] [PATCH 02/18] lavu/riscv: AV_READ_TIME cycle counter remi

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git