* [FFmpeg-devel] [PATCH 1/4] riscv: probe for Zbb extension at load time
@ 2024-06-08 11:37 Rémi Denis-Courmont
2024-06-08 11:37 ` [FFmpeg-devel] [PATCH 2/4] lavu/riscv: use Zbb REV8 at run-time Rémi Denis-Courmont
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Rémi Denis-Courmont @ 2024-06-08 11:37 UTC (permalink / raw)
To: ffmpeg-devel
Due to hysterical raisins, most RISC-V Linux distributions target a
RV64GC baseline excluding the Bit-manipulation ISA extensions, most
notably:
- Zba: address generation extension and
- Zbb: basic bit manipulation extension.
Most CPUs that would make sense to run FFmpeg on support Zba and Zbb
(including the current FATE runner), so it makes sense to optimise for
them. In fact a large chunk of existing assembler optimisations relies
on Zba and/or Zbb.
Since we cannot patch shared library code, the next best thing is to
carry a flag initialised at load-time and check it on need basis.
This results in 3 instructions overhead on isolated use, e.g.:
1: AUIPC rd, %pcrel_hi(ff_rv_zbb_supported)
LBU rd, %pcrel_lo(1b)(rd)
BEQZ rd, non_Zbb_fallback_code
// Zbb code here
The C compiler will typically load the flag ahead of time to reducing
latency, and can also keep it around if Zbb is used multiple times in a
single optimisation scope. For this to work, the flag symbol must be
hidden; otherwise the optimisation degrades with a GOT look-up to
support interposition:
1: AUIPC rd, GOT_OFFSET_HI
LD rd, GOT_OFFSET_LO(rd)
LBU rd, (rd)
BEQZ rd, non_Zbb_fallback_code
// Zbb code here
This patch adds code to provision the flag in libraries using bit
manipulation functions from libavutil: byte-swap, bit-weight and
counting leading or trailing zeroes.
---
libavcodec/riscv/Makefile | 2 ++
libavcodec/riscv/cpu_common.c | 1 +
libavdevice/riscv/Makefile | 1 +
libavdevice/riscv/cpu_common.c | 1 +
libavfilter/riscv/Makefile | 2 ++
libavfilter/riscv/cpu_common.c | 1 +
libavformat/riscv/Makefile | 1 +
libavformat/riscv/cpu_common.c | 1 +
libavutil/riscv/Makefile | 3 ++-
libavutil/riscv/cpu.h | 14 ++++++++++++++
libavutil/riscv/cpu_common.c | 33 +++++++++++++++++++++++++++++++++
libswscale/riscv/Makefile | 2 ++
libswscale/riscv/cpu_common.c | 1 +
tests/ref/fate/source | 5 +++++
14 files changed, 67 insertions(+), 1 deletion(-)
create mode 100644 libavcodec/riscv/cpu_common.c
create mode 100644 libavdevice/riscv/Makefile
create mode 100644 libavdevice/riscv/cpu_common.c
create mode 100644 libavfilter/riscv/cpu_common.c
create mode 100644 libavformat/riscv/Makefile
create mode 100644 libavformat/riscv/cpu_common.c
create mode 100644 libavutil/riscv/cpu_common.c
create mode 100644 libswscale/riscv/cpu_common.c
diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile
index 590655f829..c180223141 100644
--- a/libavcodec/riscv/Makefile
+++ b/libavcodec/riscv/Makefile
@@ -77,3 +77,5 @@ RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o \
riscv/vp9_mc_rvv.o
OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_init.o
RVV-OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_rvv.o
+
+SHLIBOBJS += riscv/cpu_common.o
diff --git a/libavcodec/riscv/cpu_common.c b/libavcodec/riscv/cpu_common.c
new file mode 100644
index 0000000000..17c9b392c9
--- /dev/null
+++ b/libavcodec/riscv/cpu_common.c
@@ -0,0 +1 @@
+#include "libavutil/riscv/cpu_common.c"
diff --git a/libavdevice/riscv/Makefile b/libavdevice/riscv/Makefile
new file mode 100644
index 0000000000..52857aacba
--- /dev/null
+++ b/libavdevice/riscv/Makefile
@@ -0,0 +1 @@
+SHLIBOBJS += riscv/cpu_common.o
diff --git a/libavdevice/riscv/cpu_common.c b/libavdevice/riscv/cpu_common.c
new file mode 100644
index 0000000000..17c9b392c9
--- /dev/null
+++ b/libavdevice/riscv/cpu_common.c
@@ -0,0 +1 @@
+#include "libavutil/riscv/cpu_common.c"
diff --git a/libavfilter/riscv/Makefile b/libavfilter/riscv/Makefile
index 277dde2aed..14a4470d96 100644
--- a/libavfilter/riscv/Makefile
+++ b/libavfilter/riscv/Makefile
@@ -1,2 +1,4 @@
OBJS-$(CONFIG_AFIR_FILTER) += riscv/af_afir_init.o
RVV-OBJS-$(CONFIG_AFIR_FILTER) += riscv/af_afir_rvv.o
+
+SHLIBOBJS += riscv/cpu_common.o
diff --git a/libavfilter/riscv/cpu_common.c b/libavfilter/riscv/cpu_common.c
new file mode 100644
index 0000000000..17c9b392c9
--- /dev/null
+++ b/libavfilter/riscv/cpu_common.c
@@ -0,0 +1 @@
+#include "libavutil/riscv/cpu_common.c"
diff --git a/libavformat/riscv/Makefile b/libavformat/riscv/Makefile
new file mode 100644
index 0000000000..52857aacba
--- /dev/null
+++ b/libavformat/riscv/Makefile
@@ -0,0 +1 @@
+SHLIBOBJS += riscv/cpu_common.o
diff --git a/libavformat/riscv/cpu_common.c b/libavformat/riscv/cpu_common.c
new file mode 100644
index 0000000000..17c9b392c9
--- /dev/null
+++ b/libavformat/riscv/cpu_common.c
@@ -0,0 +1 @@
+#include "libavutil/riscv/cpu_common.c"
diff --git a/libavutil/riscv/Makefile b/libavutil/riscv/Makefile
index 7e9a51194b..5db4c432d9 100644
--- a/libavutil/riscv/Makefile
+++ b/libavutil/riscv/Makefile
@@ -1,7 +1,8 @@
OBJS += riscv/float_dsp_init.o \
riscv/fixed_dsp_init.o \
riscv/lls_init.o \
- riscv/cpu.o
+ riscv/cpu.o \
+ riscv/cpu_common.o
RVV-OBJS += riscv/float_dsp_rvv.o \
riscv/fixed_dsp_rvv.o \
riscv/lls_rvv.o
diff --git a/libavutil/riscv/cpu.h b/libavutil/riscv/cpu.h
index af1440f626..bb8e08aa14 100644
--- a/libavutil/riscv/cpu.h
+++ b/libavutil/riscv/cpu.h
@@ -24,8 +24,22 @@
#include "config.h"
#include <stdbool.h>
#include <stddef.h>
+#include "libavutil/attributes_internal.h"
#include "libavutil/cpu.h"
+#ifndef __riscv_zbb
+extern attribute_visibility_hidden bool ff_rv_zbb_supported;
+#endif
+
+static inline av_const bool ff_rv_zbb_support(void)
+{
+#ifndef __riscv_zbb
+ return ff_rv_zbb_supported;
+#else
+ return true;
+#endif
+}
+
#if HAVE_RVV
/**
* Returns the vector size in bytes (always a power of two and at least 4).
diff --git a/libavutil/riscv/cpu_common.c b/libavutil/riscv/cpu_common.c
new file mode 100644
index 0000000000..3ecf95809b
--- /dev/null
+++ b/libavutil/riscv/cpu_common.c
@@ -0,0 +1,33 @@
+/*
+ * Copyright © 2024 Rémi Denis-Courmont.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/cpu.h"
+
+#ifndef __riscv_zbb
+unsigned char ff_rv_zbb_supported = 0;
+
+#ifdef __ELF__
+__attribute__((constructor))
+static void probe_zbb(void)
+{
+ ff_rv_zbb_supported = (av_get_cpu_flags() & AV_CPU_FLAG_RVB_BASIC) != 0;
+}
+#endif
+#endif
diff --git a/libswscale/riscv/Makefile b/libswscale/riscv/Makefile
index 48afaf62aa..ea324bdc5f 100644
--- a/libswscale/riscv/Makefile
+++ b/libswscale/riscv/Makefile
@@ -1,3 +1,5 @@
OBJS += riscv/rgb2rgb.o
RV-OBJS += riscv/rgb2rgb_rvb.o
RVV-OBJS += riscv/rgb2rgb_rvv.o
+
+SHLIBOBJS += riscv/cpu_common.o
diff --git a/libswscale/riscv/cpu_common.c b/libswscale/riscv/cpu_common.c
new file mode 100644
index 0000000000..17c9b392c9
--- /dev/null
+++ b/libswscale/riscv/cpu_common.c
@@ -0,0 +1 @@
+#include "libavutil/riscv/cpu_common.c"
diff --git a/tests/ref/fate/source b/tests/ref/fate/source
index a3beb35093..0abeff8036 100644
--- a/tests/ref/fate/source
+++ b/tests/ref/fate/source
@@ -3,17 +3,22 @@ libavcodec/file_open.c
libavcodec/interplayacm.c
libavcodec/log2_tab.c
libavcodec/reverse.c
+libavcodec/riscv/cpu_common.c
libavdevice/file_open.c
libavdevice/reverse.c
+libavdevice/riscv/cpu_common.c
libavfilter/file_open.c
libavfilter/log2_tab.c
+libavfilter/riscv/cpu_common.c
libavformat/bitstream.c
libavformat/file_open.c
libavformat/golomb_tab.c
libavformat/log2_tab.c
libavformat/rangecoder_dec.c
+libavformat/riscv/cpu_common.c
libswresample/log2_tab.c
libswscale/log2_tab.c
+libswscale/riscv/cpu_common.c
tools/uncoded_frame.c
tools/yuvcmp.c
Headers without standard inclusion guards:
--
2.45.1
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* [FFmpeg-devel] [PATCH 2/4] lavu/riscv: use Zbb REV8 at run-time
2024-06-08 11:37 [FFmpeg-devel] [PATCH 1/4] riscv: probe for Zbb extension at load time Rémi Denis-Courmont
@ 2024-06-08 11:37 ` Rémi Denis-Courmont
2024-06-08 11:37 ` [FFmpeg-devel] [PATCH 3/4] lavu/riscv: use Zbb CPOP/CPOPW " Rémi Denis-Courmont
2024-06-08 11:37 ` [FFmpeg-devel] [PATCH 4/4] lavu/riscv: use Zbb CLZ/CTZ/CLZW/CTZW " Rémi Denis-Courmont
2 siblings, 0 replies; 6+ messages in thread
From: Rémi Denis-Courmont @ 2024-06-08 11:37 UTC (permalink / raw)
To: ffmpeg-devel
This adds runtime support to use Zbb REV8 for 32- and 64-bit byte-wise
swaps. The result is about five times slower than if targetting Zbb
statically, but still a lot faster than the default bespoke C code or a
call to GCC run-time functions.
For 16-bit swap, this is however unsurprisingly a lot worse, and so this
sticks to the baseline. In fact, even using REV8 statically does not
seem to be beneficial in that case.
Zbb static Zbb dynamic I baseline
bswap16: 0.668184765 3.340764069 0.668029012
bswap32: 0.668174014 3.340763319 9.353855435
bswap64: 0.668221765 3.340496313 14.698672283
(seconds for 1 billion iterations on a SiFive-U74 core)
---
libavutil/riscv/bswap.h | 44 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 42 insertions(+), 2 deletions(-)
diff --git a/libavutil/riscv/bswap.h b/libavutil/riscv/bswap.h
index ce75de974e..886893e241 100644
--- a/libavutil/riscv/bswap.h
+++ b/libavutil/riscv/bswap.h
@@ -22,11 +22,51 @@
#include <stdint.h>
#include "config.h"
#include "libavutil/attributes.h"
+#include "libavutil/riscv/cpu.h"
#if defined (__GNUC__) || defined (__clang__)
#define av_bswap16 __builtin_bswap16
-#define av_bswap32 __builtin_bswap32
-#define av_bswap64 __builtin_bswap64
+
+static av_always_inline av_const uint32_t av_bswap32_rv(uint32_t x)
+{
+#if HAVE_RV && !defined(__riscv_zbb)
+ if (!__builtin_constant_p(x) &&
+ __builtin_expect(ff_rv_zbb_support(), 1)) {
+ uintptr_t y;
+
+ __asm__ (
+ ".option push\n"
+ ".option arch, +zbb\n"
+ "rev8 %0, %1\n"
+ ".option pop" : "=r" (y) : "r" (x));
+ return y >> (__riscv_xlen - 32);
+ }
+#endif
+ return __builtin_bswap32(x);
+}
+#define av_bswap32 av_bswap32_rv
+
+#if __riscv_xlen >= 64
+static av_always_inline av_const uint64_t av_bswap64_rv(uint64_t x)
+{
+#if HAVE_RV && !defined(__riscv_zbb)
+ if (!__builtin_constant_p(x) &&
+ __builtin_expect(ff_rv_zbb_support(), 1)) {
+ uintptr_t y;
+
+ __asm__ (
+ ".option push\n"
+ ".option arch, +zbb\n"
+ "rev8 %0, %1\n"
+ ".option pop" : "=r" (y) : "r" (x));
+ return y >> (__riscv_xlen - 64);
+ }
+#endif
+ return __builtin_bswap64(x);
+}
+#define av_bswap64 av_bswap64_rv
+#endif
+
#endif
#endif /* AVUTIL_RISCV_BSWAP_H */
--
2.45.1
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* [FFmpeg-devel] [PATCH 3/4] lavu/riscv: use Zbb CPOP/CPOPW at run-time
2024-06-08 11:37 [FFmpeg-devel] [PATCH 1/4] riscv: probe for Zbb extension at load time Rémi Denis-Courmont
2024-06-08 11:37 ` [FFmpeg-devel] [PATCH 2/4] lavu/riscv: use Zbb REV8 at run-time Rémi Denis-Courmont
@ 2024-06-08 11:37 ` Rémi Denis-Courmont
2024-06-08 11:37 ` [FFmpeg-devel] [PATCH 4/4] lavu/riscv: use Zbb CLZ/CTZ/CLZW/CTZW " Rémi Denis-Courmont
2 siblings, 0 replies; 6+ messages in thread
From: Rémi Denis-Courmont @ 2024-06-08 11:37 UTC (permalink / raw)
To: ffmpeg-devel
Zbb static Zbb dynamic I baseline
popcount 1.336129286 3.469067758 20.146362909
popcountl 1.336322291 3.340292968 20.224829821
(seconds for 1 billion iterations on a SiFive-U74 core)
---
libavutil/riscv/intmath.h | 73 ++++++++++++++++++++++++++++++++++++---
1 file changed, 69 insertions(+), 4 deletions(-)
diff --git a/libavutil/riscv/intmath.h b/libavutil/riscv/intmath.h
index ae9ee7775b..1f0afbc81d 100644
--- a/libavutil/riscv/intmath.h
+++ b/libavutil/riscv/intmath.h
@@ -1,4 +1,6 @@
/*
+ * Copyright © 2022-2024 Rémi Denis-Courmont.
+ *
* This file is part of FFmpeg.
*
* FFmpeg is free software; you can redistribute it and/or
@@ -23,6 +25,7 @@
#include "config.h"
#include "libavutil/attributes.h"
+#include "libavutil/riscv/cpu.h"
/*
* The compiler is forced to sign-extend the result anyhow, so it is faster to
@@ -70,12 +73,74 @@ static av_always_inline av_const int av_clip_intp2_rvi(int a, int p)
}
#if defined (__GNUC__) || defined (__clang__)
-#define av_popcount __builtin_popcount
-#if (__riscv_xlen >= 64)
-#define av_popcount64 __builtin_popcountl
+static inline av_const int av_popcount_rv(unsigned int x)
+{
+#if HAVE_RV && !defined(__riscv_zbb)
+ if (!__builtin_constant_p(x) &&
+ __builtin_expect(ff_rv_zbb_support(), true)) {
+ int y;
+
+ __asm__ (
+ ".option push\n"
+ ".option arch, +zbb\n"
+#if __riscv_xlen >= 64
+ "cpopw %0, %1\n"
#else
-#define av_popcount64 __builtin_popcountll
+ "cpop %0, %1\n"
+#endif
+ ".option pop" : "=r" (y) : "r" (x));
+ if (y > 32)
+ __builtin_unreachable();
+ return y;
+ }
+#endif
+ return __builtin_popcount(x);
+}
+#define av_popcount av_popcount_rv
+
+static inline av_const int av_popcount64_rv(uint64_t x)
+{
+#if HAVE_RV && !defined(__riscv_zbb) && __riscv_xlen >= 64
+ if (!__builtin_constant_p(x) &&
+ __builtin_expect(ff_rv_zbb_support(), true)) {
+ int y;
+
+ __asm__ (
+ ".option push\n"
+ ".option arch, +zbb\n"
+ "cpop %0, %1\n"
+ ".option pop" : "=r" (y) : "r" (x));
+ if (y > 64)
+ __builtin_unreachable();
+ return y;
+ }
#endif
+ return __builtin_popcountl(x);
+}
+#define av_popcount64 av_popcount64_rv
+
+static inline av_const int av_parity_rv(unsigned int x)
+{
+#if HAVE_RV && !defined(__riscv_zbb)
+ if (!__builtin_constant_p(x) &&
+ __builtin_expect(ff_rv_zbb_support(), true)) {
+ int y;
+
+ __asm__ (
+ ".option push\n"
+ ".option arch, +zbb\n"
+#if __riscv_xlen >= 64
+ "cpopw %0, %1\n"
+#else
+ "cpop %0, %1\n"
+#endif
+ ".option pop" : "=r" (y) : "r" (x));
+ return y & 1;
+ }
+#endif
+ return __builtin_parity(x);
+}
+#define av_parity av_parity_rv
#endif
#endif /* AVUTIL_RISCV_INTMATH_H */
--
2.45.1
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* [FFmpeg-devel] [PATCH 4/4] lavu/riscv: use Zbb CLZ/CTZ/CLZW/CTZW at run-time
2024-06-08 11:37 [FFmpeg-devel] [PATCH 1/4] riscv: probe for Zbb extension at load time Rémi Denis-Courmont
2024-06-08 11:37 ` [FFmpeg-devel] [PATCH 2/4] lavu/riscv: use Zbb REV8 at run-time Rémi Denis-Courmont
2024-06-08 11:37 ` [FFmpeg-devel] [PATCH 3/4] lavu/riscv: use Zbb CPOP/CPOPW " Rémi Denis-Courmont
@ 2024-06-08 11:37 ` Rémi Denis-Courmont
2024-06-08 18:01 ` Lynne via ffmpeg-devel
2 siblings, 1 reply; 6+ messages in thread
From: Rémi Denis-Courmont @ 2024-06-08 11:37 UTC (permalink / raw)
To: ffmpeg-devel
Zbb static Zbb dynamic I baseline
clz 0.668032642 1.336072283 19.552376803
clzl 0.668092643 1.336181786 26.110855571
ctz 1.336208533 3.340209702 26.054869008
ctzl 1.336247784 3.340362457 26.055266290
(seconds for 1 billion iterations on a SiFive-U74 core)
---
libavutil/riscv/intmath.h | 101 ++++++++++++++++++++++++++++++++++++++
1 file changed, 101 insertions(+)
diff --git a/libavutil/riscv/intmath.h b/libavutil/riscv/intmath.h
index 1f0afbc81d..3e7ab864c5 100644
--- a/libavutil/riscv/intmath.h
+++ b/libavutil/riscv/intmath.h
@@ -73,6 +73,107 @@ static av_always_inline av_const int av_clip_intp2_rvi(int a, int p)
}
#if defined (__GNUC__) || defined (__clang__)
+static inline av_const int ff_ctz_rv(int x)
+{
+#if HAVE_RV && !defined(__riscv_zbb)
+ if (!__builtin_constant_p(x) &&
+ __builtin_expect(ff_rv_zbb_support(), true)) {
+ int y;
+
+ __asm__ (
+ ".option push\n"
+ ".option arch, +zbb\n"
+#if __riscv_xlen >= 64
+ "ctzw %0, %1\n"
+#else
+ "ctz %0, %1\n"
+#endif
+ ".option pop" : "=r" (y) : "r" (x));
+ if (y > 32)
+ __builtin_unreachable();
+ return y;
+ }
+#endif
+ return __builtin_ctz(x);
+}
+#define ff_ctz ff_ctz_rv
+
+static inline av_const int ff_ctzll_rv(long long x)
+{
+#if HAVE_RV && !defined(__riscv_zbb) && __riscv_xlen == 64
+ if (!__builtin_constant_p(x) &&
+ __builtin_expect(ff_rv_zbb_support(), true)) {
+ int y;
+
+ __asm__ (
+ ".option push\n"
+ ".option arch, +zbb\n"
+ "ctz %0, %1\n"
+ ".option pop" : "=r" (y) : "r" (x));
+ if (y > 64)
+ __builtin_unreachable();
+ return y;
+ }
+#endif
+ return __builtin_ctzll(x);
+}
+#define ff_ctzll ff_ctzll_rv
+
+static inline av_const int ff_clz_rv(int x)
+{
+#if HAVE_RV && !defined(__riscv_zbb)
+ if (!__builtin_constant_p(x) &&
+ __builtin_expect(ff_rv_zbb_support(), true)) {
+ int y;
+
+ __asm__ (
+ ".option push\n"
+ ".option arch, +zbb\n"
+#if __riscv_xlen >= 64
+ "clzw %0, %1\n"
+#else
+ "clz %0, %1\n"
+#endif
+ ".option pop" : "=r" (y) : "r" (x));
+ if (y > 32)
+ __builtin_unreachable();
+ return y;
+ }
+#endif
+ return __builtin_clz(x);
+}
+#define ff_clz ff_clz_rv
+
+#if __riscv_xlen == 64
+static inline av_const int ff_clzll_rv(long long x)
+{
+#if HAVE_RV && !defined(__riscv_zbb)
+ if (!__builtin_constant_p(x) &&
+ __builtin_expect(ff_rv_zbb_support(), true)) {
+ int y;
+
+ __asm__ (
+ ".option push\n"
+ ".option arch, +zbb\n"
+ "clz %0, %1\n"
+ ".option pop" : "=r" (y) : "r" (x));
+ if (y > 64)
+ __builtin_unreachable();
+ return y;
+ }
+#endif
+ return __builtin_clzll(x);
+}
+#define ff_clz ff_clz_rv
+#endif
+
+static inline av_const int ff_log2_rv(unsigned int x)
+{
+ return 31 - ff_clz_rv(x | 1);
+}
+#define ff_log2 ff_log2_rv
+#define ff_log2_16bit ff_log2_rv
+
static inline av_const int av_popcount_rv(unsigned int x)
{
#if HAVE_RV && !defined(__riscv_zbb)
--
2.45.1
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [FFmpeg-devel] [PATCH 4/4] lavu/riscv: use Zbb CLZ/CTZ/CLZW/CTZW at run-time
2024-06-08 11:37 ` [FFmpeg-devel] [PATCH 4/4] lavu/riscv: use Zbb CLZ/CTZ/CLZW/CTZW " Rémi Denis-Courmont
@ 2024-06-08 18:01 ` Lynne via ffmpeg-devel
2024-06-08 18:17 ` Rémi Denis-Courmont
0 siblings, 1 reply; 6+ messages in thread
From: Lynne via ffmpeg-devel @ 2024-06-08 18:01 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Lynne
[-- Attachment #1.1.1.1: Type: text/plain, Size: 3782 bytes --]
On 08/06/2024 13:37, Rémi Denis-Courmont wrote:
> Zbb static Zbb dynamic I baseline
> clz 0.668032642 1.336072283 19.552376803
> clzl 0.668092643 1.336181786 26.110855571
> ctz 1.336208533 3.340209702 26.054869008
> ctzl 1.336247784 3.340362457 26.055266290
> (seconds for 1 billion iterations on a SiFive-U74 core)
> ---
> libavutil/riscv/intmath.h | 101 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 101 insertions(+)
>
> diff --git a/libavutil/riscv/intmath.h b/libavutil/riscv/intmath.h
> index 1f0afbc81d..3e7ab864c5 100644
> --- a/libavutil/riscv/intmath.h
> +++ b/libavutil/riscv/intmath.h
> @@ -73,6 +73,107 @@ static av_always_inline av_const int av_clip_intp2_rvi(int a, int p)
> }
>
> #if defined (__GNUC__) || defined (__clang__)
> +static inline av_const int ff_ctz_rv(int x)
> +{
> +#if HAVE_RV && !defined(__riscv_zbb)
> + if (!__builtin_constant_p(x) &&
> + __builtin_expect(ff_rv_zbb_support(), true)) {
> + int y;
> +
> + __asm__ (
> + ".option push\n"
> + ".option arch, +zbb\n"
> +#if __riscv_xlen >= 64
> + "ctzw %0, %1\n"
> +#else
> + "ctz %0, %1\n"
> +#endif
> + ".option pop" : "=r" (y) : "r" (x));
> + if (y > 32)
> + __builtin_unreachable();
> + return y;
> + }
> +#endif
> + return __builtin_ctz(x);
> +}
> +#define ff_ctz ff_ctz_rv
> +
> +static inline av_const int ff_ctzll_rv(long long x)
> +{
> +#if HAVE_RV && !defined(__riscv_zbb) && __riscv_xlen == 64
> + if (!__builtin_constant_p(x) &&
> + __builtin_expect(ff_rv_zbb_support(), true)) {
> + int y;
> +
> + __asm__ (
> + ".option push\n"
> + ".option arch, +zbb\n"
> + "ctz %0, %1\n"
> + ".option pop" : "=r" (y) : "r" (x));
> + if (y > 64)
> + __builtin_unreachable();
> + return y;
> + }
> +#endif
> + return __builtin_ctzll(x);
> +}
> +#define ff_ctzll ff_ctzll_rv
> +
> +static inline av_const int ff_clz_rv(int x)
> +{
> +#if HAVE_RV && !defined(__riscv_zbb)
> + if (!__builtin_constant_p(x) &&
> + __builtin_expect(ff_rv_zbb_support(), true)) {
> + int y;
> +
> + __asm__ (
> + ".option push\n"
> + ".option arch, +zbb\n"
> +#if __riscv_xlen >= 64
> + "clzw %0, %1\n"
> +#else
> + "clz %0, %1\n"
> +#endif
> + ".option pop" : "=r" (y) : "r" (x));
> + if (y > 32)
> + __builtin_unreachable();
> + return y;
> + }
> +#endif
> + return __builtin_clz(x);
> +}
> +#define ff_clz ff_clz_rv
> +
> +#if __riscv_xlen == 64
> +static inline av_const int ff_clzll_rv(long long x)
> +{
> +#if HAVE_RV && !defined(__riscv_zbb)
> + if (!__builtin_constant_p(x) &&
> + __builtin_expect(ff_rv_zbb_support(), true)) {
> + int y;
> +
> + __asm__ (
> + ".option push\n"
> + ".option arch, +zbb\n"
> + "clz %0, %1\n"
> + ".option pop" : "=r" (y) : "r" (x));
> + if (y > 64)
> + __builtin_unreachable();
> + return y;
> + }
> +#endif
> + return __builtin_clzll(x);
> +}
> +#define ff_clz ff_clz_rv
> +#endif
> +
> +static inline av_const int ff_log2_rv(unsigned int x)
> +{
> + return 31 - ff_clz_rv(x | 1);
> +}
> +#define ff_log2 ff_log2_rv
> +#define ff_log2_16bit ff_log2_rv
> +
> static inline av_const int av_popcount_rv(unsigned int x)
> {
> #if HAVE_RV && !defined(__riscv_zbb)
Could you add a ./configure flag or a check for enabling non-dynamic Zbb?
[-- Attachment #1.1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 637 bytes --]
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
[-- Attachment #2: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [FFmpeg-devel] [PATCH 4/4] lavu/riscv: use Zbb CLZ/CTZ/CLZW/CTZW at run-time
2024-06-08 18:01 ` Lynne via ffmpeg-devel
@ 2024-06-08 18:17 ` Rémi Denis-Courmont
0 siblings, 0 replies; 6+ messages in thread
From: Rémi Denis-Courmont @ 2024-06-08 18:17 UTC (permalink / raw)
To: ffmpeg-devel
Le lauantaina 8. kesäkuuta 2024, 21.01.08 EEST Lynne via ffmpeg-devel a écrit :
> > #if HAVE_RV && !defined(__riscv_zbb)
>
> Could you add a ./configure flag or a check for enabling non-dynamic Zbb?
That's defined by the compiler target architecture and/or CPU. Adding a
configure flag wouldn't work since the compiler wouldn't know that it can use
emit Zbb instructions for the corresponding built-ins.
And before you ask, to my knowledge, GCC exposes no ad-hoc `-m` flags. This has
to be included in -march or -mcpu instead, e.g.:
CFLAGS="-march=rv64gc_zba_zbb"
--
雷米‧德尼-库尔蒙
http://www.remlab.net/
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-06-08 18:17 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-08 11:37 [FFmpeg-devel] [PATCH 1/4] riscv: probe for Zbb extension at load time Rémi Denis-Courmont
2024-06-08 11:37 ` [FFmpeg-devel] [PATCH 2/4] lavu/riscv: use Zbb REV8 at run-time Rémi Denis-Courmont
2024-06-08 11:37 ` [FFmpeg-devel] [PATCH 3/4] lavu/riscv: use Zbb CPOP/CPOPW " Rémi Denis-Courmont
2024-06-08 11:37 ` [FFmpeg-devel] [PATCH 4/4] lavu/riscv: use Zbb CLZ/CTZ/CLZW/CTZW " Rémi Denis-Courmont
2024-06-08 18:01 ` Lynne via ffmpeg-devel
2024-06-08 18:17 ` Rémi Denis-Courmont
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git