* [FFmpeg-devel] [PATCH 1/6] libavcodec/riscv: add RVV optimized for qpel_h in HEVC.
@ 2026-01-22 4:23 zhanheng.yang--- via ffmpeg-devel
2026-01-22 4:23 ` [FFmpeg-devel] [PATCH 2/6] libavcodec/riscv: add RVV optimized for qpel_v " zhanheng.yang--- via ffmpeg-devel
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: zhanheng.yang--- via ffmpeg-devel @ 2026-01-22 4:23 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Zhanheng Yang
From: Zhanheng Yang <zhanheng.yang@linux.alibaba.com>
Bench on A210 C908 core(VLEN 128).
put_hevc_qpel_h4_8_c: 275.4 ( 1.00x)
put_hevc_qpel_h4_8_rvv_i32: 142.9 ( 1.93x)
put_hevc_qpel_h6_8_c: 595.2 ( 1.00x)
put_hevc_qpel_h6_8_rvv_i32: 209.7 ( 2.84x)
put_hevc_qpel_h8_8_c: 1044.0 ( 1.00x)
put_hevc_qpel_h8_8_rvv_i32: 287.2 ( 3.63x)
put_hevc_qpel_h12_8_c: 2371.0 ( 1.00x)
put_hevc_qpel_h12_8_rvv_i32: 419.5 ( 5.65x)
put_hevc_qpel_h16_8_c: 4187.2 ( 1.00x)
put_hevc_qpel_h16_8_rvv_i32: 530.8 ( 7.89x)
put_hevc_qpel_h24_8_c: 9276.4 ( 1.00x)
put_hevc_qpel_h24_8_rvv_i32: 1509.6 ( 6.15x)
put_hevc_qpel_h32_8_c: 16417.8 ( 1.00x)
put_hevc_qpel_h32_8_rvv_i32: 1984.3 ( 8.27x)
put_hevc_qpel_h48_8_c: 36812.8 ( 1.00x)
put_hevc_qpel_h48_8_rvv_i32: 4390.6 ( 8.38x)
put_hevc_qpel_h64_8_c: 65296.8 ( 1.00x)
put_hevc_qpel_h64_8_rvv_i32: 7745.0 ( 8.43x)
put_hevc_qpel_uni_h4_8_c: 374.8 ( 1.00x)
put_hevc_qpel_uni_h4_8_rvv_i32: 162.9 ( 2.30x)
put_hevc_qpel_uni_h6_8_c: 818.6 ( 1.00x)
put_hevc_qpel_uni_h6_8_rvv_i32: 236.3 ( 3.46x)
put_hevc_qpel_uni_h8_8_c: 1504.3 ( 1.00x)
put_hevc_qpel_uni_h8_8_rvv_i32: 309.3 ( 4.86x)
put_hevc_qpel_uni_h12_8_c: 3239.2 ( 1.00x)
put_hevc_qpel_uni_h12_8_rvv_i32: 448.0 ( 7.23x)
put_hevc_qpel_uni_h16_8_c: 5702.9 ( 1.00x)
put_hevc_qpel_uni_h16_8_rvv_i32: 589.3 ( 9.68x)
put_hevc_qpel_uni_h24_8_c: 12741.4 ( 1.00x)
put_hevc_qpel_uni_h24_8_rvv_i32: 1650.3 ( 7.72x)
put_hevc_qpel_uni_h32_8_c: 22531.3 ( 1.00x)
put_hevc_qpel_uni_h32_8_rvv_i32: 2189.1 (10.29x)
put_hevc_qpel_uni_h48_8_c: 50647.0 ( 1.00x)
put_hevc_qpel_uni_h48_8_rvv_i32: 4817.0 (10.51x)
put_hevc_qpel_uni_h64_8_c: 89742.9 ( 1.00x)
put_hevc_qpel_uni_h64_8_rvv_i32: 8497.9 (10.56x)
put_hevc_qpel_uni_hv4_8_c: 920.4 ( 1.00x)
put_hevc_qpel_uni_hv4_8_rvv_i32: 532.1 ( 1.73x)
put_hevc_qpel_uni_hv6_8_c: 1753.0 ( 1.00x)
put_hevc_qpel_uni_hv6_8_rvv_i32: 691.0 ( 2.54x)
put_hevc_qpel_uni_hv8_8_c: 2872.7 ( 1.00x)
put_hevc_qpel_uni_hv8_8_rvv_i32: 836.9 ( 3.43x)
put_hevc_qpel_uni_hv12_8_c: 5828.4 ( 1.00x)
put_hevc_qpel_uni_hv12_8_rvv_i32: 1141.2 ( 5.11x)
put_hevc_qpel_uni_hv16_8_c: 9906.7 ( 1.00x)
put_hevc_qpel_uni_hv16_8_rvv_i32: 1452.5 ( 6.82x)
put_hevc_qpel_uni_hv24_8_c: 20871.3 ( 1.00x)
put_hevc_qpel_uni_hv24_8_rvv_i32: 4094.0 ( 5.10x)
put_hevc_qpel_uni_hv32_8_c: 36123.3 ( 1.00x)
put_hevc_qpel_uni_hv32_8_rvv_i32: 5310.5 ( 6.80x)
put_hevc_qpel_uni_hv48_8_c: 79016.0 ( 1.00x)
put_hevc_qpel_uni_hv48_8_rvv_i32: 11591.2 ( 6.82x)
put_hevc_qpel_uni_hv64_8_c: 138779.8 ( 1.00x)
put_hevc_qpel_uni_hv64_8_rvv_i32: 20321.1 ( 6.83x)
put_hevc_qpel_uni_w_h4_8_c: 412.1 ( 1.00x)
put_hevc_qpel_uni_w_h4_8_rvv_i32: 237.3 ( 1.74x)
put_hevc_qpel_uni_w_h6_8_c: 895.9 ( 1.00x)
put_hevc_qpel_uni_w_h6_8_rvv_i32: 345.6 ( 2.59x)
put_hevc_qpel_uni_w_h8_8_c: 1625.4 ( 1.00x)
put_hevc_qpel_uni_w_h8_8_rvv_i32: 452.4 ( 3.59x)
put_hevc_qpel_uni_w_h12_8_c: 3541.2 ( 1.00x)
put_hevc_qpel_uni_w_h12_8_rvv_i32: 663.6 ( 5.34x)
put_hevc_qpel_uni_w_h16_8_c: 6290.3 ( 1.00x)
put_hevc_qpel_uni_w_h16_8_rvv_i32: 875.7 ( 7.18x)
put_hevc_qpel_uni_w_h24_8_c: 13994.9 ( 1.00x)
put_hevc_qpel_uni_w_h24_8_rvv_i32: 2475.0 ( 5.65x)
put_hevc_qpel_uni_w_h32_8_c: 24852.3 ( 1.00x)
put_hevc_qpel_uni_w_h32_8_rvv_i32: 3291.2 ( 7.55x)
put_hevc_qpel_uni_w_h48_8_c: 55595.5 ( 1.00x)
put_hevc_qpel_uni_w_h48_8_rvv_i32: 7297.4 ( 7.62x)
put_hevc_qpel_uni_w_h64_8_c: 98628.2 ( 1.00x)
put_hevc_qpel_uni_w_h64_8_rvv_i32: 12883.2 ( 7.66x)
put_hevc_qpel_bi_h4_8_c: 392.6 ( 1.00x)
put_hevc_qpel_bi_h4_8_rvv_i32: 186.1 ( 2.11x)
put_hevc_qpel_bi_h6_8_c: 842.3 ( 1.00x)
put_hevc_qpel_bi_h6_8_rvv_i32: 267.8 ( 3.15x)
put_hevc_qpel_bi_h8_8_c: 1546.4 ( 1.00x)
put_hevc_qpel_bi_h8_8_rvv_i32: 353.7 ( 4.37x)
put_hevc_qpel_bi_h12_8_c: 3317.2 ( 1.00x)
put_hevc_qpel_bi_h12_8_rvv_i32: 515.1 ( 6.44x)
put_hevc_qpel_bi_h16_8_c: 5848.3 ( 1.00x)
put_hevc_qpel_bi_h16_8_rvv_i32: 680.9 ( 8.59x)
put_hevc_qpel_bi_h24_8_c: 13032.6 ( 1.00x)
put_hevc_qpel_bi_h24_8_rvv_i32: 1880.8 ( 6.93x)
put_hevc_qpel_bi_h32_8_c: 23021.1 ( 1.00x)
put_hevc_qpel_bi_h32_8_rvv_i32: 2498.5 ( 9.21x)
put_hevc_qpel_bi_h48_8_c: 51655.9 ( 1.00x)
put_hevc_qpel_bi_h48_8_rvv_i32: 5486.3 ( 9.42x)
put_hevc_qpel_bi_h64_8_c: 91738.7 ( 1.00x)
put_hevc_qpel_bi_h64_8_rvv_i32: 9735.0 ( 9.42x)
Signed-off-by: Zhanheng Yang <zhanheng.yang@linux.alibaba.com>
---
libavcodec/riscv/Makefile | 3 +-
libavcodec/riscv/h26x/h2656dsp.h | 12 ++
libavcodec/riscv/h26x/hevcqpel_rvv.S | 309 +++++++++++++++++++++++++++
libavcodec/riscv/hevcdsp_init.c | 55 +++--
4 files changed, 364 insertions(+), 15 deletions(-)
create mode 100644 libavcodec/riscv/h26x/hevcqpel_rvv.S
diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile
index 2c53334923..414790ae0c 100644
--- a/libavcodec/riscv/Makefile
+++ b/libavcodec/riscv/Makefile
@@ -36,7 +36,8 @@ RVV-OBJS-$(CONFIG_H264DSP) += riscv/h264addpx_rvv.o riscv/h264dsp_rvv.o \
OBJS-$(CONFIG_H264QPEL) += riscv/h264qpel_init.o
RVV-OBJS-$(CONFIG_H264QPEL) += riscv/h264qpel_rvv.o
OBJS-$(CONFIG_HEVC_DECODER) += riscv/hevcdsp_init.o
-RVV-OBJS-$(CONFIG_HEVC_DECODER) += riscv/h26x/h2656_inter_rvv.o
+RVV-OBJS-$(CONFIG_HEVC_DECODER) += riscv/h26x/h2656_inter_rvv.o \
+ riscv/h26x/hevcqpel_rvv.o
OBJS-$(CONFIG_HUFFYUV_DECODER) += riscv/huffyuvdsp_init.o
RVV-OBJS-$(CONFIG_HUFFYUV_DECODER) += riscv/huffyuvdsp_rvv.o
OBJS-$(CONFIG_IDCTDSP) += riscv/idctdsp_init.o
diff --git a/libavcodec/riscv/h26x/h2656dsp.h b/libavcodec/riscv/h26x/h2656dsp.h
index 6d2ac55556..028b9ffbfd 100644
--- a/libavcodec/riscv/h26x/h2656dsp.h
+++ b/libavcodec/riscv/h26x/h2656dsp.h
@@ -1,5 +1,6 @@
/*
* Copyright (c) 2024 Institute of Software Chinese Academy of Sciences (ISCAS).
+ * Copyright (C) 2026 Alibaba Group Holding Limited.
*
* This file is part of FFmpeg.
*
@@ -24,4 +25,15 @@
void ff_h2656_put_pixels_8_rvv_256(int16_t *dst, const uint8_t *src, ptrdiff_t srcstride, int height, intptr_t mx, intptr_t my, int width);
void ff_h2656_put_pixels_8_rvv_128(int16_t *dst, const uint8_t *src, ptrdiff_t srcstride, int height, intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_qpel_h_8_m1_rvv(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height,
+ intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_qpel_uni_h_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src,
+ ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_qpel_uni_w_h_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride,
+ const uint8_t *_src, ptrdiff_t _srcstride,
+ int height, int denom, int wx, int ox,
+ intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_qpel_bi_h_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src,
+ ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t
+ mx, intptr_t my, int width);
#endif
diff --git a/libavcodec/riscv/h26x/hevcqpel_rvv.S b/libavcodec/riscv/h26x/hevcqpel_rvv.S
new file mode 100644
index 0000000000..52d7acac33
--- /dev/null
+++ b/libavcodec/riscv/h26x/hevcqpel_rvv.S
@@ -0,0 +1,309 @@
+ /*
+ * Copyright (C) 2026 Alibaba Group Holding Limited.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+.data
+.align 2
+qpel_filters:
+ .byte 0, 0, 0, 0, 0, 0, 0, 0
+ .byte -1, 4, -10, 58, 17, -5, 1, 0
+ .byte -1, 4, -11, 40, 40, -11, 4, -1
+ .byte 0, 1, -5, 17, 58, -10, 4, -1
+
+.text
+#include "libavutil/riscv/asm.S"
+#define HEVC_MAX_PB_SIZE 64
+
+.macro lx rd, addr
+#if (__riscv_xlen == 32)
+ lw \rd, \addr
+#elif (__riscv_xlen == 64)
+ ld \rd, \addr
+#else
+ lq \rd, \addr
+#endif
+.endm
+
+.macro sx rd, addr
+#if (__riscv_xlen == 32)
+ sw \rd, \addr
+#elif (__riscv_xlen == 64)
+ sd \rd, \addr
+#else
+ sq \rd, \addr
+#endif
+.endm
+
+/* clobbers t0, t1 */
+.macro load_filter m
+ la t0, qpel_filters
+ slli t1, \m, 3
+ add t0, t0, t1
+ lb s1, 0(t0)
+ lb s2, 1(t0)
+ lb s3, 2(t0)
+ lb s4, 3(t0)
+ lb s5, 4(t0)
+ lb s6, 5(t0)
+ lb s7, 6(t0)
+ lb s8, 7(t0)
+.endm
+
+/* output is unclipped; clobbers t4 */
+.macro filter_h vdst, vsrc0, vsrc1, vsrc2, vsrc3, vsrc4, vsrc5, vsrc6, vsrc7, src
+ addi t4, \src, -3
+ vle8.v \vsrc0, (t4)
+ addi t4, \src, -2
+ vmv.v.x \vsrc3, s1
+ vwmulsu.vv \vdst, \vsrc3, \vsrc0
+ vle8.v \vsrc1, (t4)
+ addi t4, \src, -1
+ vle8.v \vsrc2, (t4)
+ vle8.v \vsrc3, (\src)
+ addi t4, \src, 1
+ vle8.v \vsrc4, (t4)
+ addi t4, \src, 2
+ vle8.v \vsrc5, (t4)
+ addi t4, \src, 3
+ vle8.v \vsrc6, (t4)
+ addi t4, \src, 4
+ vle8.v \vsrc7, (t4)
+
+ vwmaccsu.vx \vdst, s2, \vsrc1
+ vwmaccsu.vx \vdst, s3, \vsrc2
+ vwmaccsu.vx \vdst, s4, \vsrc3
+ vwmaccsu.vx \vdst, s5, \vsrc4
+ vwmaccsu.vx \vdst, s6, \vsrc5
+ vwmaccsu.vx \vdst, s7, \vsrc6
+ vwmaccsu.vx \vdst, s8, \vsrc7
+.endm
+
+.macro vreg
+
+.endm
+
+.macro hevc_qpel_h lmul, lmul2, lmul4
+func ff_hevc_put_qpel_h_8_\lmul\()_rvv, zve32x
+ addi sp, sp, -64
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ sx s5, 32(sp)
+ sx s6, 40(sp)
+ sx s7, 48(sp)
+ sx s8, 56(sp)
+ load_filter a4
+ mv t3, a6
+ li t1, 0 # offset
+
+1:
+ vsetvli t6, t3, e8, \lmul, ta, ma
+ add t2, a1, t1
+ filter_h v0, v16, v18, v20, v22, v24, v26, v28, v30, t2
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ slli t2, t1, 1
+ add t2, a0, t2
+ vse16.v v0, (t2)
+ sub t3, t3, t6
+ add t1, t1, t6
+ bgt t3, zero, 1b
+ addi a3, a3, -1
+ mv t3, a6
+ add a1, a1, a2
+ addi a0, a0, 2*HEVC_MAX_PB_SIZE
+ li t1, 0
+ bgt a3, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ lx s5, 32(sp)
+ lx s6, 40(sp)
+ lx s7, 48(sp)
+ lx s8, 56(sp)
+ addi sp, sp, 64
+ ret
+endfunc
+
+func ff_hevc_put_qpel_uni_h_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+ addi sp, sp, -64
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ sx s5, 32(sp)
+ sx s6, 40(sp)
+ sx s7, 48(sp)
+ sx s8, 56(sp)
+ load_filter a5
+ mv t3, a7
+ li t1, 0 # offset
+
+1:
+ vsetvli t6, t3, e8, \lmul, ta, ma
+ add t2, a2, t1
+ filter_h v0, v16, v18, v20, v22, v24, v26, v28, v30, t2
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 6
+ add t2, a0, t1
+ vse8.v v0, (t2)
+ sub t3, t3, t6
+ add t1, t1, t6
+ bgt t3, zero, 1b
+ addi a4, a4, -1
+ mv t3, a7
+ add a2, a2, a3
+ add a0, a0, a1
+ li t1, 0
+ bgt a4, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ lx s5, 32(sp)
+ lx s6, 40(sp)
+ lx s7, 48(sp)
+ lx s8, 56(sp)
+ addi sp, sp, 64
+ ret
+endfunc
+
+func ff_hevc_put_qpel_uni_w_h_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+ lx t2, 0(sp) # mx
+ addi a5, a5, 6 # shift
+#if (__riscv_xlen == 32)
+ lw t3, 8(sp) # width
+#elif (__riscv_xlen == 64)
+ lw t3, 16(sp)
+#endif
+ addi sp, sp, -64
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ sx s5, 32(sp)
+ sx s6, 40(sp)
+ sx s7, 48(sp)
+ sx s8, 56(sp)
+ load_filter t2
+ li t2, 0 # offset
+
+1:
+ vsetvli t6, t3, e8, \lmul, ta, ma
+ add t1, a2, t2
+ filter_h v8, v16, v18, v20, v22, v24, v26, v28, v30, t1
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vwmul.vx v0, v8, a6
+ vsetvli zero, zero, e32, \lmul4, ta, ma
+ vssra.vx v0, v0, a5
+ vsadd.vx v0, v0, a7
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vnclip.wi v0, v0, 0
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 0
+ add t1, a0, t2
+ vse8.v v0, (t1)
+ sub t3, t3, t6
+ add t2, t2, t6
+ bgt t3, zero, 1b
+ addi a4, a4, -1
+#if (__riscv_xlen == 32)
+ lw t3, 72(sp)
+#elif (__riscv_xlen == 64)
+ ld t3, 80(sp)
+#endif
+ add a2, a2, a3
+ add a0, a0, a1
+ li t2, 0
+ bgt a4, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ lx s5, 32(sp)
+ lx s6, 40(sp)
+ lx s7, 48(sp)
+ lx s8, 56(sp)
+ addi sp, sp, 64
+ ret
+endfunc
+
+func ff_hevc_put_qpel_bi_h_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+ lw t3, 0(sp) # width
+ addi sp, sp, -64
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ sx s5, 32(sp)
+ sx s6, 40(sp)
+ sx s7, 48(sp)
+ sx s8, 56(sp)
+ load_filter a6
+ li t1, 0 # offset
+
+1:
+ vsetvli t6, t3, e16, \lmul2, ta, ma
+ slli t2, t1, 1
+ add t2, a4, t2
+ vle16.v v12, (t2)
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ add t2, a2, t1
+ filter_h v0, v16, v18, v20, v22, v24, v26, v28, v30, t2
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vsadd.vv v0, v0, v12
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 7
+ add t2, a0, t1
+ vse8.v v0, (t2)
+ sub t3, t3, t6
+ add t1, t1, t6
+ bgt t3, zero, 1b
+ addi a5, a5, -1
+ lw t3, 64(sp)
+ add a2, a2, a3
+ add a0, a0, a1
+ addi a4, a4, 2*HEVC_MAX_PB_SIZE
+ li t1, 0
+ bgt a5, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ lx s5, 32(sp)
+ lx s6, 40(sp)
+ lx s7, 48(sp)
+ lx s8, 56(sp)
+ addi sp, sp, 64
+ ret
+endfunc
+.endm
+
+hevc_qpel_h m1, m2, m4
\ No newline at end of file
diff --git a/libavcodec/riscv/hevcdsp_init.c b/libavcodec/riscv/hevcdsp_init.c
index 70bc8ebea7..59333740de 100644
--- a/libavcodec/riscv/hevcdsp_init.c
+++ b/libavcodec/riscv/hevcdsp_init.c
@@ -1,5 +1,6 @@
/*
* Copyright (c) 2024 Institute of Software Chinese Academy of Sciences (ISCAS).
+ * Copyright (C) 2026 Alibaba Group Holding Limited.
*
* This file is part of FFmpeg.
*
@@ -34,30 +35,56 @@
member[7][v][h] = ff_h2656_put_pixels_##8_##ext; \
member[9][v][h] = ff_h2656_put_pixels_##8_##ext;
+#define RVV_FNASSIGN_PEL(member, v, h, fn) \
+ member[1][v][h] = fn; \
+ member[2][v][h] = fn; \
+ member[3][v][h] = fn; \
+ member[4][v][h] = fn; \
+ member[5][v][h] = fn; \
+ member[6][v][h] = fn; \
+ member[7][v][h] = fn; \
+ member[8][v][h] = fn; \
+ member[9][v][h] = fn;
+
void ff_hevc_dsp_init_riscv(HEVCDSPContext *c, const int bit_depth)
{
#if HAVE_RVV
const int flags = av_get_cpu_flags();
int vlenb;
- if (!(flags & AV_CPU_FLAG_RVV_I32) || !(flags & AV_CPU_FLAG_RVB))
- return;
+ if ((flags & AV_CPU_FLAG_RVV_I32) && (flags & AV_CPU_FLAG_RVB)) {
+ vlenb = ff_get_rv_vlenb();
+ if (vlenb >= 32) {
+ switch (bit_depth) {
+ case 8:
+ RVV_FNASSIGN(c->put_hevc_qpel, 0, 0, pel_pixels, rvv_256);
+ RVV_FNASSIGN(c->put_hevc_epel, 0, 0, pel_pixels, rvv_256);
- vlenb = ff_get_rv_vlenb();
- if (vlenb >= 32) {
- switch (bit_depth) {
- case 8:
- RVV_FNASSIGN(c->put_hevc_qpel, 0, 0, pel_pixels, rvv_256);
- RVV_FNASSIGN(c->put_hevc_epel, 0, 0, pel_pixels, rvv_256);
- break;
- default:
- break;
+ break;
+ default:
+ break;
+ }
+ } else if (vlenb >= 16) {
+ switch (bit_depth) {
+ case 8:
+ RVV_FNASSIGN(c->put_hevc_qpel, 0, 0, pel_pixels, rvv_128);
+ RVV_FNASSIGN(c->put_hevc_epel, 0, 0, pel_pixels, rvv_128);
+
+ break;
+ default:
+ break;
+ }
}
- } else if (vlenb >= 16) {
+ }
+
+ if ((flags & AV_CPU_FLAG_RVV_I32)) {
switch (bit_depth) {
case 8:
- RVV_FNASSIGN(c->put_hevc_qpel, 0, 0, pel_pixels, rvv_128);
- RVV_FNASSIGN(c->put_hevc_epel, 0, 0, pel_pixels, rvv_128);
+ RVV_FNASSIGN_PEL(c->put_hevc_qpel, 0, 1, ff_hevc_put_qpel_h_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_qpel_uni, 0, 1, ff_hevc_put_qpel_uni_h_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_qpel_uni_w, 0, 1, ff_hevc_put_qpel_uni_w_h_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_qpel_bi, 0, 1, ff_hevc_put_qpel_bi_h_8_m1_rvv);
+
break;
default:
break;
--
2.25.1
_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org
^ permalink raw reply [flat|nested] 6+ messages in thread
* [FFmpeg-devel] [PATCH 2/6] libavcodec/riscv: add RVV optimized for qpel_v in HEVC.
2026-01-22 4:23 [FFmpeg-devel] [PATCH 1/6] libavcodec/riscv: add RVV optimized for qpel_h in HEVC zhanheng.yang--- via ffmpeg-devel
@ 2026-01-22 4:23 ` zhanheng.yang--- via ffmpeg-devel
2026-01-22 4:23 ` [FFmpeg-devel] [PATCH 3/6] libavcodec/riscv: add RVV optimized for epel_h " zhanheng.yang--- via ffmpeg-devel
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: zhanheng.yang--- via ffmpeg-devel @ 2026-01-22 4:23 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Zhanheng Yang
From: Zhanheng Yang <zhanheng.yang@linux.alibaba.com>
Bench on A210 C908 core(VLEN 128).
put_hevc_qpel_v4_8_c: 265.0 ( 1.00x)
put_hevc_qpel_v4_8_rvv_i32: 117.0 ( 2.26x)
put_hevc_qpel_v6_8_c: 568.8 ( 1.00x)
put_hevc_qpel_v6_8_rvv_i32: 162.3 ( 3.50x)
put_hevc_qpel_v8_8_c: 986.9 ( 1.00x)
put_hevc_qpel_v8_8_rvv_i32: 200.9 ( 4.91x)
put_hevc_qpel_v12_8_c: 2236.1 ( 1.00x)
put_hevc_qpel_v12_8_rvv_i32: 294.8 ( 7.58x)
put_hevc_qpel_v16_8_c: 3958.8 ( 1.00x)
put_hevc_qpel_v16_8_rvv_i32: 387.0 (10.23x)
put_hevc_qpel_v24_8_c: 8707.6 ( 1.00x)
put_hevc_qpel_v24_8_rvv_i32: 1096.5 ( 7.94x)
put_hevc_qpel_v32_8_c: 15392.3 ( 1.00x)
put_hevc_qpel_v32_8_rvv_i32: 1442.4 (10.67x)
put_hevc_qpel_v48_8_c: 34569.2 ( 1.00x)
put_hevc_qpel_v48_8_rvv_i32: 3197.1 (10.81x)
put_hevc_qpel_v64_8_c: 61109.7 ( 1.00x)
put_hevc_qpel_v64_8_rvv_i32: 5642.4 (10.83x)
put_hevc_qpel_uni_v4_8_c: 354.9 ( 1.00x)
put_hevc_qpel_uni_v4_8_rvv_i32: 131.3 ( 2.70x)
put_hevc_qpel_uni_v6_8_c: 769.3 ( 1.00x)
put_hevc_qpel_uni_v6_8_rvv_i32: 180.8 ( 4.25x)
put_hevc_qpel_uni_v8_8_c: 1399.3 ( 1.00x)
put_hevc_qpel_uni_v8_8_rvv_i32: 223.6 ( 6.26x)
put_hevc_qpel_uni_v12_8_c: 3031.4 ( 1.00x)
put_hevc_qpel_uni_v12_8_rvv_i32: 323.2 ( 9.38x)
put_hevc_qpel_uni_v16_8_c: 5334.2 ( 1.00x)
put_hevc_qpel_uni_v16_8_rvv_i32: 417.9 (12.76x)
put_hevc_qpel_uni_v24_8_c: 11908.4 ( 1.00x)
put_hevc_qpel_uni_v24_8_rvv_i32: 1212.2 ( 9.82x)
put_hevc_qpel_uni_v32_8_c: 21030.6 ( 1.00x)
put_hevc_qpel_uni_v32_8_rvv_i32: 1579.5 (13.31x)
put_hevc_qpel_uni_v48_8_c: 47025.7 ( 1.00x)
put_hevc_qpel_uni_v48_8_rvv_i32: 3500.2 (13.43x)
put_hevc_qpel_uni_v64_8_c: 83487.0 ( 1.00x)
put_hevc_qpel_uni_v64_8_rvv_i32: 6188.4 (13.49x)
put_hevc_qpel_uni_w_v4_8_c: 396.3 ( 1.00x)
put_hevc_qpel_uni_w_v4_8_rvv_i32: 200.9 ( 1.97x)
put_hevc_qpel_uni_w_v6_8_c: 851.4 ( 1.00x)
put_hevc_qpel_uni_w_v6_8_rvv_i32: 282.1 ( 3.02x)
put_hevc_qpel_uni_w_v8_8_c: 1544.0 ( 1.00x)
put_hevc_qpel_uni_w_v8_8_rvv_i32: 356.5 ( 4.33x)
put_hevc_qpel_uni_w_v12_8_c: 3329.0 ( 1.00x)
put_hevc_qpel_uni_w_v12_8_rvv_i32: 519.6 ( 6.41x)
put_hevc_qpel_uni_w_v16_8_c: 5857.9 ( 1.00x)
put_hevc_qpel_uni_w_v16_8_rvv_i32: 679.6 ( 8.62x)
put_hevc_qpel_uni_w_v24_8_c: 13050.5 ( 1.00x)
put_hevc_qpel_uni_w_v24_8_rvv_i32: 1965.5 ( 6.64x)
put_hevc_qpel_uni_w_v32_8_c: 23219.4 ( 1.00x)
put_hevc_qpel_uni_w_v32_8_rvv_i32: 2601.6 ( 8.93x)
put_hevc_qpel_uni_w_v48_8_c: 51925.3 ( 1.00x)
put_hevc_qpel_uni_w_v48_8_rvv_i32: 5786.7 ( 8.97x)
put_hevc_qpel_uni_w_v64_8_c: 92075.5 ( 1.00x)
put_hevc_qpel_uni_w_v64_8_rvv_i32: 10269.8 ( 8.97x)
put_hevc_qpel_bi_v4_8_c: 376.4 ( 1.00x)
put_hevc_qpel_bi_v4_8_rvv_i32: 150.2 ( 2.51x)
put_hevc_qpel_bi_v6_8_c: 808.3 ( 1.00x)
put_hevc_qpel_bi_v6_8_rvv_i32: 207.1 ( 3.90x)
put_hevc_qpel_bi_v8_8_c: 1490.1 ( 1.00x)
put_hevc_qpel_bi_v8_8_rvv_i32: 257.2 ( 5.79x)
put_hevc_qpel_bi_v12_8_c: 3220.3 ( 1.00x)
put_hevc_qpel_bi_v12_8_rvv_i32: 375.2 ( 8.58x)
put_hevc_qpel_bi_v16_8_c: 5657.5 ( 1.00x)
put_hevc_qpel_bi_v16_8_rvv_i32: 482.5 (11.72x)
put_hevc_qpel_bi_v24_8_c: 12495.4 ( 1.00x)
put_hevc_qpel_bi_v24_8_rvv_i32: 1383.8 ( 9.03x)
put_hevc_qpel_bi_v32_8_c: 22191.6 ( 1.00x)
put_hevc_qpel_bi_v32_8_rvv_i32: 1822.0 (12.18x)
put_hevc_qpel_bi_v48_8_c: 49654.0 ( 1.00x)
put_hevc_qpel_bi_v48_8_rvv_i32: 4046.8 (12.27x)
put_hevc_qpel_bi_v64_8_c: 88287.8 ( 1.00x)
put_hevc_qpel_bi_v64_8_rvv_i32: 7196.6 (12.27x)
Signed-off-by: Zhanheng Yang <zhanheng.yang@linux.alibaba.com>
---
libavcodec/riscv/h26x/h2656dsp.h | 11 +
libavcodec/riscv/h26x/hevcqpel_rvv.S | 315 ++++++++++++++++++++++++++-
libavcodec/riscv/hevcdsp_init.c | 5 +
3 files changed, 330 insertions(+), 1 deletion(-)
diff --git a/libavcodec/riscv/h26x/h2656dsp.h b/libavcodec/riscv/h26x/h2656dsp.h
index 028b9ffbfd..2dabc16aee 100644
--- a/libavcodec/riscv/h26x/h2656dsp.h
+++ b/libavcodec/riscv/h26x/h2656dsp.h
@@ -36,4 +36,15 @@ void ff_hevc_put_qpel_uni_w_h_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride,
void ff_hevc_put_qpel_bi_h_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src,
ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t
mx, intptr_t my, int width);
+void ff_hevc_put_qpel_v_8_m1_rvv(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height,
+ intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_qpel_uni_v_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src,
+ ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_qpel_uni_w_v_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride,
+ const uint8_t *_src, ptrdiff_t _srcstride,
+ int height, int denom, int wx, int ox,
+ intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_qpel_bi_v_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src,
+ ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t
+ mx, intptr_t my, int width);
#endif
diff --git a/libavcodec/riscv/h26x/hevcqpel_rvv.S b/libavcodec/riscv/h26x/hevcqpel_rvv.S
index 52d7acac33..8fd3c47bcc 100644
--- a/libavcodec/riscv/h26x/hevcqpel_rvv.S
+++ b/libavcodec/riscv/h26x/hevcqpel_rvv.S
@@ -306,4 +306,317 @@ func ff_hevc_put_qpel_bi_h_8_\lmul\()_rvv, zve32x
endfunc
.endm
-hevc_qpel_h m1, m2, m4
\ No newline at end of file
+hevc_qpel_h m1, m2, m4
+
+/* output is unclipped; clobbers v4 */
+.macro filter_v vdst, vsrc0, vsrc1, vsrc2, vsrc3, vsrc4, vsrc5, vsrc6, vsrc7
+ vmv.v.x v4, s1
+ vwmulsu.vv \vdst, v4, \vsrc0
+ vwmaccsu.vx \vdst, s2, \vsrc1
+ vmv.v.v \vsrc0, \vsrc1
+ vwmaccsu.vx \vdst, s3, \vsrc2
+ vmv.v.v \vsrc1, \vsrc2
+ vwmaccsu.vx \vdst, s4, \vsrc3
+ vmv.v.v \vsrc2, \vsrc3
+ vwmaccsu.vx \vdst, s5, \vsrc4
+ vmv.v.v \vsrc3, \vsrc4
+ vwmaccsu.vx \vdst, s6, \vsrc5
+ vmv.v.v \vsrc4, \vsrc5
+ vwmaccsu.vx \vdst, s7, \vsrc6
+ vmv.v.v \vsrc5, \vsrc6
+ vwmaccsu.vx \vdst, s8, \vsrc7
+ vmv.v.v \vsrc6, \vsrc7
+.endm
+
+.macro hevc_qpel_v lmul, lmul2, lmul4
+func ff_hevc_put_qpel_v_8_\lmul\()_rvv, zve32x
+ addi sp, sp, -64
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ sx s5, 32(sp)
+ sx s6, 40(sp)
+ sx s7, 48(sp)
+ sx s8, 56(sp)
+ load_filter a5
+ slli t1, a2, 1
+ add t1, t1, a2
+ sub a1, a1, t1 # src - 3 * src_stride
+ li t1, 0 # offset
+ mv t4, a3
+
+1:
+ add t2, a1, t1
+ slli t3, t1, 1
+ add t3, a0, t3
+
+ vsetvli t5, a6, e8, \lmul, ta, ma
+ vle8.V v16, (t2)
+ add t2, t2, a2
+ vle8.V v18, (t2)
+ add t2, t2, a2
+ vle8.V v20, (t2)
+ add t2, t2, a2
+ vle8.V v22, (t2)
+ add t2, t2, a2
+ vle8.V v24, (t2)
+ add t2, t2, a2
+ vle8.V v26, (t2)
+ add t2, t2, a2
+ vle8.V v28, (t2)
+ add t2, t2, a2
+
+2:
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vle8.V v30, (t2)
+ add t2, t2, a2
+ filter_v v0, v16, v18, v20, v22, v24, v26, v28, v30
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vse16.v v0, (t3)
+ add t3, t3, 2*HEVC_MAX_PB_SIZE
+ addi a3, a3, -1
+ bgt a3, zero, 2b
+ add t1, t1, t5
+ sub a6, a6, t5
+ mv a3, t4
+ bgt a6, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ lx s5, 32(sp)
+ lx s6, 40(sp)
+ lx s7, 48(sp)
+ lx s8, 56(sp)
+ addi sp, sp, 64
+ ret
+endfunc
+
+func ff_hevc_put_qpel_uni_v_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+ addi sp, sp, -64
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ sx s5, 32(sp)
+ sx s6, 40(sp)
+ sx s7, 48(sp)
+ sx s8, 56(sp)
+ load_filter a6
+ slli t1, a3, 1
+ add t1, t1, a3
+ sub a2, a2, t1 # src - 3 * src_stride
+ li t1, 0 # offset
+ mv t4, a4
+
+1:
+ add t2, a2, t1
+ add t3, a0, t1
+
+ vsetvli t5, a7, e8, \lmul, ta, ma
+ vle8.V v16, (t2)
+ add t2, t2, a3
+ vle8.V v18, (t2)
+ add t2, t2, a3
+ vle8.V v20, (t2)
+ add t2, t2, a3
+ vle8.V v22, (t2)
+ add t2, t2, a3
+ vle8.V v24, (t2)
+ add t2, t2, a3
+ vle8.V v26, (t2)
+ add t2, t2, a3
+ vle8.V v28, (t2)
+ add t2, t2, a3
+
+2:
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vle8.V v30, (t2)
+ add t2, t2, a3
+ filter_v v0, v16, v18, v20, v22, v24, v26, v28, v30
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 6
+ vse8.v v0, (t3)
+ add t3, t3, a1
+ addi a4, a4, -1
+ bgt a4, zero, 2b
+ add t1, t1, t5
+ sub a7, a7, t5
+ mv a4, t4
+ bgt a7, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ lx s5, 32(sp)
+ lx s6, 40(sp)
+ lx s7, 48(sp)
+ lx s8, 56(sp)
+ addi sp, sp, 64
+ ret
+endfunc
+
+func ff_hevc_put_qpel_uni_w_v_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+#if (__riscv_xlen == 32)
+ lw t1, 4(sp) # my
+ lw t6, 8(sp) # width
+#elif (__riscv_xlen == 64)
+ ld t1, 8(sp)
+ lw t6, 16(sp)
+#endif
+ addi sp, sp, -64
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ sx s5, 32(sp)
+ sx s6, 40(sp)
+ sx s7, 48(sp)
+ sx s8, 56(sp)
+ load_filter t1
+ addi a5, a5, 6 # shift
+ slli t1, a3, 1
+ add t1, t1, a3
+ sub a2, a2, t1 # src - 3 * src_stride
+ li t1, 0 # offset
+ mv t4, a4
+
+1:
+ add t2, a2, t1
+ add t3, a0, t1
+
+ vsetvli t5, t6, e8, \lmul, ta, ma
+ vle8.V v16, (t2)
+ add t2, t2, a3
+ vle8.V v18, (t2)
+ add t2, t2, a3
+ vle8.V v20, (t2)
+ add t2, t2, a3
+ vle8.V v22, (t2)
+ add t2, t2, a3
+ vle8.V v24, (t2)
+ add t2, t2, a3
+ vle8.V v26, (t2)
+ add t2, t2, a3
+ vle8.V v28, (t2)
+ add t2, t2, a3
+
+2:
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vle8.V v30, (t2)
+ add t2, t2, a3
+ filter_v v0, v16, v18, v20, v22, v24, v26, v28, v30
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vwmul.vx v8, v0, a6
+ vsetvli zero, zero, e32, \lmul4, ta, ma
+ vssra.vx v0, v8, a5
+ vsadd.vx v0, v0, a7
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vnclip.wi v0, v0, 0
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 0
+ vse8.v v0, (t3)
+ add t3, t3, a1
+ addi a4, a4, -1
+ bgt a4, zero, 2b
+ add t1, t1, t5
+ sub t6, t6, t5
+ mv a4, t4
+ bgt t6, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ lx s5, 32(sp)
+ lx s6, 40(sp)
+ lx s7, 48(sp)
+ lx s8, 56(sp)
+ addi sp, sp, 64
+ ret
+endfunc
+
+func ff_hevc_put_qpel_bi_v_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+ lw t6, 0(sp) # width
+ addi sp, sp, -64
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ sx s5, 32(sp)
+ sx s6, 40(sp)
+ sx s7, 48(sp)
+ sx s8, 56(sp)
+ load_filter a7
+ slli t1, a3, 1
+ add t1, t1, a3
+ sub a2, a2, t1 # src - 3 * src_stride
+ li t1, 0 # offset
+ mv t4, a5
+
+1:
+ add t2, a2, t1
+ add t3, a0, t1
+ slli t0, t1, 1
+ add t0, a4, t0
+
+ vsetvli t5, t6, e8, \lmul, ta, ma
+ vle8.V v16, (t2)
+ add t2, t2, a3
+ vle8.V v18, (t2)
+ add t2, t2, a3
+ vle8.V v20, (t2)
+ add t2, t2, a3
+ vle8.V v22, (t2)
+ add t2, t2, a3
+ vle8.V v24, (t2)
+ add t2, t2, a3
+ vle8.V v26, (t2)
+ add t2, t2, a3
+ vle8.V v28, (t2)
+ add t2, t2, a3
+
+2:
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vle8.V v30, (t2)
+ add t2, t2, a3
+ filter_v v0, v16, v18, v20, v22, v24, v26, v28, v30
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vle16.v v8, (t0)
+ addi t0, t0, 2*HEVC_MAX_PB_SIZE
+ vsadd.vv v0, v0, v8
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 7
+ vse8.v v0, (t3)
+ add t3, t3, a1
+ addi a5, a5, -1
+ bgt a5, zero, 2b
+ add t1, t1, t5
+ sub t6, t6, t5
+ mv a5, t4
+ bgt t6, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ lx s5, 32(sp)
+ lx s6, 40(sp)
+ lx s7, 48(sp)
+ lx s8, 56(sp)
+ addi sp, sp, 64
+ ret
+endfunc
+.endm
+
+hevc_qpel_v m1, m2, m4
\ No newline at end of file
diff --git a/libavcodec/riscv/hevcdsp_init.c b/libavcodec/riscv/hevcdsp_init.c
index 59333740de..480cfd2968 100644
--- a/libavcodec/riscv/hevcdsp_init.c
+++ b/libavcodec/riscv/hevcdsp_init.c
@@ -84,6 +84,11 @@ void ff_hevc_dsp_init_riscv(HEVCDSPContext *c, const int bit_depth)
RVV_FNASSIGN_PEL(c->put_hevc_qpel_uni, 0, 1, ff_hevc_put_qpel_uni_h_8_m1_rvv);
RVV_FNASSIGN_PEL(c->put_hevc_qpel_uni_w, 0, 1, ff_hevc_put_qpel_uni_w_h_8_m1_rvv);
RVV_FNASSIGN_PEL(c->put_hevc_qpel_bi, 0, 1, ff_hevc_put_qpel_bi_h_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_qpel_bi, 0, 1, ff_hevc_put_qpel_bi_h_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_qpel, 1, 0, ff_hevc_put_qpel_v_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_qpel_uni, 1, 0, ff_hevc_put_qpel_uni_v_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_qpel_uni_w, 1, 0, ff_hevc_put_qpel_uni_w_v_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_qpel_bi, 1, 0, ff_hevc_put_qpel_bi_v_8_m1_rvv);
break;
default:
--
2.25.1
_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org
^ permalink raw reply [flat|nested] 6+ messages in thread
* [FFmpeg-devel] [PATCH 3/6] libavcodec/riscv: add RVV optimized for epel_h in HEVC.
2026-01-22 4:23 [FFmpeg-devel] [PATCH 1/6] libavcodec/riscv: add RVV optimized for qpel_h in HEVC zhanheng.yang--- via ffmpeg-devel
2026-01-22 4:23 ` [FFmpeg-devel] [PATCH 2/6] libavcodec/riscv: add RVV optimized for qpel_v " zhanheng.yang--- via ffmpeg-devel
@ 2026-01-22 4:23 ` zhanheng.yang--- via ffmpeg-devel
2026-01-22 4:23 ` [FFmpeg-devel] [PATCH 4/6] libavcodec/riscv: add RVV optimized for epel_v " zhanheng.yang--- via ffmpeg-devel
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: zhanheng.yang--- via ffmpeg-devel @ 2026-01-22 4:23 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Zhanheng Yang
From: Zhanheng Yang <zhanheng.yang@linux.alibaba.com>
Bench on A210 C908 core(VLEN 128).
put_hevc_epel_h4_8_c: 146.2 ( 1.00x)
put_hevc_epel_h4_8_rvv_i32: 81.8 ( 1.79x)
put_hevc_epel_h6_8_c: 305.4 ( 1.00x)
put_hevc_epel_h6_8_rvv_i32: 115.5 ( 2.65x)
put_hevc_epel_h8_8_c: 532.7 ( 1.00x)
put_hevc_epel_h8_8_rvv_i32: 156.7 ( 3.40x)
put_hevc_epel_h12_8_c: 1233.8 ( 1.00x)
put_hevc_epel_h12_8_rvv_i32: 225.7 ( 5.47x)
put_hevc_epel_h16_8_c: 2223.8 ( 1.00x)
put_hevc_epel_h16_8_rvv_i32: 296.2 ( 7.51x)
put_hevc_epel_h24_8_c: 4739.4 ( 1.00x)
put_hevc_epel_h24_8_rvv_i32: 800.7 ( 5.92x)
put_hevc_epel_h32_8_c: 8344.4 ( 1.00x)
put_hevc_epel_h32_8_rvv_i32: 1066.0 ( 7.83x)
put_hevc_epel_h48_8_c: 18595.3 ( 1.00x)
put_hevc_epel_h48_8_rvv_i32: 2324.3 ( 8.00x)
put_hevc_epel_h64_8_c: 32911.2 ( 1.00x)
put_hevc_epel_h64_8_rvv_i32: 4079.8 ( 8.07x)
put_hevc_epel_uni_h4_8_c: 225.1 ( 1.00x)
put_hevc_epel_uni_h4_8_rvv_i32: 99.0 ( 2.27x)
put_hevc_epel_uni_h6_8_c: 500.0 ( 1.00x)
put_hevc_epel_uni_h6_8_rvv_i32: 138.1 ( 3.62x)
put_hevc_epel_uni_h8_8_c: 895.6 ( 1.00x)
put_hevc_epel_uni_h8_8_rvv_i32: 186.3 ( 4.81x)
put_hevc_epel_uni_h12_8_c: 1925.0 ( 1.00x)
put_hevc_epel_uni_h12_8_rvv_i32: 264.4 ( 7.28x)
put_hevc_epel_uni_h16_8_c: 3372.3 ( 1.00x)
put_hevc_epel_uni_h16_8_rvv_i32: 342.7 ( 9.84x)
put_hevc_epel_uni_h24_8_c: 7501.4 ( 1.00x)
put_hevc_epel_uni_h24_8_rvv_i32: 935.6 ( 8.02x)
put_hevc_epel_uni_h32_8_c: 13232.0 ( 1.00x)
put_hevc_epel_uni_h32_8_rvv_i32: 1240.0 (10.67x)
put_hevc_epel_uni_h48_8_c: 29608.1 ( 1.00x)
put_hevc_epel_uni_h48_8_rvv_i32: 2710.5 (10.92x)
put_hevc_epel_uni_h64_8_c: 52452.8 ( 1.00x)
put_hevc_epel_uni_h64_8_rvv_i32: 4775.5 (10.98x)
put_hevc_epel_uni_w_h4_8_c: 298.5 ( 1.00x)
put_hevc_epel_uni_w_h4_8_rvv_i32: 176.6 ( 1.69x)
put_hevc_epel_uni_w_h6_8_c: 645.3 ( 1.00x)
put_hevc_epel_uni_w_h6_8_rvv_i32: 254.9 ( 2.53x)
put_hevc_epel_uni_w_h8_8_c: 1187.0 ( 1.00x)
put_hevc_epel_uni_w_h8_8_rvv_i32: 335.3 ( 3.54x)
put_hevc_epel_uni_w_h12_8_c: 2535.6 ( 1.00x)
put_hevc_epel_uni_w_h12_8_rvv_i32: 487.8 ( 5.20x)
put_hevc_epel_uni_w_h16_8_c: 4491.0 ( 1.00x)
put_hevc_epel_uni_w_h16_8_rvv_i32: 641.8 ( 7.00x)
put_hevc_epel_uni_w_h24_8_c: 9974.7 ( 1.00x)
put_hevc_epel_uni_w_h24_8_rvv_i32: 1791.4 ( 5.57x)
put_hevc_epel_uni_w_h32_8_c: 17646.1 ( 1.00x)
put_hevc_epel_uni_w_h32_8_rvv_i32: 2379.0 ( 7.42x)
put_hevc_epel_uni_w_h48_8_c: 39569.2 ( 1.00x)
put_hevc_epel_uni_w_h48_8_rvv_i32: 5226.0 ( 7.57x)
put_hevc_epel_uni_w_h64_8_c: 70274.5 ( 1.00x)
put_hevc_epel_uni_w_h64_8_rvv_i32: 9214.3 ( 7.63x)
put_hevc_epel_bi_h4_8_c: 234.5 ( 1.00x)
put_hevc_epel_bi_h4_8_rvv_i32: 128.3 ( 1.83x)
put_hevc_epel_bi_h6_8_c: 505.0 ( 1.00x)
put_hevc_epel_bi_h6_8_rvv_i32: 177.1 ( 2.85x)
put_hevc_epel_bi_h8_8_c: 958.2 ( 1.00x)
put_hevc_epel_bi_h8_8_rvv_i32: 235.2 ( 4.07x)
put_hevc_epel_bi_h12_8_c: 2001.0 ( 1.00x)
put_hevc_epel_bi_h12_8_rvv_i32: 338.5 ( 5.91x)
put_hevc_epel_bi_h16_8_c: 3510.2 ( 1.00x)
put_hevc_epel_bi_h16_8_rvv_i32: 446.5 ( 7.86x)
put_hevc_epel_bi_h24_8_c: 7803.2 ( 1.00x)
put_hevc_epel_bi_h24_8_rvv_i32: 1189.6 ( 6.56x)
put_hevc_epel_bi_h32_8_c: 13764.5 ( 1.00x)
put_hevc_epel_bi_h32_8_rvv_i32: 1579.3 ( 8.72x)
put_hevc_epel_bi_h48_8_c: 30827.4 ( 1.00x)
put_hevc_epel_bi_h48_8_rvv_i32: 3422.3 ( 9.01x)
put_hevc_epel_bi_h64_8_c: 54715.6 ( 1.00x)
put_hevc_epel_bi_h64_8_rvv_i32: 6059.8 ( 9.03x)
Signed-off-by: Zhanheng Yang <zhanheng.yang@linux.alibaba.com>
---
libavcodec/riscv/Makefile | 3 +-
libavcodec/riscv/h26x/h2656dsp.h | 12 ++
libavcodec/riscv/h26x/hevcepel_rvv.S | 265 +++++++++++++++++++++++++++
libavcodec/riscv/hevcdsp_init.c | 4 +
4 files changed, 283 insertions(+), 1 deletion(-)
create mode 100644 libavcodec/riscv/h26x/hevcepel_rvv.S
diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile
index 414790ae0c..bf65e827e7 100644
--- a/libavcodec/riscv/Makefile
+++ b/libavcodec/riscv/Makefile
@@ -37,7 +37,8 @@ OBJS-$(CONFIG_H264QPEL) += riscv/h264qpel_init.o
RVV-OBJS-$(CONFIG_H264QPEL) += riscv/h264qpel_rvv.o
OBJS-$(CONFIG_HEVC_DECODER) += riscv/hevcdsp_init.o
RVV-OBJS-$(CONFIG_HEVC_DECODER) += riscv/h26x/h2656_inter_rvv.o \
- riscv/h26x/hevcqpel_rvv.o
+ riscv/h26x/hevcqpel_rvv.o \
+ riscv/h26x/hevcepel_rvv.o
OBJS-$(CONFIG_HUFFYUV_DECODER) += riscv/huffyuvdsp_init.o
RVV-OBJS-$(CONFIG_HUFFYUV_DECODER) += riscv/huffyuvdsp_rvv.o
OBJS-$(CONFIG_IDCTDSP) += riscv/idctdsp_init.o
diff --git a/libavcodec/riscv/h26x/h2656dsp.h b/libavcodec/riscv/h26x/h2656dsp.h
index 2dabc16aee..fa2f5a88e3 100644
--- a/libavcodec/riscv/h26x/h2656dsp.h
+++ b/libavcodec/riscv/h26x/h2656dsp.h
@@ -47,4 +47,16 @@ void ff_hevc_put_qpel_uni_w_v_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride,
void ff_hevc_put_qpel_bi_v_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src,
ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t
mx, intptr_t my, int width);
+
+void ff_hevc_put_epel_h_8_m1_rvv(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height,
+ intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_epel_uni_h_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src,
+ ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_epel_uni_w_h_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride,
+ const uint8_t *_src, ptrdiff_t _srcstride,
+ int height, int denom, int wx, int ox,
+ intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_epel_bi_h_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src,
+ ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t
+ mx, intptr_t my, int width);
#endif
diff --git a/libavcodec/riscv/h26x/hevcepel_rvv.S b/libavcodec/riscv/h26x/hevcepel_rvv.S
new file mode 100644
index 0000000000..81044846f7
--- /dev/null
+++ b/libavcodec/riscv/h26x/hevcepel_rvv.S
@@ -0,0 +1,265 @@
+ /*
+ * Copyright (C) 2026 Alibaba Group Holding Limited.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+.data
+.align 2
+qpel_filters:
+ .byte 0, 0, 0, 0
+ .byte -2, 58, 10, -2
+ .byte -4, 54, 16, -2
+ .byte -6, 46, 28, -4
+ .byte -4, 36, 36, -4
+ .byte -4, 28, 46, -6
+ .byte -2, 16, 54, -4
+ .byte -2, 10, 58, -2
+
+.text
+#include "libavutil/riscv/asm.S"
+#define HEVC_MAX_PB_SIZE 64
+
+.macro lx rd, addr
+#if (__riscv_xlen == 32)
+ lw \rd, \addr
+#elif (__riscv_xlen == 64)
+ ld \rd, \addr
+#else
+ lq \rd, \addr
+#endif
+.endm
+
+.macro sx rd, addr
+#if (__riscv_xlen == 32)
+ sw \rd, \addr
+#elif (__riscv_xlen == 64)
+ sd \rd, \addr
+#else
+ sq \rd, \addr
+#endif
+.endm
+
+/* clobbers t0, t1 */
+.macro load_filter m
+ la t0, qpel_filters
+ slli t1, \m, 2
+ add t0, t0, t1
+ lb s1, 0(t0)
+ lb s2, 1(t0)
+ lb s3, 2(t0)
+ lb s4, 3(t0)
+.endm
+
+/* output is unclipped; clobbers t4 */
+.macro filter_h vdst, vsrc0, vsrc1, vsrc2, vsrc3, src
+ addi t4, \src, -1
+ vle8.v \vsrc0, (t4)
+ vmv.v.x \vsrc3, s1
+ vwmulsu.vv \vdst, \vsrc3, \vsrc0
+ vle8.v \vsrc1, (\src)
+ addi t4, \src, 1
+ vle8.v \vsrc2, (t4)
+ addi t4, \src, 2
+ vle8.v \vsrc3, (t4)
+
+ vwmaccsu.vx \vdst, s2, \vsrc1
+ vwmaccsu.vx \vdst, s3, \vsrc2
+ vwmaccsu.vx \vdst, s4, \vsrc3
+.endm
+
+.macro vreg
+
+.endm
+
+.macro hevc_epel_h lmul, lmul2, lmul4
+func ff_hevc_put_epel_h_8_\lmul\()_rvv, zve32x
+ addi sp, sp, -32
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ load_filter a4
+ mv t3, a6
+ li t1, 0 # offset
+
+1:
+ vsetvli t6, t3, e8, \lmul, ta, ma
+ add t2, a1, t1
+ filter_h v0, v16, v18, v20, v22, t2
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ slli t2, t1, 1
+ add t2, a0, t2
+ vse16.v v0, (t2)
+ sub t3, t3, t6
+ add t1, t1, t6
+ bgt t3, zero, 1b
+ addi a3, a3, -1
+ mv t3, a6
+ add a1, a1, a2
+ addi a0, a0, 2*HEVC_MAX_PB_SIZE
+ li t1, 0
+ bgt a3, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ addi sp, sp, 32
+ ret
+endfunc
+
+func ff_hevc_put_epel_uni_h_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+ addi sp, sp, -32
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ load_filter a5
+ mv t3, a7
+ li t1, 0 # offset
+
+1:
+ vsetvli t6, t3, e8, \lmul, ta, ma
+ add t2, a2, t1
+ filter_h v0, v16, v18, v20, v22, t2
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 6
+ add t2, a0, t1
+ vse8.v v0, (t2)
+ sub t3, t3, t6
+ add t1, t1, t6
+ bgt t3, zero, 1b
+ addi a4, a4, -1
+ mv t3, a7
+ add a2, a2, a3
+ add a0, a0, a1
+ li t1, 0
+ bgt a4, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ addi sp, sp, 32
+ ret
+endfunc
+
+func ff_hevc_put_epel_uni_w_h_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+ lx t2, 0(sp) # mx
+ addi a5, a5, 6 # shift
+#if (__riscv_xlen == 32)
+ lw t3, 8(sp) # width
+#elif (__riscv_xlen == 64)
+ lw t3, 16(sp)
+#endif
+ addi sp, sp, -32
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ load_filter t2
+ li t2, 0 # offset
+
+1:
+ vsetvli t6, t3, e8, \lmul, ta, ma
+ add t1, a2, t2
+ filter_h v8, v16, v18, v20, v22, t1
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vwmul.vx v0, v8, a6
+ vsetvli zero, zero, e32, \lmul4, ta, ma
+ vssra.vx v0, v0, a5
+ vsadd.vx v0, v0, a7
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vnclip.wi v0, v0, 0
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 0
+ add t1, a0, t2
+ vse8.v v0, (t1)
+ sub t3, t3, t6
+ add t2, t2, t6
+ bgt t3, zero, 1b
+ addi a4, a4, -1
+#if (__riscv_xlen == 32)
+ lw t3, 40(sp)
+#elif (__riscv_xlen == 64)
+ ld t3, 48(sp)
+#endif
+ add a2, a2, a3
+ add a0, a0, a1
+ li t2, 0
+ bgt a4, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ addi sp, sp, 32
+ ret
+endfunc
+
+func ff_hevc_put_epel_bi_h_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+ lw t3, 0(sp) # width
+ addi sp, sp, -32
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ load_filter a6
+ li t1, 0 # offset
+
+1:
+ vsetvli t6, t3, e16, \lmul2, ta, ma
+ slli t2, t1, 1
+ add t2, a4, t2
+ vle16.v v12, (t2)
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ add t2, a2, t1
+ filter_h v0, v16, v18, v20, v22, t2
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vsadd.vv v0, v0, v12
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 7
+ add t2, a0, t1
+ vse8.v v0, (t2)
+ sub t3, t3, t6
+ add t1, t1, t6
+ bgt t3, zero, 1b
+ addi a5, a5, -1
+ lw t3, 32(sp)
+ add a2, a2, a3
+ add a0, a0, a1
+ addi a4, a4, 2*HEVC_MAX_PB_SIZE
+ li t1, 0
+ bgt a5, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ addi sp, sp, 32
+ ret
+endfunc
+.endm
+
+hevc_epel_h m1, m2, m4
\ No newline at end of file
diff --git a/libavcodec/riscv/hevcdsp_init.c b/libavcodec/riscv/hevcdsp_init.c
index 480cfd2968..8608fdbd19 100644
--- a/libavcodec/riscv/hevcdsp_init.c
+++ b/libavcodec/riscv/hevcdsp_init.c
@@ -90,6 +90,10 @@ void ff_hevc_dsp_init_riscv(HEVCDSPContext *c, const int bit_depth)
RVV_FNASSIGN_PEL(c->put_hevc_qpel_uni_w, 1, 0, ff_hevc_put_qpel_uni_w_v_8_m1_rvv);
RVV_FNASSIGN_PEL(c->put_hevc_qpel_bi, 1, 0, ff_hevc_put_qpel_bi_v_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_epel, 0, 1, ff_hevc_put_epel_h_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_epel_uni, 0, 1, ff_hevc_put_epel_uni_h_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_epel_uni_w, 0, 1, ff_hevc_put_epel_uni_w_h_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_epel_bi, 0, 1, ff_hevc_put_epel_bi_h_8_m1_rvv);
break;
default:
break;
--
2.25.1
_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org
^ permalink raw reply [flat|nested] 6+ messages in thread
* [FFmpeg-devel] [PATCH 4/6] libavcodec/riscv: add RVV optimized for epel_v in HEVC.
2026-01-22 4:23 [FFmpeg-devel] [PATCH 1/6] libavcodec/riscv: add RVV optimized for qpel_h in HEVC zhanheng.yang--- via ffmpeg-devel
2026-01-22 4:23 ` [FFmpeg-devel] [PATCH 2/6] libavcodec/riscv: add RVV optimized for qpel_v " zhanheng.yang--- via ffmpeg-devel
2026-01-22 4:23 ` [FFmpeg-devel] [PATCH 3/6] libavcodec/riscv: add RVV optimized for epel_h " zhanheng.yang--- via ffmpeg-devel
@ 2026-01-22 4:23 ` zhanheng.yang--- via ffmpeg-devel
2026-01-22 4:23 ` [FFmpeg-devel] [PATCH 5/6] libavcodec/riscv: add RVV optimized for qpel_hv " zhanheng.yang--- via ffmpeg-devel
2026-01-22 4:23 ` [FFmpeg-devel] [PATCH 6/6] libavcodec/riscv: add RVV optimized for epel_hv " zhanheng.yang--- via ffmpeg-devel
4 siblings, 0 replies; 6+ messages in thread
From: zhanheng.yang--- via ffmpeg-devel @ 2026-01-22 4:23 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Zhanheng Yang
From: Zhanheng Yang <zhanheng.yang@linux.alibaba.com>
Bench on A210 C908 core(VLEN 128).
put_hevc_epel_v4_8_c: 157.8 ( 1.00x)
put_hevc_epel_v4_8_rvv_i32: 73.2 ( 2.16x)
put_hevc_epel_v6_8_c: 314.6 ( 1.00x)
put_hevc_epel_v6_8_rvv_i32: 101.2 ( 3.11x)
put_hevc_epel_v8_8_c: 545.5 ( 1.00x)
put_hevc_epel_v8_8_rvv_i32: 124.4 ( 4.39x)
put_hevc_epel_v12_8_c: 1240.8 ( 1.00x)
put_hevc_epel_v12_8_rvv_i32: 183.6 ( 6.76x)
put_hevc_epel_v16_8_c: 2170.7 ( 1.00x)
put_hevc_epel_v16_8_rvv_i32: 235.1 ( 9.23x)
put_hevc_epel_v24_8_c: 4743.5 ( 1.00x)
put_hevc_epel_v24_8_rvv_i32: 677.5 ( 7.00x)
put_hevc_epel_v32_8_c: 8353.4 ( 1.00x)
put_hevc_epel_v32_8_rvv_i32: 892.1 ( 9.36x)
put_hevc_epel_v48_8_c: 18608.1 ( 1.00x)
put_hevc_epel_v48_8_rvv_i32: 1956.1 ( 9.51x)
put_hevc_epel_v64_8_c: 32934.3 ( 1.00x)
put_hevc_epel_v64_8_rvv_i32: 3454.1 ( 9.53x)
put_hevc_epel_uni_v4_8_c: 237.5 ( 1.00x)
put_hevc_epel_uni_v4_8_rvv_i32: 87.5 ( 2.72x)
put_hevc_epel_uni_v6_8_c: 509.5 ( 1.00x)
put_hevc_epel_uni_v6_8_rvv_i32: 119.6 ( 4.26x)
put_hevc_epel_uni_v8_8_c: 982.8 ( 1.00x)
put_hevc_epel_uni_v8_8_rvv_i32: 147.1 ( 6.68x)
put_hevc_epel_uni_v12_8_c: 2027.7 ( 1.00x)
put_hevc_epel_uni_v12_8_rvv_i32: 211.0 ( 9.61x)
put_hevc_epel_uni_v16_8_c: 3525.4 ( 1.00x)
put_hevc_epel_uni_v16_8_rvv_i32: 278.8 (12.64x)
put_hevc_epel_uni_v24_8_c: 7804.3 ( 1.00x)
put_hevc_epel_uni_v24_8_rvv_i32: 778.9 (10.02x)
put_hevc_epel_uni_v32_8_c: 13807.3 ( 1.00x)
put_hevc_epel_uni_v32_8_rvv_i32: 1028.7 (13.42x)
put_hevc_epel_uni_v48_8_c: 30934.9 ( 1.00x)
put_hevc_epel_uni_v48_8_rvv_i32: 2265.1 (13.66x)
put_hevc_epel_uni_v64_8_c: 54705.5 ( 1.00x)
put_hevc_epel_uni_v64_8_rvv_i32: 4003.7 (13.66x)
put_hevc_epel_uni_w_v4_8_c: 313.8 ( 1.00x)
put_hevc_epel_uni_w_v4_8_rvv_i32: 156.6 ( 2.00x)
put_hevc_epel_uni_w_v6_8_c: 674.3 ( 1.00x)
put_hevc_epel_uni_w_v6_8_rvv_i32: 222.8 ( 3.03x)
put_hevc_epel_uni_w_v8_8_c: 1253.3 ( 1.00x)
put_hevc_epel_uni_w_v8_8_rvv_i32: 279.4 ( 4.49x)
put_hevc_epel_uni_w_v12_8_c: 2619.4 ( 1.00x)
put_hevc_epel_uni_w_v12_8_rvv_i32: 410.2 ( 6.39x)
put_hevc_epel_uni_w_v16_8_c: 4614.2 ( 1.00x)
put_hevc_epel_uni_w_v16_8_rvv_i32: 535.8 ( 8.61x)
put_hevc_epel_uni_w_v24_8_c: 10290.6 ( 1.00x)
put_hevc_epel_uni_w_v24_8_rvv_i32: 1550.6 ( 6.64x)
put_hevc_epel_uni_w_v32_8_c: 18169.4 ( 1.00x)
put_hevc_epel_uni_w_v32_8_rvv_i32: 2047.2 ( 8.88x)
put_hevc_epel_uni_w_v48_8_c: 40704.3 ( 1.00x)
put_hevc_epel_uni_w_v48_8_rvv_i32: 4552.4 ( 8.94x)
put_hevc_epel_uni_w_v64_8_c: 72197.1 ( 1.00x)
put_hevc_epel_uni_w_v64_8_rvv_i32: 8069.4 ( 8.95x)
put_hevc_epel_bi_v4_8_c: 262.7 ( 1.00x)
put_hevc_epel_bi_v4_8_rvv_i32: 105.9 ( 2.48x)
put_hevc_epel_bi_v6_8_c: 553.0 ( 1.00x)
put_hevc_epel_bi_v6_8_rvv_i32: 145.4 ( 3.80x)
put_hevc_epel_bi_v8_8_c: 1045.5 ( 1.00x)
put_hevc_epel_bi_v8_8_rvv_i32: 180.3 ( 5.80x)
put_hevc_epel_bi_v12_8_c: 2172.7 ( 1.00x)
put_hevc_epel_bi_v12_8_rvv_i32: 264.2 ( 8.22x)
put_hevc_epel_bi_v16_8_c: 3791.6 ( 1.00x)
put_hevc_epel_bi_v16_8_rvv_i32: 336.5 (11.27x)
put_hevc_epel_bi_v24_8_c: 8424.1 ( 1.00x)
put_hevc_epel_bi_v24_8_rvv_i32: 967.2 ( 8.71x)
put_hevc_epel_bi_v32_8_c: 14910.8 ( 1.00x)
put_hevc_epel_bi_v32_8_rvv_i32: 1270.7 (11.73x)
put_hevc_epel_bi_v48_8_c: 33326.5 ( 1.00x)
put_hevc_epel_bi_v48_8_rvv_i32: 2804.7 (11.88x)
put_hevc_epel_bi_v64_8_c: 59177.9 ( 1.00x)
put_hevc_epel_bi_v64_8_rvv_i32: 5022.3 (11.78x)
Signed-off-by: Zhanheng Yang <zhanheng.yang@linux.alibaba.com>
---
libavcodec/riscv/h26x/h2656dsp.h | 11 ++
libavcodec/riscv/h26x/hevcepel_rvv.S | 235 ++++++++++++++++++++++++++-
libavcodec/riscv/hevcdsp_init.c | 4 +
3 files changed, 249 insertions(+), 1 deletion(-)
diff --git a/libavcodec/riscv/h26x/h2656dsp.h b/libavcodec/riscv/h26x/h2656dsp.h
index fa2f5a88e3..085ed4cf14 100644
--- a/libavcodec/riscv/h26x/h2656dsp.h
+++ b/libavcodec/riscv/h26x/h2656dsp.h
@@ -59,4 +59,15 @@ void ff_hevc_put_epel_uni_w_h_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride,
void ff_hevc_put_epel_bi_h_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src,
ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t
mx, intptr_t my, int width);
+void ff_hevc_put_epel_v_8_m1_rvv(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height,
+ intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_epel_uni_v_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src,
+ ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_epel_uni_w_v_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride,
+ const uint8_t *_src, ptrdiff_t _srcstride,
+ int height, int denom, int wx, int ox,
+ intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_epel_bi_v_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src,
+ ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t
+ mx, intptr_t my, int width);
#endif
diff --git a/libavcodec/riscv/h26x/hevcepel_rvv.S b/libavcodec/riscv/h26x/hevcepel_rvv.S
index 81044846f7..caca0b88ab 100644
--- a/libavcodec/riscv/h26x/hevcepel_rvv.S
+++ b/libavcodec/riscv/h26x/hevcepel_rvv.S
@@ -262,4 +262,237 @@ func ff_hevc_put_epel_bi_h_8_\lmul\()_rvv, zve32x
endfunc
.endm
-hevc_epel_h m1, m2, m4
\ No newline at end of file
+hevc_epel_h m1, m2, m4
+
+/* output is unclipped; clobbers v4 */
+.macro filter_v vdst, vsrc0, vsrc1, vsrc2, vsrc3
+ vmv.v.x v4, s1
+ vwmulsu.vv \vdst, v4, \vsrc0
+ vwmaccsu.vx \vdst, s2, \vsrc1
+ vmv.v.v \vsrc0, \vsrc1
+ vwmaccsu.vx \vdst, s3, \vsrc2
+ vmv.v.v \vsrc1, \vsrc2
+ vwmaccsu.vx \vdst, s4, \vsrc3
+ vmv.v.v \vsrc2, \vsrc3
+.endm
+
+.macro hevc_epel_v lmul, lmul2, lmul4
+func ff_hevc_put_epel_v_8_\lmul\()_rvv, zve32x
+ addi sp, sp, -32
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ load_filter a5
+ sub a1, a1, a2 # src - src_stride
+ li t1, 0 # offset
+ mv t4, a3
+
+1:
+ add t2, a1, t1
+ slli t3, t1, 1
+ add t3, a0, t3
+
+ vsetvli t5, a6, e8, \lmul, ta, ma
+ vle8.V v16, (t2)
+ add t2, t2, a2
+ vle8.V v18, (t2)
+ add t2, t2, a2
+ vle8.V v20, (t2)
+ add t2, t2, a2
+
+2:
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vle8.V v22, (t2)
+ add t2, t2, a2
+ filter_v v0, v16, v18, v20, v22
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vse16.v v0, (t3)
+ add t3, t3, 2*HEVC_MAX_PB_SIZE
+ addi a3, a3, -1
+ bgt a3, zero, 2b
+ add t1, t1, t5
+ sub a6, a6, t5
+ mv a3, t4
+ bgt a6, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ addi sp, sp, 32
+ ret
+endfunc
+
+func ff_hevc_put_epel_uni_v_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+ addi sp, sp, -32
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ load_filter a6
+ sub a2, a2, a3 # src - src_stride
+ li t1, 0 # offset
+ mv t4, a4
+
+1:
+ add t2, a2, t1
+ add t3, a0, t1
+
+ vsetvli t5, a7, e8, \lmul, ta, ma
+ vle8.V v16, (t2)
+ add t2, t2, a3
+ vle8.V v18, (t2)
+ add t2, t2, a3
+ vle8.V v20, (t2)
+ add t2, t2, a3
+
+2:
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vle8.V v22, (t2)
+ add t2, t2, a3
+ filter_v v0, v16, v18, v20, v22
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 6
+ vse8.v v0, (t3)
+ add t3, t3, a1
+ addi a4, a4, -1
+ bgt a4, zero, 2b
+ add t1, t1, t5
+ sub a7, a7, t5
+ mv a4, t4
+ bgt a7, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ addi sp, sp, 32
+ ret
+endfunc
+
+func ff_hevc_put_epel_uni_w_v_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+#if (__riscv_xlen == 32)
+ lw t1, 4(sp) # my
+ lw t6, 8(sp) # width
+#elif (__riscv_xlen == 64)
+ ld t1, 8(sp)
+ lw t6, 16(sp)
+#endif
+ addi sp, sp, -32
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ load_filter t1
+ addi a5, a5, 6 # shift
+ sub a2, a2, a3 # src - src_stride
+ li t1, 0 # offset
+ mv t4, a4
+
+1:
+ add t2, a2, t1
+ add t3, a0, t1
+
+ vsetvli t5, t6, e8, \lmul, ta, ma
+ vle8.V v16, (t2)
+ add t2, t2, a3
+ vle8.V v18, (t2)
+ add t2, t2, a3
+ vle8.V v20, (t2)
+ add t2, t2, a3
+
+2:
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vle8.V v22, (t2)
+ add t2, t2, a3
+ filter_v v0, v16, v18, v20, v22
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vwmul.vx v8, v0, a6
+ vsetvli zero, zero, e32, \lmul4, ta, ma
+ vssra.vx v0, v8, a5
+ vsadd.vx v0, v0, a7
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vnclip.wi v0, v0, 0
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 0
+ vse8.v v0, (t3)
+ add t3, t3, a1
+ addi a4, a4, -1
+ bgt a4, zero, 2b
+ add t1, t1, t5
+ sub t6, t6, t5
+ mv a4, t4
+ bgt t6, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ addi sp, sp, 32
+ ret
+endfunc
+
+func ff_hevc_put_epel_bi_v_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+ lw t6, 0(sp) # width
+ addi sp, sp, -32
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ load_filter a7
+ sub a2, a2, a3 # src - src_stride
+ li t1, 0 # offset
+ mv t4, a5
+
+1:
+ add t2, a2, t1
+ add t3, a0, t1
+ slli t0, t1, 1
+ add t0, a4, t0
+
+ vsetvli t5, t6, e8, \lmul, ta, ma
+ vle8.V v16, (t2)
+ add t2, t2, a3
+ vle8.V v18, (t2)
+ add t2, t2, a3
+ vle8.V v20, (t2)
+ add t2, t2, a3
+
+2:
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vle8.V v22, (t2)
+ add t2, t2, a3
+ filter_v v0, v16, v18, v20, v22
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vle16.v v8, (t0)
+ addi t0, t0, 2*HEVC_MAX_PB_SIZE
+ vsadd.vv v0, v0, v8
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 7
+ vse8.v v0, (t3)
+ add t3, t3, a1
+ addi a5, a5, -1
+ bgt a5, zero, 2b
+ add t1, t1, t5
+ sub t6, t6, t5
+ mv a5, t4
+ bgt t6, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ addi sp, sp, 32
+ ret
+endfunc
+.endm
+
+hevc_epel_v m1, m2, m4
\ No newline at end of file
diff --git a/libavcodec/riscv/hevcdsp_init.c b/libavcodec/riscv/hevcdsp_init.c
index 8608fdbd19..c7874996a8 100644
--- a/libavcodec/riscv/hevcdsp_init.c
+++ b/libavcodec/riscv/hevcdsp_init.c
@@ -94,6 +94,10 @@ void ff_hevc_dsp_init_riscv(HEVCDSPContext *c, const int bit_depth)
RVV_FNASSIGN_PEL(c->put_hevc_epel_uni, 0, 1, ff_hevc_put_epel_uni_h_8_m1_rvv);
RVV_FNASSIGN_PEL(c->put_hevc_epel_uni_w, 0, 1, ff_hevc_put_epel_uni_w_h_8_m1_rvv);
RVV_FNASSIGN_PEL(c->put_hevc_epel_bi, 0, 1, ff_hevc_put_epel_bi_h_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_epel, 1, 0, ff_hevc_put_epel_v_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_epel_uni, 1, 0, ff_hevc_put_epel_uni_v_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_epel_uni_w, 1, 0, ff_hevc_put_epel_uni_w_v_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_epel_bi, 1, 0, ff_hevc_put_epel_bi_v_8_m1_rvv);
break;
default:
break;
--
2.25.1
_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org
^ permalink raw reply [flat|nested] 6+ messages in thread
* [FFmpeg-devel] [PATCH 5/6] libavcodec/riscv: add RVV optimized for qpel_hv in HEVC.
2026-01-22 4:23 [FFmpeg-devel] [PATCH 1/6] libavcodec/riscv: add RVV optimized for qpel_h in HEVC zhanheng.yang--- via ffmpeg-devel
` (2 preceding siblings ...)
2026-01-22 4:23 ` [FFmpeg-devel] [PATCH 4/6] libavcodec/riscv: add RVV optimized for epel_v " zhanheng.yang--- via ffmpeg-devel
@ 2026-01-22 4:23 ` zhanheng.yang--- via ffmpeg-devel
2026-01-22 4:23 ` [FFmpeg-devel] [PATCH 6/6] libavcodec/riscv: add RVV optimized for epel_hv " zhanheng.yang--- via ffmpeg-devel
4 siblings, 0 replies; 6+ messages in thread
From: zhanheng.yang--- via ffmpeg-devel @ 2026-01-22 4:23 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Zhanheng Yang
From: Zhanheng Yang <zhanheng.yang@linux.alibaba.com>
Bench on A210 C908 core(VLEN 128).
put_hevc_qpel_hv4_8_c: 865.6 ( 1.00x)
put_hevc_qpel_hv4_8_rvv_i32: 501.8 ( 1.72x)
put_hevc_qpel_hv6_8_c: 1602.9 ( 1.00x)
put_hevc_qpel_hv6_8_rvv_i32: 635.4 ( 2.52x)
put_hevc_qpel_hv8_8_c: 2571.2 ( 1.00x)
put_hevc_qpel_hv8_8_rvv_i32: 774.1 ( 3.32x)
put_hevc_qpel_hv12_8_c: 5366.3 ( 1.00x)
put_hevc_qpel_hv12_8_rvv_i32: 1049.3 ( 5.11x)
put_hevc_qpel_hv16_8_c: 8959.2 ( 1.00x)
put_hevc_qpel_hv16_8_rvv_i32: 1328.1 ( 6.75x)
put_hevc_qpel_hv24_8_c: 18969.7 ( 1.00x)
put_hevc_qpel_hv24_8_rvv_i32: 3712.5 ( 5.11x)
put_hevc_qpel_hv32_8_c: 32674.3 ( 1.00x)
put_hevc_qpel_hv32_8_rvv_i32: 4806.7 ( 6.80x)
put_hevc_qpel_hv48_8_c: 71309.9 ( 1.00x)
put_hevc_qpel_hv48_8_rvv_i32: 10465.8 ( 6.81x)
put_hevc_qpel_hv64_8_c: 124846.0 ( 1.00x)
put_hevc_qpel_hv64_8_rvv_i32: 18306.5 ( 6.82x)
put_hevc_qpel_uni_hv4_8_c: 920.4 ( 1.00x)
put_hevc_qpel_uni_hv4_8_rvv_i32: 532.1 ( 1.73x)
put_hevc_qpel_uni_hv6_8_c: 1753.0 ( 1.00x)
put_hevc_qpel_uni_hv6_8_rvv_i32: 691.0 ( 2.54x)
put_hevc_qpel_uni_hv8_8_c: 2872.7 ( 1.00x)
put_hevc_qpel_uni_hv8_8_rvv_i32: 836.9 ( 3.43x)
put_hevc_qpel_uni_hv12_8_c: 5828.4 ( 1.00x)
put_hevc_qpel_uni_hv12_8_rvv_i32: 1141.2 ( 5.11x)
put_hevc_qpel_uni_hv16_8_c: 9906.7 ( 1.00x)
put_hevc_qpel_uni_hv16_8_rvv_i32: 1452.5 ( 6.82x)
put_hevc_qpel_uni_hv24_8_c: 20871.3 ( 1.00x)
put_hevc_qpel_uni_hv24_8_rvv_i32: 4094.0 ( 5.10x)
put_hevc_qpel_uni_hv32_8_c: 36123.3 ( 1.00x)
put_hevc_qpel_uni_hv32_8_rvv_i32: 5310.5 ( 6.80x)
put_hevc_qpel_uni_hv48_8_c: 79016.0 ( 1.00x)
put_hevc_qpel_uni_hv48_8_rvv_i32: 11591.2 ( 6.82x)
put_hevc_qpel_uni_hv64_8_c: 138779.8 ( 1.00x)
put_hevc_qpel_uni_hv64_8_rvv_i32: 20321.1 ( 6.83x)
put_hevc_qpel_uni_w_hv4_8_c: 988.8 ( 1.00x)
put_hevc_qpel_uni_w_hv4_8_rvv_i32: 580.3 ( 1.70x)
put_hevc_qpel_uni_w_hv6_8_c: 1871.5 ( 1.00x)
put_hevc_qpel_uni_w_hv6_8_rvv_i32: 751.7 ( 2.49x)
put_hevc_qpel_uni_w_hv8_8_c: 3089.8 ( 1.00x)
put_hevc_qpel_uni_w_hv8_8_rvv_i32: 923.7 ( 3.35x)
put_hevc_qpel_uni_w_hv12_8_c: 6384.8 ( 1.00x)
put_hevc_qpel_uni_w_hv12_8_rvv_i32: 1266.7 ( 5.04x)
put_hevc_qpel_uni_w_hv16_8_c: 10844.7 ( 1.00x)
put_hevc_qpel_uni_w_hv16_8_rvv_i32: 1612.2 ( 6.73x)
put_hevc_qpel_uni_w_hv24_8_c: 23060.9 ( 1.00x)
put_hevc_qpel_uni_w_hv24_8_rvv_i32: 4560.2 ( 5.06x)
put_hevc_qpel_uni_w_hv32_8_c: 39977.0 ( 1.00x)
put_hevc_qpel_uni_w_hv32_8_rvv_i32: 5927.0 ( 6.74x)
put_hevc_qpel_uni_w_hv48_8_c: 87560.3 ( 1.00x)
put_hevc_qpel_uni_w_hv48_8_rvv_i32: 12978.3 ( 6.75x)
put_hevc_qpel_uni_w_hv64_8_c: 153980.5 ( 1.00x)
put_hevc_qpel_uni_w_hv64_8_rvv_i32: 22823.0 ( 6.75x)
put_hevc_qpel_bi_hv4_8_c: 938.5 ( 1.00x)
put_hevc_qpel_bi_hv4_8_rvv_i32: 541.4 ( 1.73x)
put_hevc_qpel_bi_hv6_8_c: 1760.1 ( 1.00x)
put_hevc_qpel_bi_hv6_8_rvv_i32: 695.9 ( 2.53x)
put_hevc_qpel_bi_hv8_8_c: 2924.3 ( 1.00x)
put_hevc_qpel_bi_hv8_8_rvv_i32: 849.3 ( 3.44x)
put_hevc_qpel_bi_hv12_8_c: 5992.7 ( 1.00x)
put_hevc_qpel_bi_hv12_8_rvv_i32: 1157.5 ( 5.18x)
put_hevc_qpel_bi_hv16_8_c: 10065.4 ( 1.00x)
put_hevc_qpel_bi_hv16_8_rvv_i32: 1473.6 ( 6.83x)
put_hevc_qpel_bi_hv24_8_c: 21450.2 ( 1.00x)
put_hevc_qpel_bi_hv24_8_rvv_i32: 4151.3 ( 5.17x)
put_hevc_qpel_bi_hv32_8_c: 37107.8 ( 1.00x)
put_hevc_qpel_bi_hv32_8_rvv_i32: 5386.4 ( 6.89x)
put_hevc_qpel_bi_hv48_8_c: 81401.7 ( 1.00x)
put_hevc_qpel_bi_hv48_8_rvv_i32: 11761.7 ( 6.92x)
put_hevc_qpel_bi_hv64_8_c: 143503.3 ( 1.00x)
put_hevc_qpel_bi_hv64_8_rvv_i32: 20700.3 ( 6.93x)
Signed-off-by: Zhanheng Yang <zhanheng.yang@linux.alibaba.com>
---
libavcodec/riscv/h26x/h2656dsp.h | 11 +
libavcodec/riscv/h26x/hevcqpel_rvv.S | 386 ++++++++++++++++++++++++++-
libavcodec/riscv/hevcdsp_init.c | 4 +
3 files changed, 400 insertions(+), 1 deletion(-)
diff --git a/libavcodec/riscv/h26x/h2656dsp.h b/libavcodec/riscv/h26x/h2656dsp.h
index 085ed4cf14..7e320bd795 100644
--- a/libavcodec/riscv/h26x/h2656dsp.h
+++ b/libavcodec/riscv/h26x/h2656dsp.h
@@ -47,6 +47,17 @@ void ff_hevc_put_qpel_uni_w_v_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride,
void ff_hevc_put_qpel_bi_v_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src,
ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t
mx, intptr_t my, int width);
+void ff_hevc_put_qpel_hv_8_m1_rvv(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height,
+ intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_qpel_uni_hv_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src,
+ ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_qpel_uni_w_hv_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride,
+ const uint8_t *_src, ptrdiff_t _srcstride,
+ int height, int denom, int wx, int ox,
+ intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_qpel_bi_hv_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src,
+ ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t
+ mx, intptr_t my, int width);
void ff_hevc_put_epel_h_8_m1_rvv(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height,
intptr_t mx, intptr_t my, int width);
diff --git a/libavcodec/riscv/h26x/hevcqpel_rvv.S b/libavcodec/riscv/h26x/hevcqpel_rvv.S
index 8fd3c47bcc..ed7fa8fe00 100644
--- a/libavcodec/riscv/h26x/hevcqpel_rvv.S
+++ b/libavcodec/riscv/h26x/hevcqpel_rvv.S
@@ -619,4 +619,388 @@ func ff_hevc_put_qpel_bi_v_8_\lmul\()_rvv, zve32x
endfunc
.endm
-hevc_qpel_v m1, m2, m4
\ No newline at end of file
+hevc_qpel_v m1, m2, m4
+
+/* clobbers reg t4 */
+.macro filter_v_s vdst, vsrc0, vsrc1, vsrc2, vsrc3, vsrc4, vsrc5, vsrc6, vsrc7, vf
+ vwmul.vx \vdst, \vsrc0, s0
+ vwmacc.vx \vdst, s9, \vsrc1
+ vmv.v.v \vsrc0, \vsrc1
+ vwmacc.vx \vdst, s10, \vsrc2
+ vmv.v.v \vsrc1, \vsrc2
+ vwmacc.vx \vdst, s11, \vsrc3
+ vmv.v.v \vsrc2, \vsrc3
+ lb t4, 4(\vf)
+ vwmacc.vx \vdst, t4, \vsrc4
+ lb t4, 5(\vf)
+ vmv.v.v \vsrc3, \vsrc4
+ vwmacc.vx \vdst, t4, \vsrc5
+ lb t4, 6(\vf)
+ vmv.v.v \vsrc4, \vsrc5
+ vwmacc.vx \vdst, t4, \vsrc6
+ lb t4, 7(\vf)
+ vmv.v.v \vsrc5, \vsrc6
+ vwmacc.vx \vdst, t4, \vsrc7
+ vmv.v.v \vsrc6, \vsrc7
+.endm
+
+/* output \m as t0; clobbers t0, t1, reg not enough for all coef */
+.macro load_filter2 m
+ la t0, qpel_filters
+ slli t1, \m, 3
+ add t0, t0, t1
+ lb s0, 0(t0)
+ lb s9, 1(t0)
+ lb s10, 2(t0)
+ lb s11, 3(t0)
+ mv \m, t0
+.endm
+
+.macro hevc_qpel_hv lmul, lmul2, lmul4
+func ff_hevc_put_qpel_hv_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 2
+ addi sp, sp, -96
+ sx s0, 0(sp)
+ sx s1, 8(sp)
+ sx s2, 16(sp)
+ sx s3, 24(sp)
+ sx s4, 32(sp)
+ sx s5, 40(sp)
+ sx s6, 48(sp)
+ sx s7, 56(sp)
+ sx s8, 64(sp)
+ sx s9, 72(sp)
+ sx s10, 80(sp)
+ sx s11, 88(sp)
+ load_filter a4
+ load_filter2 a5
+ slli t1, a2, 1
+ add t1, t1, a2
+ sub a1, a1, t1 # src - 3 * src_stride
+ mv t0, a3
+ li t1, 0 # offset
+
+1:
+ add t2, a1, t1
+ slli t3, t1, 1
+ add t3, a0, t3
+
+ vsetvli t6, a6, e8, \lmul, ta, ma
+ filter_h v4, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a2
+ filter_h v6, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a2
+ filter_h v8, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a2
+ filter_h v10, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a2
+ filter_h v12, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a2
+ filter_h v14, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a2
+ filter_h v16, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a2
+
+2:
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ filter_h v18, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a2
+
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ filter_v_s v0, v4, v6, v8, v10, v12, v14, v16, v18, a5
+ vnclip.wi v0, v0, 6
+ vse16.v v0, (t3)
+ addi a3, a3, -1
+ addi t3, t3, 2*HEVC_MAX_PB_SIZE
+ bgt a3, zero, 2b
+ mv a3, t0
+ add t1, t1, t6
+ sub a6, a6, t6
+ bgt a6, zero, 1b
+
+ lx s0, 0(sp)
+ lx s1, 8(sp)
+ lx s2, 16(sp)
+ lx s3, 24(sp)
+ lx s4, 32(sp)
+ lx s5, 40(sp)
+ lx s6, 48(sp)
+ lx s7, 56(sp)
+ lx s8, 64(sp)
+ lx s9, 72(sp)
+ lx s10, 80(sp)
+ lx s11, 88(sp)
+ addi sp, sp, 96
+ ret
+endfunc
+
+func ff_hevc_put_qpel_uni_hv_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+ addi sp, sp, -96
+ sx s0, 0(sp)
+ sx s1, 8(sp)
+ sx s2, 16(sp)
+ sx s3, 24(sp)
+ sx s4, 32(sp)
+ sx s5, 40(sp)
+ sx s6, 48(sp)
+ sx s7, 56(sp)
+ sx s8, 64(sp)
+ sx s9, 72(sp)
+ sx s10, 80(sp)
+ sx s11, 88(sp)
+ load_filter a5
+ load_filter2 a6
+ slli t1, a3, 1
+ add t1, t1, a3
+ sub a2, a2, t1 # src - 3 * src_stride
+ mv t0, a4
+ li t1, 0 # offset
+
+1:
+ add t2, a2, t1
+ add t3, a0, t1
+
+ vsetvli t6, a7, e8, \lmul, ta, ma
+ filter_h v4, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v6, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v8, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v10, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v12, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v14, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v16, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+
+2:
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ filter_h v18, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ filter_v_s v0, v4, v6, v8, v10, v12, v14, v16, v18, a6
+ vsetvli zero, zero, e32, \lmul4, ta, ma
+ vsra.vi v0, v0, 6
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vnclipu.wi v0, v0, 6
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 0
+ vse8.v v0, (t3)
+ addi a4, a4, -1
+ add t3, t3, a1
+ bgt a4, zero, 2b
+ mv a4, t0
+ add t1, t1, t6
+ sub a7, a7, t6
+ bgt a7, zero, 1b
+
+ lx s0, 0(sp)
+ lx s1, 8(sp)
+ lx s2, 16(sp)
+ lx s3, 24(sp)
+ lx s4, 32(sp)
+ lx s5, 40(sp)
+ lx s6, 48(sp)
+ lx s7, 56(sp)
+ lx s8, 64(sp)
+ lx s9, 72(sp)
+ lx s10, 80(sp)
+ lx s11, 88(sp)
+ addi sp, sp, 96
+ ret
+endfunc
+
+func ff_hevc_put_qpel_uni_w_hv_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+ lx t2, 0(sp) # mx
+#if (__riscv_xlen == 32)
+ lw t4, 4(sp) # my
+ lw t5, 8(sp) # width
+#elif (__riscv_xlen == 64)
+ ld t4, 8(sp)
+ lw t5, 16(sp)
+#endif
+ addi a5, a5, 6 # shift
+ addi sp, sp, -104
+ sx s0, 0(sp)
+ sx s1, 8(sp)
+ sx s2, 16(sp)
+ sx s3, 24(sp)
+ sx s4, 32(sp)
+ sx s5, 40(sp)
+ sx s6, 48(sp)
+ sx s7, 56(sp)
+ sx s8, 64(sp)
+ sx s9, 72(sp)
+ sx s10, 80(sp)
+ sx s11, 88(sp)
+ sx ra, 96(sp)
+ mv ra, t4
+ load_filter t2
+ load_filter2 ra
+ slli t1, a3, 1
+ add t1, t1, a3
+ sub a2, a2, t1 # src - 3 * src_stride
+ mv t0, a4
+ li t1, 0 # offset
+
+1:
+ add t2, a2, t1
+ add t3, a0, t1
+
+ vsetvli t6, t5, e8, \lmul, ta, ma
+ filter_h v4, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v6, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v8, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v10, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v12, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v14, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v16, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+
+2:
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ filter_h v18, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ filter_v_s v0, v4, v6, v8, v10, v12, v14, v16, v18, ra
+ vsetvli zero, zero, e32, \lmul4, ta, ma
+ vsra.vi v0, v0, 6
+ vmul.vx v0, v0, a6
+ vssra.vx v0, v0, a5
+ vsadd.vx v0, v0, a7
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vnclip.wi v0, v0, 0
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 0
+ vse8.v v0, (t3)
+ addi a4, a4, -1
+ add t3, t3, a1
+ bgt a4, zero, 2b
+ mv a4, t0
+ add t1, t1, t6
+ sub t5, t5, t6
+ bgt t5, zero, 1b
+
+ lx s0, 0(sp)
+ lx s1, 8(sp)
+ lx s2, 16(sp)
+ lx s3, 24(sp)
+ lx s4, 32(sp)
+ lx s5, 40(sp)
+ lx s6, 48(sp)
+ lx s7, 56(sp)
+ lx s8, 64(sp)
+ lx s9, 72(sp)
+ lx s10, 80(sp)
+ lx s11, 88(sp)
+ lx ra, 96(sp)
+ addi sp, sp, 104
+ ret
+endfunc
+
+func ff_hevc_put_qpel_bi_hv_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+ lw t3, 0(sp) # width
+ addi sp, sp, -96
+ sx s0, 0(sp)
+ sx s1, 8(sp)
+ sx s2, 16(sp)
+ sx s3, 24(sp)
+ sx s4, 32(sp)
+ sx s5, 40(sp)
+ sx s6, 48(sp)
+ sx s7, 56(sp)
+ sx s8, 64(sp)
+ sx s9, 72(sp)
+ sx s10, 80(sp)
+ sx s11, 88(sp)
+ load_filter a6
+ load_filter2 a7
+ mv a6, t3
+ slli t1, a3, 1
+ add t1, t1, a3
+ sub a2, a2, t1 # src - 3 * src_stride
+ mv t0, a5
+ li t1, 0 # offset
+
+1:
+ add t2, a2, t1
+ add t3, a0, t1
+ slli t5, t1, 1
+ add t5, a4, t5
+
+ vsetvli t6, a6, e8, \lmul, ta, ma
+ filter_h v4, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v6, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v8, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v10, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v12, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v14, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+ filter_h v16, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+
+2:
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ filter_h v18, v24, v25, v26, v27, v28, v29, v30, v31, t2
+ add t2, t2, a3
+
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vle16.V v24, (t5)
+ addi t5, t5, 2*HEVC_MAX_PB_SIZE
+ filter_v_s v0, v4, v6, v8, v10, v12, v14, v16, v18, a7
+ vsetvli zero, zero, e32, \lmul4, ta, ma
+ vsra.vi v0, v0, 6
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vwadd.wv v0, v0, v24
+ vnclip.wi v0, v0, 7
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 0
+ vse8.v v0, (t3)
+ addi a5, a5, -1
+ add t3, t3, a1
+ bgt a5, zero, 2b
+ mv a5, t0
+ add t1, t1, t6
+ sub a6, a6, t6
+ bgt a6, zero, 1b
+
+ lx s0, 0(sp)
+ lx s1, 8(sp)
+ lx s2, 16(sp)
+ lx s3, 24(sp)
+ lx s4, 32(sp)
+ lx s5, 40(sp)
+ lx s6, 48(sp)
+ lx s7, 56(sp)
+ lx s8, 64(sp)
+ lx s9, 72(sp)
+ lx s10, 80(sp)
+ lx s11, 88(sp)
+ addi sp, sp, 96
+ ret
+endfunc
+.endm
+
+hevc_qpel_hv m1, m2, m4
diff --git a/libavcodec/riscv/hevcdsp_init.c b/libavcodec/riscv/hevcdsp_init.c
index c7874996a8..53c800626f 100644
--- a/libavcodec/riscv/hevcdsp_init.c
+++ b/libavcodec/riscv/hevcdsp_init.c
@@ -89,6 +89,10 @@ void ff_hevc_dsp_init_riscv(HEVCDSPContext *c, const int bit_depth)
RVV_FNASSIGN_PEL(c->put_hevc_qpel_uni, 1, 0, ff_hevc_put_qpel_uni_v_8_m1_rvv);
RVV_FNASSIGN_PEL(c->put_hevc_qpel_uni_w, 1, 0, ff_hevc_put_qpel_uni_w_v_8_m1_rvv);
RVV_FNASSIGN_PEL(c->put_hevc_qpel_bi, 1, 0, ff_hevc_put_qpel_bi_v_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_qpel, 1, 1, ff_hevc_put_qpel_hv_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_qpel_uni, 1, 1, ff_hevc_put_qpel_uni_hv_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_qpel_uni_w, 1, 1, ff_hevc_put_qpel_uni_w_hv_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_qpel_bi, 1, 1, ff_hevc_put_qpel_bi_hv_8_m1_rvv);
RVV_FNASSIGN_PEL(c->put_hevc_epel, 0, 1, ff_hevc_put_epel_h_8_m1_rvv);
RVV_FNASSIGN_PEL(c->put_hevc_epel_uni, 0, 1, ff_hevc_put_epel_uni_h_8_m1_rvv);
--
2.25.1
_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org
^ permalink raw reply [flat|nested] 6+ messages in thread
* [FFmpeg-devel] [PATCH 6/6] libavcodec/riscv: add RVV optimized for epel_hv in HEVC.
2026-01-22 4:23 [FFmpeg-devel] [PATCH 1/6] libavcodec/riscv: add RVV optimized for qpel_h in HEVC zhanheng.yang--- via ffmpeg-devel
` (3 preceding siblings ...)
2026-01-22 4:23 ` [FFmpeg-devel] [PATCH 5/6] libavcodec/riscv: add RVV optimized for qpel_hv " zhanheng.yang--- via ffmpeg-devel
@ 2026-01-22 4:23 ` zhanheng.yang--- via ffmpeg-devel
4 siblings, 0 replies; 6+ messages in thread
From: zhanheng.yang--- via ffmpeg-devel @ 2026-01-22 4:23 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Zhanheng Yang
From: Zhanheng Yang <zhanheng.yang@linux.alibaba.com>
Bench on A210 C908 core(VLEN 128).
put_hevc_epel_hv4_8_c: 390.0 ( 1.00x)
put_hevc_epel_hv4_8_rvv_i32: 213.0 ( 1.83x)
put_hevc_epel_hv6_8_c: 749.8 ( 1.00x)
put_hevc_epel_hv6_8_rvv_i32: 290.8 ( 2.58x)
put_hevc_epel_hv8_8_c: 1215.5 ( 1.00x)
put_hevc_epel_hv8_8_rvv_i32: 360.7 ( 3.37x)
put_hevc_epel_hv12_8_c: 2602.5 ( 1.00x)
put_hevc_epel_hv12_8_rvv_i32: 515.4 ( 5.05x)
put_hevc_epel_hv16_8_c: 4417.0 ( 1.00x)
put_hevc_epel_hv16_8_rvv_i32: 661.8 ( 6.67x)
put_hevc_epel_hv24_8_c: 9524.8 ( 1.00x)
put_hevc_epel_hv24_8_rvv_i32: 1909.2 ( 4.99x)
put_hevc_epel_hv32_8_c: 16589.1 ( 1.00x)
put_hevc_epel_hv32_8_rvv_i32: 2508.0 ( 6.61x)
put_hevc_epel_hv48_8_c: 37145.4 ( 1.00x)
put_hevc_epel_hv48_8_rvv_i32: 5526.8 ( 6.72x)
put_hevc_epel_hv64_8_c: 65015.9 ( 1.00x)
put_hevc_epel_hv64_8_rvv_i32: 9751.9 ( 6.67x)
put_hevc_epel_uni_hv4_8_c: 434.8 ( 1.00x)
put_hevc_epel_uni_hv4_8_rvv_i32: 238.8 ( 1.82x)
put_hevc_epel_uni_hv6_8_c: 856.8 ( 1.00x)
put_hevc_epel_uni_hv6_8_rvv_i32: 329.6 ( 2.60x)
put_hevc_epel_uni_hv8_8_c: 1474.2 ( 1.00x)
put_hevc_epel_uni_hv8_8_rvv_i32: 412.9 ( 3.57x)
put_hevc_epel_uni_hv12_8_c: 2995.9 ( 1.00x)
put_hevc_epel_uni_hv12_8_rvv_i32: 593.9 ( 5.04x)
put_hevc_epel_uni_hv16_8_c: 5128.2 ( 1.00x)
put_hevc_epel_uni_hv16_8_rvv_i32: 770.6 ( 6.66x)
put_hevc_epel_uni_hv24_8_c: 11159.5 ( 1.00x)
put_hevc_epel_uni_hv24_8_rvv_i32: 2223.1 ( 5.02x)
put_hevc_epel_uni_hv32_8_c: 19462.3 ( 1.00x)
put_hevc_epel_uni_hv32_8_rvv_i32: 2925.1 ( 6.65x)
put_hevc_epel_uni_hv48_8_c: 43480.5 ( 1.00x)
put_hevc_epel_uni_hv48_8_rvv_i32: 6476.7 ( 6.71x)
put_hevc_epel_uni_hv64_8_c: 76411.2 ( 1.00x)
put_hevc_epel_uni_hv64_8_rvv_i32: 11456.7 ( 6.67x)
put_hevc_epel_uni_w_hv4_8_c: 557.8 ( 1.00x)
put_hevc_epel_uni_w_hv4_8_rvv_i32: 287.9 ( 1.94x)
put_hevc_epel_uni_w_hv6_8_c: 1068.0 ( 1.00x)
put_hevc_epel_uni_w_hv6_8_rvv_i32: 399.4 ( 2.67x)
put_hevc_epel_uni_w_hv8_8_c: 1835.2 ( 1.00x)
put_hevc_epel_uni_w_hv8_8_rvv_i32: 507.3 ( 3.62x)
put_hevc_epel_uni_w_hv12_8_c: 3758.9 ( 1.00x)
put_hevc_epel_uni_w_hv12_8_rvv_i32: 729.2 ( 5.15x)
put_hevc_epel_uni_w_hv16_8_c: 6524.5 ( 1.00x)
put_hevc_epel_uni_w_hv16_8_rvv_i32: 954.7 ( 6.83x)
put_hevc_epel_uni_w_hv24_8_c: 14094.2 ( 1.00x)
put_hevc_epel_uni_w_hv24_8_rvv_i32: 2764.9 ( 5.10x)
put_hevc_epel_uni_w_hv32_8_c: 24887.0 ( 1.00x)
put_hevc_epel_uni_w_hv32_8_rvv_i32: 3640.5 ( 6.84x)
put_hevc_epel_uni_w_hv48_8_c: 55341.0 ( 1.00x)
put_hevc_epel_uni_w_hv48_8_rvv_i32: 8083.8 ( 6.85x)
put_hevc_epel_uni_w_hv64_8_c: 97377.8 ( 1.00x)
put_hevc_epel_uni_w_hv64_8_rvv_i32: 14322.9 ( 6.80x)
put_hevc_epel_bi_hv4_8_c: 472.2 ( 1.00x)
put_hevc_epel_bi_hv4_8_rvv_i32: 250.0 ( 1.89x)
put_hevc_epel_bi_hv6_8_c: 903.1 ( 1.00x)
put_hevc_epel_bi_hv6_8_rvv_i32: 341.3 ( 2.65x)
put_hevc_epel_bi_hv8_8_c: 1583.5 ( 1.00x)
put_hevc_epel_bi_hv8_8_rvv_i32: 433.1 ( 3.66x)
put_hevc_epel_bi_hv12_8_c: 3205.8 ( 1.00x)
put_hevc_epel_bi_hv12_8_rvv_i32: 615.0 ( 5.21x)
put_hevc_epel_bi_hv16_8_c: 5504.1 ( 1.00x)
put_hevc_epel_bi_hv16_8_rvv_i32: 800.3 ( 6.88x)
put_hevc_epel_bi_hv24_8_c: 11897.2 ( 1.00x)
put_hevc_epel_bi_hv24_8_rvv_i32: 2309.9 ( 5.15x)
put_hevc_epel_bi_hv32_8_c: 20823.8 ( 1.00x)
put_hevc_epel_bi_hv32_8_rvv_i32: 3031.2 ( 6.87x)
put_hevc_epel_bi_hv48_8_c: 46854.5 ( 1.00x)
put_hevc_epel_bi_hv48_8_rvv_i32: 6713.2 ( 6.98x)
put_hevc_epel_bi_hv64_8_c: 82399.2 ( 1.00x)
put_hevc_epel_bi_hv64_8_rvv_i32: 11901.4 ( 6.92x)
Signed-off-by: Zhanheng Yang <zhanheng.yang@linux.alibaba.com>
---
libavcodec/riscv/h26x/h2656dsp.h | 11 +
libavcodec/riscv/h26x/hevcepel_rvv.S | 325 +++++++++++++++++++++++++--
libavcodec/riscv/hevcdsp_init.c | 4 +
3 files changed, 325 insertions(+), 15 deletions(-)
diff --git a/libavcodec/riscv/h26x/h2656dsp.h b/libavcodec/riscv/h26x/h2656dsp.h
index 7e320bd795..b8a116bdf7 100644
--- a/libavcodec/riscv/h26x/h2656dsp.h
+++ b/libavcodec/riscv/h26x/h2656dsp.h
@@ -81,4 +81,15 @@ void ff_hevc_put_epel_uni_w_v_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride,
void ff_hevc_put_epel_bi_v_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src,
ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t
mx, intptr_t my, int width);
+void ff_hevc_put_epel_hv_8_m1_rvv(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height,
+ intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_epel_uni_hv_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src,
+ ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_epel_uni_w_hv_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride,
+ const uint8_t *_src, ptrdiff_t _srcstride,
+ int height, int denom, int wx, int ox,
+ intptr_t mx, intptr_t my, int width);
+void ff_hevc_put_epel_bi_hv_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src,
+ ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t
+ mx, intptr_t my, int width);
#endif
diff --git a/libavcodec/riscv/h26x/hevcepel_rvv.S b/libavcodec/riscv/h26x/hevcepel_rvv.S
index caca0b88ab..7a4a3f3318 100644
--- a/libavcodec/riscv/h26x/hevcepel_rvv.S
+++ b/libavcodec/riscv/h26x/hevcepel_rvv.S
@@ -285,8 +285,8 @@ func ff_hevc_put_epel_v_8_\lmul\()_rvv, zve32x
sx s4, 24(sp)
load_filter a5
sub a1, a1, a2 # src - src_stride
- li t1, 0 # offset
- mv t4, a3
+ li t1, 0 # offset
+ mv t4, a3
1:
add t2, a1, t1
@@ -310,7 +310,7 @@ func ff_hevc_put_epel_v_8_\lmul\()_rvv, zve32x
vse16.v v0, (t3)
add t3, t3, 2*HEVC_MAX_PB_SIZE
addi a3, a3, -1
- bgt a3, zero, 2b
+ bgt a3, zero, 2b
add t1, t1, t5
sub a6, a6, t5
mv a3, t4
@@ -325,7 +325,7 @@ func ff_hevc_put_epel_v_8_\lmul\()_rvv, zve32x
endfunc
func ff_hevc_put_epel_uni_v_8_\lmul\()_rvv, zve32x
- csrwi vxrm, 0
+ csrwi vxrm, 0
addi sp, sp, -32
sx s1, 0(sp)
sx s2, 8(sp)
@@ -333,8 +333,8 @@ func ff_hevc_put_epel_uni_v_8_\lmul\()_rvv, zve32x
sx s4, 24(sp)
load_filter a6
sub a2, a2, a3 # src - src_stride
- li t1, 0 # offset
- mv t4, a4
+ li t1, 0 # offset
+ mv t4, a4
1:
add t2, a2, t1
@@ -360,7 +360,7 @@ func ff_hevc_put_epel_uni_v_8_\lmul\()_rvv, zve32x
vse8.v v0, (t3)
add t3, t3, a1
addi a4, a4, -1
- bgt a4, zero, 2b
+ bgt a4, zero, 2b
add t1, t1, t5
sub a7, a7, t5
mv a4, t4
@@ -375,7 +375,7 @@ func ff_hevc_put_epel_uni_v_8_\lmul\()_rvv, zve32x
endfunc
func ff_hevc_put_epel_uni_w_v_8_\lmul\()_rvv, zve32x
- csrwi vxrm, 0
+ csrwi vxrm, 0
#if (__riscv_xlen == 32)
lw t1, 4(sp) # my
lw t6, 8(sp) # width
@@ -391,8 +391,8 @@ func ff_hevc_put_epel_uni_w_v_8_\lmul\()_rvv, zve32x
load_filter t1
addi a5, a5, 6 # shift
sub a2, a2, a3 # src - src_stride
- li t1, 0 # offset
- mv t4, a4
+ li t1, 0 # offset
+ mv t4, a4
1:
add t2, a2, t1
@@ -424,7 +424,7 @@ func ff_hevc_put_epel_uni_w_v_8_\lmul\()_rvv, zve32x
vse8.v v0, (t3)
add t3, t3, a1
addi a4, a4, -1
- bgt a4, zero, 2b
+ bgt a4, zero, 2b
add t1, t1, t5
sub t6, t6, t5
mv a4, t4
@@ -439,7 +439,7 @@ func ff_hevc_put_epel_uni_w_v_8_\lmul\()_rvv, zve32x
endfunc
func ff_hevc_put_epel_bi_v_8_\lmul\()_rvv, zve32x
- csrwi vxrm, 0
+ csrwi vxrm, 0
lw t6, 0(sp) # width
addi sp, sp, -32
sx s1, 0(sp)
@@ -448,8 +448,8 @@ func ff_hevc_put_epel_bi_v_8_\lmul\()_rvv, zve32x
sx s4, 24(sp)
load_filter a7
sub a2, a2, a3 # src - src_stride
- li t1, 0 # offset
- mv t4, a5
+ li t1, 0 # offset
+ mv t4, a5
1:
add t2, a2, t1
@@ -495,4 +495,299 @@ func ff_hevc_put_epel_bi_v_8_\lmul\()_rvv, zve32x
endfunc
.endm
-hevc_epel_v m1, m2, m4
\ No newline at end of file
+hevc_epel_v m1, m2, m4
+
+.macro filter_v_s vdst, vsrc0, vsrc1, vsrc2, vsrc3
+ vwmul.vx \vdst, \vsrc0, s5
+ vwmacc.vx \vdst, s6, \vsrc1
+ vmv.v.v \vsrc0, \vsrc1
+ vwmacc.vx \vdst, s7, \vsrc2
+ vmv.v.v \vsrc1, \vsrc2
+ vwmacc.vx \vdst, s8, \vsrc3
+ vmv.v.v \vsrc2, \vsrc3
+.endm
+
+/* clobbers t0, t1 */
+.macro load_filter2 m
+ la t0, qpel_filters
+ slli t1, \m, 2
+ add t0, t0, t1
+ lb s5, 0(t0)
+ lb s6, 1(t0)
+ lb s7, 2(t0)
+ lb s8, 3(t0)
+.endm
+
+.macro hevc_epel_hv lmul, lmul2, lmul4
+func ff_hevc_put_epel_hv_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 2
+ addi sp, sp, -64
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ sx s5, 32(sp)
+ sx s6, 40(sp)
+ sx s7, 48(sp)
+ sx s8, 56(sp)
+ load_filter a4
+ load_filter2 a5
+ sub a1, a1, a2 # src - src_stride
+ mv t0, a3
+ li t1, 0 # offset
+
+1:
+ add t2, a1, t1
+ slli t3, t1, 1
+ add t3, a0, t3
+
+ vsetvli t6, a6, e8, \lmul, ta, ma
+ filter_h v4, v24, v26, v28, v30, t2
+ add t2, t2, a2
+ filter_h v8, v24, v26, v28, v30, t2
+ add t2, t2, a2
+ filter_h v12, v24, v26, v28, v30, t2
+ add t2, t2, a2
+
+2:
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ filter_h v16, v24, v26, v28, v30, t2
+ add t2, t2, a2
+
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ filter_v_s v0, v4, v8, v12, v16
+ vnclip.wi v0, v0, 6
+ vse16.v v0, (t3)
+ addi a3, a3, -1
+ addi t3, t3, 2*HEVC_MAX_PB_SIZE
+ bgt a3, zero, 2b
+ mv a3, t0
+ add t1, t1, t6
+ sub a6, a6, t6
+ bgt a6, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ lx s5, 32(sp)
+ lx s6, 40(sp)
+ lx s7, 48(sp)
+ lx s8, 56(sp)
+ addi sp, sp, 64
+ ret
+endfunc
+
+func ff_hevc_put_epel_uni_hv_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+ addi sp, sp, -64
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ sx s5, 32(sp)
+ sx s6, 40(sp)
+ sx s7, 48(sp)
+ sx s8, 56(sp)
+ load_filter a5
+ load_filter2 a6
+ sub a2, a2, a3 # src - src_stride
+ mv t0, a4
+ li t1, 0 # offset
+
+1:
+ add t2, a2, t1
+ add t3, a0, t1
+
+ vsetvli t6, a7, e8, \lmul, ta, ma
+ filter_h v4, v24, v26, v28, v30, t2
+ add t2, t2, a3
+ filter_h v8, v24, v26, v28, v30, t2
+ add t2, t2, a3
+ filter_h v12, v24, v26, v28, v30, t2
+ add t2, t2, a3
+
+2:
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ filter_h v16, v24, v26, v28, v30, t2
+ add t2, t2, a3
+
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ filter_v_s v0, v4, v8, v12, v16
+ vsetvli zero, zero, e32, \lmul4, ta, ma
+ vsra.vi v0, v0, 6
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vnclipu.wi v0, v0, 6
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 0
+ vse8.v v0, (t3)
+ addi a4, a4, -1
+ add t3, t3, a1
+ bgt a4, zero, 2b
+ mv a4, t0
+ add t1, t1, t6
+ sub a7, a7, t6
+ bgt a7, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ lx s5, 32(sp)
+ lx s6, 40(sp)
+ lx s7, 48(sp)
+ lx s8, 56(sp)
+ addi sp, sp, 64
+ ret
+endfunc
+
+func ff_hevc_put_epel_uni_w_hv_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+ lx t2, 0(sp) # mx
+#if (__riscv_xlen == 32)
+ lw t4, 4(sp) # my
+ lw t5, 8(sp) # width
+#elif (__riscv_xlen == 64)
+ ld t4, 8(sp)
+ lw t5, 16(sp)
+#endif
+ addi a5, a5, 6 # shift
+ addi sp, sp, -64
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ sx s5, 32(sp)
+ sx s6, 40(sp)
+ sx s7, 48(sp)
+ sx s8, 56(sp)
+ load_filter t2
+ load_filter2 t4
+ sub a2, a2, a3 # src - src_stride
+ mv t0, a4
+ li t1, 0 # offset
+
+1:
+ add t2, a2, t1
+ add t3, a0, t1
+
+ vsetvli t6, t5, e8, \lmul, ta, ma
+ filter_h v4, v24, v26, v28, v30, t2
+ add t2, t2, a3
+ filter_h v8, v24, v26, v28, v30, t2
+ add t2, t2, a3
+ filter_h v12, v24, v26, v28, v30, t2
+ add t2, t2, a3
+
+2:
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ filter_h v16, v24, v26, v28, v30, t2
+ add t2, t2, a3
+
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ filter_v_s v0, v4, v8, v12, v16
+ vsetvli zero, zero, e32, \lmul4, ta, ma
+ vsra.vi v0, v0, 6
+ vmul.vx v0, v0, a6
+ vssra.vx v0, v0, a5
+ vsadd.vx v0, v0, a7
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vnclip.wi v0, v0, 0
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 0
+ vse8.v v0, (t3)
+ addi a4, a4, -1
+ add t3, t3, a1
+ bgt a4, zero, 2b
+ mv a4, t0
+ add t1, t1, t6
+ sub t5, t5, t6
+ bgt t5, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ lx s5, 32(sp)
+ lx s6, 40(sp)
+ lx s7, 48(sp)
+ lx s8, 56(sp)
+ addi sp, sp, 64
+ ret
+endfunc
+
+func ff_hevc_put_epel_bi_hv_8_\lmul\()_rvv, zve32x
+ csrwi vxrm, 0
+ lw t3, 0(sp) # width
+ addi sp, sp, -64
+ sx s1, 0(sp)
+ sx s2, 8(sp)
+ sx s3, 16(sp)
+ sx s4, 24(sp)
+ sx s5, 32(sp)
+ sx s6, 40(sp)
+ sx s7, 48(sp)
+ sx s8, 56(sp)
+ load_filter a6
+ load_filter2 a7
+ mv a6, t3
+ sub a2, a2, a3 # src - src_stride
+ mv t0, a5
+ li t1, 0 # offset
+
+1:
+ add t2, a2, t1
+ add t3, a0, t1
+ slli t5, t1, 1
+ add t5, a4, t5
+
+ vsetvli t6, a6, e8, \lmul, ta, ma
+ filter_h v4, v24, v26, v28, v30, t2
+ add t2, t2, a3
+ filter_h v8, v24, v26, v28, v30, t2
+ add t2, t2, a3
+ filter_h v12, v24, v26, v28, v30, t2
+ add t2, t2, a3
+
+2:
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ filter_h v16, v24, v26, v28, v30, t2
+ add t2, t2, a3
+
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vle16.V v24, (t5)
+ addi t5, t5, 2*HEVC_MAX_PB_SIZE
+ filter_v_s v0, v4, v8, v12, v16
+ vsetvli zero, zero, e32, \lmul4, ta, ma
+ vsra.vi v0, v0, 6
+ vsetvli zero, zero, e16, \lmul2, ta, ma
+ vwadd.wv v0, v0, v24
+ vnclip.wi v0, v0, 7
+ vmax.vx v0, v0, zero
+ vsetvli zero, zero, e8, \lmul, ta, ma
+ vnclipu.wi v0, v0, 0
+ vse8.v v0, (t3)
+ addi a5, a5, -1
+ add t3, t3, a1
+ bgt a5, zero, 2b
+ mv a5, t0
+ add t1, t1, t6
+ sub a6, a6, t6
+ bgt a6, zero, 1b
+
+ lx s1, 0(sp)
+ lx s2, 8(sp)
+ lx s3, 16(sp)
+ lx s4, 24(sp)
+ lx s5, 32(sp)
+ lx s6, 40(sp)
+ lx s7, 48(sp)
+ lx s8, 56(sp)
+ addi sp, sp, 64
+ ret
+endfunc
+.endm
+
+hevc_epel_hv m1, m2, m4
diff --git a/libavcodec/riscv/hevcdsp_init.c b/libavcodec/riscv/hevcdsp_init.c
index 53c800626f..1df7eb654a 100644
--- a/libavcodec/riscv/hevcdsp_init.c
+++ b/libavcodec/riscv/hevcdsp_init.c
@@ -102,6 +102,10 @@ void ff_hevc_dsp_init_riscv(HEVCDSPContext *c, const int bit_depth)
RVV_FNASSIGN_PEL(c->put_hevc_epel_uni, 1, 0, ff_hevc_put_epel_uni_v_8_m1_rvv);
RVV_FNASSIGN_PEL(c->put_hevc_epel_uni_w, 1, 0, ff_hevc_put_epel_uni_w_v_8_m1_rvv);
RVV_FNASSIGN_PEL(c->put_hevc_epel_bi, 1, 0, ff_hevc_put_epel_bi_v_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_epel, 1, 1, ff_hevc_put_epel_hv_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_epel_uni, 1, 1, ff_hevc_put_epel_uni_hv_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_epel_uni_w, 1, 1, ff_hevc_put_epel_uni_w_hv_8_m1_rvv);
+ RVV_FNASSIGN_PEL(c->put_hevc_epel_bi, 1, 1, ff_hevc_put_epel_bi_hv_8_m1_rvv);
break;
default:
break;
--
2.25.1
_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-01-24 15:58 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-22 4:23 [FFmpeg-devel] [PATCH 1/6] libavcodec/riscv: add RVV optimized for qpel_h in HEVC zhanheng.yang--- via ffmpeg-devel
2026-01-22 4:23 ` [FFmpeg-devel] [PATCH 2/6] libavcodec/riscv: add RVV optimized for qpel_v " zhanheng.yang--- via ffmpeg-devel
2026-01-22 4:23 ` [FFmpeg-devel] [PATCH 3/6] libavcodec/riscv: add RVV optimized for epel_h " zhanheng.yang--- via ffmpeg-devel
2026-01-22 4:23 ` [FFmpeg-devel] [PATCH 4/6] libavcodec/riscv: add RVV optimized for epel_v " zhanheng.yang--- via ffmpeg-devel
2026-01-22 4:23 ` [FFmpeg-devel] [PATCH 5/6] libavcodec/riscv: add RVV optimized for qpel_hv " zhanheng.yang--- via ffmpeg-devel
2026-01-22 4:23 ` [FFmpeg-devel] [PATCH 6/6] libavcodec/riscv: add RVV optimized for epel_hv " zhanheng.yang--- via ffmpeg-devel
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git