From: flow gg <hlefthleft@gmail.com> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Subject: Re: [FFmpeg-devel] [PATCH v4 3/4] lavc/vp9dsp: R-V V mc tap h v Date: Tue, 23 Jul 2024 16:56:40 +0800 Message-ID: <CAEa-L+vCycGn9=qmEgUrQLSxxmSsi5j6_78PqqaNcwYUoDNfrw@mail.gmail.com> (raw) In-Reply-To: <6069879.RfdxbOI5XL@basile.remlab.net> > TBH it is very hard to review this due to the large extents of code > conditionals. This should avoidable at least partly. You can name macros for > each filter and then expand those macros instead of using if's. Do you mean that before the addition of .equ ff_vp9_subpel_filters_xxx, epel_filter had too many if statements? Now the filter has only two if statements. Anyway, I have updated it and reduced one more if. > Besides in my experience, it is more readable to leave the loads/stores to the > outer function or macros and factor only the calculations, whenever you need > to apply the same maths vertically and/or horizontally. This also sometimes > enables actually using shared code, e.g., the H.263 loop filter or the VC-1 > ITX. There is an issue here because of insufficient vector registers, so vector registers need to be reused. If we use the H.263 method, it would require two more jumps. Additionally, scalar registers are also insufficient. so need more stack. I want to implement this as a macro for lengths of 4, 8, 16, 32, and 64 first. In a subsequent patch, I will break down 4, 8, and 16 into one macro, and 32 or 64 into another macro. This way, code can be better shared and some other adjustments like vlseg ... > Lastly this seems to both add new optimisations *and* add specialisations for > 256-bit vectors, which really should be separate patches, but maybe I just > don't understand the code. In any case, that would not really match with the > patch description. I think the purpose of this patch is to implement 128b+256b RVV, so adding the corresponding 128+256 functions in vp9dsp.h could also be part of this patch? Rémi Denis-Courmont <remi@remlab.net> 于2024年7月13日周六 17:02写道: > Le lauantaina 15. kesäkuuta 2024, 14.50.33 EEST uk7b@foxmail.com a écrit : > > From: sunyuechi <sunyuechi@iscas.ac.cn> > > OK, so I realise that this review is very late, but... > > TBH it is very hard to review this due to the large extents of code > conditionals. This should avoidable at least partly. You can name macros > for > each filter and then expand those macros instead of using if's. > > Besides in my experience, it is more readable to leave the loads/stores to > the > outer function or macros and factor only the calculations, whenever you > need > to apply the same maths vertically and/or horizontally. This also > sometimes > enables actually using shared code, e.g., the H.263 loop filter or the > VC-1 > ITX. > > Lastly this seems to both add new optimisations *and* add specialisations > for > 256-bit vectors, which really should be separate patches, but maybe I just > don't understand the code. In any case, that would not really match with > the > patch description. > > > > C908 X60 > > vp9_avg_8tap_smooth_4h_8bpp_c : 12.7 11.2 > > vp9_avg_8tap_smooth_4h_8bpp_rvv_i32 : 4.7 4.2 > > vp9_avg_8tap_smooth_4v_8bpp_c : 29.7 12.5 > > vp9_avg_8tap_smooth_4v_8bpp_rvv_i32 : 4.7 4.2 > > vp9_avg_8tap_smooth_8h_8bpp_c : 48.7 42.2 > > vp9_avg_8tap_smooth_8h_8bpp_rvv_i32 : 9.5 8.5 > > vp9_avg_8tap_smooth_8v_8bpp_c : 49.7 45.5 > > vp9_avg_8tap_smooth_8v_8bpp_rvv_i32 : 9.5 8.5 > > vp9_avg_8tap_smooth_16h_8bpp_c : 192.0 166.5 > > vp9_avg_8tap_smooth_16h_8bpp_rvv_i32 : 21.7 19.5 > > vp9_avg_8tap_smooth_16v_8bpp_c : 191.2 175.2 > > vp9_avg_8tap_smooth_16v_8bpp_rvv_i32 : 21.2 19.0 > > vp9_avg_8tap_smooth_32h_8bpp_c : 780.2 663.2 > > vp9_avg_8tap_smooth_32h_8bpp_rvv_i32 : 68.2 60.5 > > vp9_avg_8tap_smooth_32v_8bpp_c : 770.0 685.7 > > vp9_avg_8tap_smooth_32v_8bpp_rvv_i32 : 67.0 59.5 > > vp9_avg_8tap_smooth_64h_8bpp_c : 3116.2 2648.2 > > vp9_avg_8tap_smooth_64h_8bpp_rvv_i32 : 270.7 120.7 > > vp9_avg_8tap_smooth_64v_8bpp_c : 3058.5 2731.7 > > vp9_avg_8tap_smooth_64v_8bpp_rvv_i32 : 266.5 119.0 > > vp9_put_8tap_smooth_4h_8bpp_c : 11.0 9.7 > > vp9_put_8tap_smooth_4h_8bpp_rvv_i32 : 4.2 3.7 > > vp9_put_8tap_smooth_4v_8bpp_c : 11.7 10.5 > > vp9_put_8tap_smooth_4v_8bpp_rvv_i32 : 4.0 3.7 > > vp9_put_8tap_smooth_8h_8bpp_c : 42.0 37.5 > > vp9_put_8tap_smooth_8h_8bpp_rvv_i32 : 8.5 7.7 > > vp9_put_8tap_smooth_8v_8bpp_c : 43.5 38.5 > > vp9_put_8tap_smooth_8v_8bpp_rvv_i32 : 8.7 7.7 > > vp9_put_8tap_smooth_16h_8bpp_c : 181.7 147.2 > > vp9_put_8tap_smooth_16h_8bpp_rvv_i32 : 20.0 18.0 > > vp9_put_8tap_smooth_16v_8bpp_c : 168.5 149.7 > > vp9_put_8tap_smooth_16v_8bpp_rvv_i32 : 19.7 17.5 > > vp9_put_8tap_smooth_32h_8bpp_c : 675.0 586.5 > > vp9_put_8tap_smooth_32h_8bpp_rvv_i32 : 65.2 58.0 > > vp9_put_8tap_smooth_32v_8bpp_c : 664.7 591.2 > > vp9_put_8tap_smooth_32v_8bpp_rvv_i32 : 64.0 57.0 > > vp9_put_8tap_smooth_64h_8bpp_c : 2696.2 2339.0 > > vp9_put_8tap_smooth_64h_8bpp_rvv_i32 : 259.7 115.7 > > vp9_put_8tap_smooth_64v_8bpp_c : 2691.0 2348.5 > > vp9_put_8tap_smooth_64v_8bpp_rvv_i32 : 255.5 114.0 > > --- > > libavcodec/riscv/vp9_mc_rvv.S | 200 +++++++++++++++++++++++++++++++++ > > libavcodec/riscv/vp9dsp.h | 72 ++++++++---- > > libavcodec/riscv/vp9dsp_init.c | 38 ++++++- > > 3 files changed, 285 insertions(+), 25 deletions(-) > > > > diff --git a/libavcodec/riscv/vp9_mc_rvv.S > b/libavcodec/riscv/vp9_mc_rvv.S > > index 5241562531..5e81301aa5 100644 > > --- a/libavcodec/riscv/vp9_mc_rvv.S > > +++ b/libavcodec/riscv/vp9_mc_rvv.S > > @@ -36,6 +36,18 @@ > > .endif > > .endm > > > > +.macro vsetvlstatic16 len > > +.ifc \len,4 > > + vsetvli zero, zero, e16, mf2, ta, ma > > +.elseif \len == 8 > > + vsetvli zero, zero, e16, m1, ta, ma > > +.elseif \len == 16 > > + vsetvli zero, zero, e16, m2, ta, ma > > +.else > > + vsetvli zero, zero, e16, m4, ta, ma > > +.endif > > +.endm > > + > > .macro copy_avg len > > func ff_vp9_avg\len\()_rvv, zve32x > > csrwi vxrm, 0 > > @@ -181,8 +193,196 @@ func ff_\op\()_vp9_bilin_64hv_rvv, zve32x > > endfunc > > .endm > > > > +.equ ff_vp9_subpel_filters_smooth, ff_vp9_subpel_filters > > +.equ ff_vp9_subpel_filters_regular, ff_vp9_subpel_filters + 16*8*2 > > +.equ ff_vp9_subpel_filters_sharp, ff_vp9_subpel_filters + 16*8*2*2 > > + > > +.macro epel_filter name, type, regtype > > + lla \regtype\()2, ff_vp9_subpel_filters_\name > > + > > +.ifc \type,v > > + slli \regtype\()0, a6, 4 > > +.else > > + slli \regtype\()0, a5, 4 > > +.endif > > + add \regtype\()0, \regtype\()0, \regtype\()2 > > + > > + lh \regtype\()1, 2(\regtype\()0) > > + lh \regtype\()2, 4(\regtype\()0) > > + lh \regtype\()3, 6(\regtype\()0) > > + lh \regtype\()4, 8(\regtype\()0) > > + lh \regtype\()5, 10(\regtype\()0) > > + lh \regtype\()6, 12(\regtype\()0) > > + > > +.ifc \regtype,t > > + lh a7, 14(\regtype\()0) > > +.else > > + lh s7, 14(\regtype\()0) > > +.endif > > + lh \regtype\()0, 0(\regtype\()0) > > +.endm > > + > > +.macro epel_load dst, len, op, name, type, from_mem, regtype > > +.ifc \from_mem, 1 > > + vle8.v v22, (a2) > > +.ifc \type,v > > + add a5, a3, a2 > > + sub a2, a2, a3 > > + vle8.v v24, (a5) > > + vle8.v v20, (a2) > > + sh1add a2, a3, a5 > > + add a5, a5, a3 > > + vle8.v v26, (a5) > > + vle8.v v28, (a2) > > + add a2, a2, a3 > > + vle8.v v30, (a2) > > +.else > > + addi a5, a2, 1 > > + addi a2, a2, -1 > > + vle8.v v24, (a5) > > + vle8.v v20, (a2) > > + addi a5, a5, 2 > > + addi a2, a2, 3 > > + vle8.v v28, (a5) > > + vle8.v v26, (a2) > > + addi a2, a5, 1 > > + vle8.v v30, (a2) > > +.endif > > + > > +.ifc \name,smooth > > + vwmulu.vx v16, v24, \regtype\()4 > > + vwmaccu.vx v16, \regtype\()2, v20 > > + vwmaccu.vx v16, \regtype\()5, v26 > > + vwmaccsu.vx v16, \regtype\()6, v28 > > +.else > > + vwmulu.vx v16, v28, \regtype\()6 > > + vwmaccsu.vx v16, \regtype\()2, v20 > > + vwmaccsu.vx v16, \regtype\()5, v26 > > +.endif > > + > > +.ifc \regtype,t > > + vwmaccsu.vx v16, a7, v30 > > +.else > > + vwmaccsu.vx v16, s7, v30 > > +.endif > > + > > +.ifc \type,v > > + sh1add a5, a3, a3 > > + sub a2, a2, a5 > > + sub a2, a2, a5 > > + sub a5, a2, a3 > > + vle8.v v28, (a2) > > + vle8.v v26, (a5) > > + sh1add a2, a3, a2 > > +.else > > + addi a5, a2, -7 > > + addi a2, a2, -6 > > + vle8.v v26, (a5) > > + vle8.v v28, (a2) > > + addi a2, a2, 2 > > +.endif > > + > > +.ifc \name,smooth > > + vwmaccsu.vx v16, \regtype\()1, v28 > > +.else > > + vwmaccu.vx v16, \regtype\()1, v28 > > + vwmulu.vx v28, v24, \regtype\()4 > > +.endif > > + vwmaccsu.vx v16, \regtype\()0, v26 > > + vwmulu.vx v20, v22, \regtype\()3 > > +.else > > +.ifc \name,smooth > > + vwmulu.vx v16, v8, \regtype\()4 > > + vwmaccu.vx v16, \regtype\()2, v4 > > + vwmaccu.vx v16, \regtype\()5, v10 > > + vwmaccsu.vx v16, \regtype\()6, v12 > > + vwmaccsu.vx v16, \regtype\()1, v2 > > +.else > > + vwmulu.vx v16, v2, \regtype\()1 > > + vwmaccu.vx v16, \regtype\()6, v12 > > + vwmaccsu.vx v16, \regtype\()5, v10 > > + vwmaccsu.vx v16, \regtype\()2, v4 > > + vwmulu.vx v28, v8, \regtype\()4 > > +.endif > > + vwmaccsu.vx v16, \regtype\()0, v0 > > + vwmulu.vx v20, v6, \regtype\()3 > > + > > +.ifc \regtype,t > > + vwmaccsu.vx v16, a7, v14 > > +.else > > + vwmaccsu.vx v16, s7, v14 > > +.endif > > + > > +.endif > > + li a5, 64 > > + vwadd.wx v16, v16, a5 > > + vsetvlstatic16 \len > > + > > +.ifc \name,smooth > > + vwadd.vv v24, v16, v20 > > +.else > > + vwadd.vv v24, v16, v28 > > + vwadd.wv v24, v24, v20 > > +.endif > > + vnsra.wi v24, v24, 7 > > + vmax.vx v24, v24, zero > > + vsetvlstatic8 \len, zero, 32, m2 > > + > > + vnclipu.wi \dst, v24, 0 > > +.ifc \op,avg > > + vle8.v v24, (a0) > > + vaaddu.vv \dst, \dst, v24 > > +.endif > > + > > +.endm > > + > > +.macro epel_load_inc dst, len, op, name, type, from_mem, regtype > > + epel_load \dst, \len, \op, \name, \type, \from_mem, > \regtype > > + add a2, a2, a3 > > +.endm > > + > > +.macro epel len, op, name, type, vlen > > +func ff_\op\()_vp9_8tap_\name\()_\len\()\type\()_rvv\vlen\(), zve32x > > + epel_filter \name, \type, t > > +.if \vlen < 256 > > + vsetvlstatic8 \len, a5, 32, m2 > > +.else > > + vsetvlstatic8 \len, a5, 64, m2 > > +.endif > > +.ifc \op,avg > > + csrwi vxrm, 0 > > +.endif > > + > > +1: > > + addi a4, a4, -1 > > + epel_load v30, \len, \op, \name, \type, 1, t > > + vse8.v v30, (a0) > > +.if \len == 64 && \vlen < 256 > > + addi a0, a0, 32 > > + addi a2, a2, 32 > > + epel_load v30, \len, \op, \name, \type, 1, t > > + vse8.v v30, (a0) > > + addi a0, a0, -32 > > + addi a2, a2, -32 > > +.endif > > + add a2, a2, a3 > > + add a0, a0, a1 > > + bnez a4, 1b > > + > > + ret > > +endfunc > > +.endm > > + > > .irp len, 64, 32, 16, 8, 4 > > copy_avg \len > > + .irp op, put, avg > > + .irp name, regular, sharp, smooth > > + .irp type, h, v > > + epel \len, \op, \name, \type, 128 > > + epel \len, \op, \name, \type, 256 > > + .endr > > + .endr > > + .endr > > .endr > > > > bilin_h_v put, h, a5 > > diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h > > index 8fb326dae0..5fd64a1b8c 100644 > > --- a/libavcodec/riscv/vp9dsp.h > > +++ b/libavcodec/riscv/vp9dsp.h > > @@ -81,33 +81,39 @@ void ff_tm_8x8_rvv(uint8_t *dst, ptrdiff_t stride, > const > > uint8_t *l, void ff_tm_4x4_rvv(uint8_t *dst, ptrdiff_t stride, const > > uint8_t *l, const uint8_t *a); > > > > -#define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx) > > > \ -void ff_put_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t > > dststride, \ +#define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx, > > min_vlen) \ +void > > ff_put_vp9_8tap_##type##_##SIZE##h_rvv##min_vlen(uint8_t *dst, > \ + > > ptrdiff_t dststride, > > > \ const uint8_t *src, \ ptrdiff_t srcstride, > > > \ int h, int mx, int my); \ \ -void > > ff_put_8tap_##type##_##SIZE##v_rvv(uint8_t *dst, ptrdiff_t dststride, \ > > +void ff_put_vp9_8tap_##type##_##SIZE##v_rvv##min_vlen(uint8_t *dst, > > > \ + ptrdiff_t dststride, > > > \ const uint8_t *src, \ ptrdiff_t srcstride, > > > \ int h, int mx, int my); \ \ -void > > ff_put_8tap_##type##_##SIZE##hv_rvv(uint8_t *dst, ptrdiff_t dststride, \ > > +void ff_put_vp9_8tap_##type##_##SIZE##hv_rvv##min_vlen(uint8_t *dst, > > > \ + ptrdiff_t dststride, > > > \ const uint8_t *src, \ ptrdiff_t srcstride, > > > \ int h, int mx, int my); \ \ -void > > ff_avg_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ > > +void ff_avg_vp9_8tap_##type##_##SIZE##h_rvv##min_vlen(uint8_t *dst, > > > \ + ptrdiff_t dststride, > > > \ const uint8_t *src, \ ptrdiff_t srcstride, > > > \ int h, int mx, int my); \ \ -void > > ff_avg_8tap_##type##_##SIZE##v_rvv(uint8_t *dst, ptrdiff_t dststride, \ > > +void ff_avg_vp9_8tap_##type##_##SIZE##v_rvv##min_vlen(uint8_t *dst, > > > \ + ptrdiff_t dststride, > > > \ const uint8_t *src, \ ptrdiff_t srcstride, > > > \ int h, int mx, int my); \ \ -void > > ff_avg_8tap_##type##_##SIZE##hv_rvv(uint8_t *dst, ptrdiff_t dststride, \ > > +void ff_avg_vp9_8tap_##type##_##SIZE##hv_rvv##min_vlen(uint8_t *dst, > > > \ + ptrdiff_t dststride, > > > \ const uint8_t *src, \ ptrdiff_t srcstride, > > > \ int h, int mx, int my); > > @@ -146,23 +152,41 @@ void ff_vp9_avg##SIZE##_rvv(uint8_t *dst, ptrdiff_t > > dststride, \ const uint8_t *src, ptrdiff_t srcstride, \ int h, int > > mx, int my); > > > > -VP9_8TAP_RISCV_RVV_FUNC(64, regular, FILTER_8TAP_REGULAR); > > -VP9_8TAP_RISCV_RVV_FUNC(32, regular, FILTER_8TAP_REGULAR); > > -VP9_8TAP_RISCV_RVV_FUNC(16, regular, FILTER_8TAP_REGULAR); > > -VP9_8TAP_RISCV_RVV_FUNC(8, regular, FILTER_8TAP_REGULAR); > > -VP9_8TAP_RISCV_RVV_FUNC(4, regular, FILTER_8TAP_REGULAR); > > - > > -VP9_8TAP_RISCV_RVV_FUNC(64, sharp, FILTER_8TAP_SHARP); > > -VP9_8TAP_RISCV_RVV_FUNC(32, sharp, FILTER_8TAP_SHARP); > > -VP9_8TAP_RISCV_RVV_FUNC(16, sharp, FILTER_8TAP_SHARP); > > -VP9_8TAP_RISCV_RVV_FUNC(8, sharp, FILTER_8TAP_SHARP); > > -VP9_8TAP_RISCV_RVV_FUNC(4, sharp, FILTER_8TAP_SHARP); > > - > > -VP9_8TAP_RISCV_RVV_FUNC(64, smooth, FILTER_8TAP_SMOOTH); > > -VP9_8TAP_RISCV_RVV_FUNC(32, smooth, FILTER_8TAP_SMOOTH); > > -VP9_8TAP_RISCV_RVV_FUNC(16, smooth, FILTER_8TAP_SMOOTH); > > -VP9_8TAP_RISCV_RVV_FUNC(8, smooth, FILTER_8TAP_SMOOTH); > > -VP9_8TAP_RISCV_RVV_FUNC(4, smooth, FILTER_8TAP_SMOOTH); > > +VP9_8TAP_RISCV_RVV_FUNC(64, regular, FILTER_8TAP_REGULAR, 128); > > +VP9_8TAP_RISCV_RVV_FUNC(32, regular, FILTER_8TAP_REGULAR, 128); > > +VP9_8TAP_RISCV_RVV_FUNC(16, regular, FILTER_8TAP_REGULAR, 128); > > +VP9_8TAP_RISCV_RVV_FUNC(8, regular, FILTER_8TAP_REGULAR, 128); > > +VP9_8TAP_RISCV_RVV_FUNC(4, regular, FILTER_8TAP_REGULAR, 128); > > + > > +VP9_8TAP_RISCV_RVV_FUNC(64, sharp, FILTER_8TAP_SHARP, 128); > > +VP9_8TAP_RISCV_RVV_FUNC(32, sharp, FILTER_8TAP_SHARP, 128); > > +VP9_8TAP_RISCV_RVV_FUNC(16, sharp, FILTER_8TAP_SHARP, 128); > > +VP9_8TAP_RISCV_RVV_FUNC(8, sharp, FILTER_8TAP_SHARP, 128); > > +VP9_8TAP_RISCV_RVV_FUNC(4, sharp, FILTER_8TAP_SHARP, 128); > > + > > +VP9_8TAP_RISCV_RVV_FUNC(64, smooth, FILTER_8TAP_SMOOTH, 128); > > +VP9_8TAP_RISCV_RVV_FUNC(32, smooth, FILTER_8TAP_SMOOTH, 128); > > +VP9_8TAP_RISCV_RVV_FUNC(16, smooth, FILTER_8TAP_SMOOTH, 128); > > +VP9_8TAP_RISCV_RVV_FUNC(8, smooth, FILTER_8TAP_SMOOTH, 128); > > +VP9_8TAP_RISCV_RVV_FUNC(4, smooth, FILTER_8TAP_SMOOTH, 128); > > + > > +VP9_8TAP_RISCV_RVV_FUNC(64, regular, FILTER_8TAP_REGULAR, 256); > > +VP9_8TAP_RISCV_RVV_FUNC(32, regular, FILTER_8TAP_REGULAR, 256); > > +VP9_8TAP_RISCV_RVV_FUNC(16, regular, FILTER_8TAP_REGULAR, 256); > > +VP9_8TAP_RISCV_RVV_FUNC(8, regular, FILTER_8TAP_REGULAR, 256); > > +VP9_8TAP_RISCV_RVV_FUNC(4, regular, FILTER_8TAP_REGULAR, 256); > > + > > +VP9_8TAP_RISCV_RVV_FUNC(64, sharp, FILTER_8TAP_SHARP, 256); > > +VP9_8TAP_RISCV_RVV_FUNC(32, sharp, FILTER_8TAP_SHARP, 256); > > +VP9_8TAP_RISCV_RVV_FUNC(16, sharp, FILTER_8TAP_SHARP, 256); > > +VP9_8TAP_RISCV_RVV_FUNC(8, sharp, FILTER_8TAP_SHARP, 256); > > +VP9_8TAP_RISCV_RVV_FUNC(4, sharp, FILTER_8TAP_SHARP, 256); > > + > > +VP9_8TAP_RISCV_RVV_FUNC(64, smooth, FILTER_8TAP_SMOOTH, 256); > > +VP9_8TAP_RISCV_RVV_FUNC(32, smooth, FILTER_8TAP_SMOOTH, 256); > > +VP9_8TAP_RISCV_RVV_FUNC(16, smooth, FILTER_8TAP_SMOOTH, 256); > > +VP9_8TAP_RISCV_RVV_FUNC(8, smooth, FILTER_8TAP_SMOOTH, 256); > > +VP9_8TAP_RISCV_RVV_FUNC(4, smooth, FILTER_8TAP_SMOOTH, 256); > > > > VP9_BILINEAR_RISCV_RVV_FUNC(64); > > VP9_BILINEAR_RISCV_RVV_FUNC(32); > > diff --git a/libavcodec/riscv/vp9dsp_init.c > b/libavcodec/riscv/vp9dsp_init.c > > index b3700dfb08..3669070fca 100644 > > --- a/libavcodec/riscv/vp9dsp_init.c > > +++ b/libavcodec/riscv/vp9dsp_init.c > > @@ -49,7 +49,9 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext > > *dsp, int bpp) # endif > > > > #if HAVE_RVV > > - if (bpp == 8 && (flags & AV_CPU_FLAG_RVV_I32) && > ff_rv_vlen_least(128)) > > { + if (bpp == 8 && (flags & AV_CPU_FLAG_RVV_I32)) { > > + int vlenb = ff_get_rv_vlenb(); > > + if (vlenb >= 16) { > > > > #define init_fpel(idx1, sz) \ > > dsp->mc[idx1][FILTER_8TAP_SMOOTH ][1][0][0] = > ff_vp9_avg##sz##_rvv; \ > > @@ -95,6 +97,40 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext > > *dsp, int bpp) dsp->mc[4][FILTER_BILINEAR ][1][1][1] = > > ff_avg_vp9_bilin_4hv_rvv; > > > > #undef init_fpel > > + > > +#define init_subpel1(idx1, idx2, idxh, idxv, sz, dir, type, vlen) \ > > + dsp->mc[idx1][FILTER_8TAP_SMOOTH ][idx2][idxh][idxv] = \ > > + ff_##type##_vp9_8tap_smooth_##sz##dir##_rvv##vlen; \ > > + dsp->mc[idx1][FILTER_8TAP_REGULAR][idx2][idxh][idxv] = \ > > + ff_##type##_vp9_8tap_regular_##sz##dir##_rvv##vlen; \ > > + dsp->mc[idx1][FILTER_8TAP_SHARP ][idx2][idxh][idxv] = \ > > + ff_##type##_vp9_8tap_sharp_##sz##dir##_rvv##vlen; > > + > > +#define init_subpel2(idx, idxh, idxv, dir, type, vlen) \ > > + init_subpel1(0, idx, idxh, idxv, 64, dir, type, vlen); \ > > + init_subpel1(1, idx, idxh, idxv, 32, dir, type, vlen); \ > > + init_subpel1(2, idx, idxh, idxv, 16, dir, type, vlen); \ > > + init_subpel1(3, idx, idxh, idxv, 8, dir, type, vlen); \ > > + init_subpel1(4, idx, idxh, idxv, 4, dir, type, vlen) > > + > > + init_subpel2(0, 1, 0, h, put, 128); > > + init_subpel2(1, 1, 0, h, avg, 128); > > + > > + if (flags & AV_CPU_FLAG_RVB_ADDR) { > > + init_subpel2(0, 0, 1, v, put, 128); > > + init_subpel2(1, 0, 1, v, avg, 128); > > + } > > + > > + } > > + if (vlenb >= 32) { > > + init_subpel2(0, 1, 0, h, put, 256); > > + init_subpel2(1, 1, 0, h, avg, 256); > > + > > + if (flags & AV_CPU_FLAG_RVB_ADDR) { > > + init_subpel2(0, 0, 1, v, put, 256); > > + init_subpel2(1, 0, 1, v, avg, 256); > > + } > > + } > > } > > #endif > > #endif > > > -- > 雷米‧德尼-库尔蒙 > http://www.remlab.net/ > > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2024-07-23 8:57 UTC|newest] Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <20240615115034.3891490-1-uk7b@foxmail.com> 2024-06-15 11:50 ` [FFmpeg-devel] [PATCH v4 2/4] lavc/vp9dsp: R-V V mc bilin hv uk7b 2024-06-15 11:52 ` flow gg 2024-06-24 20:07 ` Rémi Denis-Courmont 2024-06-30 11:39 ` flow gg 2024-06-15 11:50 ` [FFmpeg-devel] [PATCH v4 3/4] lavc/vp9dsp: R-V V mc tap h v uk7b 2024-06-15 11:52 ` flow gg 2024-07-13 9:02 ` Rémi Denis-Courmont 2024-07-23 8:51 ` uk7b 2024-07-29 15:20 ` Rémi Denis-Courmont 2024-07-31 10:36 ` flow gg 2024-07-31 19:10 ` Rémi Denis-Courmont 2024-07-23 8:56 ` flow gg [this message] 2024-06-15 11:50 ` [FFmpeg-devel] [PATCH v4 4/4] lavc/vp9dsp: R-V V mc tap hv uk7b 2024-07-23 8:58 ` uk7b 2024-07-23 9:03 ` flow gg
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAEa-L+vCycGn9=qmEgUrQLSxxmSsi5j6_78PqqaNcwYUoDNfrw@mail.gmail.com' \ --to=hlefthleft@gmail.com \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git