Re: [FFmpeg-devel] [PATCH 3/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32

From: 沈佩婷 <shenpeiting@eswincomputing.com>
To: "FFmpeg development discussions and patches" <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH 3/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32
Date: Fri, 16 Jun 2023 18:15:13 +0800 (GMT+08:00)
Message-ID: <59152899.4ccf.188c3b3b2d7.Coremail.shenpeiting@eswincomputing.com> (raw)
In-Reply-To: <6971742.LrdTnZPg15@basile.remlab.net>

Hei,

> -----原始邮件-----发件人:"Rémi Denis-Courmont" <remi@remlab.net>发送时间:2023-06-16 03:25:07 (星期五)收件人:ffmpeg-devel@ffmpeg.org抄送:"Shen Peiting" <shenpeiting@eswincomputing.com>主题:Re: [FFmpeg-devel] [PATCH 3/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32
> 
> Le torstaina 15. kesäkuuta 2023, 13.36.42 EEST Peiting Shen a écrit :
> > From: Shen Peiting <shenpeiting@eswincomputing.com>
> > 
> > Scalar calculating int32 sum_square optimized by using RVV instructions
> > 
> > Benchmarks on Spike(cycles):
> > len=128
> > ac3_sum_square_butterfly_int32_c: 8497
> > ac3_sum_square_butterfly_int32_rvv: 258
> > len=1280
> > ac3_sum_square_butterfly_int32_c: 84529
> > ac3_sum_square_butterfly_int32_rvv: 2274
> > 
> > Co-Authored by: Yang Xiaojun <yangxiaojun@eswincomputing.com>
> > Co-Authored by: Huang Xing <huangxing1@eswincomputing.com>
> > Co-Authored by: Zeng Fanchen <zengfanchen@eswincomputing.com>
> > Signed-off-by: Shen Peiting <shenpeiting@eswincomputing.com>
> > ---
> >  libavcodec/riscv/ac3dsp_init.c |  8 +++++
> >  libavcodec/riscv/ac3dsp_rvv.S  | 53 ++++++++++++++++++++++++++++++++++
> >  2 files changed, 61 insertions(+)
> > 
> > diff --git a/libavcodec/riscv/ac3dsp_init.c b/libavcodec/riscv/ac3dsp_init.c
> > index a4e75a7541..4fd4abe83e 100644
> > --- a/libavcodec/riscv/ac3dsp_init.c
> > +++ b/libavcodec/riscv/ac3dsp_init.c
> > @@ -26,6 +26,10 @@
> > 
> >  void ff_ac3_exponent_min_rvv(uint8_t *exp, int num_reuse_blocks, int
> > nb_coefs); void ff_float_to_fixed24_rvv(int32_t *dst, const float *src,
> > unsigned int len); +void ff_ac3_sum_square_butterfly_int32_rvv(int64_t
> > sum[4],
> > +                                            const int32_t *coef0,
> > +                                            const int32_t *coef1,
> > +                                            int len);
> > 
> >  av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
> >  {
> > @@ -35,6 +39,10 @@ av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
> >          c->ac3_exponent_min = ff_ac3_exponent_min_rvv;
> >          c->float_to_fixed24 = ff_float_to_fixed24_rvv;
> >      }
> > +#if (__riscv_xlen >= 64)
> > +    if (flags & AV_CPU_FLAG_RVV_I64)
> > +        c->sum_square_butterfly_int32 =
> > ff_ac3_sum_square_butterfly_int32_rvv; +#endif
> >  #endif
> >  }
> > 
> > diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S
> > index d98e72c12c..4e0d238f85 100644
> > --- a/libavcodec/riscv/ac3dsp_rvv.S
> > +++ b/libavcodec/riscv/ac3dsp_rvv.S
> > @@ -63,3 +63,56 @@ func ff_float_to_fixed24_rvv, zve32x
> >      bgtz            a2, 1b
> >      ret
> >  endfunc
> > +
> > +
> > +func ff_ac3_sum_square_butterfly_int32_rvv, zve64x
> > +    vsetvli         t0, a3, e32, m2
> > +    vle32.v         v0, (a1)
> > +    vle32.v         v2, (a2)
> > +    vadd.vv         v4, v0, v2
> > +    vsub.vv         v6, v0, v2
> > +    vwmul.vv        v8, v0, v0
> > +    vwmul.vv        v12, v2, v2
> > +    vwmul.vv        v16, v4, v4
> > +    vwmul.vv        v20, v6, v6
> > +    sub             a3, a3, t0
> > +    slli            t0, t0, 2
> > +    add             a1, a1, t0
> > +    add             a2, a2, t0
> > +    beq             a3, x0, 2f
> > +1:
> > +    vsetvli         t0, a3, e32, m2
> > +    vle32.v         v0, (a1)
> > +    vle32.v         v2, (a2)
> > +    vadd.vv         v4, v0, v2
> > +    vsub.vv         v6, v0, v2
> > +    vwmacc.vv       v8, v0, v0
> > +    vwmacc.vv       v12, v2, v2
> > +    vwmacc.vv       v16, v4, v4
> > +    vwmacc.vv       v20, v6, v6
> > +    sub             a3, a3, t0
> > +    slli            t0, t0, 2
> > +    add             a1, a1, t0
> > +    add             a2, a2, t0
> > +    bnez            a3, 1b
> > +2:
> > +    vsetvli         t0, x0, e64, m4
> > +    vmv.s.x         v24, x0
> > +    vmv.s.x         v25, x0
> > +    vmv.s.x         v26, x0
> > +    vmv.s.x         v27, x0
> > +    vredsum.vs      v24, v8, v24
> > +    vredsum.vs      v25, v12, v25
> > +    vredsum.vs      v26, v16, v26
> > +    vredsum.vs      v27, v20, v27
> 
> As far as I can tell this is a reserved encoding (c.f. RVV 1.0 §3.4.2), and I 
> believe that QEMU throws an Illegal instruction in this case. (I would check 
> but there are no checkasm test case for this function.) Does this actual work 
> on your simulator? Because if so, then your simulator is probably broken/
> buggy.
> 
RVV 1.0 §14 
Vector reduction operations take a vector register group of elements and a scalar held in 
element 0 of a vector register, and perform a reduction using some binary operator, to produce
a scalar result in element 0 of a vector register. The scalar input and output operands 
are held in element 0 of a single vector register, not a vector register group, so any vector
register can be the scalar source or destination of a vector reduction regardless of LMUL setting.

RVV 1.0 §16.1. Integer Scalar Move Instructions
The integer scalar read/write instructions transfer a single value between a scalar x register and
element 0 of a vector register. The instructions ignore LMUL and vector register groups.

According to the above, I think this coding is legal.

Actually, we have passed all the fate tests on the qemu 6.0.0，compiled riscv-unknown-linux-gnu-gcc 13.0.1, configuration as
./configure --enable-cross-compile --cross-prefix=riscv64-unknown-linux-gnu- --arch=riscv 
--extra-cflags="-march=rv64imafdcbv -mabi=lp64d --static -I/home/user/code/iconv/iconv-riscv/include" 
--prefix=ffshare --extra-libs="-static -liconv" --extra-ldflags="-L/home/user/code/iconv/iconv-riscv/lib" 
--target-os=linux --target-exec="qemu-riscv64 -cpu rv64,x-v=true,x-b=true,x-zpn=true,x-zbpbo=true,x-zpsfoperand=true,x-arith=true" 
--enable-gpl --enable-memory-poisoning

We will modify the non-standard coding mentioned in emails, and complete the checkasm code in patch v2
> > +    vsetivli        t0, 1, e64, m1
> > +    vse64.v         v24, (a0)
> > +    addi            a0, a0, 8
> > +    vse64.v         v25, (a0)
> > +    addi            a0, a0, 8
> > +    vse64.v         v26, (a0)
> > +    addi            a0, a0, 8
> > +    vse64.v         v27, (a0)
> > +    addi            a0, a0, 8
> > +    ret
> > +endfunc
> 
> 
> -- 
> 雷米‧德尼-库尔蒙
> http://www.remlab.net/
> 
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".