Okay, after using zext, can delete two vset, which is better than splat. I
have updated the patch in this reply.

Rémi Denis-Courmont <remi@remlab.net> 于2023年12月4日周一 23:15写道：

> Le maanantaina 4. joulukuuta 2023, 10.48.56 EET flow gg a écrit :
> > > Probably missing VLENB checks.
> >
> > Changed.
> >
> > > You can multiply by 3, 5 or 9 with shift-and-add. By 12 with
> shift-and-add
> > > then shift, and by 17 with shift then add. You don't need
> multiplications.
> >
> > Changed.
> >
> > > Do you really need to splat? Can't .vx or .wx be used instead?
> >
> > Okay, for example in ff_vc1_inv_trans_8x8_dc_rvv
> >
> > + vsetvli      zero, t0, e8, m2, ta, ma
> > + vwaddu.vx    v4, v0, zero
> > + vsetvli      zero, t0, e16, m4, ta, ma
> > + vadd.vx      v4, v4, t2
> > - vsetvli      zero, t0, e16, m4, ta, ma
> > - vmv.v.x      v4, t2
> > - vsetvli      zero, t0, e8, m2, ta, ma
> > - vwaddu.wv    v4, v4, v0
> >
> > But the speed has slowed down slightly on the c910,
> > I'm not sure if I should modify it.
>
> OK, unfortunately, there is no widening addition with wide scalar operand.
> But
> you can do zero-extension then addition here. In the end, I doubt that you
> can
> reasonably optimise whilst working with a C910-based board. This function
> deviates too much on non-conformant hardware.
>
> --
> レミ・デニ-クールモン
> http://www.remlab.net/
>
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>