Okay, after using zext, can delete two vset, which is better than splat. I have updated the patch in this reply. Rémi Denis-Courmont 于2023年12月4日周一 23:15写道: > Le maanantaina 4. joulukuuta 2023, 10.48.56 EET flow gg a écrit : > > > Probably missing VLENB checks. > > > > Changed. > > > > > You can multiply by 3, 5 or 9 with shift-and-add. By 12 with > shift-and-add > > > then shift, and by 17 with shift then add. You don't need > multiplications. > > > > Changed. > > > > > Do you really need to splat? Can't .vx or .wx be used instead? > > > > Okay, for example in ff_vc1_inv_trans_8x8_dc_rvv > > > > + vsetvli zero, t0, e8, m2, ta, ma > > + vwaddu.vx v4, v0, zero > > + vsetvli zero, t0, e16, m4, ta, ma > > + vadd.vx v4, v4, t2 > > - vsetvli zero, t0, e16, m4, ta, ma > > - vmv.v.x v4, t2 > > - vsetvli zero, t0, e8, m2, ta, ma > > - vwaddu.wv v4, v4, v0 > > > > But the speed has slowed down slightly on the c910, > > I'm not sure if I should modify it. > > OK, unfortunately, there is no widening addition with wide scalar operand. > But > you can do zero-extension then addition here. In the end, I doubt that you > can > reasonably optimise whilst working with a C910-based board. This function > deviates too much on non-conformant hardware. > > -- > レミ・デニ-クールモン > http://www.remlab.net/ > > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". >