Re: [FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V put_bilin_h

From: flow gg <hlefthleft@gmail.com>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V put_bilin_h
Date: Sat, 24 Feb 2024 16:31:36 +0800
Message-ID: <CAEa-L+vSsrs4NrS7SK0OeXBwbAgA+PaJgfZXdfqEFH9TsT6T+g@mail.gmail.com> (raw)
In-Reply-To: <B4E257C0-2429-423F-B06A-70935BB3D900@remlab.net>

Okay, Thanks for clarifying.

I have used many fractional multipliers, mostly not for correctness, but
often for performance improvements (though I don't know why),
and there are no obvious downsides, How about leaving this code?

Rémi Denis-Courmont <remi@remlab.net> 于2024年2月24日周六 15:39写道：

> Hi,
>
> Le 24 février 2024 03:07:36 GMT+02:00, flow gg <hlefthleft@gmail.com> a
> écrit :
> > .ifc \len,4
> >-        vsetivli        zero, 5, e8, mf2, ta, ma
> >+        vsetivli        zero, 5, e8, m1, ta, ma
> > .elseif \len == 8
> >         vsetivli        zero, 9, e8, m1, ta, ma
> > .else
> >@@ -112,9 +112,9 @@ endfunc
> >         vslide1down.vx  v2, \dst, t5
> >
> > .ifc \len,4
> >-        vsetivli        zero, 4, e8, mf4, ta, ma
> >+        vsetivli        zero, 4, e8, m1, ta, ma
> > .elseif \len == 8
> >-        vsetivli        zero, 8, e8, mf2, ta, ma
> >+        vsetivli        zero, 8, e8, m1, ta, ma
> >
> >What are the benefits of not using fractional multipliers here?
>
> Insofar as E8/MF4 is guaranteed to work for Zve32x, there are no benefits
> per se.
>
> However fractional multipliers were added to the specification to enable
> addressing invididual vectors whilst the effective multiplier is larger
> than one. This can only happen with mixed widths. Fractions were not
> intended to make vector shorter - there is the vector length for that
> already.
>
> That's why "E64/MF2" doesn't work, even though it's the same vector bit
> size as "E8/MF2".
>
> > Making this
> >change would result in a 10%-20% slowdown.
>
> That's kind of odd. This may be caused by the slides, but it's strange to
> go out of the way for hardware to optimise a case that's not even intended.
>
> >                                              mf2/4   m1
> >vp8_put_bilin4_h_rvv_i32:   158.7   193.7
> >vp8_put_bilin4_hv_rvv_i32:  255.7   302.7
> >vp8_put_bilin8_h_rvv_i32:   318.7   358.7
> >vp8_put_bilin8_hv_rvv_i32:  528.7   569.7
> >
> >Rémi Denis-Courmont <remi@remlab.net> 于2024年2月24日周六 01:18写道：
> >
> >> Hi,
> >>
> >> +
> >> +.macro bilin_h_load dst len
> >> +.ifc \len,4
> >> +        vsetivli        zero, 5, e8, mf2, ta, ma
> >>
> >> Don't use fractional multipliers if you don't mix element widths.
> >>
> >> +.elseif \len == 8
> >> +        vsetivli        zero, 9, e8, m1, ta, ma
> >> +.else
> >> +        vsetivli        zero, 17, e8, m2, ta, ma
> >> +.endif
> >> +
> >> +        vle8.v          \dst, (a2)
> >> +        vslide1down.vx  v2, \dst, t5
> >> +
> >>
> >> +.ifc \len,4
> >> +        vsetivli        zero, 4, e8, mf4, ta, ma
> >>
> >> Same as above.
> >>
> >> +.elseif \len == 8
> >> +        vsetivli        zero, 8, e8, mf2, ta, ma
> >>
> >> Also.
> >>
> >> +.else
> >> +        vsetivli        zero, 16, e8, m1, ta, ma
> >> +.endif
> >>
> >> +        vwmulu.vx       v28, \dst, t1
> >> +        vwmaccu.vx      v28, a5, v2
> >> +        vwaddu.wx       v24, v28, t4
> >> +        vnsra.wi        \dst, v24, 3
> >> +.endm
> >> +
> >> +.macro put_vp8_bilin_h len
> >> +        li              t1, 8
> >> +        li              t4, 4
> >> +        li              t5, 1
> >> +        sub             t1, t1, a5
> >> +1:
> >> +        addi            a4, a4, -1
> >> +        bilin_h_load    v0, \len
> >> +        vse8.v          v0, (a0)
> >> +        add             a2, a2, a3
> >> +        add             a0, a0, a1
> >> +        bnez            a4, 1b
> >> +
> >> +        ret
> >> +.endm
> >> +
> >> +func ff_put_vp8_bilin16_h_rvv, zve32x
> >> +        put_vp8_bilin_h 16
> >> +endfunc
> >> +
> >> +func ff_put_vp8_bilin8_h_rvv, zve32x
> >> +        put_vp8_bilin_h 8
> >> +endfunc
> >> +
> >> +func ff_put_vp8_bilin4_h_rvv, zve32x
> >> +        put_vp8_bilin_h 4
> >> +endfunc
> >>
> >> --
> >> レミ・デニ-クールモン
> >> http://www.remlab.net/
> >>
> >>
> >>
> >> _______________________________________________
> >> ffmpeg-devel mailing list
> >> ffmpeg-devel@ffmpeg.org
> >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >>
> >> To unsubscribe, visit link above, or email
> >> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> >>
> >_______________________________________________
> >ffmpeg-devel mailing list
> >ffmpeg-devel@ffmpeg.org
> >https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> >To unsubscribe, visit link above, or email
> >ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".