From: flow gg <hlefthleft@gmail.com> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Subject: Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V mspel_pixels Date: Sun, 5 May 2024 17:18:56 +0800 Message-ID: <CAEa-L+tR-Ou=zqwvPnBocrEdXvSf=SXQnOP=Nm0HU5XtcghtqA@mail.gmail.com> (raw) In-Reply-To: <7116774.evdviihFTz@basile.remlab.net> > Is it not faster to compute the address ahead of time, e.g.: > Ditto below and in other patches. Yes, update here and I will check other patches > Copying 64-bit quantities should not need RVV at all. Maybe the C version needs to be improved instead, but if that is not possible, then an RVI version may be more portable and work just as well. The logic in the c version is the same in other places, which might be difficult to modify. I've updated it using rvi. > Does MF2 actually improve perfs over M1 here? The difference here seems very small, but when both mf2 and m1 are correct, the test results have only shown mf2 to be better, so I want to use mf2. Rémi Denis-Courmont <remi@remlab.net> 于2024年5月5日周日 01:53写道: > Le lauantaina 4. toukokuuta 2024, 13.01.05 EEST uk7b@foxmail.com a écrit : > > From: sunyuechi <sunyuechi@iscas.ac.cn> > > > > vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_c: 869.7 > > vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_rvv_i32: 148.7 > > vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_c: 220.5 > > vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_rvv_i64: 56.2 > > vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c: 523.7 > > vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvv_i32: 82.0 > > vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c: 138.5 > > vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvv_i64: 23.7 > > --- > > libavcodec/riscv/vc1dsp_init.c | 8 +++++ > > libavcodec/riscv/vc1dsp_rvv.S | 66 ++++++++++++++++++++++++++++++++++ > > 2 files changed, 74 insertions(+) > > > > diff --git a/libavcodec/riscv/vc1dsp_init.c > b/libavcodec/riscv/vc1dsp_init.c > > index e47b644f80..610c43a1a3 100644 > > --- a/libavcodec/riscv/vc1dsp_init.c > > +++ b/libavcodec/riscv/vc1dsp_init.c > > @@ -29,6 +29,10 @@ void ff_vc1_inv_trans_8x8_dc_rvv(uint8_t *dest, > ptrdiff_t > > stride, int16_t *block void ff_vc1_inv_trans_4x8_dc_rvv(uint8_t *dest, > > ptrdiff_t stride, int16_t *block); void > ff_vc1_inv_trans_8x4_dc_rvv(uint8_t > > *dest, ptrdiff_t stride, int16_t *block); void > > ff_vc1_inv_trans_4x4_dc_rvv(uint8_t *dest, ptrdiff_t stride, int16_t > > *block); +void ff_put_pixels16x16_rvv(uint8_t *dst, const uint8_t *src, > > ptrdiff_t line_size, int rnd); +void ff_put_pixels8x8_rvv(uint8_t *dst, > > const uint8_t *src, ptrdiff_t line_size, int rnd); +void > > ff_avg_pixels16x16_rvv(uint8_t *dst, const uint8_t *src, ptrdiff_t > > line_size, int rnd); +void ff_avg_pixels8x8_rvv(uint8_t *dst, const > uint8_t > > *src, ptrdiff_t line_size, int rnd); > > > > av_cold void ff_vc1dsp_init_riscv(VC1DSPContext *dsp) > > { > > @@ -38,9 +42,13 @@ av_cold void ff_vc1dsp_init_riscv(VC1DSPContext *dsp) > > if (flags & AV_CPU_FLAG_RVV_I32 && ff_get_rv_vlenb() >= 16) { > > dsp->vc1_inv_trans_4x8_dc = ff_vc1_inv_trans_4x8_dc_rvv; > > dsp->vc1_inv_trans_4x4_dc = ff_vc1_inv_trans_4x4_dc_rvv; > > + dsp->put_vc1_mspel_pixels_tab[0][0] = ff_put_pixels16x16_rvv; > > + dsp->avg_vc1_mspel_pixels_tab[0][0] = ff_avg_pixels16x16_rvv; > > if (flags & AV_CPU_FLAG_RVV_I64) { > > dsp->vc1_inv_trans_8x8_dc = ff_vc1_inv_trans_8x8_dc_rvv; > > dsp->vc1_inv_trans_8x4_dc = ff_vc1_inv_trans_8x4_dc_rvv; > > + dsp->put_vc1_mspel_pixels_tab[1][0] = ff_put_pixels8x8_rvv; > > + dsp->avg_vc1_mspel_pixels_tab[1][0] = ff_avg_pixels8x8_rvv; > > } > > } > > #endif > > diff --git a/libavcodec/riscv/vc1dsp_rvv.S > b/libavcodec/riscv/vc1dsp_rvv.S > > index 4a00945ead..48244f91aa 100644 > > --- a/libavcodec/riscv/vc1dsp_rvv.S > > +++ b/libavcodec/riscv/vc1dsp_rvv.S > > @@ -111,3 +111,69 @@ func ff_vc1_inv_trans_4x4_dc_rvv, zve32x > > vsse32.v v0, (a0), a1 > > ret > > endfunc > > + > > +func ff_put_pixels16x16_rvv, zve32x > > + vsetivli zero, 16, e8, m1, ta, ma > > + .irp n 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, > 30 > > + vle8.v v\n, (a1) > > + add a1, a1, a2 > > + .endr > > + vle8.v v31, (a1) > > Is it not faster to compute the address ahead of time, e.g.: > > add t1, a2, a1 > vle8.v vN, (a1) > sh1add a1, a2, a1 > vle8.v vN+1, (t1) > > ...and so on? Even on a reordering core, you can't eliminate stall on data > dependency if there is nothing else to be done. > > (Ditto below and in other patches.) > > > + .irp n 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, > 30 > > + vse8.v v\n, (a0) > > + add a0, a0, a2 > > + .endr > > + vse8.v v31, (a0) > > + > > + ret > > +endfunc > > + > > +func ff_put_pixels8x8_rvv, zve64x > > + vsetivli zero, 8, e8, mf2, ta, ma > > + vlse64.v v8, (a1), a2 > > + vsse64.v v8, (a0), a2 > > Copying 64-bit quantities should not need RVV at all. Maybe the C version > needs to be improved instead, but if that is not possible, then an RVI > version > may be more portable and work just as well. > > > + > > + ret > > +endfunc > > + > > +func ff_avg_pixels16x16_rvv, zve32x > > + csrwi vxrm, 0 > > + vsetivli zero, 16, e8, m1, ta, ma > > + li t0, 128 > > + > > + .irp n 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, > 30 > > + vle8.v v\n, (a1) > > + add a1, a1, a2 > > + .endr > > + vle8.v v31, (a1) > > + .irp n 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 > > + vle8.v v\n, (a0) > > + add a0, a0, a2 > > + .endr > > + vle8.v v15, (a0) > > + vsetvli zero, t0, e8, m8, ta, ma > > + vaaddu.vv v0, v0, v16 > > + vaaddu.vv v8, v8, v24 > > + vsetivli zero, 16, e8, m1, ta, ma > > + .irp n 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 > > + vse8.v v\n, (a0) > > + sub a0, a0, a2 > > + .endr > > + vse8.v v0, (a0) > > + > > + ret > > +endfunc > > + > > +func ff_avg_pixels8x8_rvv, zve64x > > + csrwi vxrm, 0 > > + li t0, 64 > > + vsetivli zero, 8, e8, mf2, ta, ma > > Does MF2 actually improve perfs over M1 here? > > > + vlse64.v v16, (a1), a2 > > + vlse64.v v8, (a0), a2 > > + vsetvli zero, t0, e8, m4, ta, ma > > + vaaddu.vv v16, v16, v8 > > + vsetivli zero, 8, e8, mf2, ta, ma > > + vsse64.v v16, (a0), a2 > > + > > + ret > > +endfunc > > > -- > レミ・デニ-クールモン > http://www.remlab.net/ > > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2024-05-05 9:19 UTC|newest] Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top 2024-05-04 10:01 uk7b 2024-05-04 10:08 ` flow gg 2024-05-04 17:53 ` Rémi Denis-Courmont 2024-05-05 9:15 ` uk7b 2024-05-05 9:18 ` flow gg [this message] 2024-05-05 19:26 ` Rémi Denis-Courmont 2024-05-10 8:21 ` uk7b 2024-05-12 11:48 ` Rémi Denis-Courmont 2024-05-12 12:43 ` uk7b 2024-05-12 12:43 ` flow gg 2024-05-12 12:57 ` uk7b 2024-05-10 8:22 ` flow gg 2024-05-10 15:34 ` Rémi Denis-Courmont 2024-05-11 10:02 ` flow gg 2024-05-11 10:24 ` Rémi Denis-Courmont 2024-05-11 10:47 ` flow gg
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAEa-L+tR-Ou=zqwvPnBocrEdXvSf=SXQnOP=Nm0HU5XtcghtqA@mail.gmail.com' \ --to=hlefthleft@gmail.com \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git