* [FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm @ 2023-12-18 15:16 flow gg 2023-12-21 16:07 ` Rémi Denis-Courmont 0 siblings, 1 reply; 5+ messages in thread From: flow gg @ 2023-12-18 15:16 UTC (permalink / raw) To: FFmpeg development discussions and patches [-- Attachment #1: Type: text/plain, Size: 59 bytes --] C908: decorrelate_sm_c: 130.0 decorrelate_sm_rvv_i32: 43.7 [-- Attachment #2: 0006-lavc-takdsp-R-V-V-decorrelate_sm.patch --] [-- Type: text/x-patch, Size: 1901 bytes --] From 3dc613feaa6c38a7df47a3fc385e2140716e0ae2 Mon Sep 17 00:00:00 2001 From: sunyuechi <sunyuechi@iscas.ac.cn> Date: Mon, 18 Dec 2023 22:53:39 +0800 Subject: [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm C908: decorrelate_sm_c: 130.0 decorrelate_sm_rvv_i32: 43.7 --- libavcodec/riscv/takdsp_init.c | 2 ++ libavcodec/riscv/takdsp_rvv.S | 17 +++++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/libavcodec/riscv/takdsp_init.c b/libavcodec/riscv/takdsp_init.c index 0b4ec18086..85634d6db6 100644 --- a/libavcodec/riscv/takdsp_init.c +++ b/libavcodec/riscv/takdsp_init.c @@ -27,6 +27,7 @@ void ff_decorrelate_ls_rvv(int32_t *p1, int32_t *p2, int length); void ff_decorrelate_sr_rvv(int32_t *p1, int32_t *p2, int length); +void ff_decorrelate_sm_rvv(int32_t *p1, int32_t *p2, int length); av_cold void ff_takdsp_init_riscv(TAKDSPContext *dsp) { @@ -36,6 +37,7 @@ av_cold void ff_takdsp_init_riscv(TAKDSPContext *dsp) if ((flags & AV_CPU_FLAG_RVV_I32) && (flags & AV_CPU_FLAG_RVB_ADDR)) { dsp->decorrelate_ls = ff_decorrelate_ls_rvv; dsp->decorrelate_sr = ff_decorrelate_sr_rvv; + dsp->decorrelate_sm = ff_decorrelate_sm_rvv; } #endif } diff --git a/libavcodec/riscv/takdsp_rvv.S b/libavcodec/riscv/takdsp_rvv.S index 65c79e1aa9..816e765039 100644 --- a/libavcodec/riscv/takdsp_rvv.S +++ b/libavcodec/riscv/takdsp_rvv.S @@ -47,3 +47,20 @@ func ff_decorrelate_sr_rvv, zve32x bnez a2, 1b ret endfunc + +func ff_decorrelate_sm_rvv, zve32x +1: + vsetvli t0, a2, e32, m8, ta, ma + vle32.v v0, (a0) + sub a2, a2, t0 + vle32.v v8, (a1) + vsra.vi v16, v8, 1 + vsub.vv v0, v0, v16 + vse32.v v0, (a0) + sh2add a0, t0, a0 + vadd.vv v0, v0, v8 + vse32.v v0, (a1) + sh2add a1, t0, a1 + bnez a2, 1b + ret +endfunc -- 2.43.0 [-- Attachment #3: Type: text/plain, Size: 251 bytes --] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm 2023-12-18 15:16 [FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm flow gg @ 2023-12-21 16:07 ` Rémi Denis-Courmont 2023-12-21 16:11 ` Rémi Denis-Courmont 2023-12-22 1:34 ` flow gg 0 siblings, 2 replies; 5+ messages in thread From: Rémi Denis-Courmont @ 2023-12-21 16:07 UTC (permalink / raw) To: FFmpeg development discussions and patches Le maanantaina 18. joulukuuta 2023, 17.16.27 EET flow gg a écrit : > C908: > decorrelate_sm_c: 130.0 > decorrelate_sm_rvv_i32: 43.7 + +func ff_decorrelate_sm_rvv, zve32x +1: + vsetvli t0, a2, e32, m8, ta, ma + vle32.v v0, (a0) + sub a2, a2, t0 + vle32.v v8, (a1) + vsra.vi v16, v8, 1 You should load v8 first, since it is used as input before v0. + vsub.vv v0, v0, v16 + vse32.v v0, (a0) + sh2add a0, t0, a0 + vadd.vv v0, v0, v8 You can use VSSRA, and then VADD won't need to depend on the output of VSUB. + vse32.v v0, (a1) + sh2add a1, t0, a1 + bnez a2, 1b + ret +endfunc -- 雷米‧德尼-库尔蒙 http://www.remlab.net/ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm 2023-12-21 16:07 ` Rémi Denis-Courmont @ 2023-12-21 16:11 ` Rémi Denis-Courmont 2023-12-22 1:34 ` flow gg 1 sibling, 0 replies; 5+ messages in thread From: Rémi Denis-Courmont @ 2023-12-21 16:11 UTC (permalink / raw) To: FFmpeg development discussions and patches Le torstaina 21. joulukuuta 2023, 18.07.55 EET Rémi Denis-Courmont a écrit : > You can use VSSRA, and then VADD won't need to depend on the output of VSUB. P.S.: I have NOT checked which approach is actually faster. -- Rémi Denis-Courmont http://www.remlab.net/ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm 2023-12-21 16:07 ` Rémi Denis-Courmont 2023-12-21 16:11 ` Rémi Denis-Courmont @ 2023-12-22 1:34 ` flow gg 2023-12-22 15:34 ` Rémi Denis-Courmont 1 sibling, 1 reply; 5+ messages in thread From: flow gg @ 2023-12-22 1:34 UTC (permalink / raw) To: FFmpeg development discussions and patches func ff_decorrelate_sm_rvv, zve32x 1: vsetvli t0, a2, e32, m8, ta, ma vle32.v v8, (a1) sub a2, a2, t0 vle32.v v0, (a0) vssra.vi v8, v8, 1 vsub.vv v16, v0, v8 vse32.v v16, (a0) sh2add a0, t0, a0 vadd.vv v16, v0, v8 vse32.v v16, (a1) sh2add a1, t0, a1 bnez a2, 1b ret endfunc Is this way? In this situation, or when using vsra, there will be some tests that fail, and the result value differs by 1. I'm not sure where the problem.. Rémi Denis-Courmont <remi@remlab.net> 于2023年12月22日周五 00:08写道: > Le maanantaina 18. joulukuuta 2023, 17.16.27 EET flow gg a écrit : > > C908: > > decorrelate_sm_c: 130.0 > > decorrelate_sm_rvv_i32: 43.7 > > + > +func ff_decorrelate_sm_rvv, zve32x > +1: > + vsetvli t0, a2, e32, m8, ta, ma > + vle32.v v0, (a0) > + sub a2, a2, t0 > + vle32.v v8, (a1) > + vsra.vi v16, v8, 1 > > You should load v8 first, since it is used as input before v0. > > + vsub.vv v0, v0, v16 > + vse32.v v0, (a0) > + sh2add a0, t0, a0 > + vadd.vv v0, v0, v8 > > You can use VSSRA, and then VADD won't need to depend on the output of > VSUB. > > + vse32.v v0, (a1) > + sh2add a1, t0, a1 > + bnez a2, 1b > + ret > +endfunc > > -- > 雷米‧德尼-库尔蒙 > http://www.remlab.net/ > > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm 2023-12-22 1:34 ` flow gg @ 2023-12-22 15:34 ` Rémi Denis-Courmont 0 siblings, 0 replies; 5+ messages in thread From: Rémi Denis-Courmont @ 2023-12-22 15:34 UTC (permalink / raw) To: FFmpeg development discussions and patches Le perjantaina 22. joulukuuta 2023, 3.34.39 EET flow gg a écrit : > func ff_decorrelate_sm_rvv, zve32x > 1: > vsetvli t0, a2, e32, m8, ta, ma > vle32.v v8, (a1) > sub a2, a2, t0 > vle32.v v0, (a0) > vssra.vi v8, v8, 1 > vsub.vv v16, v0, v8 > vse32.v v16, (a0) > sh2add a0, t0, a0 > vadd.vv v16, v0, v8 > vse32.v v16, (a1) > sh2add a1, t0, a1 > bnez a2, 1b > ret > endfunc > > Is this way? In this situation, or when using vsra, there will be some > tests that fail, and the result value differs by 1. I'm not sure where the > problem.. No, I meant something like this, but it turns out slightly slower anyway. Saving the data dependency is not worth adding an instruction. func ff_decorrelate_sm_rvv, zve32x csrwi vxrm, 0 1: vsetvli t0, a2, e32, m8, ta, ma vle32.v v8, (a1) sub a2, a2, t0 vle32.v v0, (a0) vsra.vi v16, v8, 1 vssra.vi v8, v8, 1 vsub.vv v16, v0, v16 vadd.vv v8, v0, v8 vse32.v v16, (a0) sh2add a0, t0, a0 vse32.v v8, (a1) sh2add a1, t0, a1 bnez a2, 1b ret endfunc -- 雷米‧德尼-库尔蒙 http://www.remlab.net/ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-12-22 15:34 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-12-18 15:16 [FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm flow gg 2023-12-21 16:07 ` Rémi Denis-Courmont 2023-12-21 16:11 ` Rémi Denis-Courmont 2023-12-22 1:34 ` flow gg 2023-12-22 15:34 ` Rémi Denis-Courmont
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git