* [FFmpeg-devel] [PATCH] libavfilter/af_afir: R-V V dcmul_add
@ 2023-12-19 2:53 flow gg
2023-12-21 20:52 ` Rémi Denis-Courmont
0 siblings, 1 reply; 4+ messages in thread
From: flow gg @ 2023-12-19 2:53 UTC (permalink / raw)
To: FFmpeg development discussions and patches
[-- Attachment #1: Type: text/plain, Size: 175 bytes --]
c908:
dcmul_add_c: 88.0
dcmul_add_rvv_f64: 46.2
Did not use vlseg2e64, because it is much slower than vlse64
Did not use vsseg2e64, because it is slightly slower than vsse64
[-- Attachment #2: libavfilter-af_afir-R-V-V-dcmul_add.patch --]
[-- Type: text/x-patch, Size: 2767 bytes --]
From 80b6694bc29ed1c37852dc079a6d91a24dd6f18e Mon Sep 17 00:00:00 2001
From: sunyuechi <sunyuechi@iscas.ac.cn>
Date: Tue, 19 Dec 2023 09:11:28 +0800
Subject: [PATCH] libavfilter/af_afir: R-V V dcmul_add
c908:
dcmul_add_c: 88.0
dcmul_add_rvv_f64: 46.2
---
libavfilter/riscv/af_afir_init.c | 3 +++
libavfilter/riscv/af_afir_rvv.S | 41 ++++++++++++++++++++++++++++++++
2 files changed, 44 insertions(+)
diff --git a/libavfilter/riscv/af_afir_init.c b/libavfilter/riscv/af_afir_init.c
index 52aa18c126..f9a76f108b 100644
--- a/libavfilter/riscv/af_afir_init.c
+++ b/libavfilter/riscv/af_afir_init.c
@@ -27,6 +27,8 @@
void ff_fcmul_add_rvv(float *sum, const float *t, const float *c,
ptrdiff_t len);
+void ff_dcmul_add_rvv(double *sum, const double *t, const double *c,
+ ptrdiff_t len);
av_cold void ff_afir_init_riscv(AudioFIRDSPContext *s)
{
@@ -36,6 +38,7 @@ av_cold void ff_afir_init_riscv(AudioFIRDSPContext *s)
if (flags & AV_CPU_FLAG_RVV_F64) {
if (flags & AV_CPU_FLAG_RVB_ADDR) {
s->fcmul_add = ff_fcmul_add_rvv;
+ s->dcmul_add = ff_dcmul_add_rvv;
}
}
#endif
diff --git a/libavfilter/riscv/af_afir_rvv.S b/libavfilter/riscv/af_afir_rvv.S
index 04ec2e50d8..d1fa6e22e5 100644
--- a/libavfilter/riscv/af_afir_rvv.S
+++ b/libavfilter/riscv/af_afir_rvv.S
@@ -53,3 +53,44 @@ func ff_fcmul_add_rvv, zve64f
ret
endfunc
+
+func ff_dcmul_add_rvv, zve64f
+1:
+ vsetvli t0, a3, e64, m4, ta, ma
+ li t1, 16
+ li t2, 8
+ vlse64.v v0, (a1), t1
+ add a1, a1, t2
+ vlse64.v v4, (a2), t1
+ add a2, a2, t2
+ vlse64.v v12, (a0), t1
+ add a0, a0, t2
+ vfmacc.vv v12, v0, v4
+ sub a3, a3, t0
+ vlse64.v v8, (a2), t1
+ sub a2, a2, t2
+ sh3add a2, t0, a2
+ vlse64.v v16, (a0), t1
+ sub a0, a0, t2
+ vfmacc.vv v16, v0, v8
+ sh3add a2, t0, a2
+ vlse64.v v0, (a1), t1
+ sub a1, a1, t2
+ sh3add a1, t0, a1
+ vfnmsac.vv v12, v0, v8
+ sh3add a1, t0, a1
+ vfmacc.vv v16, v0, v4
+ vsse64.v v12, (a0), t1
+ add a0, a0, t2
+ vsse64.v v16, (a0), t1
+ sub a0, a0, t2
+ sh3add a0, t0, a0
+ sh3add a0, t0, a0
+ bgtz a3, 1b
+ fld fa0, 0(a1)
+ fld fa1, 0(a2)
+ fld fa2, 0(a0)
+ fmadd.d fa2, fa0, fa1, fa2
+ fsd fa2, 0(a0)
+ ret
+endfunc
--
2.43.0
[-- Attachment #3: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [FFmpeg-devel] [PATCH] libavfilter/af_afir: R-V V dcmul_add
2023-12-19 2:53 [FFmpeg-devel] [PATCH] libavfilter/af_afir: R-V V dcmul_add flow gg
@ 2023-12-21 20:52 ` Rémi Denis-Courmont
2023-12-22 1:41 ` flow gg
0 siblings, 1 reply; 4+ messages in thread
From: Rémi Denis-Courmont @ 2023-12-21 20:52 UTC (permalink / raw)
To: FFmpeg development discussions and patches
Le tiistaina 19. joulukuuta 2023, 4.53.12 EET flow gg a écrit :
> c908:
> dcmul_add_c: 88.0
> dcmul_add_rvv_f64: 46.2
>
> Did not use vlseg2e64, because it is much slower than vlse64
> Did not use vsseg2e64, because it is slightly slower than vsse64
Is this about C910 or C908? I have not checked this specific function, but the
general understanding for C908 has been the exact opposite so far, i.e.
segmented accesses are fast, while strided accesses are (unsurprisingly) slow.
See also https://camel-cdr.github.io/rvv-bench-results/canmv_k230/index.html
--
レミ・デニ-クールモン
http://www.remlab.net/
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [FFmpeg-devel] [PATCH] libavfilter/af_afir: R-V V dcmul_add
2023-12-21 20:52 ` Rémi Denis-Courmont
@ 2023-12-22 1:41 ` flow gg
2023-12-22 15:45 ` Rémi Denis-Courmont
0 siblings, 1 reply; 4+ messages in thread
From: flow gg @ 2023-12-22 1:41 UTC (permalink / raw)
To: FFmpeg development discussions and patches
It's at c908
According to the benchmark results, if vlseg2e64 is used, the speed is
almost as slow as C language (dcmul_add_rvv_f64: 86.2), if vsseg2e64 is
used, it will be only a bit slower (dcmul_add_rvv_f64: 50.2).
Rémi Denis-Courmont <remi@remlab.net> 于2023年12月22日周五 04:52写道:
> Le tiistaina 19. joulukuuta 2023, 4.53.12 EET flow gg a écrit :
> > c908:
> > dcmul_add_c: 88.0
> > dcmul_add_rvv_f64: 46.2
> >
> > Did not use vlseg2e64, because it is much slower than vlse64
> > Did not use vsseg2e64, because it is slightly slower than vsse64
>
> Is this about C910 or C908? I have not checked this specific function, but
> the
> general understanding for C908 has been the exact opposite so far, i.e.
> segmented accesses are fast, while strided accesses are (unsurprisingly)
> slow.
>
> See also
> https://camel-cdr.github.io/rvv-bench-results/canmv_k230/index.html
>
> --
> レミ・デニ-クールモン
> http://www.remlab.net/
>
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [FFmpeg-devel] [PATCH] libavfilter/af_afir: R-V V dcmul_add
2023-12-22 1:41 ` flow gg
@ 2023-12-22 15:45 ` Rémi Denis-Courmont
0 siblings, 0 replies; 4+ messages in thread
From: Rémi Denis-Courmont @ 2023-12-22 15:45 UTC (permalink / raw)
To: FFmpeg development discussions and patches
Le perjantaina 22. joulukuuta 2023, 3.41.29 EET flow gg a écrit :
> It's at c908
>
> According to the benchmark results, if vlseg2e64 is used, the speed is
> almost as slow as C language (dcmul_add_rvv_f64: 86.2), if vsseg2e64 is
> used, it will be only a bit slower (dcmul_add_rvv_f64: 50.2).
Fair enough but yikes. I doubt that this is going to turn out well on other
vendors' upcoming IPs.
--
雷米‧德尼-库尔蒙
http://www.remlab.net/
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-12-22 15:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-19 2:53 [FFmpeg-devel] [PATCH] libavfilter/af_afir: R-V V dcmul_add flow gg
2023-12-21 20:52 ` Rémi Denis-Courmont
2023-12-22 1:41 ` flow gg
2023-12-22 15:45 ` Rémi Denis-Courmont
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git