From: "Rémi Denis-Courmont" <remi@remlab.net> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Cc: flow gg <hlefthleft@gmail.com> Subject: Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add Date: Mon, 13 Nov 2023 17:35:35 +0200 Message-ID: <3257813.aeNJFYEL58@basile.remlab.net> (raw) In-Reply-To: <CAEa-L+uLB2dnEV3UEHERvpgB2aUjmWOjps_K9d2U037nBqDy4g@mail.gmail.com> Hi, Le maanantaina 13. marraskuuta 2023, 11.43.01 EET flow gg a écrit : > Sorry for the long delay in responding. No problem. Working with T-Head C910 (or C920?) cores is very tedious. I gave up on that and switched over to Kendryte K230 (based on C908) now. > How is the modified patch now? It looks better, but some minute improvements are still possible. > no longer using register stride(learn from your code) and have switched to > shNadd instead. > > (using m4 and m2 as they are slightly faster than m8 and m4) > > benchmark: > fcmul_add_c: 2179 > fcmul_add_rvv_f32: 1652 > diff --git a/libavfilter/af_afirdsp.h b/libavfilter/af_afirdsp.h > index 4208501393..d2d1e909c1 100644 > --- a/libavfilter/af_afirdsp.h > +++ b/libavfilter/af_afirdsp.h > @@ -34,6 +34,7 @@ typedef struct AudioFIRDSPContext { > } AudioFIRDSPContext; > > void ff_afir_init_x86(AudioFIRDSPContext *s); > +void ff_afir_init_riscv(AudioFIRDSPContext *s); Nit: please stick to alphabetical order like most similar code. > > static void fcmul_add_c(float *sum, const float *t, const float *c, > ptrdiff_t len) > { > @@ -76,6 +77,8 @@ static av_unused void ff_afir_init(AudioFIRDSPContext > *dsp) > > #if ARCH_X86 > ff_afir_init_x86(dsp); > +#elif ARCH_RISCV > + ff_afir_init_riscv(dsp); Ditto. > #endif > } > > diff --git a/libavfilter/riscv/Makefile b/libavfilter/riscv/Makefile > new file mode 100644 > index 0000000000..0b968a9c0d > --- /dev/null > +++ b/libavfilter/riscv/Makefile > @@ -0,0 +1,2 @@ > +OBJS += riscv/af_afir_init.o > +RVV-OBJS += riscv/af_afir_rvv.o > diff --git a/libavfilter/riscv/af_afir_init.c > b/libavfilter/riscv/af_afir_init.c new file mode 100644 > index 0000000000..13df8341e7 > --- /dev/null > +++ b/libavfilter/riscv/af_afir_init.c > @@ -0,0 +1,39 @@ > +/* > + * Copyright (c) 2023 Institue of Software Chinese Academy of Sciences > (ISCAS). > + * > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 > USA > + */ > + > +#include <stdint.h> > + > +#include "config.h" > +#include "libavutil/attributes.h" > +#include "libavutil/cpu.h" > +#include "libavfilter/af_afirdsp.h" > + > +void ff_fcmul_add_rvv(float *sum, const float *t, const float *c, > + ptrdiff_t len); > + > +av_cold void ff_afir_init_riscv(AudioFIRDSPContext *s) > +{ > +#if HAVE_RVV > + int flags = av_get_cpu_flags(); > + > + if (flags & AV_CPU_FLAG_RVV_F32) You need to check for Zba as well here. I doubt that we'll see hardware with V and without Zba in real life, but for the sake of correctness... > + s->fcmul_add = ff_fcmul_add_rvv; > +#endif > +} > diff --git a/libavfilter/riscv/af_afir_rvv.S > b/libavfilter/riscv/af_afir_rvv.S new file mode 100644 > index 0000000000..078cac8e7e > --- /dev/null > +++ b/libavfilter/riscv/af_afir_rvv.S > @@ -0,0 +1,61 @@ > +/* > + * Copyright (c) 2023 Institue of Software Chinese Academy of Sciences > (ISCAS). > + * > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 > USA > + */ > + > +#include "libavutil/riscv/asm.S" > + > +// void ff_fcmul_add(float *sum, const float *t, const float *c, int len) > +func ff_fcmul_add_rvv, zve32f > + li t1, 32 > +1: > + vsetvli t0, a3, e64, m4, ta, ma You can set SEW=32 and corresponding LMUL here. Then you can remove all other VSETVLI instances below. (Note that this will NOT work on draft 0.7.1 hardware, but it does work on conformant hardware.) > + vle64.v v12, (a0) This requires 64-bit alignment. I don't know if this is correct for this specific filter, so I leave it to other people to comment here. > + sub a3, a3, t0 > + vsetvli zero, zero, e32, m2, ta, ma > + vnsrl.vx v8, v12, zero > + vnsrl.vx v10, v12, t1 > + vsetvli zero, zero, e64, m4, ta, ma > + vle64.v v12, (a1) > + sh3add a1, t0, a1 > + vsetvli zero, zero, e32, m2, ta, ma > + vnsrl.vx v0, v12, zero > + vnsrl.vx v2, v12, t1 > + vsetvli zero, zero, e64, m4, ta, ma > + vle64.v v12, (a2) > + sh3add a2, t0, a2 > + vsetvli zero, zero, e32, m2, ta, ma > + vnsrl.vx v4, v12, zero > + vnsrl.vx v6, v12, t1 > + vfmacc.vv v8, v0, v4 > + vfnmsac.vv v8, v2, v6 > + vfmacc.vv v10, v0, v6 Swap the two instructions above for better pipeline utilisation on in-order CPUs. > + vfmacc.vv v10, v2, v4 > + vsseg2e32.v v8, (a0) > + sh3add a0, t0, a0 > + bgtz a3, 1b > + > + flw fa0, 0(a1) > + flw fa1, 0(a2) > + flw fa2, 0(a0) > + fmul.s fa0, fa0, fa1 > + fadd.s fa2, fa2, fa0 It won't make much difference, but you can use a fused multiply-add here. > + fsw fa2, 0(a0) > + > + ret > +endfunc While you're at it, this looks like it could easily be adapted for the double precision version. In fact, it will be simpler, since you will have to use vlseg2e64 rather than vle128.v+vnsrl.vx+vnsrl.vx. But if you decide to implement that too, please keep it a separate patch. -- レミ・デニ-クールモン http://www.remlab.net/ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2023-11-13 15:35 UTC|newest] Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-09-26 9:24 flow gg 2023-09-26 18:34 ` Rémi Denis-Courmont 2023-09-26 18:40 ` Paul B Mahol 2023-09-26 18:44 ` Rémi Denis-Courmont 2023-09-27 1:47 ` flow gg 2023-09-27 16:01 ` Rémi Denis-Courmont 2023-09-27 16:27 ` Rémi Denis-Courmont 2023-09-26 18:50 ` Rémi Denis-Courmont 2023-09-27 16:41 ` Rémi Denis-Courmont 2023-09-28 5:45 ` flow gg 2023-09-28 13:33 ` Rémi Denis-Courmont 2023-11-13 9:43 ` flow gg 2023-11-13 15:35 ` Rémi Denis-Courmont [this message] 2023-11-13 16:01 ` Paul B Mahol 2023-11-15 8:57 ` flow gg 2023-11-15 8:59 ` flow gg 2023-11-15 15:05 ` Rémi Denis-Courmont 2023-11-15 23:04 ` flow gg
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=3257813.aeNJFYEL58@basile.remlab.net \ --to=remi@remlab.net \ --cc=ffmpeg-devel@ffmpeg.org \ --cc=hlefthleft@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git