Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
From: Paul B Mahol <onemda@gmail.com>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: flow gg <hlefthleft@gmail.com>
Subject: Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add
Date: Mon, 13 Nov 2023 17:01:18 +0100
Message-ID: <CAPYw7P52oDNfUUx-oUKavhczTbhJsTAyuAmy1RFJ5BX4N3h-Hg@mail.gmail.com> (raw)
In-Reply-To: <3257813.aeNJFYEL58@basile.remlab.net>

On Mon, Nov 13, 2023 at 4:35 PM Rémi Denis-Courmont <remi@remlab.net> wrote:

>    Hi,
>
> Le maanantaina 13. marraskuuta 2023, 11.43.01 EET flow gg a écrit :
> > Sorry for the long delay in responding.
>
> No problem. Working with T-Head C910 (or C920?) cores is very tedious. I
> gave
> up on that and switched over to Kendryte K230 (based on C908) now.
>
> > How is the modified patch now?
>
> It looks better, but some minute improvements are still possible.
>
> > no longer using register stride(learn from your code) and have switched
> to
> > shNadd instead.
> >
> > (using m4 and m2 as they are slightly faster than m8 and m4)
> >
> > benchmark:
> > fcmul_add_c: 2179
> > fcmul_add_rvv_f32: 1652
>
> > diff --git a/libavfilter/af_afirdsp.h b/libavfilter/af_afirdsp.h
> > index 4208501393..d2d1e909c1 100644
> > --- a/libavfilter/af_afirdsp.h
> > +++ b/libavfilter/af_afirdsp.h
> > @@ -34,6 +34,7 @@ typedef struct AudioFIRDSPContext {
> >  } AudioFIRDSPContext;
> >
> >  void ff_afir_init_x86(AudioFIRDSPContext *s);
> > +void ff_afir_init_riscv(AudioFIRDSPContext *s);
>
> Nit: please stick to alphabetical order like most similar code.
>
> >
> >  static void fcmul_add_c(float *sum, const float *t, const float *c,
> > ptrdiff_t len)
> >  {
> > @@ -76,6 +77,8 @@ static av_unused void ff_afir_init(AudioFIRDSPContext
> > *dsp)
> >
> >  #if ARCH_X86
> >      ff_afir_init_x86(dsp);
> > +#elif ARCH_RISCV
> > +    ff_afir_init_riscv(dsp);
>
> Ditto.
>
> >  #endif
> >  }
> >
> > diff --git a/libavfilter/riscv/Makefile b/libavfilter/riscv/Makefile
> > new file mode 100644
> > index 0000000000..0b968a9c0d
> > --- /dev/null
> > +++ b/libavfilter/riscv/Makefile
> > @@ -0,0 +1,2 @@
> > +OBJS += riscv/af_afir_init.o
> > +RVV-OBJS += riscv/af_afir_rvv.o
> > diff --git a/libavfilter/riscv/af_afir_init.c
> > b/libavfilter/riscv/af_afir_init.c new file mode 100644
> > index 0000000000..13df8341e7
> > --- /dev/null
> > +++ b/libavfilter/riscv/af_afir_init.c
> > @@ -0,0 +1,39 @@
> > +/*
> > + * Copyright (c) 2023 Institue of Software Chinese Academy of Sciences
> > (ISCAS).
> > + *
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> 02110-1301
> > USA
> > + */
> > +
> > +#include <stdint.h>
> > +
> > +#include "config.h"
> > +#include "libavutil/attributes.h"
> > +#include "libavutil/cpu.h"
> > +#include "libavfilter/af_afirdsp.h"
> > +
> > +void ff_fcmul_add_rvv(float *sum, const float *t, const float *c,
> > +                       ptrdiff_t len);
> > +
> > +av_cold void ff_afir_init_riscv(AudioFIRDSPContext *s)
> > +{
> > +#if HAVE_RVV
> > +    int flags = av_get_cpu_flags();
> > +
> > +    if (flags & AV_CPU_FLAG_RVV_F32)
>
> You need to check for Zba as well here. I doubt that we'll see hardware
> with V
> and without Zba in real life, but for the sake of correctness...
>
> > +        s->fcmul_add = ff_fcmul_add_rvv;
> > +#endif
> > +}
> > diff --git a/libavfilter/riscv/af_afir_rvv.S
> > b/libavfilter/riscv/af_afir_rvv.S new file mode 100644
> > index 0000000000..078cac8e7e
> > --- /dev/null
> > +++ b/libavfilter/riscv/af_afir_rvv.S
> > @@ -0,0 +1,61 @@
> > +/*
> > + * Copyright (c) 2023 Institue of Software Chinese Academy of Sciences
> > (ISCAS).
> > + *
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> 02110-1301
> > USA
> > + */
> > +
> > +#include "libavutil/riscv/asm.S"
> > +
> > +//  void ff_fcmul_add(float *sum, const float *t, const float *c, int
> len)
> > +func ff_fcmul_add_rvv, zve32f
> > +        li          t1, 32
> > +1:
> > +        vsetvli     t0, a3, e64, m4, ta, ma
>
> You can set SEW=32 and corresponding LMUL here. Then you can remove all
> other
> VSETVLI instances below. (Note that this will NOT work on draft 0.7.1
> hardware, but it does work on conformant hardware.)
>
> > +        vle64.v     v12, (a0)
>
> This requires 64-bit alignment. I don't know if this is correct for this
> specific filter, so I leave it to other people to comment here.
>

Array should be aligned as allocated by libavutil calls.
The buffers sizes are aligned using av_cpu_align() so if that returns
correct size it should work.


>
> > +        sub         a3, a3, t0
> > +        vsetvli     zero, zero, e32, m2, ta, ma
> > +        vnsrl.vx    v8, v12, zero
> > +        vnsrl.vx    v10, v12, t1
> > +        vsetvli     zero, zero, e64, m4, ta, ma
> > +        vle64.v     v12, (a1)
> > +        sh3add      a1, t0, a1
> > +        vsetvli     zero, zero, e32, m2, ta, ma
> > +        vnsrl.vx    v0, v12, zero
> > +        vnsrl.vx    v2, v12, t1
> > +        vsetvli     zero, zero, e64, m4, ta, ma
> > +        vle64.v     v12, (a2)
> > +        sh3add      a2, t0, a2
> > +        vsetvli     zero, zero, e32, m2, ta, ma
> > +        vnsrl.vx    v4, v12, zero
> > +        vnsrl.vx    v6, v12, t1
> > +        vfmacc.vv   v8, v0, v4
> > +        vfnmsac.vv  v8, v2, v6
> > +        vfmacc.vv   v10, v0, v6
>
> Swap the two instructions above for better pipeline utilisation on
> in-order
> CPUs.
>
> > +        vfmacc.vv   v10, v2, v4
> > +        vsseg2e32.v v8, (a0)
> > +        sh3add      a0, t0, a0
> > +        bgtz        a3, 1b
> > +
> > +        flw         fa0, 0(a1)
> > +        flw         fa1, 0(a2)
> > +        flw         fa2, 0(a0)
> > +        fmul.s      fa0, fa0, fa1
> > +        fadd.s      fa2, fa2, fa0
>
> It won't make much difference, but you can use a fused multiply-add here.
>
> > +        fsw         fa2, 0(a0)
> > +
> > +        ret
> > +endfunc
>
> While you're at it, this looks like it could easily be adapted for the
> double
> precision version. In fact, it will be simpler, since you will have to use
> vlseg2e64 rather than vle128.v+vnsrl.vx+vnsrl.vx. But if you decide to
> implement that too, please keep it a separate patch.
>
> --
> レミ・デニ-クールモン
> http://www.remlab.net/
>
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

  reply	other threads:[~2023-11-13 15:53 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-26  9:24 flow gg
2023-09-26 18:34 ` Rémi Denis-Courmont
2023-09-26 18:40   ` Paul B Mahol
2023-09-26 18:44     ` Rémi Denis-Courmont
2023-09-27  1:47       ` flow gg
2023-09-27 16:01         ` Rémi Denis-Courmont
2023-09-27 16:27         ` Rémi Denis-Courmont
2023-09-26 18:50 ` Rémi Denis-Courmont
2023-09-27 16:41 ` Rémi Denis-Courmont
2023-09-28  5:45   ` flow gg
2023-09-28 13:33     ` Rémi Denis-Courmont
2023-11-13  9:43       ` flow gg
2023-11-13 15:35         ` Rémi Denis-Courmont
2023-11-13 16:01           ` Paul B Mahol [this message]
2023-11-15  8:57           ` flow gg
2023-11-15  8:59           ` flow gg
2023-11-15 15:05             ` Rémi Denis-Courmont
2023-11-15 23:04               ` flow gg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPYw7P52oDNfUUx-oUKavhczTbhJsTAyuAmy1RFJ5BX4N3h-Hg@mail.gmail.com \
    --to=onemda@gmail.com \
    --cc=ffmpeg-devel@ffmpeg.org \
    --cc=hlefthleft@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git