From: remi@remlab.net To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH 28/31] lavc/aacpsdsp: RISC-V V hybrid_analysis Date: Sun, 25 Sep 2022 17:26:16 +0300 Message-ID: <20220925142619.67917-28-remi@remlab.net> (raw) In-Reply-To: <5861881.lOV4Wx5bFT@basile.remlab.net> From: Rémi Denis-Courmont <remi@remlab.net> This starts with one-time initialisation of the 26 constant factors like 08edacc248bce3f8946d75e97188d189c74a6de6. That is done with the scalar instruction set. While the formula can readily be vectored, the gains would (probably) be more than lost in transfering the results back to FP registers (or suitably reshuffling them into vector registers). Note that the main loop could likely be scheduled sligthly better by expanding the filter macro and interleaving loads with arithmetic. It is not clear yet if that would be relevant for vector processing (as opposed to traditional SIMD). We could also use fewer vectors, but there is not much point in sparing them (they are *all* callee-clobbered). --- libavcodec/riscv/aacpsdsp_init.c | 3 + libavcodec/riscv/aacpsdsp_rvv.S | 97 ++++++++++++++++++++++++++++++++ 2 files changed, 100 insertions(+) diff --git a/libavcodec/riscv/aacpsdsp_init.c b/libavcodec/riscv/aacpsdsp_init.c index 90c9c501c3..6222d6f787 100644 --- a/libavcodec/riscv/aacpsdsp_init.c +++ b/libavcodec/riscv/aacpsdsp_init.c @@ -27,6 +27,8 @@ void ff_ps_add_squares_rvv(float *dst, const float (*src)[2], int n); void ff_ps_mul_pair_single_rvv(float (*dst)[2], float (*src0)[2], float *src1, int n); +void ff_ps_hybrid_analysis_rvv(float (*out)[2], float (*in)[2], + const float (*filter)[8][2], ptrdiff_t, int n); av_cold void ff_psdsp_init_riscv(PSDSPContext *c) { @@ -36,6 +38,7 @@ av_cold void ff_psdsp_init_riscv(PSDSPContext *c) if (flags & AV_CPU_FLAG_RV_ZVE32F) { c->add_squares = ff_ps_add_squares_rvv; c->mul_pair_single = ff_ps_mul_pair_single_rvv; + c->hybrid_analysis = ff_ps_hybrid_analysis_rvv; } #endif } diff --git a/libavcodec/riscv/aacpsdsp_rvv.S b/libavcodec/riscv/aacpsdsp_rvv.S index 70b7b72218..65e5e0be4f 100644 --- a/libavcodec/riscv/aacpsdsp_rvv.S +++ b/libavcodec/riscv/aacpsdsp_rvv.S @@ -52,3 +52,100 @@ func ff_ps_mul_pair_single_rvv, zve32f ret endfunc + +func ff_ps_hybrid_analysis_rvv, zve32f + /* We need 26 FP registers, for 20 scratch ones. Spill fs0-fs5. */ + addi sp, sp, -32 + .irp n, 0, 1, 2, 3, 4, 5 + fsw fs\n, (4 * \n)(sp) + .endr + + .macro input, j, fd0, fd1, fd2, fd3 + flw \fd0, (4 * ((\j * 2) + 0))(a1) + flw fs4, (4 * (((12 - \j) * 2) + 0))(a1) + flw \fd1, (4 * ((\j * 2) + 1))(a1) + fsub.s \fd3, \fd0, fs4 + flw fs5, (4 * (((12 - \j) * 2) + 1))(a1) + fadd.s \fd2, \fd1, fs5 + fadd.s \fd0, \fd0, fs4 + fsub.s \fd1, \fd1, fs5 + .endm + + // re0, re1, im0, im1 + input 0, ft0, ft1, ft2, ft3 + input 1, ft4, ft5, ft6, ft7 + input 2, ft8, ft9, ft10, ft11 + input 3, fa0, fa1, fa2, fa3 + input 4, fa4, fa5, fa6, fa7 + input 5, fs0, fs1, fs2, fs3 + flw fs4, (4 * ((6 * 2) + 0))(a1) + flw fs5, (4 * ((6 * 2) + 1))(a1) + + add a2, a2, 6 * 2 * 4 // point to filter[i][6][0] + li t4, 8 * 2 * 4 // filter byte stride + slli a3, a3, 3 // output byte stride +1: + .macro filter, vs0, vs1, fo0, fo1, fo2, fo3 + vfmacc.vf v8, \fo0, \vs0 + vfmacc.vf v9, \fo2, \vs0 + vfnmsac.vf v8, \fo1, \vs1 + vfmacc.vf v9, \fo3, \vs1 + .endm + + vsetvli t0, a4, e32, m1, ta, ma + /* + * The filter (a2) has 16 segments, of which 13 need to be extracted. + * R-V V supports only up to 8 segments, so unrolling is unavoidable. + */ + addi t1, a2, -48 + vlse32.v v22, (a2), t4 + addi t2, a2, -44 + vlse32.v v16, (t1), t4 + addi t1, a2, -40 + vfmul.vf v8, v22, fs4 + vlse32.v v24, (t2), t4 + addi t2, a2, -36 + vfmul.vf v9, v22, fs5 + vlse32.v v17, (t1), t4 + addi t1, a2, -32 + vlse32.v v25, (t2), t4 + addi t2, a2, -28 + filter v16, v24, ft0, ft1, ft2, ft3 + vlse32.v v18, (t1), t4 + addi t1, a2, -24 + vlse32.v v26, (t2), t4 + addi t2, a2, -20 + filter v17, v25, ft4, ft5, ft6, ft7 + vlse32.v v19, (t1), t4 + addi t1, a2, -16 + vlse32.v v27, (t2), t4 + addi t2, a2, -12 + filter v18, v26, ft8, ft9, ft10, ft11 + vlse32.v v20, (t1), t4 + addi t1, a2, -8 + vlse32.v v28, (t2), t4 + addi t2, a2, -4 + filter v19, v27, fa0, fa1, fa2, fa3 + vlse32.v v21, (t1), t4 + sub a4, a4, t0 + vlse32.v v29, (t2), t4 + slli t1, t0, 3 + 1 + 2 // ctz(8 * 2 * 4) + add a2, a2, t1 + filter v20, v28, fa4, fa5, fa6, fa7 + filter v21, v29, fs0, fs1, fs2, fs3 + + add t2, a0, 4 + vsse32.v v8, (a0), a3 + mul t0, t0, a3 + vsse32.v v9, (t2), a3 + add a0, a0, t0 + bnez a4, 1b + + .irp n, 5, 4, 3, 2, 1, 0 + flw fs\n, (4 * \n)(sp) + .endr + addi sp, sp, 32 + ret + .purgem input + .purgem filter +endfunc -- 2.37.2 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2022-09-25 14:30 UTC|newest] Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-09-25 14:25 [FFmpeg-devel] [PATCHv5 00/31] RISC-V CPU extensions Rémi Denis-Courmont 2022-09-25 14:25 ` [FFmpeg-devel] [PATCH 01/31] lavu/cpu: detect RISC-V base extensions remi 2022-09-25 14:25 ` [FFmpeg-devel] [PATCH 02/31] lavu/riscv: initial common header for assembler macros remi 2022-09-25 14:25 ` [FFmpeg-devel] [PATCH 03/31] lavc/audiodsp: RISC-V F vector_clipf remi 2022-09-25 14:25 ` [FFmpeg-devel] [PATCH 04/31] lavc/pixblockdsp: RISC-V I get_pixels remi 2022-09-25 14:25 ` [FFmpeg-devel] [PATCH 05/31] lavu/cpu: CPU flags for the RISC-V Vector extension remi 2022-09-26 6:51 ` Lynne 2022-09-26 8:02 ` Andreas Rheinhardt 2022-09-26 9:38 ` Rémi Denis-Courmont 2022-09-25 14:25 ` [FFmpeg-devel] [PATCH 06/31] configure: probe " remi 2022-09-25 14:25 ` [FFmpeg-devel] [PATCH 07/31] lavu/riscv: fallback macros for SH{1, 2, 3}ADD remi 2022-09-25 14:25 ` [FFmpeg-devel] [PATCH 08/31] lavu/floatdsp: RISC-V V vector_fmul_scalar remi 2022-09-25 14:25 ` [FFmpeg-devel] [PATCH 09/31] lavu/floatdsp: RISC-V V vector_dmul_scalar remi 2022-09-26 6:53 ` Lynne 2022-09-26 9:42 ` Rémi Denis-Courmont 2022-09-25 14:25 ` [FFmpeg-devel] [PATCH 10/31] lavu/floatdsp: RISC-V V vector_fmul remi 2022-09-25 14:25 ` [FFmpeg-devel] [PATCH 11/31] lavu/floatdsp: RISC-V V vector_dmul remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 12/31] lavu/floatdsp: RISC-V V vector_fmac_scalar remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 13/31] lavu/floatdsp: RISC-V V vector_dmac_scalar remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 14/31] lavu/floatdsp: RISC-V V vector_fmul_add remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 15/31] lavu/floatdsp: RISC-V V butterflies_float remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 16/31] lavu/floatdsp: RISC-V V vector_fmul_reverse remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 17/31] lavu/floatdsp: RISC-V V vector_fmul_window remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 18/31] lavu/floatdsp: RISC-V V scalarproduct_float remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 19/31] lavu/fixeddsp: RISC-V V butterflies_fixed remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 20/31] lavc/audiodsp: RISC-V V vector_clip_int32 remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 21/31] lavc/audiodsp: RISC-V V vector_clipf remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 22/31] lavc/audiodsp: RISC-V V scalarproduct_int16 remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 23/31] lavc/fmtconvert: RISC-V V int32_to_float_fmul_scalar remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 24/31] lavc/fmtconvert: RISC-V V int32_to_float_fmul_array8 remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 25/31] lavc/vorbisdsp: RISC-V V inverse_coupling remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 26/31] lavc/aacpsdsp: RISC-V V add_squares remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 27/31] lavc/aacpsdsp: RISC-V V mul_pair_single remi 2022-09-25 14:26 ` remi [this message] 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 29/31] lavc/aacpsdsp: RISC-V V hybrid_analysis_ileave remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 30/31] lavc/aacpsdsp: RISC-V V hybrid_synthesis_deint remi 2022-09-25 14:26 ` [FFmpeg-devel] [PATCH 31/31] lavc/aacpsdsp: RISC-V V stereo_interpolate[0] remi 2022-09-26 7:05 ` [FFmpeg-devel] [PATCHv5 00/31] RISC-V CPU extensions Lynne 2022-09-26 12:01 ` Rémi Denis-Courmont 2022-09-26 14:52 [FFmpeg-devel] [PATCHv6 00/31] initial " Rémi Denis-Courmont 2022-09-26 14:52 ` [FFmpeg-devel] [PATCH 28/31] lavc/aacpsdsp: RISC-V V hybrid_analysis remi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20220925142619.67917-28-remi@remlab.net \ --to=remi@remlab.net \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git