From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id E4F2A47D01 for ; Sun, 19 Nov 2023 11:39:58 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9F93F68CC35; Sun, 19 Nov 2023 13:39:50 +0200 (EET) Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 74BCD68CBB7 for ; Sun, 19 Nov 2023 13:39:43 +0200 (EET) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 20DD5C0196 for ; Sun, 19 Nov 2023 13:39:43 +0200 (EET) From: =?UTF-8?q?R=C3=A9mi=20Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Sun, 19 Nov 2023 13:39:42 +0200 Message-ID: <20231119113942.10269-2-remi@remlab.net> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] lavu/float_dsp: optimise R-V V fmul_reverse & fmul_window X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Roll the loop to avoid slow gathers. Before: vector_fmul_reverse_c: 1561.7 vector_fmul_reverse_rvv_f32: 2410.2 vector_fmul_window_c: 2068.2 vector_fmul_window_rvv_f32: 1879.5 After: vector_fmul_reverse_c: 1561.7 vector_fmul_reverse_rvv_f32: 916.2 vector_fmul_window_c: 2068.2 vector_fmul_window_rvv_f32: 1202.5 --- libavutil/riscv/float_dsp_rvv.S | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S index a2f9488249..ce5b6823d4 100644 --- a/libavutil/riscv/float_dsp_rvv.S +++ b/libavutil/riscv/float_dsp_rvv.S @@ -75,18 +75,19 @@ endfunc func ff_vector_fmul_window_rvv, zve32f // a0: dst, a1: src0, a2: src1, a3: window, a4: length - vsetvli t0, zero, e16, m2, ta, ma + // e16/m2 and e32/m4 are possible but slower due to gather. + vsetvli t0, zero, e16, m1, ta, ma sh2add a2, a4, a2 vid.v v0 sh3add t3, a4, a3 vadd.vi v0, v0, 1 sh3add t0, a4, a0 1: - vsetvli t2, a4, e16, m2, ta, ma + vsetvli t2, a4, e16, m1, ta, ma slli t4, t2, 2 vrsub.vx v2, v0, t2 sub t3, t3, t4 - vsetvli zero, zero, e32, m4, ta, ma + vsetvli zero, zero, e32, m2, ta, ma sub a2, a2, t4 vle32.v v8, (t3) sub t0, t0, t4 @@ -133,6 +134,7 @@ endfunc // TODO factor vrsub, separate last iteration? // (a0) = (a1) * reverse(a2) [0..a3-1] func ff_vector_fmul_reverse_rvv, zve32f + // e16/m4 and e32/m8 are possible but slower due to gather. vsetvli t0, zero, e16, m4, ta, ma sh2add a2, a3, a2 vid.v v0 -- 2.42.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".