From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id C8CA34AF9E for ; Thu, 30 May 2024 15:27:20 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id ED60C68D355; Thu, 30 May 2024 18:27:17 +0300 (EEST) Received: from out203-205-221-242.mail.qq.com (out203-205-221-242.mail.qq.com [203.205.221.242]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D69AB68CF80 for ; Thu, 30 May 2024 18:27:09 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1717082817; bh=3vgDxPt/3BeuwWrKQmfz7dyk5qJeT8oT43HEt66bLEc=; h=From:To:Cc:Subject:Date; b=unZq9rDiPPtXA7uWuwCabwJ71Pv8GtnN3it8cSLDNta+Ph51ghdIVAgKaCKXEFZ4N Chg8QlgSuh0d/IdLCuXjVpCNyEQLiY4P04ikFn3+kjRcLoxeVOAq8lroCYVHu8teAd cuvjlWOtDzl5gWxOXdN0u9fcXLUrpb4IcEdSKriM= Received: from localhost.localdomain ([42.86.114.86]) by newxmesmtplogicsvrszc13-0.qq.com (NewEsmtp) with SMTP id 6B7ADE61; Thu, 30 May 2024 23:26:55 +0800 X-QQ-mid: xmsmtpt1717082815tjm24kwwc Message-ID: X-QQ-XMAILINFO: N89bjyf9tBCOXRvVHwNnxOvNlp6mfoZdhvJWOt49txkPufWUT0hG1J8QJpdaDb 9R3komy90xvVCnOQnOdHU9/8EbpABO4Et7eNiyuQTiqKLQVgIp02oFD7DQQWhrFN0ggKlGY8VpE4 ZVSMJ6mFlukrwtj+4nw/735ms642KRDCTxXmCVEna7T7YRMKnBhU4QdTKTkSVNPhbA9jQ88GaJuG 4kDdHz6SRWSBncLyqmwsKWIP6CdYRmOJ03TUxw8E/je9R88CSqmf8+Fv90wECsBYon371e2K3xZ+ MpDhV3qZoiZDVP/Vfg2nnSdPzyQmY1HvBQKx1BUxLv6EAmytseT8H882ghbg48OsunlQZu7kOVgS uVKVTd4KEd/cneeoIK5eVV9AJ5ynKVR5zGvtIGBwp+M2pqdBetlb2v0OCUoMcyXvpPxl/2fNKWQZ 6h4w16RmYarxIxEslLrYu4O1IojPBw6M2ZioD5diGOXAnDByufjXhGLzNX4uwV/IvyHGqu2xQgE9 zigyP6vYcu74zIW/EquCySn0Zfv7YihEaP4/1YxVRaAThxTgsmh9oZj74Desjhln2civsBFv9Lau P1UuQrrq+AG8cjiZzfwQN7W4isw3X4L+j7tIg2ehsjigTbd6sIKF880txFd/bu7qbwa2PD3G/8hv uS96sebTgpkJ2OTXit4kCFWdlz6HJhkyIIAzBjd8mE34NSZieCAJc0LFWK4+XXMczEpm5MZuDeXM GTKxr5KpRjd+rQY1DiVxJ6jCKZfITpieIPYT0XI0qZis3yZk6V8FR28xk9OeFyNfKAqX47DJdTOM t5KspN6E5t9eutX1K/84lPw4u6syuabrqrabz7aWml3v8ue6UzNMimClhwzxU2vKKqMNN5RaLxl8 M+GjB822Odp74btHWGVRFshmkSPCp9Ldphrz02s7jBzBWoMCsqLAs= X-QQ-XMRINFO: NI4Ajvh11aEj8Xl/2s1/T8w= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 23:26:53 +0800 X-OQ-MSGID: <20240530152653.2304943-1-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: From: sunyuechi Since len < 64, the registers are sufficient, so it can be directly unrolled (a4 is even). Another benefit of unrolling is that it reduces one load operation vertically compared to horizontally. old new C908 X60 C908 X60 vp8_put_bilin4_h_c : 6.2 5.5 : 6.2 5.5 vp8_put_bilin4_h_rvv_i32 : 2.2 2.0 : 1.5 1.5 vp8_put_bilin4_v_c : 6.5 5.7 : 6.2 5.7 vp8_put_bilin4_v_rvv_i32 : 2.2 2.0 : 1.2 1.5 vp8_put_bilin8_h_c : 24.2 21.5 : 24.2 21.5 vp8_put_bilin8_h_rvv_i32 : 5.2 4.7 : 3.5 3.5 vp8_put_bilin8_v_c : 24.5 21.7 : 24.5 21.7 vp8_put_bilin8_v_rvv_i32 : 5.2 4.7 : 3.5 3.2 vp8_put_bilin16_h_c : 48.0 42.7 : 48.0 42.7 vp8_put_bilin16_h_rvv_i32 : 5.7 5.0 : 5.2 4.5 vp8_put_bilin16_v_c : 48.2 43.0 : 48.2 42.7 vp8_put_bilin16_v_rvv_i32 : 5.7 5.2 : 4.5 4.2 --- libavcodec/riscv/vp8dsp_rvv.S | 34 +++++++++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 5 deletions(-) diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S index 3360a38cac..5bea6cba9c 100644 --- a/libavcodec/riscv/vp8dsp_rvv.S +++ b/libavcodec/riscv/vp8dsp_rvv.S @@ -172,11 +172,35 @@ func ff_put_vp8_bilin4_\type\()_rvv, zve32x li t4, 4 sub t1, t1, \mn 1: - addi a4, a4, -1 - bilin_load v0, \type, \mn - vse8.v v0, (a0) - add a2, a2, a3 - add a0, a0, a1 + add t0, a2, a3 + add t2, a0, a1 + addi a4, a4, -2 +.ifc \type,v + add t3, t0, a3 +.else + addi t5, a2, 1 + addi t3, t0, 1 + vle8.v v2, (t5) +.endif + vle8.v v0, (a2) + vle8.v v4, (t0) + vle8.v v6, (t3) + vwmulu.vx v28, v0, t1 + vwmulu.vx v26, v4, t1 +.ifc \type,v + vwmaccu.vx v28, \mn, v4 +.else + vwmaccu.vx v28, \mn, v2 +.endif + vwmaccu.vx v26, \mn, v6 + vwaddu.wx v24, v28, t4 + vwaddu.wx v22, v26, t4 + vnsra.wi v30, v24, 3 + vnsra.wi v0, v22, 3 + vse8.v v30, (a0) + vse8.v v0, (t2) + add a2, t0, a3 + add a0, t2, a1 bnez a4, 1b ret -- 2.45.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".