From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id ADE2D497E3 for ; Thu, 20 Feb 2025 08:08:37 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E4DB968C2BA; Thu, 20 Feb 2025 10:08:32 +0200 (EET) Received: from out162-62-57-49.mail.qq.com (out162-62-57-49.mail.qq.com [162.62.57.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2DC8C68C0E9 for ; Thu, 20 Feb 2025 10:08:24 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1740038895; bh=EdyRar5ux5XALOPLqtGxNeRaKztqxXEufzgR74ed3MY=; h=Subject:From:In-Reply-To:Date:Cc:References:To; b=bWouKsN5cbyPkxWccauDc7JJVkfdSy8DwiZ26g36Ji5+iXKi3nOQER320poo3v5tf sdD5hGDZTsLVBwSHUw5BZRrhBj+5gb+QgoOFi+oUh+4zp6ptvTK6tKBa2usE+lMxP2 pLz+9qHi8FzgyOyrYyz5KS3ev37TKzYOCnJ3HC/Y= Received: from smtpclient.apple ([119.147.10.242]) by newxmesmtplogicsvrszb21-0.qq.com (NewEsmtp) with SMTP id 20E82A35; Thu, 20 Feb 2025 16:08:14 +0800 X-QQ-mid: xmsmtpt1740038894tow93aav6 Message-ID: X-QQ-XMAILINFO: Nd/Exl7W9DK5OiQUJzpkmgO/Kk0mMmcMgJ/bYQoLshHKNW0kFT5cZOWrKRx1B1 Ugf5fK3gMt90vYllIOcEoWUN4KJrfC3Dwjguud33d9UP12GnwOwval85U3Ft8YhNEyTNgOyeW3R7 QFqYexp8tya5OERgxF0N39xDYn7dYu8Z8efvMOJwUuILzqzfm5vCTexuJ8Au0LYGoLJfuwssHeLS sanlDOszzKuSb/N2of/DTygcuGXHHD/Hp0jx9wVsWUtBRUYU6viEfYqscRQoWwOsLVfoDo8P7Bij mPN0aEebD7KqBi9xTv8EI8h8iaFoPuaIAulfN4tXRvDJuxeNcUY+Di/WX9JIwOCJPDzUgGT6TvPJ mtjU3hnjbqSFdQ90bLoVnNX0LU/D3I3PrMvW2bksB+kUj+9GVqZjLtvEwBOYnVGy6peynOgfKOt3 tXFPAGNmv3liQgI6cQwi7dUmRisY4lKVja0jG2qf0q5wrzQqGcWSxznYB74psD6TXp/6RzqoGCoK ICTooQ4gNSLWEzjXSW0mc9kIZecqiYa9cJXQNNGmHQK6IMCQqD4hrn4ziIz2lt6Khw7qZ9niA4IA +V3CTJjCA/xaS3Hrzc9w3nd2sgiaz/VbxirPdgBM3GeBULaLV+/IrUw7RdLqcpNLFmTbapOFbdow U07BB8pMpgiiXOMGFPLolKEV8ZN1AGCNmcg2Pl6gaxTM+rHSY5HvJsXqucIp05ZjWc9Wm9eblAFQ aCKjS+8AqmMNSO2jyd5aFbRPFY7RddvTtckICOBqjorTNQvqcmXSCZJKnTQpdiE9jF4P8ejec1Zn TMiTLQ0BS/yhceSnfKfBcID82+Pe2NfytVJzYE3iThrqBXkW6kGqJmbWo3vXZvez06UiKHCgwrmn TMVOhMrNhfS4PsBLlOUsj1xNf2687w5Du2XvzkIUEocDg4mK++wr5uWw9JQGbIi0bkjfeSf72fpk sFG/xoTvIvdlRz1Wq3E9brbKzhitjRsxAAyl9z6KTD1E8lrOjSlMi5Yxu9x+ss X-QQ-XMRINFO: OD9hHCdaPRBwq3WW+NvGbIU= Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3818.100.11.1.3\)) From: Zhao Zhili In-Reply-To: <20250219174010.3911-4-ffmpeg@szaka.eu> Date: Thu, 20 Feb 2025 16:08:04 +0800 X-OQ-MSGID: References: <20250219174010.3911-2-ffmpeg@szaka.eu> <20250219174010.3911-4-ffmpeg@szaka.eu> To: FFmpeg development discussions and patches X-Mailer: Apple Mail (2.3818.100.11.1.3) Subject: Re: [FFmpeg-devel] [PATCH 2/2] avcodec/aarch64/vvc: Use rounding shift NEON instruction X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Krzysztof Pyrkosz Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: > On Feb 20, 2025, at 01:40, Krzysztof Pyrkosz via ffmpeg-devel wrote: > > --- > > Before and after on A78 > > dmvr_8_12x20_neon: 86.2 ( 6.90x) > dmvr_8_20x12_neon: 94.8 ( 5.93x) > dmvr_8_20x20_neon: 141.5 ( 6.50x) > dmvr_12_12x20_neon: 158.0 ( 3.76x) > dmvr_12_20x12_neon: 151.2 ( 3.73x) > dmvr_12_20x20_neon: 247.2 ( 3.71x) > dmvr_hv_8_12x20_neon: 423.2 ( 3.75x) > dmvr_hv_8_20x12_neon: 434.0 ( 3.69x) > dmvr_hv_8_20x20_neon: 706.0 ( 3.69x) > > dmvr_8_12x20_neon: 77.2 ( 7.70x) > dmvr_8_20x12_neon: 66.5 ( 8.49x) > dmvr_8_20x20_neon: 92.2 ( 9.90x) > dmvr_12_12x20_neon: 80.2 ( 7.38x) > dmvr_12_20x12_neon: 58.2 ( 9.59x) > dmvr_12_20x20_neon: 90.0 (10.15x) > dmvr_hv_8_12x20_neon: 369.0 ( 4.34x) > dmvr_hv_8_20x12_neon: 355.8 ( 4.49x) > dmvr_hv_8_20x20_neon: 574.2 ( 4.51x) > > libavcodec/aarch64/vvc/inter.S | 72 ++++++++++------------------------ > 1 file changed, 20 insertions(+), 52 deletions(-) > > diff --git a/libavcodec/aarch64/vvc/inter.S b/libavcodec/aarch64/vvc/inter.S > index c9d698ee29..45add44b6e 100644 > --- a/libavcodec/aarch64/vvc/inter.S > +++ b/libavcodec/aarch64/vvc/inter.S > @@ -369,22 +369,18 @@ function ff_vvc_dmvr_8_neon, export=1 > 1: > cbz w15, 2f > ldr q0, [src], #16 > - uxtl v1.8h, v0.8b > - uxtl2 v2.8h, v0.16b > - ushl v1.8h, v1.8h, v16.8h > - ushl v2.8h, v2.8h, v16.8h Please remove assignment to v16. LGTM otherwise. > + ushll v1.8h, v0.8b, #2 > + ushll2 v2.8h, v0.16b, #2 > stp q1, q2, [dst], #32 > b 3f > 2: > ldr d0, [src], #8 > - uxtl v1.8h, v0.8b > - ushl v1.8h, v1.8h, v16.8h > + ushll v1.8h, v0.8b, #2 > str q1, [dst], #16 > 3: > subs height, height, #1 > ldr s3, [src], #4 > - uxtl v4.8h, v3.8b > - ushl v4.4h, v4.4h, v16.4h > + ushll v4.8h, v3.8b, #2 > st1 {v4.4h}, [dst], x7 > > add src, src, src_stride > @@ -399,42 +395,24 @@ function ff_vvc_dmvr_12_neon, export=1 > cmp width, #16 > sub src_stride, src_stride, x6, lsl #1 > cset w15, gt // width > 16 > - movi v16.8h, #2 // offset4 > sub x7, x7, x6, lsl #1 > 1: > cbz w15, 2f > ldp q0, q1, [src], #32 > - uaddl v2.4s, v0.4h, v16.4h > - uaddl2 v3.4s, v0.8h, v16.8h > - uaddl v4.4s, v1.4h, v16.4h > - uaddl2 v5.4s, v1.8h, v16.8h > - ushr v2.4s, v2.4s, #2 > - ushr v3.4s, v3.4s, #2 > - ushr v4.4s, v4.4s, #2 > - ushr v5.4s, v5.4s, #2 > - uqxtn v2.4h, v2.4s > - uqxtn2 v2.8h, v3.4s > - uqxtn v4.4h, v4.4s > - uqxtn2 v4.8h, v5.4s > - > - stp q2, q4, [dst], #32 > + urshr v0.8h, v0.8h, #2 > + urshr v1.8h, v1.8h, #2 > + > + stp q0, q1, [dst], #32 > b 3f > 2: > ldr q0, [src], #16 > - uaddl v2.4s, v0.4h, v16.4h > - uaddl2 v3.4s, v0.8h, v16.8h > - ushr v2.4s, v2.4s, #2 > - ushr v3.4s, v3.4s, #2 > - uqxtn v2.4h, v2.4s > - uqxtn2 v2.8h, v3.4s > - str q2, [dst], #16 > + urshr v0.8h, v0.8h, #2 > + str q0, [dst], #16 > 3: > subs height, height, #1 > ldr d0, [src], #8 > - uaddl v3.4s, v0.4h, v16.4h > - ushr v3.4s, v3.4s, #2 > - uqxtn v3.4h, v3.4s > - st1 {v3.4h}, [dst], x7 > + urshr v0.4h, v0.4h, #2 > + st1 {v0.4h}, [dst], x7 > > add src, src, src_stride > b.ne 1b > @@ -462,8 +440,6 @@ function ff_vvc_dmvr_hv_8_neon, export=1 > ldrb w10, [x12] > ldrb w11, [x12, #1] > sxtw x6, w6 > - movi v30.8h, #(1 << (8 - 7)) // offset1 > - movi v31.8h, #8 // offset2 > dup v2.8h, w10 // filter_y[0] > dup v3.8h, w11 // filter_y[1] > > @@ -491,10 +467,8 @@ function ff_vvc_dmvr_hv_8_neon, export=1 > mul v16.8h, v16.8h, v0.8h > mla v6.8h, v7.8h, v1.8h > mla v16.8h, v17.8h, v1.8h > - add v6.8h, v6.8h, v30.8h > - add v16.8h, v16.8h, v30.8h > - ushr v6.8h, v6.8h, #(8 - 6) > - ushr v7.8h, v16.8h, #(8 - 6) > + urshr v6.8h, v6.8h, #(8 - 6) > + urshr v7.8h, v16.8h, #(8 - 6) > stp q6, q7, [x13], #32 > > cbz w10, 3f > @@ -504,10 +478,8 @@ function ff_vvc_dmvr_hv_8_neon, export=1 > mul v17.8h, v17.8h, v2.8h > mla v16.8h, v6.8h, v3.8h > mla v17.8h, v7.8h, v3.8h > - add v16.8h, v16.8h, v31.8h > - add v17.8h, v17.8h, v31.8h > - ushr v16.8h, v16.8h, #4 > - ushr v17.8h, v17.8h, #4 > + urshr v16.8h, v16.8h, #4 > + urshr v17.8h, v17.8h, #4 > stp q16, q17, [x14], #32 > b 3f > 2: > @@ -518,8 +490,7 @@ function ff_vvc_dmvr_hv_8_neon, export=1 > uxtl v6.8h, v4.8b > mul v6.8h, v6.8h, v0.8h > mla v6.8h, v7.8h, v1.8h > - add v6.8h, v6.8h, v30.8h > - ushr v6.8h, v6.8h, #(8 - 6) > + urshr v6.8h, v6.8h, #(8 - 6) > str q6, [x13], #16 > > cbz w10, 3f > @@ -527,8 +498,7 @@ function ff_vvc_dmvr_hv_8_neon, export=1 > ldr q16, [x12], #16 > mul v16.8h, v16.8h, v2.8h > mla v16.8h, v6.8h, v3.8h > - add v16.8h, v16.8h, v31.8h > - ushr v16.8h, v16.8h, #4 > + urshr v16.8h, v16.8h, #4 > str q16, [x14], #16 > 3: > ldur s5, [src, #1] > @@ -537,8 +507,7 @@ function ff_vvc_dmvr_hv_8_neon, export=1 > uxtl v6.8h, v4.8b > mul v6.4h, v6.4h, v0.4h > mla v6.4h, v7.4h, v1.4h > - add v6.4h, v6.4h, v30.4h > - ushr v6.4h, v6.4h, #(8 - 6) > + urshr v6.4h, v6.4h, #(8 - 6) > str d6, [x13], #8 > > cbz w10, 4f > @@ -546,8 +515,7 @@ function ff_vvc_dmvr_hv_8_neon, export=1 > ldr d16, [x12], #8 > mul v16.4h, v16.4h, v2.4h > mla v16.4h, v6.4h, v3.4h > - add v16.4h, v16.4h, v31.4h > - ushr v16.4h, v16.4h, #4 > + urshr v16.4h, v16.4h, #4 > str d16, [x14], #8 > 4: > subs height, height, #1 > -- > 2.47.2 > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".