From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id B9C334723A for ; Sun, 14 Sep 2025 18:20:26 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'MVkVQouLbVW0Ng+3R6SVHVYisgkaBWBKBc8QTpSYwAU=', expected b'sscOSVqnStFAEjani30tBX5iS6AYTNaJwAyL99IS3r4=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1757874020; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=MVkVQouLbVW0Ng+3R6SVHVYisgkaBWBKBc8QTpSYwAU=; b=nDHSRPIZuT1IgGBXRV60pcmdxDoOhvG+athLNfzkQD2N76FLv5b+ttIGw3oIcp/IF9siU kvtrK9Xfg7hhU1pSAUtwUhhJwnZd3O5KHIXBifhvMaYbEKUhhI0Sx+s3/kIYrj7VVIzudfE 4rB+fSvVVkKVB3YxyZD5cvMhytB6fHo2/9fxc1S85uRcqyVyRn0YsPiX0kgf0tajGnVAlHB phHrQUQlynic9IxQUCgzWLY39YoW+pfW3Rczl7T3BCo638shqmwci75hiJq7UxihK1s/fbR IJ1tZIod/u5+7olucZu3QgSYbZQas6PQG9g06PAfqhWVbh6mrUtIDYaMGVZg== Received: from [172.19.0.4] (unknown [172.19.0.4]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 340E568E792; Sun, 14 Sep 2025 21:20:20 +0300 (EEST) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1757874019; b=PwQWq/LhREWOWkxfzwVBxUQ63I23ec6TWPe2RUP5GbS/8T5C9ahijIJdRo2rcGBsGO3Vf Z6Zh02DA0Qn22JSHkeU72VMKTY9L3BO6jAU34SPP/O27M7K+CvMc2TL+SeO6W/IyEnSjo3K InWvYWdulWnxitfhrBB6XYf3P0lD+KTQ45hrzHAEIFZ6d/iKLPEU5OeBpAraXtd0J+L4jQf GpqgZvZfcrbUZ6jfaylr3S0CstGu1xD3h5U17Hg5lPkTO7ytcwOXIvSUI87bEdijznaOxQn f6/0h+9oP8pXNni1rWQp9omGGPbjbUyvdl1BLY6IrgkpChX9z2EhHjOQJHLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1757874019; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=UzZvmWPqzyEtEEMtHQNeYAx2J2Kcj8zojFPxE13O+KM=; b=hv5OG9zDSpvlMjI4wO6eLyYNFs4qcANZ4pMsr7aTN/CKzWM8auGkcd4r01kuEwJAmF0MQ Tn9ei3sUs7ClkpQGTUkZSqiKAPk64GOPlY8ulok3qnVN+ZjAgSFT9j81tIg/Heqs3r88fx7 bk3G4gcWtStJY96sN1t0HriSdohfbf1JUHcW0MmzhRIlqM4doAiYkVEroiA8S+vCYI6lCkk mcvJ80WkVaopVhIkIC970t04Fv7rQ+G6Lf3vc1ip2crgDdnMWllfLuzL5QE4GszL5LYSKsb 7hil6oZCaRyyyoyEoEXdX1RMvr6YyzBLrXH7umqoNU2gEWfzJUwWwDzF8smg== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=none Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1757874010; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=sscOSVqnStFAEjani30tBX5iS6AYTNaJwAyL99IS3r4=; b=4QR+GQHTUkxsJeIW2E5Ev3g/V+SfKTc5Te2p34Xt+KkfjEmM6ETPwNWEFcvVulIDM3Rf5 FX/jVlepVUAYPMb0qRSuj47QZ+7WVnO4vR0/tJqrdUYh7mI0TDWvge3o3ZBdzQ7KTH5Haa3 s5kAlYhypsvI44nQyCCtHIvXEC8KjY3AtTorxRIDWMLNDWIzgtyTGfnkAKLfNB90wQJYuRp vOATf/sdSPXTDET7sugixwnxSNPlxBaukHZVikQjQtFNzvZBttsomOGS5H6m6yzC7Iq3BWH stDkxjiz+sstA6y5ntQHAcZ/4VMgTZIiXNkoLQSIeDYyN+9p7VsPeZqaMRPQ== Received: from ed19c606a818 (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 89B1D68E70B for ; Sun, 14 Sep 2025 21:20:10 +0300 (EEST) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Sun, 14 Sep 2025 18:20:10 -0000 Message-ID: <175787401067.25.4999638338512204626@463a07221176> Message-ID-Hash: U4MDZXJFQ7A5KRJPLACTJ2WDHKRK5FVC X-Message-ID-Hash: U4MDZXJFQ7A5KRJPLACTJ2WDHKRK5FVC X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH] avcodec/aarch64/vvc: Optimize dmvr_hv_10 (PR #20517) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: welder via ffmpeg-devel Cc: welder Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #20517 opened by welder URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20517 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20517.patch Nothing spectacular, merged a few adds and shifts into rounding shifts. >>From 7809ff9746abf83bc41c1f13d9e1b2f1da6b0fb9 Mon Sep 17 00:00:00 2001 From: Krzysztof Pyrkosz Date: Fri, 5 Sep 2025 19:52:11 +0200 Subject: [PATCH] avcodec/aarch64/vvc: Optimize dmvr_hv_10 Before and ofter on A53: dmvr_hv_10_12x20_neon: 1838.2 ( 3.02x) dmvr_hv_10_20x12_neon: 1330.2 ( 1.83x) dmvr_hv_10_20x20_neon: 2148.2 ( 1.85x) dmvr_hv_12_12x20_neon: 1839.2 ( 3.02x) dmvr_hv_12_20x12_neon: 1330.6 ( 1.83x) dmvr_hv_12_20x20_neon: 2147.2 ( 1.85x) dmvr_hv_10_12x20_neon: 1755.0 ( 3.17x) dmvr_hv_10_20x12_neon: 1165.8 ( 2.09x) dmvr_hv_10_20x20_neon: 1876.1 ( 2.12x) dmvr_hv_12_12x20_neon: 1754.4 ( 3.17x) dmvr_hv_12_20x12_neon: 1167.8 ( 2.09x) dmvr_hv_12_20x20_neon: 1878.8 ( 2.12x) --- libavcodec/aarch64/vvc/inter.S | 58 ++++++++++------------------------ 1 file changed, 17 insertions(+), 41 deletions(-) diff --git a/libavcodec/aarch64/vvc/inter.S b/libavcodec/aarch64/vvc/inter.S index 01d2ff155c..79ff720cdd 100644 --- a/libavcodec/aarch64/vvc/inter.S +++ b/libavcodec/aarch64/vvc/inter.S @@ -599,18 +599,13 @@ function ff_vvc_dmvr_hv_8_neon, export=1 endfunc function ff_vvc_dmvr_hv_12_neon, export=1 - movi v29.4s, #(12 - 6) - movi v30.4s, #(1 << (12 - 7)) // offset1 + mvni v29.4s, #(12 - 6 - 1) b 0f endfunc function ff_vvc_dmvr_hv_10_neon, export=1 - movi v29.4s, #(10 - 6) - movi v30.4s, #(1 << (10 - 7)) // offset1 + mvni v29.4s, #(10 - 6 - 1) 0: - movi v31.4s, #8 // offset2 - neg v29.4s, v29.4s - sub sp, sp, #(VVC_MAX_PB_SIZE * 4) movrel x9, X(ff_vvc_inter_luma_dmvr_filters) @@ -626,7 +621,6 @@ function ff_vvc_dmvr_hv_10_neon, export=1 add x12, x9, my, lsl #1 ldrb w10, [x12] ldrb w11, [x12, #1] - sxtw x6, w6 dup v2.8h, w10 // filter_y[0] dup v3.8h, w11 // filter_y[1] @@ -635,7 +629,7 @@ function ff_vvc_dmvr_hv_10_neon, export=1 mov w10, #0 // start filter_y or not add height, height, #1 sub dst, dst, #(VVC_MAX_PB_SIZE * 2) - sub src_stride, src_stride, x6, lsl #1 + sub src_stride, src_stride, w6, sxtw #1 cset w15, gt // width > 16 1: mov x12, tmp0 @@ -656,14 +650,10 @@ function ff_vvc_dmvr_hv_10_neon, export=1 umlal v18.4s, v17.4h, v1.4h umlal2 v19.4s, v17.8h, v1.8h - add v4.4s, v4.4s, v30.4s - add v5.4s, v5.4s, v30.4s - add v18.4s, v18.4s, v30.4s - add v19.4s, v19.4s, v30.4s - ushl v4.4s, v4.4s, v29.4s - ushl v5.4s, v5.4s, v29.4s - ushl v18.4s, v18.4s, v29.4s - ushl v19.4s, v19.4s, v29.4s + urshl v4.4s, v4.4s, v29.4s + urshl v5.4s, v5.4s, v29.4s + urshl v18.4s, v18.4s, v29.4s + urshl v19.4s, v19.4s, v29.4s uqxtn v6.4h, v4.4s uqxtn2 v6.8h, v5.4s uqxtn v7.4h, v18.4s @@ -681,18 +671,10 @@ function ff_vvc_dmvr_hv_10_neon, export=1 umlal2 v18.4s, v6.8h, v3.8h umlal v19.4s, v7.4h, v3.4h umlal2 v20.4s, v7.8h, v3.8h - add v17.4s, v17.4s, v31.4s - add v18.4s, v18.4s, v31.4s - add v19.4s, v19.4s, v31.4s - add v20.4s, v20.4s, v31.4s - ushr v17.4s, v17.4s, #4 - ushr v18.4s, v18.4s, #4 - ushr v19.4s, v19.4s, #4 - ushr v20.4s, v20.4s, #4 - uqxtn v6.4h, v17.4s - uqxtn2 v6.8h, v18.4s - uqxtn v7.4h, v19.4s - uqxtn2 v7.8h, v20.4s + uqrshrn v6.4h, v17.4s, #4 + uqrshrn2 v6.8h, v18.4s, #4 + uqrshrn v7.4h, v19.4s, #4 + uqrshrn2 v7.8h, v20.4s, #4 stp q6, q7, [x14], #32 b 3f 2: @@ -704,10 +686,8 @@ function ff_vvc_dmvr_hv_10_neon, export=1 umlal v4.4s, v7.4h, v1.4h umlal2 v5.4s, v7.8h, v1.8h - add v4.4s, v4.4s, v30.4s - add v5.4s, v5.4s, v30.4s - ushl v4.4s, v4.4s, v29.4s - ushl v5.4s, v5.4s, v29.4s + urshl v4.4s, v4.4s, v29.4s + urshl v5.4s, v5.4s, v29.4s uqxtn v6.4h, v4.4s uqxtn2 v6.8h, v5.4s str q6, [x13], #16 @@ -719,10 +699,8 @@ function ff_vvc_dmvr_hv_10_neon, export=1 umull2 v18.4s, v16.8h, v2.8h umlal v17.4s, v6.4h, v3.4h umlal2 v18.4s, v6.8h, v3.8h - add v17.4s, v17.4s, v31.4s - add v18.4s, v18.4s, v31.4s - ushr v17.4s, v17.4s, #4 - ushr v18.4s, v18.4s, #4 + urshr v17.4s, v17.4s, #4 + urshr v18.4s, v18.4s, #4 uqxtn v16.4h, v17.4s uqxtn2 v16.8h, v18.4s str q16, [x14], #16 @@ -731,8 +709,7 @@ function ff_vvc_dmvr_hv_10_neon, export=1 ldr d6, [src], #8 umull v4.4s, v7.4h, v1.4h umlal v4.4s, v6.4h, v0.4h - add v4.4s, v4.4s, v30.4s - ushl v4.4s, v4.4s, v29.4s + urshl v4.4s, v4.4s, v29.4s uqxtn v6.4h, v4.4s str d6, [x13], #8 @@ -741,8 +718,7 @@ function ff_vvc_dmvr_hv_10_neon, export=1 ldr d16, [x12], #8 umull v17.4s, v16.4h, v2.4h umlal v17.4s, v6.4h, v3.4h - add v17.4s, v17.4s, v31.4s - ushr v17.4s, v17.4s, #4 + urshr v17.4s, v17.4s, #4 uqxtn v16.4h, v17.4s str d16, [x14], #8 4: -- 2.49.1 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org