From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id D506E4DB98 for ; Wed, 31 Dec 2025 17:18:26 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'F9fw2YAEbTwbzj4f4PnJaCXZOMZGZiP9tBj6e9Smj3E=', expected b'RL0FUgSH48AMC9SqMYGqWJC0QZr90tnstizyXF9bf1o=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1767201498; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=F9fw2YAEbTwbzj4f4PnJaCXZOMZGZiP9tBj6e9Smj3E=; b=vhzCDqBVLrBtatVc+/B6ZoD3YB9N7/w9KAIgHmxbhUD59hhoUJG7B4AzhcX6CBjpVGlD9 SIwtPhRPJTOa7UM+4XjPkIM5HkZJt7MmlIG3ifR4WzYtQ1uRxetzgQDuOdG44/ksfmqE5xA U2Jvwe519byj3FbwKBr0pGvHHAUqLo7qLsbX1amEd1TrfQJd4I5v06yvV+gTYohz5+sz6Bx q34pukYDrA2Ctv2tdCztaqVDg0fe6DO6i9G8ucrfojiQ2WwAEvseiji8PCKKlFypE4ltTJJ bLnQN8oD/e1yvCMwPXWlLq8bPeyxt4vnxVbIYxu6c9DfiBGnGCVzO9TK6Oww== Received: from [172.20.0.4] (unknown [172.20.0.4]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id C1FCE690C62; Wed, 31 Dec 2025 19:18:18 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1767201486; b=lt5LuAbyZkLg2FrCACc9j7dgGvNarNxZ5AhiUiWjuJdaRlhhfg2OKINiSVnnX727EawLy r65m4fJA76T9mGk03xubGH2h5e9wOVfXKYk04CugWyexkAqKbqumd4h1ulU0mONO0fcRhmJ 4DX9p4mKET8O3kzSgcqgZaLn7uuybug9hp9//JQvYczovhPSw2EFaMr4GUIu4VkvOxQPFwe L75Blyux09gksW9LFDetvtbclkCXCq7kw/aNXxxDH65YINu7cCigMGzRRCX1WPetfRsiq6c sNAVVohUNkJt06Xmy+J9FjtjXKWrgsOdQ/8ipr2RRqKxELwfGiNugMMus6uw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1767201486; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=hkSsuLtb/LE4apuTq2tC6FNh5o/YKn0VOR3i9XhVi/s=; b=BofLL+Z8alFKVOA7ZKw9R39gK8/SAnJHll1GoMksOBZAl7QBu41xlJ/YlOGAhFFcdLYUE rey8eSk5Rl7XxfcEdGTa8yoS8SyOyTrY0PhYx/+64OJnd6rQt/E6MJWagS46Cmehz8v06+f d6Ht/K7mgY9i5xOaKG1B822erdWaSNwLdkMMkoniBB309kUfnbo4Z0vcnqkhS/aqv2HUmFg //GV06eLr4wtw3P/khxq5vH1DaDGjNB1W6IXCpx40UiI39cyyIhv1FdhNRjkYL4m7E5+HtV 0j3/Sjb4ETCf98NV/mHMJia3yMrnDTVN71brhqXU0dhj+gPqKQLsW2+TL/uA== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1767201478; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=RL0FUgSH48AMC9SqMYGqWJC0QZr90tnstizyXF9bf1o=; b=cgvCcr2qwE4zdJ21/vaLdsjKQiFf++ybCJSm9yxxJ3Kdco0ULb7f9ISfDV69oo0wBFjRc 0DHVqwHcHBEIvtW6XKoBxFIpItDeyzQ0F08VAIaSkHtf/LRyyVHKEwybAteUDLRDwC60Hrf +tnF8A4inYJXJNpyAdSSYCpALHSiCruAxnlRvztqKmMYloxxGZwFnyPVoD2DxlKqRix1YKP 0fayu7SLBsXoE0HDWdZJGmmKVJfiL0UuLK6srQbvOCPTCjaPxb1X/Y4fS8PSVaOe1G/v7Rl mLYffInESc0k3JAq9ZcpyuZ0nEYCJ119ripFtnvuK9wGPM7elHITNocb+eig== Received: from de3a2b3407a2 (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id C1E69690A99 for ; Wed, 31 Dec 2025 19:17:58 +0200 (EET) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Wed, 31 Dec 2025 17:17:58 -0000 Message-ID: <176720147897.25.8552990334381972025@4457048688e7> Message-ID-Hash: FNP6GR3DQP5E2Q5M52MALTVK4Z577XII X-Message-ID-Hash: FNP6GR3DQP5E2Q5M52MALTVK4Z577XII X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PR] aarch64/hpeldsp_neon: fix overread (PR #21345) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Zhao Zhili via ffmpeg-devel Cc: Zhao Zhili Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #21345 opened by Zhao Zhili (quink) URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21345 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21345.patch Fix #21141 The performance improved a little bit. On A76: Before After put_pixels_tab[0][1]_neon: 32.4 ( 3.91x) 31.6 ( 3.99x) put_pixels_tab[0][3]_neon: 88.0 ( 4.50x) 74.6 ( 5.31x) put_pixels_tab[1][1]_neon: 33.5 ( 2.52x) 31.2 ( 2.71x) put_pixels_tab[1][3]_neon: 30.5 ( 3.61x) 21.7 ( 5.08x) On A55: Before After put_pixels_tab[0][1]_neon: 175.2 ( 2.41x) 138.7 ( 3.04x) put_pixels_tab[0][3]_neon: 334.3 ( 2.71x) 296.1 ( 3.07x) put_pixels_tab[1][1]_neon: 168.3 ( 1.78x) 94.1 ( 3.19x) put_pixels_tab[1][3]_neon: 112.3 ( 2.20x) 90.0 ( 2.74x) >>From 1d4f113c75befc460ce02ee9898705c8a4fcf882 Mon Sep 17 00:00:00 2001 From: Zhao Zhili Date: Thu, 1 Jan 2026 00:52:44 +0800 Subject: [PATCH] aarch64/hpeldsp_neon: fix overread Fix #21141 The performance improved a little bit. On A76: Before After put_pixels_tab[0][1]_neon: 32.4 ( 3.91x) 31.6 ( 3.99x) put_pixels_tab[0][3]_neon: 88.0 ( 4.50x) 74.6 ( 5.31x) put_pixels_tab[1][1]_neon: 33.5 ( 2.52x) 31.2 ( 2.71x) put_pixels_tab[1][3]_neon: 30.5 ( 3.61x) 21.7 ( 5.08x) On A55: Before After put_pixels_tab[0][1]_neon: 175.2 ( 2.41x) 138.7 ( 3.04x) put_pixels_tab[0][3]_neon: 334.3 ( 2.71x) 296.1 ( 3.07x) put_pixels_tab[1][1]_neon: 168.3 ( 1.78x) 94.1 ( 3.19x) put_pixels_tab[1][3]_neon: 112.3 ( 2.20x) 90.0 ( 2.74x) --- libavcodec/aarch64/hpeldsp_neon.S | 58 ++++++++++++++++--------------- 1 file changed, 30 insertions(+), 28 deletions(-) diff --git a/libavcodec/aarch64/hpeldsp_neon.S b/libavcodec/aarch64/hpeldsp_neon.S index e7c1549c40..fd2c2c98c4 100644 --- a/libavcodec/aarch64/hpeldsp_neon.S +++ b/libavcodec/aarch64/hpeldsp_neon.S @@ -50,12 +50,13 @@ .endm .macro pixels16_x2 rnd=1, avg=0 -1: ld1 {v0.16b, v1.16b}, [x1], x2 - ld1 {v2.16b, v3.16b}, [x1], x2 +1: + ldur q1, [x1, #1] + ld1 {v0.16b}, [x1], x2 subs w3, w3, #2 - ext v1.16b, v0.16b, v1.16b, #1 + ldur q3, [x1, #1] + ld1 {v2.16b}, [x1], x2 avg v0.16b, v0.16b, v1.16b - ext v3.16b, v2.16b, v3.16b, #1 avg v2.16b, v2.16b, v3.16b .if \avg ld1 {v1.16b}, [x0], x2 @@ -108,20 +109,20 @@ .macro pixels16_xy2 rnd=1, avg=0 sub w3, w3, #2 - ld1 {v0.16b, v1.16b}, [x1], x2 - ld1 {v4.16b, v5.16b}, [x1], x2 + ldur q1, [x1, #1] + ld1 {v0.16b}, [x1], x2 NRND movi v26.8H, #1 - ext v1.16b, v0.16b, v1.16b, #1 - ext v5.16b, v4.16b, v5.16b, #1 + ldur q5, [x1, #1] + ld1 {v4.16b}, [x1], x2 uaddl v16.8h, v0.8b, v1.8b uaddl2 v20.8h, v0.16b, v1.16b uaddl v18.8h, v4.8b, v5.8b uaddl2 v22.8h, v4.16b, v5.16b 1: subs w3, w3, #2 - ld1 {v0.16b, v1.16b}, [x1], x2 + ldur q30, [x1, #1] + ld1 {v0.16b}, [x1], x2 add v24.8h, v16.8h, v18.8h NRND add v24.8H, v24.8H, v26.8H - ext v30.16b, v0.16b, v1.16b, #1 add v1.8h, v20.8h, v22.8h mshrn v28.8b, v24.8h, #2 NRND add v1.8H, v1.8H, v26.8H @@ -131,12 +132,12 @@ NRND add v1.8H, v1.8H, v26.8H urhadd v28.16b, v28.16b, v16.16b .endif uaddl v16.8h, v0.8b, v30.8b - ld1 {v2.16b, v3.16b}, [x1], x2 + ldur q3, [x1, #1] + ld1 {v2.16b}, [x1], x2 uaddl2 v20.8h, v0.16b, v30.16b st1 {v28.16b}, [x0], x2 add v24.8h, v16.8h, v18.8h NRND add v24.8H, v24.8H, v26.8H - ext v3.16b, v2.16b, v3.16b, #1 add v0.8h, v20.8h, v22.8h mshrn v30.8b, v24.8h, #2 NRND add v0.8H, v0.8H, v26.8H @@ -150,10 +151,10 @@ NRND add v0.8H, v0.8H, v26.8H st1 {v30.16b}, [x0], x2 b.gt 1b - ld1 {v0.16b, v1.16b}, [x1], x2 + ldur q30, [x1, #1] + ld1 {v0.16b}, [x1], x2 add v24.8h, v16.8h, v18.8h NRND add v24.8H, v24.8H, v26.8H - ext v30.16b, v0.16b, v1.16b, #1 add v1.8h, v20.8h, v22.8h mshrn v28.8b, v24.8h, #2 NRND add v1.8H, v1.8H, v26.8H @@ -206,10 +207,11 @@ NRND add v0.8H, v0.8H, v26.8H .endm .macro pixels8_x2 rnd=1, avg=0 -1: ld1 {v0.8b, v1.8b}, [x1], x2 - ext v1.8b, v0.8b, v1.8b, #1 - ld1 {v2.8b, v3.8b}, [x1], x2 - ext v3.8b, v2.8b, v3.8b, #1 +1: + ldur d1, [x1, #1] + ld1 {v0.8b}, [x1], x2 + ldur d3, [x1, #1] + ld1 {v2.8b}, [x1], x2 subs w3, w3, #2 avg v0.8b, v0.8b, v1.8b avg v2.8b, v2.8b, v3.8b @@ -263,22 +265,23 @@ NRND add v0.8H, v0.8H, v26.8H .endm .macro pixels8_xy2 rnd=1, avg=0 + ldur d4, [x1, #1] sub w3, w3, #2 - ld1 {v0.16b}, [x1], x2 - ld1 {v1.16b}, [x1], x2 + ld1 {v0.8b}, [x1], x2 NRND movi v19.8H, #1 - ext v4.16b, v0.16b, v4.16b, #1 - ext v6.16b, v1.16b, v6.16b, #1 + ldur d6, [x1, #1] + ld1 {v1.8b}, [x1], x2 uaddl v16.8h, v0.8b, v4.8b uaddl v17.8h, v1.8b, v6.8b 1: subs w3, w3, #2 - ld1 {v0.16b}, [x1], x2 + ldur d4, [x1, #1] + ld1 {v0.8b}, [x1], x2 add v18.8h, v16.8h, v17.8h - ext v4.16b, v0.16b, v4.16b, #1 NRND add v18.8H, v18.8H, v19.8H uaddl v16.8h, v0.8b, v4.8b mshrn v5.8b, v18.8h, #2 - ld1 {v1.16b}, [x1], x2 + ldur d6, [x1, #1] + ld1 {v1.8b}, [x1], x2 add v18.8h, v16.8h, v17.8h .if \avg ld1 {v7.8b}, [x0] @@ -291,14 +294,13 @@ NRND add v18.8H, v18.8H, v19.8H ld1 {v5.8b}, [x0] urhadd v7.8b, v7.8b, v5.8b .endif - ext v6.16b, v1.16b, v6.16b, #1 uaddl v17.8h, v1.8b, v6.8b st1 {v7.8b}, [x0], x2 b.gt 1b - ld1 {v0.16b}, [x1], x2 + ldur d4, [x1, #1] + ld1 {v0.8b}, [x1], x2 add v18.8h, v16.8h, v17.8h - ext v4.16b, v0.16b, v4.16b, #1 NRND add v18.8H, v18.8H, v19.8H uaddl v16.8h, v0.8b, v4.8b mshrn v5.8b, v18.8h, #2 -- 2.49.1 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org