From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTPS id 663B04CF5C
	for <ffmpegdev@gitmailbox.com>; Fri, 30 May 2025 07:27:36 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id D124768D678;
	Fri, 30 May 2025 10:27:27 +0300 (EEST)
Received: from EUR03-DBA-obe.outbound.protection.outlook.com
 (mail-dbaeur03olkn2106.outbound.protection.outlook.com [40.92.58.106])
 by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id A4DA868D62E
 for <ffmpeg-devel@ffmpeg.org>; Fri, 30 May 2025 10:27:20 +0300 (EEST)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=jN1LyuJ2nhyL/u1kIqSEAB3tZ06NBhhEoRuhTOwTJdzKchPqZOFd0uUQ7Xw8no0zPx5YaDsSK6aH/uA+RYW70d9kdmTnIEsCccJuVTcIwq3+G9E0DWm2+7u6hP5ZNFfuQsTG6DQ91/jGhSeXUMASNzSHR/6wKi5TKHxHkR9GFMQmwnBXgnaxaE+nCWnUSVEyDlc/Fi3QFZQxZxOsZfWKoDU0D5xPg8rI2axUYik/zHieD5ixW98uo1R9toCmWiC4TwWPp9/N0qScD2aGwBnxLAnmCCnNEqTK5wB3ocEkbhyXoXKrKKK8r0ypKjVg9aTRN0OA0KkKzX0R0qhXwbLWBQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=7HBIZPE7D3rkMgHfk2CfuGYI8f2mQIpxfiDe/3FCelk=;
 b=bml3F4d3wYVyrb/R9/vzg21Z7gH4ftpr8QOsyVf/ywGunz1xcb7HbSj3P00hO7HRj7gppQ8a3/aERBhYetMFQ2x9IO5TtOhz2ySTSvZJpuitWi0oWwZuGfve5UVmOcnLasS653GTHRuh1CIbh757jLiRIlTrP2CAb/GrGusL0sOB5/72RKhnH5xxFGeNEkE+lGngs6fU5WtlEUNx9nLQK+KSxUwL7CvVGAJ4w4+htzcUNkpF7TRJGHP4zalp2IW+g4r1OJzcWjR24lhJyc5fIRWjoXMGy1VTJ0gtThR7LTAi/GLYQCFDr7LzWxKoY37X9vEhl4nCJSJHH0LtnDskbw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none;
 dkim=none; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com;
 s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=7HBIZPE7D3rkMgHfk2CfuGYI8f2mQIpxfiDe/3FCelk=;
 b=awdg86g/H9ue1L9lVbFPICjBovgeMMdl26Wjobo1mrvWnW+zOC8teWOfgHKqIARV5cv61gOSc2hyZm9e6QOIKjjvbkN/afnjQGO64Z9ElMkl2ElxkW9UVPKIa22O36F1Bp1JlR9EELZgWcypr3FFmK6Xfg4UJ2PJ3NlsAv88SV2gwynOMGlATcaVh7UOT/2FAaPsYefcSzwmYgAtQQZIDMCA+5C5eGw3gw4e+67Cl5H2IDnwGvxBymuAIZsCzxmUvaoc25eEssVSu0HW9PH5Aap5QJacleTu4Wfh0DYVslKDGyzKZNeXnSrz5GrVqEX3RyaSZCOkPX2RF19cXx5nYw==
Received: from DBAP193MB0956.EURP193.PROD.OUTLOOK.COM (2603:10a6:10:1c5::19)
 by GV1P193MB2037.EURP193.PROD.OUTLOOK.COM (2603:10a6:150:29::22) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8769.32; Fri, 30 May
 2025 07:27:14 +0000
Received: from DBAP193MB0956.EURP193.PROD.OUTLOOK.COM
 ([fe80::ed13:9f9d:e088:ae31]) by DBAP193MB0956.EURP193.PROD.OUTLOOK.COM
 ([fe80::ed13:9f9d:e088:ae31%3]) with mapi id 15.20.8769.022; Fri, 30 May 2025
 07:27:14 +0000
From: Dmitriy Kovalenko <dmtr.kovalenko@outlook.com>
To: ffmpeg-devel@ffmpeg.org
Date: Fri, 30 May 2025 09:27:05 +0200
Message-ID: <DBAP193MB0956E4FE4B60344958D908F58D61A@DBAP193MB0956.EURP193.PROD.OUTLOOK.COM>
X-Mailer: git-send-email 2.49.0
In-Reply-To: <20250530072706.15067-1-dmtr.kovalenko@outlook.com>
References: <20250530072706.15067-1-dmtr.kovalenko@outlook.com>
X-ClientProxiedBy: AM8P190CA0024.EURP190.PROD.OUTLOOK.COM
 (2603:10a6:20b:219::29) To DBAP193MB0956.EURP193.PROD.OUTLOOK.COM
 (2603:10a6:10:1c5::19)
X-Microsoft-Original-Message-ID: <20250530072706.15067-2-dmtr.kovalenko@outlook.com>
MIME-Version: 1.0
X-MS-Exchange-MessageSentRepresentingType: 1
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: DBAP193MB0956:EE_|GV1P193MB2037:EE_
X-MS-Office365-Filtering-Correlation-Id: a74489e1-b9f1-47ae-daee-08dd9f4b65eb
X-Microsoft-Antispam: BCL:0;
 ARA:14566002|7092599006|19110799006|5072599009|8060799009|15080799009|461199028|3412199025|440099028|1710799026;
X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?ShX8suFyUnP6zb42AfWaDGgyCG+uxRaGUnPELVk6G1Y9CiNCgxEmkxbhGIb7?=
 =?us-ascii?Q?b8SxbS1+z2xASRIZf1YgL1p7JWPwSia5X1bSOgKoaO9o1qGuaUJVsSTfvTBd?=
 =?us-ascii?Q?iWqycVm37XR/RDHZHvKXRqXLg+VvZUpskAEZrYXyrFkwSJEzx+VK0gXamfYA?=
 =?us-ascii?Q?0s3jRC4+40aPwtGnFpOVutPbxzSi4WlsMXLUp9ANSaL4e/z4d3Hqzb1sBx37?=
 =?us-ascii?Q?bpMitdyNev4irhP30iVuPnTRyg3vQAbVX0SAzxPXyYSYoCrIzXAfKneQfLad?=
 =?us-ascii?Q?WLURN7U70YkCN4dFYxABWcHJCp3av3oAG8YnLUcdbaMYLQBAjzLPBknV+XBh?=
 =?us-ascii?Q?WDU3R3DNufsoUYgFIQn11ukf2VX1VUirrowf0s+AmfKAMibjblM8Qny9JQFH?=
 =?us-ascii?Q?gfj7VfpZ7BPtDiEguGjtEUNzYv4btktXl+HC5O5LRXOeqiDRsZU1lxMZApCG?=
 =?us-ascii?Q?Zm4vCyjW5hQQpTrg0OjdnhAv08bTeJIXxGzyrIvkGcNaMNmhN+2/H71v0YCv?=
 =?us-ascii?Q?pw7sOAGsHsGM0pmfDTQvL2PBZcD+KnvJpocsfjCvrIK3Elebgj0Ao/OSjeSd?=
 =?us-ascii?Q?+fKQF5E0o+K6hH7stMmyK+woTof0ZOoprc/zZy2Q9PMyFD9ipQ9PuzcOqfJ1?=
 =?us-ascii?Q?UbyrPOBiHvXfchCm/udJ7UHIwP/+pp7IBtH+STS7eVEf8OL5Why9JPIxgiwO?=
 =?us-ascii?Q?VAa17+df1S6rlRMQEv+1moehhJ16KwrDcF5YOAUV0eJGfMYdli8K8ii0sv1E?=
 =?us-ascii?Q?V7cvFZ2Byl5/n52RKtKibaobbZDLczpPe+Z5o/pRz0UWKMDAOu05X6elaepz?=
 =?us-ascii?Q?VElbNa8N4k8CjByeGahqhIoNWIw/xHypsE6U1E67F6YmjFBUY6ap0rivQNXk?=
 =?us-ascii?Q?0eL0+SN/bYsc1T4QA4cnjQe0+0/XIDLhMgi2OrTPe5qLYrFkdyDEwaW/HojR?=
 =?us-ascii?Q?XgB3UrBrdP3z1XUytU+pXiZNAFKQ4q0CDG965xRHi2cENdB1j5Rq5muxRjm8?=
 =?us-ascii?Q?XEjv4cJqsUVpFWMs9FwLJSq8l+2EWJJnVRervhEmy06r70k0O6qsHAbYF6Ll?=
 =?us-ascii?Q?2/zs6DN5rd5PBQZ9bVkgk2iUteYvW9fYj5zW9fjFtNyN/Uy0OQ2mMxn9XuzG?=
 =?us-ascii?Q?gGdfTMeITSHTgmXv0/z7RYLFWKuVh4uhFQ=3D=3D?=
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?Ak/vFpttlUW/AyCYwrReX004MzoiBGmgWqIWqB8boVDTZ80PHifGhPjC0DpY?=
 =?us-ascii?Q?bhP03P1XtEcUrBlG2yc0I91FsqL0wRAW0yQ8aR5gqjAiLCnh+aeXSUd5yAyC?=
 =?us-ascii?Q?p+stuB3cMNa2RpUIbvx5sR1G9mhGoNE7k6Ejrq/Nh6auWuJ4VNkynhQ7WUiO?=
 =?us-ascii?Q?ei6asDD4vYZn7VsGKGRD/HJtzbfnMj4yVJyquRN4NusG93LnP8F1zqmEltwl?=
 =?us-ascii?Q?c7jG3/Vcy6Bgg9cdoF3Clm4Q6TznzqbXg9v7N4dCb446OG8h+3gVLTA7UxZa?=
 =?us-ascii?Q?4AyU9RAvj2ii4MS0bibKqTWBuWijxlDqCEHDcsSDrSmr+m6qXXtW18zWRDb5?=
 =?us-ascii?Q?5MLeRqzLQA46lJ6lJK16VRdPep9inKAZNAb79OcvRakzEwN9Nh/0QwYHnBgD?=
 =?us-ascii?Q?fr6KYRO1G8VSABG6y9R9WBgrH8gd9kn4ibcx6X5vL7/s1wSWqy+HEJ6ZYUxj?=
 =?us-ascii?Q?/JAFj50fRNBR/sya54ucYxd9Ly3K6rfcopLBXKBzdSDtkmEFdx6A9yIlvoLY?=
 =?us-ascii?Q?kfUpm8OZDEwMewOuC6Vpq3I8EFdrLaurXCfY+suPXlfWdLpVKEzr2tEI9QSI?=
 =?us-ascii?Q?1IQMtDZ17K9SFNTqGF/sQpnPjgkq+0iLf5udb3OHzEbg+a/GYG17pU55iEoD?=
 =?us-ascii?Q?z/WXogAE39CMuroLuWGqpoUebd0R9ZAMBXYSqv95CVhrTEj4hTowwkqtYDEf?=
 =?us-ascii?Q?HL6A99+HDamxYgxpYHN1m5NOLw8l++zFONNLrTyUVLLduQKonLFmK78Q6YOO?=
 =?us-ascii?Q?JrlqFDLmrUw0RdFTSf1urLntLjdEbls9/WWPQmU8UcR/N+inbltzYVLOVkG7?=
 =?us-ascii?Q?nG2OrM/t/FykpfwW5KHRWZudJCzKHZydgnMhKkWKDw1my2CVlBlozeiYqXRn?=
 =?us-ascii?Q?HidEYnc/SVQLSRgCyT+FRoFAjSlvQX4rqh/neWPdQhTRTDTn9LsgWmIcuMx9?=
 =?us-ascii?Q?nRRJLrCTcyqi13JolWMYg8z8CHi3oc9EuZRcMcyvKl0WbCeixOaGqd9NtXrW?=
 =?us-ascii?Q?uj6UxBSZZ4qU3yIf3XTfFTvEtytON3eE8UHd1JGhOYP8xzP2aWrG35z2V5Yj?=
 =?us-ascii?Q?8oO5VjNCYK33OqHaskuaDksV3SX21sdDtzrKbHT3YLUyfDILQP1HkrF2WflA?=
 =?us-ascii?Q?2manoqwHUvwMn1bJKtjgITFw70AZeuyiCeICWSBGF7/T2dZj6OQ3oFBSpiM1?=
 =?us-ascii?Q?A9CBfEefqpNJRYwEgJR+k8hs2dut0/qn6ZC6zRoH7AF61ZRfwqPcP5RTyQZK?=
 =?us-ascii?Q?wTGJyKbZKC93ECiThBNJ?=
X-OriginatorOrg: outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: a74489e1-b9f1-47ae-daee-08dd9f4b65eb
X-MS-Exchange-CrossTenant-AuthSource: DBAP193MB0956.EURP193.PROD.OUTLOOK.COM
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 May 2025 07:27:14.1156 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa
X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000
X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1P193MB2037
Subject: [FFmpeg-devel] [PATCH v3 2/2] swscale: Neon rgb_to_yuv_half process
 32 pixels at a time
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Dmitriy Kovalenko <dmtr.kovalenko@outlook.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/DBAP193MB0956E4FE4B60344958D908F58D61A@DBAP193MB0956.EURP193.PROD.OUTLOOK.COM/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

This patch integrates so called double bufferring when we are loading
2 batch of elements at a time and then processing them in parallel. On the
moden arm processors especially Apple Silicon it gives a visible
benefit, for subsampled pixel processing it is especially nice because
it allows to read elements w/ 2 instructions and write with a single one
(especially visible on a platforms with slower memory like ios).

Including the previous patch in a stack on macbook pro m4 max rgb_to_yuv_half
in checkasm goes up 2x of the c version
---
 libswscale/aarch64/input.S | 103 ++++++++++++++++++++++++++++---------
 1 file changed, 79 insertions(+), 24 deletions(-)

diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S
index dc07bd1b48..f305d87935 100644
--- a/libswscale/aarch64/input.S
+++ b/libswscale/aarch64/input.S
@@ -197,40 +197,94 @@ function ff_\fmt_rgb()ToUV_half_neon, export=1
         ldp             w12, w13, [x6, #20]     // w12: bu, w13: rv
         ldp             w14, w15, [x6, #28]     // w14: gv, w15: bv
 4:
-        cmp             w5, #8
         rgb_set_uv_coeff half=1
-        b.lt            2f
-1:  // load 16 pixels and prefetch memory for the next block
+
+        cmp             w5, #16
+        b.lt            2f                      // Go directly to scalar if < 16
+
+1:
     .if \element == 3
-        ld3             { v16.16b, v17.16b, v18.16b }, [x3], #48
-        prfm            pldl1strm, [x3, #48]
+        ld3             { v16.16b, v17.16b, v18.16b }, [x3], #48  // First 16 pixels
+        ld3             { v26.16b, v27.16b, v28.16b }, [x3], #48  // Second 16 pixels
+        prfm            pldl1keep, [x3, #96]
     .else
-        ld4             { v16.16b, v17.16b, v18.16b, v19.16b }, [x3], #64
-        prfm            pldl1strm, [x3, #64]
+        ld4             { v16.16b, v17.16b, v18.16b, v19.16b }, [x3], #64  // First 16 pixels
+        ld4             { v26.16b, v27.16b, v28.16b, v29.16b }, [x3], #64  // Second 16 pixels
+        prfm            pldl1keep, [x3, #128]
     .endif
 
+    // **Sum adjacent pixel pairs**
     .if \alpha_first
-        uaddlp          v21.8h, v19.16b         // v21: summed b pairs
-        uaddlp          v20.8h, v18.16b         // v20: summed g pairs
-        uaddlp          v19.8h, v17.16b         // v19: summed r pairs
+        uaddlp          v21.8h, v19.16b         // Block 1: B sums
+        uaddlp          v20.8h, v18.16b         // Block 1: G sums
+        uaddlp          v19.8h, v17.16b         // Block 1: R sums
+        uaddlp          v31.8h, v29.16b         // Block 2: B sums
+        uaddlp          v30.8h, v28.16b         // Block 2: G sums
+        uaddlp          v29.8h, v27.16b         // Block 2: R sums
     .else
-        uaddlp          v19.8h, v16.16b         // v19: summed r pairs
-        uaddlp          v20.8h, v17.16b         // v20: summed g pairs
-        uaddlp          v21.8h, v18.16b         // v21: summed b pairs
+        uaddlp          v19.8h, v16.16b         // Block 1: R sums
+        uaddlp          v20.8h, v17.16b         // Block 1: G sums
+        uaddlp          v21.8h, v18.16b         // Block 1: B sums
+        uaddlp          v29.8h, v26.16b         // Block 2: R sums
+        uaddlp          v30.8h, v27.16b         // Block 2: G sums
+        uaddlp          v31.8h, v28.16b         // Block 2: B sums
     .endif
 
-        mov             v22.16b, v6.16b         // U first half
-        mov             v23.16b, v6.16b         // U second half
-        mov             v24.16b, v6.16b         // V first half
-        mov             v25.16b, v6.16b         // V second half
+        // init accumulatos for both blocks
+        mov             v7.16b, v6.16b          //  U_low
+        mov             v8.16b, v6.16b          //  U_high
+        mov             v9.16b, v6.16b          //  V_low
+        mov             v10.16b, v6.16b         //  V_high
+        mov             v11.16b, v6.16b         //  U_low
+        mov             v12.16b, v6.16b         //  U_high
+        mov             v13.16b, v6.16b         //  V_low
+        mov             v14.16b, v6.16b         //  V_high
+
+        smlal           v7.4s, v0.4h, v19.4h    // U += ru * r (0-3)
+        smlal           v9.4s, v3.4h, v19.4h    // V += rv * r (0-3)
+        smlal           v11.4s, v0.4h, v29.4h   // U += ru * r (0-3)
+        smlal           v13.4s, v3.4h, v29.4h   // V += rv * r (0-3)
+
+        smlal2          v8.4s, v0.8h, v19.8h    // U += ru * r (4-7)
+        smlal2          v10.4s, v3.8h, v19.8h   // V += rv * r (4-7)
+        smlal2          v12.4s, v0.8h, v29.8h   // U += ru * r (4-7)
+        smlal2          v14.4s, v3.8h, v29.8h   // V += rv * r (4-7)
+
+        smlal           v7.4s, v1.4h, v20.4h    // U += gu * g (0-3)
+        smlal           v9.4s, v4.4h, v20.4h    // V += gv * g (0-3)
+        smlal           v11.4s, v1.4h, v30.4h   // U += gu * g (0-3)
+        smlal           v13.4s, v4.4h, v30.4h   // V += gv * g (0-3)
+
+        smlal2          v8.4s, v1.8h, v20.8h    // U += gu * g (4-7)
+        smlal2          v10.4s, v4.8h, v20.8h   // V += gv * g (4-7)
+        smlal2          v12.4s, v1.8h, v30.8h   // U += gu * g (4-7)
+        smlal2          v14.4s, v4.8h, v30.8h   // V += gv * g (4-7)
+
+        smlal           v7.4s, v2.4h, v21.4h    // U += bu * b (0-3)
+        smlal           v9.4s, v5.4h, v21.4h    // V += bv * b (0-3)
+        smlal           v11.4s, v2.4h, v31.4h   // U += bu * b (0-3)
+        smlal           v13.4s, v5.4h, v31.4h   // V += bv * b (0-3)
+
+        smlal2          v8.4s, v2.8h, v21.8h    // U += bu * b (4-7)
+        smlal2          v10.4s, v5.8h, v21.8h   // V += bv * b (4-7)
+        smlal2          v12.4s, v2.8h, v31.8h   // U += bu * b (4-7)
+        smlal2          v14.4s, v5.8h, v31.8h   // V += bv * b (4-7)
+
+        sqshrn          v16.4h, v7.4s, #10      // U (0-3)
+        sqshrn          v17.4h, v9.4s, #10      // V (0-3)
+        sqshrn          v22.4h, v11.4s, #10     // U (0-3)
+        sqshrn          v23.4h, v13.4s, #10     // V (0-3)
+
+        sqshrn2         v16.8h, v8.4s, #10      // U (0-7)
+        sqshrn2         v17.8h, v10.4s, #10     // V (0-7)
+        sqshrn2         v22.8h, v12.4s, #10     // U (0-7)
+        sqshrn2         v23.8h, v14.4s, #10     // V (0-7)
+
+        stp             q16, q22, [x0], #32     // Store all 16 U values
+        stp             q17, q23, [x1], #32     // Store all 16 V values
 
-        rgb_to_uv_interleaved_product v19, v20, v21, v0, v1, v2, v3, v4, v5, v22, v23, v24, v25, v16, v17, #10
-
-        str             q16, [x0], #16          // store dst_u
-        str             q17, [x1], #16          // store dst_v
-
-        sub             w5, w5, #8              // width -= 8
-        cmp             w5, #8                  // width >= 8 ?
+        sub             w5, w5, #16             // width -= 16
+        cmp             w5, #16                 // width >= 16 ?
         b.ge            1b
         cbz             w5, 3f                  // No pixels left? Exit
 
@@ -444,3 +498,4 @@ endfunc
 
 DISABLE_DOTPROD
 #endif
+
-- 
2.49.0

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".