From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id D818D4CFDC for ; Fri, 30 May 2025 08:41:14 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id EE46068D94B; Fri, 30 May 2025 11:41:03 +0300 (EEST) Received: from EUR03-VI1-obe.outbound.protection.outlook.com (mail-vi1eur03olkn2039.outbound.protection.outlook.com [40.92.57.39]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id D877D68C9CC for ; Fri, 30 May 2025 11:40:56 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=RYcTd5f3tg5fFLIpbF8HO8MRudwUnYmM75N9q0Mv7yLFyuJL0UohXrlx1469N5EgDRjGZ90xP/Nm5UL1KtIUUPdZTCdXqb5Lxoq4NNsaOuhFYN3zXxNicqzxyrpExfuok7LeJ84PDV4aDU0U8I3vZ7TmLBVF0V/X5N51VTYDxHnBoM4egbfyArMAoEpsreFQcDWiZ2MVGl6rl154RbmO47XDGQLVz140s8KEI/802/cSPTY+GSjUsr0xN63C7hpuH+7vGZWmBEqgu9RmpxuKnXLERQtqtl+O7nqLFjddjfWXnRo12AF60ClLeYXnuTn/5lpvecd69XUgnurOBX72qQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ne8/trGOlfur5h4TDj4PGProuOCOKEdCA9SaSWubWA4=; b=fsIsp53gkFXgnK2CZKX+LDq8Dr1iaAdenEL3dm0sVaQ+9PuLFwjAT2jHwi42BMmWcmy3175ZR0Ykj7KvaPwkBdSeZ//0S5DX51d/ebOh4MmUMIYoUE8YYDKJqwrmNlExyYEte9t7t81QevJMZV+8nRYopCQJ2YMsZBlrC7kqf1JtHADElEMQ+1tW/Bn7sHUnDwksFMv3mGzsLjEGozRSugjXtn6q1wlXIVJRdJDUbt9wm/MTbCl3OnjrVSW0ANvFP0HwTQlaoR8ZiuqHTaUJN2FemLOuc/pL0uW/xC301XKrAdruo2oqpuQE5wC1VPkn/Dxlvia+wuJ873hFoNFDLg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ne8/trGOlfur5h4TDj4PGProuOCOKEdCA9SaSWubWA4=; b=i468ZIIVb266BwaCa57Z8bHH8XvMapgChZp7Xh8+nfqSoAgnC9UKbrQ0le5rbCPHZ0+qkSh1wWi6/S8GoJUSLnvmMptLLQgnv8wrOT2oDDcQnJeja7GTmbzc6PPGVy6a/lVVE9IF3nac90FmPChVVokHHwX+IhPrMVjFGrw8BACq66H6qzYmsGV01Kt3z9EIPHW76a3avBxCcoZUNOWIP1Cv0qGyrg4p0y8g3W/+jXNAx9DqL/Yo0eg8ouvZGY1OpcnOmFwINhz1vwhqS6kPbsE7vM9HxewyEeRF4I9vedC1cGykHPXpggORMaNSRzvmgXx7aDBAwb0jj+N7WghQ5g== Received: from DBAP193MB0956.EURP193.PROD.OUTLOOK.COM (2603:10a6:10:1c5::19) by PR3P193MB0698.EURP193.PROD.OUTLOOK.COM (2603:10a6:102:37::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8769.30; Fri, 30 May 2025 08:40:51 +0000 Received: from DBAP193MB0956.EURP193.PROD.OUTLOOK.COM ([fe80::ed13:9f9d:e088:ae31]) by DBAP193MB0956.EURP193.PROD.OUTLOOK.COM ([fe80::ed13:9f9d:e088:ae31%3]) with mapi id 15.20.8769.022; Fri, 30 May 2025 08:40:51 +0000 From: Dmitriy Kovalenko To: ffmpeg-devel@ffmpeg.org Date: Fri, 30 May 2025 10:40:42 +0200 Message-ID: X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250530084042.33204-1-dmtr.kovalenko@outlook.com> References: <20250530084042.33204-1-dmtr.kovalenko@outlook.com> X-ClientProxiedBy: LO4P123CA0315.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:197::14) To DBAP193MB0956.EURP193.PROD.OUTLOOK.COM (2603:10a6:10:1c5::19) X-Microsoft-Original-Message-ID: <20250530084042.33204-2-dmtr.kovalenko@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DBAP193MB0956:EE_|PR3P193MB0698:EE_ X-MS-Office365-Filtering-Correlation-Id: f4614953-c5fa-41a8-4952-08dd9f55ae77 X-MS-Exchange-SLBlob-MailProps: Cq7lScuPrnoSu68Y5FdRDlCPOBk43FEt2iqEIybQV9FOVH6tsTP6UsAjuaHGu7DGNcmf41V9vvxQyprEE98jN0zJsdoKB4vfHQq/yyKOtjK48wO9d5A4A1naNWm+AOPzBkKwtGAlzFo/n/1RJ/rVbhAFeejsyhQTHG3lJtuCRX/TKMGAc4ZeFPTrmomAXgZM8Zn+QWGW/Ldl8WoLYA+/SNZ3FXSAzGkkUai+K/VHOnbxJF82yRMETKvlI4zoSBOLCpMCex+Y4esxbNbGEqOXoC9XEOeJu89T7fTypj/JEpt+IZ32IkhO7ko4smW9RZohCUg44bD8AjOdhOnMyyZJIf8Get39grDj682lKdsDuVfeV4ssKKqV7JjGtSn3gVASw+4kcaGGL1d+WZwPVnk56ZtQxtlzmNi1E2oTKgjp9YxXX1A0cYm8D0d+DUS/7ntRLQW1SdCeY+zGn9sKfwX5TpWOxgVL7c/vMwAUM2R7T0SJpbFEhQ7Rk/t+4uHro7GPOp+seSlq03ljeRmSAKMaBIiyRv6J78Y8FpaepEkz7Er52X3HUBYaXuhEvThj5yW3rES8zDCURep5ALA9CmmHFzV+hygYTQcAd8kXS2n17wy0zCy1oWcFrYTTc04poThtJAwO7FWUdtfyDO3w3OOLKC82xoyfyHDdkNu9bKcOSGZMhRnqSrZ3dLTFWYWcVGHFkg0YjGcRe4o1pM6zMCmgqOUQEuKfakmqJzV9upFvINaj8MJFvzHhgjkxvyQqDVH8YfCEPu7cexA= X-Microsoft-Antispam: BCL:0; ARA:14566002|461199028|15080799009|7092599006|8060799009|19110799006|5072599009|440099028|3412199025|1710799026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?anJnCAg7265yUyQhI+dQ9AjfM8A0KhJhioS/pOjvyQ9RyD6er42VQuHCaQNj?= =?us-ascii?Q?nm+owjzln4jlnnuEPHcZuHkHusx7chXv0EOywtMxabRi1L9R29CFshaVNpzR?= =?us-ascii?Q?427fbdI7sLLNkQ7mr8vZoLdN80nz3rZZw3mw+hb/jFuPytIs7SkyxjYb1eQA?= =?us-ascii?Q?Ajj7IfR/Z/FK/9SxyypmAzRqbwkXHEbZwn8RbJDzNjgWMIW0uEm9jo5Vv65X?= =?us-ascii?Q?fiCIopKmivqxOVL//5XeFjhQ0Ov/awEhzoqdOb3ACVp95m7sDRXa4G5CELJB?= =?us-ascii?Q?CI92+k9zUasrj5mTB40b6jaSEwXpa+zBGOniAp1kEfnikdW8xLzygV//hrBi?= =?us-ascii?Q?c7LG4ZNzNtCoiJleC5Ts0XglaTOPSZ5bfrw1rH8MNKJ/wbJ7J1jsYdiWmbsH?= =?us-ascii?Q?ybE5DRw6hAs7+PqzJH+6SgoR3OsVYnZop6wzbH3juQOxXJ6uy+XbP6o9Voru?= =?us-ascii?Q?UE/wbDBTNlpOifwTMxGrZGW535qsblOp96kW33G2Nt/rY4M6Hul/5HhYdcX1?= =?us-ascii?Q?wIuqtwTTFtZZDet0u04QzwQWHiTI6W5D8RnMQ8nfbZaKsHYqmi+UtlukzElO?= =?us-ascii?Q?RIksTer1/hkrMZ72eLFDj4nffx1woE8bOnrU11MlHsONUwuinpblJMJJH7bH?= =?us-ascii?Q?jvhc6BM4pLV67Q5MyfeKxW4UuScD7oI5LoVHsyL/5RoempPmuMAMI7Fy0KDJ?= =?us-ascii?Q?KpHZHRq7emaMIwrSHhbtiW8oygaiq+b9xRJIfR6DSI3LypbaCHTIiO6086pB?= =?us-ascii?Q?HSOcduP99JYPSzUxojZ9Vfeb5uoxV/gP1IoMmWn11momQaBf8eZ4d7ZlLDTY?= =?us-ascii?Q?5+UU8bjSHYJlld861IeUz2IBKhQ5osKq46MMYJ8vJLdfCa+i+2TxnviwYi7y?= =?us-ascii?Q?CndCLnIwb2iLHza0dCh7c56OU8ZWiitmXVWrfopqrmMBbqx83wFrVl7Lfmnd?= =?us-ascii?Q?VY7718Ggcsoiemewo5rrkq66CS0kqcaFZBMeQeNQE4EJmFl69nMA9dAHPp92?= =?us-ascii?Q?P1cSPrBJ1Q7mOFUsM1FTCUBOKbfVbFGhNLGr2G+45wXbHx8cARbC7U3I71A0?= =?us-ascii?Q?3nUT/OBQ7uWXvgqa8bw89hSn2yRu2zYYVUhI9u2fxlNqnWVU9J69kFQoX1H1?= =?us-ascii?Q?OAuCbVc5inci042QlxZ8A9lPKUG2EQ6OYw=3D=3D?= X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?XG+U0h8j0qe7CzzfX3T4WSmgcmvUo/dh87KoTlWhaO/tBOtLRfkfktxjy7N4?= =?us-ascii?Q?WkQOj0vQrMF8mNlayKxirF5riKwDPZfXFAcxbARDkuULBTSkHAMD3ySOzmqQ?= =?us-ascii?Q?dm35A6oZX6ApsesUZtqGlXdWdabaJ1Kwxlhs6mwbPVJQ5S50/tnmt0qaOwhE?= =?us-ascii?Q?uOHpc4kuKgnvE6WXpty8HfHlKT6B1QFBdIQqTulN7LGmlZipvgoTk3vOnsnq?= =?us-ascii?Q?2Yc07D1eE1QkyPAhLfXLnoMWxeBNO1m2F7sSn6hWHbPTdaYjqKHj/cUEXA4Y?= =?us-ascii?Q?ERlXrjJErV+HdApyWgvUfYhiNhTdCNFC9I+LwqVsg2WA9G7xiz7M7Q1CFZtk?= =?us-ascii?Q?+J1Bjs1MXb53yh9vHYEcNS1x9W1eb8fTvZ1IBcg3j+prQSLn/DV+2xQE42T1?= =?us-ascii?Q?5ld+Yl3gD2zjTOb/tDAyQh1lKpoNZOQk1H6WBaB58I0ct+ymFbBpljs5lL7s?= =?us-ascii?Q?kf0h4yJDAeIjfR/89WXTZK4kyxjjSfD5TwIV0VU/HpFty0W5nR1NLyS1CsVP?= =?us-ascii?Q?WRYqvrRURpOnv9UYotptNNVI5JQf2WueYIKj3FWY2baLkkh7fRpnBDT2D5rP?= =?us-ascii?Q?4uRvqnG1cA13pWkiXyHf+nXFpiIX6dQmVkB/j/b5A8r6asO1I3caXPCVJsdw?= =?us-ascii?Q?Lins3FwSsv7xLKFYi1V6ycpnKS4+AOK1DW9Oq25KaLlM6/zLxIVHE7svNjxK?= =?us-ascii?Q?SJCT+qd891bTFLMGKbgBX+3jixdGPaXsfPe1Ut+TzSggXlBYC86Ao3e8qhtE?= =?us-ascii?Q?d69RKhARjOTE77PtHTB2ZKNpxyrJmjcC9KJ13zrd+WDZSXvtBlwwSDbnM2cY?= =?us-ascii?Q?twgmxQMh3g9RzNJOYK6NIjyjj0NPot+DqEP1YxnpLPPo1h04soRFbkVic2aI?= =?us-ascii?Q?AGCShL8XiVGlSbpwjKhLLC36J9Cjzy1ZqaXTc9XnMZ5ItH7SJg5TplV3tE+y?= =?us-ascii?Q?di9maVbW/jG24oTzCvsnyydG37yZkqB1nEGIX7EYxzHIn1ZAZaUFnbrvA1h0?= =?us-ascii?Q?1SsONu4bHCK8rCwg3x2TLjE7dmCgeftJqI2JKz1PMKzfE1+9b37uTIVosv/n?= =?us-ascii?Q?HwB4VXke6sIUaMruNCLBNgq4Ev1GS0qAe7hGj9cVSK3IJWMAs3GdqsvKK3fb?= =?us-ascii?Q?Usi9fI39tlp81DY1Xcd7P/fGNScVEvHmlXjTJ8Zfzv6unWy6Vxg09W7b7L8P?= =?us-ascii?Q?9IdEFhCqTU9h62BuWaE5elgpzm6NjSMRWAvQLxeMyhWGa8Tnsj6KIpomxaa6?= =?us-ascii?Q?vjCFs8RC/l7zzwX0NGjg?= X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: f4614953-c5fa-41a8-4952-08dd9f55ae77 X-MS-Exchange-CrossTenant-AuthSource: DBAP193MB0956.EURP193.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 May 2025 08:40:50.9220 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PR3P193MB0698 Subject: [FFmpeg-devel] [PATCH v4 2/2] swscale: Neon rgb_to_yuv_half process 32 pixels at a time X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Dmitriy Kovalenko Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: === Feedback response === > Also, with that fixed, this fails to properly back up and restore registers v8-v15; checkasm doesn't notice this on macOS, but on Linux and windows, checkasm has a call wrapper which does detect such issues. I managed to rewrite the function to avoid using any callee saved registers. The only register I keep using is v7 which is not callee saved. === === === === === === === === === This patch integrates so called double bufferring when we are loading 2 batch of elements at a time and then processing them in parallel. On the moden arm processors especially Apple Silicon it gives a visible benefit, for subsampled pixel processing it is especially nice because it allows to read elements w/ 2 instructions and write with a single one (especially visible on a platforms with slower memory like ios). Including the previous patch in a stack on macbook pro m4 max rgb_to_yuv_half in checkasm goes up 2x of the c version --- libswscale/aarch64/input.S | 130 ++++++++++++++++++++++++++----------- 1 file changed, 91 insertions(+), 39 deletions(-) diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S index cf513d43d1..9e41ff6fcd 100644 --- a/libswscale/aarch64/input.S +++ b/libswscale/aarch64/input.S @@ -178,7 +178,7 @@ rgbToY_neon abgr32, argb32, element=4, alpha_first=1 .macro rgbToUV_half_neon fmt_bgr, fmt_rgb, element, alpha_first=0 function ff_\fmt_bgr\()ToUV_half_neon, export=1 - cbz w5, 3f // check width > 0 + cbz w5, 3f ldp w12, w11, [x6, #12] ldp w10, w15, [x6, #20] @@ -187,49 +187,101 @@ function ff_\fmt_bgr\()ToUV_half_neon, export=1 endfunc function ff_\fmt_rgb\()ToUV_half_neon, export=1 - cmp w5, #0 // check width > 0 + cmp w5, #0 b.le 3f - ldp w10, w11, [x6, #12] // w10: ru, w11: gu - ldp w12, w13, [x6, #20] // w12: bu, w13: rv - ldp w14, w15, [x6, #28] // w14: gv, w15: bv + ldp w10, w11, [x6, #12] + ldp w12, w13, [x6, #20] + ldp w14, w15, [x6, #28] 4: - cmp w5, #8 rgb_set_uv_coeff half=1 + + cmp w5, #16 b.lt 2f -1: // load 16 pixels + +1: .if \element == 3 ld3 { v16.16b, v17.16b, v18.16b }, [x3], #48 + ld3 { v26.16b, v27.16b, v28.16b }, [x3], #48 .else ld4 { v16.16b, v17.16b, v18.16b, v19.16b }, [x3], #64 + ld4 { v26.16b, v27.16b, v28.16b, v29.16b }, [x3], #64 .endif .if \alpha_first - uaddlp v21.8h, v19.16b // v21: summed b pairs - uaddlp v20.8h, v18.16b // v20: summed g pairs - uaddlp v19.8h, v17.16b // v19: summed r pairs + uaddlp v21.8h, v19.16b + uaddlp v20.8h, v18.16b + uaddlp v19.8h, v17.16b + uaddlp v31.8h, v29.16b + uaddlp v30.8h, v28.16b + uaddlp v29.8h, v27.16b .else - uaddlp v19.8h, v16.16b // v19: summed r pairs - uaddlp v20.8h, v17.16b // v20: summed g pairs - uaddlp v21.8h, v18.16b // v21: summed b pairs + uaddlp v19.8h, v16.16b + uaddlp v20.8h, v17.16b + uaddlp v21.8h, v18.16b + uaddlp v29.8h, v26.16b + uaddlp v30.8h, v27.16b + uaddlp v31.8h, v28.16b .endif - mov v22.16b, v6.16b // U first half - mov v23.16b, v6.16b // U second half - mov v24.16b, v6.16b // V first half - mov v25.16b, v6.16b // V second half - - rgb_to_uv_interleaved_product v19, v20, v21, v0, v1, v2, v3, v4, v5, v22, v23, v24, v25, v16, v17, #10 - - str q16, [x0], #16 // store dst_u - str q17, [x1], #16 // store dst_v + mov v7.16b, v6.16b + mov v16.16b, v6.16b + mov v17.16b, v6.16b + mov v18.16b, v6.16b + mov v26.16b, v6.16b + mov v27.16b, v6.16b + mov v28.16b, v6.16b + mov v25.16b, v6.16b - sub w5, w5, #8 // width -= 8 - cmp w5, #8 // width >= 8 ? + smlal v7.4s, v0.4h, v19.4h + smlal v17.4s, v3.4h, v19.4h + smlal v26.4s, v0.4h, v29.4h + smlal v28.4s, v3.4h, v29.4h + + smlal2 v16.4s, v0.8h, v19.8h + smlal2 v18.4s, v3.8h, v19.8h + smlal2 v27.4s, v0.8h, v29.8h + smlal2 v25.4s, v3.8h, v29.8h + + smlal v7.4s, v1.4h, v20.4h + smlal v17.4s, v4.4h, v20.4h + smlal v26.4s, v1.4h, v30.4h + smlal v28.4s, v4.4h, v30.4h + + smlal2 v16.4s, v1.8h, v20.8h + smlal2 v18.4s, v4.8h, v20.8h + smlal2 v27.4s, v1.8h, v30.8h + smlal2 v25.4s, v4.8h, v30.8h + + smlal v7.4s, v2.4h, v21.4h + smlal v17.4s, v5.4h, v21.4h + smlal v26.4s, v2.4h, v31.4h + smlal v28.4s, v5.4h, v31.4h + + smlal2 v16.4s, v2.8h, v21.8h + smlal2 v18.4s, v5.8h, v21.8h + smlal2 v27.4s, v2.8h, v31.8h + smlal2 v25.4s, v5.8h, v31.8h + + sqshrn v19.4h, v7.4s, #10 + sqshrn v20.4h, v17.4s, #10 + sqshrn v22.4h, v26.4s, #10 + sqshrn v23.4h, v28.4s, #10 + + sqshrn2 v19.8h, v16.4s, #10 + sqshrn2 v20.8h, v18.4s, #10 + sqshrn2 v22.8h, v27.4s, #10 + sqshrn2 v23.8h, v25.4s, #10 + + stp q19, q22, [x0], #32 + stp q20, q23, [x1], #32 + + sub w5, w5, #16 + cmp w5, #16 b.ge 1b - cbz w5, 3f // No pixels left? Exit + cbz w5, 3f -2: // Scalar fallback for remaining pixels +2: .if \alpha_first rgb_load_add_half 1, 5, 2, 6, 3, 7 .else @@ -239,24 +291,24 @@ function ff_\fmt_rgb\()ToUV_half_neon, export=1 rgb_load_add_half 0, 4, 1, 5, 2, 6 .endif .endif - smaddl x8, w2, w10, x9 // dst_u = ru * r + const_offset - smaddl x16, w2, w13, x9 // dst_v = rv * r + const_offset (parallel) + smaddl x8, w2, w10, x9 + smaddl x16, w2, w13, x9 - smaddl x8, w4, w11, x8 // dst_u += gu * g - smaddl x16, w4, w14, x16 // dst_v += gv * g (parallel) + smaddl x8, w4, w11, x8 + smaddl x16, w4, w14, x16 - smaddl x8, w7, w12, x8 // dst_u += bu * b - smaddl x16, w7, w15, x16 // dst_v += bv * b (parallel) + smaddl x8, w7, w12, x8 + smaddl x16, w7, w15, x16 - asr w8, w8, #10 // dst_u >>= 10 - asr w16, w16, #10 // dst_v >>= 10 + asr w8, w8, #10 + asr w16, w16, #10 - strh w8, [x0], #2 // store dst_u - strh w16, [x1], #2 // store dst_v + strh w8, [x0], #2 + strh w16, [x1], #2 - sub w5, w5, #1 // width-- - add x3, x3, #(2*\element) // Advance source pointer - cbnz w5, 2b // Process next pixel if any left + sub w5, w5, #1 + add x3, x3, #(2*\element) + cbnz w5, 2b 3: ret endfunc -- 2.49.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".