From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 4724847250 for ; Tue, 15 Jul 2025 05:29:26 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id E85E268DB4A; Tue, 15 Jul 2025 08:29:21 +0300 (EEST) Received: from PNZPR01CU001.outbound.protection.outlook.com (mail-centralindiaazon11021089.outbound.protection.outlook.com [40.107.51.89]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id A2D0368DACD for ; Tue, 15 Jul 2025 08:29:15 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=MKQq49PYdNzDciudmG7Aa4usbK5QHkbeEBB1wTLGrh7nJ+hXQtYhDEB1ydP1xZ0g4ZE5bYCbJEWMoHcY3aXSGpAkhkyTHSCkBdrehXpN6M4UaYtFIxzB0AOMI6onOhKPawL8fAfmp6nGMGZ1y6k85eHbTid1no4yADisdXBkzeYMjNA8AlNG6Rl1icqy1On+ho5u5j7hMfi1pKCfG9C3TBkT+sZCBm/1LrEOR3o+kFDk4McrSnBXHU/BC3m8jZm08XsF++txfUStoCIO6mM63l4/fPIcWzumD6nuIopHsVrKRejbgkLeMkLX+yu2MklMVwpO8LPEsIdSflHHsJjzsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=IoOm9KWQO56efvL5xTBBE6vxgcHClH85KmsZBA1CdTs=; b=xd4CsgCGM+Z6W36LY0Txc+zdRre/qgRFSRN10XWHPU50Crzhx4LPNipcTbOoQ/233bybkJxMlcbHF75V0C3pw4HuW176NeMdk021r5bClTvQGGtR3nQVMxnfnWwpAnjdVC/GsJLz0x50Bmb/tCbHQiB5sjHH7PgqLEXm0mGguB7zpWA4HGMmHNtWpit+x7onTF82DIN2OaLPX/AUVNUxEjzIArzjlXURJgq3zTIhcXU+Gml4C9zCIKE5rrNoc0wERGALdPnSffZoj6AXsxGF73tHkXQZZe19r0FSWkXWO/bMh8weWi2bDPQKGvZOpeVcGydbMzCUALXJ0HTCAre0/A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=multicorewareinc.com; dmarc=pass action=none header.from=multicorewareinc.com; dkim=pass header.d=multicorewareinc.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multicorewareinc.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IoOm9KWQO56efvL5xTBBE6vxgcHClH85KmsZBA1CdTs=; b=GALSvPW9CN5/SHZ/MlpmHei2bVOVQMbEoN0ZWvbIgOyB2IPMp3LibiL3Snw4rnT1Jr81feeG4vOL0w6KTEjYVOCv5DfoTkbF/G1lFA3RfNvfAuApHSvwyM0SFg7C6n0KAALLU4NeeqPx+327upDnvsSW8/RjvmOE1oj6QSyJVhnqC3mirO8uux+ykhGZEuGF9kvjurjzaD39A6Ud73owa64N7iiyka5rZyEkxXwSCaIlQjRnol0ifd0ZTmEHEPSFglo8WgpsylVBsLQtKaHQdOGYGgPCeRYf7HKw+YligBHUD8s63LFfdWMAGG12wiXiMA81MTlGs3E6qAFn+lzARQ== Received: from MA0P287MB1158.INDP287.PROD.OUTLOOK.COM (2603:1096:a01:e5::5) by PNXP287MB4099.INDP287.PROD.OUTLOOK.COM (2603:1096:c01:28c::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8922.32; Tue, 15 Jul 2025 05:29:10 +0000 Received: from MA0P287MB1158.INDP287.PROD.OUTLOOK.COM ([fe80::d173:abc7:2297:fdc8]) by MA0P287MB1158.INDP287.PROD.OUTLOOK.COM ([fe80::d173:abc7:2297:fdc8%3]) with mapi id 15.20.8922.028; Tue, 15 Jul 2025 05:29:10 +0000 From: Harshitha Sarangu Suresh To: FFmpeg development discussions and patches Thread-Topic: [FFmpeg-devel] [PATCH] swscale/aarch64/output: Implement yuv2nv12cx neon assembly Thread-Index: AQHb7MRKl0IhR2Y/xEaM85YDCqNtT7QqzrpEgAfqc10= Date: Tue, 15 Jul 2025 05:29:10 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-IN, en-US Content-Language: en-IN X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: x-ms-reactions: allow authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=multicorewareinc.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: MA0P287MB1158:EE_|PNXP287MB4099:EE_ x-ms-office365-filtering-correlation-id: 40cdd7ac-27ad-4555-bd68-08ddc360870d x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|376014|1800799024|366016|8096899003|38070700018|7053199007; x-microsoft-antispam-message-info: =?us-ascii?Q?e4gk5yXVLCd0kUwp+wvQFkF8Tgwk5H3FDZFSBkHID3Gi5MtAczdl+shKNg6K?= =?us-ascii?Q?uiE8tP3F+LxpfyoyOshzqcQZ1g+qcs7bsYTEwrQVvsQgJq+ELjGgXFOYbL5N?= =?us-ascii?Q?rGcW2xgGLIuVir2+xHiuXkrWjj46RHm+OXf3sgPFkQOkH+TV9mjyz9yaxsI+?= =?us-ascii?Q?aMOVN+VV+eCSf+mQe+/xj4I+3TeqPxUT7NCZsfVE4vWDpGD7rSVBzfjTAjKY?= =?us-ascii?Q?sP1ra4JDd2RNUHyUEOB82+uK9B+EYLLjh4tZ9jfgayFGNUtt4V7M9yVMNzmv?= =?us-ascii?Q?kd0ie0RNbiB8p0DCzgsl2HxXY6LOWFGfRLkYkxhYTNKVYuzn9x5GpFXD6PQq?= =?us-ascii?Q?tTAFnDLCfc5bqWM7ZZq33ePwAmJNenMedu+ai0h+PPWCUgDnrKayYY7G1Wt3?= =?us-ascii?Q?IrKVHbZRQ84lQqam5Gd0Sc2uEGdkeD+fADgmXnShU2FzJLOx+KD1Ugyvw8uu?= =?us-ascii?Q?d1jpljHu7KyLP8eb3Yquy7q1YII1QWTQRAjBgzokZRI5QO2ob9LAl/US6Bvo?= =?us-ascii?Q?XcKBbklXekxO291pSMgK71aUzfSOhJZtVc+lvqTprlVrsLAMPQnckZA1Ytqo?= =?us-ascii?Q?AV5RwzPQdJXuj1Tn/g/1MsrVjWUHXvc/MSsp3qfKrqgH1aFOapTksT+sgSbE?= =?us-ascii?Q?kKwughtMkDsANIpwtxL2DdspnnzA8LOPavTtUUSikVdWtnoFFe1B3Ua12jcK?= =?us-ascii?Q?g7Mn4bVNztNk3/lZqs6xiFP1qoYrgNxnQlQoDomH4Xt5C0ADMQ7fdcBDKttr?= =?us-ascii?Q?HNXAxuq54L90d3l2ks0T2B7Yj0yzNqMTL8uzDnNwHaP+F2wO6ruC5q7cUshe?= =?us-ascii?Q?Sy9x4lWXgJ9TD2cSQTcENPt/gpIA4SRkcKzki0V3aMpfu1HllggqL7SUety/?= =?us-ascii?Q?cRuPJ7xbH/fkkVxRTAUfVTzot1VFei5K545PCvpg8UURH8Se+OjXJrNm9XWr?= =?us-ascii?Q?UTxAV5TSfiUU5ffHtoY4ihZtJFHYti3ffxSOE/KgT0I1MwVggMQ+S2Q+QK9j?= =?us-ascii?Q?0i/WfNQh5kTre6qWoogCIJu7pA4OZ072FowYKAA7/PJc140AjE4CwFZbavVL?= =?us-ascii?Q?QkoWYWgP7l7abgwyEQ0gc+zZGEAEm025JrzsNLEBrpDWiBMpjK26TpQJ3V2Y?= =?us-ascii?Q?QRgpSTKXUlWSt4Mf1VrXJUN85N34cDVWixIvkrEHvqvlsb4qHCKG0z0qg0Xe?= =?us-ascii?Q?H0DE6L4kLtDeUybRB0CBdPcvFAFejNgzR5nKqneCuMhnK39ZiFF0RycIHIGn?= =?us-ascii?Q?OPY2/UbKI+uh3R7vwqRLXK/XDJYPFLfU7Ae7MVH3+lsFkyIgNHXCLfAdLZON?= =?us-ascii?Q?XsrKW3LVsVtNO26EdjIVVxrDbKnj+KoSIQ9M+OdU3WmCBTPgkzJ2QkC5HCCm?= =?us-ascii?Q?xrV0IT95v8oECLK86fRN4p75SUSX86IkgQYVmkvgLljv9cGIu4BiJWAONwGU?= =?us-ascii?Q?33pgWAI63FQ=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:MA0P287MB1158.INDP287.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFS:(13230040)(376014)(1800799024)(366016)(8096899003)(38070700018)(7053199007); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?2pyrPVFRZ+En4vp6cRbYjMR20tah0f8F/LW26/ZJP0x6ddaCRfIKSWs7Whn2?= =?us-ascii?Q?Ygsa/ENAN1VYLDw4Kd3+TWtnmmrqiB3BptgAPdim1aMuUs0HcMsLmR1xW18V?= =?us-ascii?Q?lLa1ekO1+2Lch6Ub0690CmZz7Og3OsXHnEX5IAmaUUNQuFA9ro/L30ajzIAP?= =?us-ascii?Q?hx6LINfj2yCR4Ru5shZJtZLJ3rX4ddxBHExNe9pBGLmonqrsgy707x5+N3IP?= =?us-ascii?Q?QKSZOMOuARTG0b/2qHAsdXKKaMNgXkt1SrGuGoADEFw/bZKAyMBcsVg19eD4?= =?us-ascii?Q?ciNpdNt2C+suW3ZU047aRq8lWvwvZusG+evJeQvReXqS83j5TIP3ppINBNc/?= =?us-ascii?Q?kMCogcmsnggaF+OUzcGtz0hCmuan17rOzsIzozupSnX96/cXdrk83JD7OWx6?= =?us-ascii?Q?GN321DIl6Nu5hTMn5O6dq7inz/sZJi+CU33SSrmYNKV7yYioWBgu9QQVRefk?= =?us-ascii?Q?t9Uckz18Of0m7+Hr86/i4htyHIeD9z++fKNPX1gNGTKKd14lSYKyeRpPXvlU?= =?us-ascii?Q?haZ7SuK200SdGeIkxAv45CDJgpdsrl5DgI6km98G62tXEqYHz7riiUeb4JOh?= =?us-ascii?Q?crwYnwm0tL21GOWfIQxTOUgy6uupzFijmP9drJoRxPOhav6Hvs01RUCp9VQi?= =?us-ascii?Q?O0oysx3Wi7Py4MdlzLlSGsQ1kpgeI6htkNMVjTV5NtV0SPt27ZRRY4ZwlgPU?= =?us-ascii?Q?K503MRKF/kU3GNwCNfW9pOvYkfInXU3vivNiDS+TC6HKTz/ijLqrlLInHObJ?= =?us-ascii?Q?aO7zHAEYMP79WVRWMDxGEIIqPWPvhcqzZc+byZb8vJ5r1xjJux4/1qzKEWTJ?= =?us-ascii?Q?/DuUJDGiFQ7N2nT4Wg/wKtPrCPU7tXax0S9+s8yYoryC+wY8uaqDVB/g2TWj?= =?us-ascii?Q?gh7BXDxKax2F/JgqNiL8xuiXrOiEgvrXMhsI1SNJi0rlQOj2p2Yuk8U4Q60Z?= =?us-ascii?Q?++t8Yyy9g1l24IZML+HT1EaBX5aTzLVBdzHN0Ow4ksTL3nqGzAYdn4bzs5cx?= =?us-ascii?Q?paepPVi1s1j/3aOkBWlMyeEIarJWMY6TP/HjCMs4wRhZFYLOaVbNJKKWI33y?= =?us-ascii?Q?B9Oj+pmiq6guS3t8pZH+EBe/tLzI/PbQKuBVQSR55TquMWO2odhh13Bw16yJ?= =?us-ascii?Q?8rdG7hKnCTw0osqyADURezB6GNVw6hSgjridKy8cTY+KWXk7RRFZL8DWnyft?= =?us-ascii?Q?6Z/KngHCjgEdzwZyZ8N5ZlV6Zr1AAe6TeT7qalbd0BbMS3w56IeIhzg78hj4?= =?us-ascii?Q?TQzQI3nJrG0taFe1h4+9IsF4UGbresztd/R3HrvWjbfs8+GO3819NMI3WUZl?= =?us-ascii?Q?9+1F4eYN0/kqWqy+F4x5ZH8Wp33bOskq2AiXTubJ38rl0y08+Ow3JCFIZvPy?= =?us-ascii?Q?1Bbe/SUOT4iGPIrovBWkeiVWQy8Bux6qqwglwP80NFETf8YCWfsCU+9YOG2O?= =?us-ascii?Q?k1AETGop/ZsFV09mRcEdjasdn7qGjDyW3PDQU2i/bOmWd3SCzNQ/TUHy2cuG?= =?us-ascii?Q?935wgQXF5s1uvpED3paGAqH6AhK+zYEJTiSKayWNF5t1AgLWQ81JjIYg0Zcm?= =?us-ascii?Q?qAuvfOvpzINekjZZVkEUrvAZ5741Le4zLtPqp901a5ZCFWooDuyIjnB14hJy?= =?us-ascii?Q?DXtfonUYhmNie550Ky60VwE=3D?= MIME-Version: 1.0 X-OriginatorOrg: multicorewareinc.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MA0P287MB1158.INDP287.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: 40cdd7ac-27ad-4555-bd68-08ddc360870d X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Jul 2025 05:29:10.6146 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: ffc5e88b-3fa2-4d69-a468-344b6b766e7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: xrcJ0/Fr7F2JXm+u1PwRsC3Tk/3q8wTYxVOJ3njEH8EjXe4+p0ADvMwj2J1KbffLtaE2MydvbgvlHX4FsaTbfECY4H6KDqy5lb7tJKZ6BRc= X-MS-Exchange-Transport-CrossTenantHeadersStamped: PNXP287MB4099 X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [FFmpeg-devel] [PATCH] swscale/aarch64/output: Implement yuv2nv12cx neon assembly X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Dash Santosh Sathyanarayanan , Logaprakash Ramajayam Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Hi, Can you please review the patch? Thanks. Get Outlook for Android ________________________________ From: Harshitha Sarangu Suresh Sent: Thursday, July 10, 2025 10:06:19 AM To: FFmpeg development discussions and patches Cc: Logaprakash Ramajayam ; Dash Santosh Sathyanarayanan Subject: Re: [FFmpeg-devel] [PATCH] swscale/aarch64/output: Implement yuv2nv12cx neon assembly Hi, Can you please review the patch? Thanks. ________________________________ From: Harshitha Sarangu Suresh Sent: 04 July 2025 15:01 To: FFmpeg development discussions and patches Cc: Logaprakash Ramajayam ; Dash Santosh Sathyanarayanan Subject: [FFmpeg-devel] [PATCH] swscale/aarch64/output: Implement yuv2nv12cx neon assembly Handled all the comments, removed code duplication for swapped and non-swapped case. Checkasm Benchmark Results yuv2nv12cX_2_512_accurate_c: 3496.2 ( 1.00x) yuv2nv12cX_2_512_accurate_neon: 409.5 ( 8.54x) yuv2nv12cX_2_512_approximate_c: 3495.1 ( 1.00x) yuv2nv12cX_2_512_approximate_neon: 409.4 ( 8.54x) yuv2nv12cX_4_512_accurate_c: 4676.5 ( 1.00x) yuv2nv12cX_4_512_accurate_neon: 613.1 ( 7.63x) yuv2nv12cX_4_512_approximate_c: 4677.8 ( 1.00x) yuv2nv12cX_4_512_approximate_neon: 607.8 ( 7.70x) yuv2nv12cX_8_512_accurate_c: 7221.6 ( 1.00x) yuv2nv12cX_8_512_accurate_neon: 1003.8 ( 7.19x) yuv2nv12cX_8_512_approximate_c: 7221.2 ( 1.00x) yuv2nv12cX_8_512_approximate_neon: 1016.4 ( 7.11x) yuv2nv12cX_16_512_accurate_c: 13731.1 ( 1.00x) yuv2nv12cX_16_512_accurate_neon: 1757.2 ( 7.81x) yuv2nv12cX_16_512_approximate_c: 13740.7 ( 1.00x) yuv2nv12cX_16_512_approximate_neon: 1757.3 ( 7.82x) >From 4ca92570924ba42e20041feaae6d7488c02c1e6a Mon Sep 17 00:00:00 2001 From: Harshitha Suresh Date: Fri, 4 Jul 2025 14:29:11 +0530 Subject: [PATCH] swscale/output: Implement yuv2nv12cx neon assembly --- libswscale/aarch64/output.S | 233 +++++++++++++++++++++++++++++++++++ libswscale/aarch64/swscale.c | 24 ++++ 2 files changed, 257 insertions(+) diff --git a/libswscale/aarch64/output.S b/libswscale/aarch64/output.S index 190c438870..92dec3f0ed 100644 --- a/libswscale/aarch64/output.S +++ b/libswscale/aarch64/output.S @@ -226,3 +226,236 @@ function ff_yuv2plane1_8_neon, export=1 b.gt 2b // loop until width consumed ret endfunc + +function ff_yuv2nv12cX_neon_asm, export=1 +// w0 - isSwapped +// x1 - uint8_t *chrDither +// x2 - int16_t *chrFilter +// x3 - int chrFilterSize +// x4 - int16_t **chrUSrc +// x5 - int16_t **chrVSrc +// x6 - uint8_t *dest +// x7 - int chrDstW + + stp x19, x20, [sp, #-16]! + stp x21, x22, [sp, #-16]! + + ld1 {v0.8b}, [x1] // chrDither[0..7] + ext v1.8b, v0.8b, v0.8b, #3 // Rotate for V: (i+3)&7 + + uxtl v0.8h, v0.8b + uxtl v1.8h, v1.8b + + ushll v2.4s, v0.4h, #12 // U dither low + ushll2 v3.4s, v0.8h, #12 // U dither high + ushll v4.4s, v1.4h, #12 // V dither low + ushll2 v5.4s, v1.8h, #12 // V dither high + + mov x8, #0 // i = 0 +1: + cmp w7, #16 + blt 7f + + mov v16.16b, v2.16b // U acc low + mov v17.16b, v3.16b // U acc high + mov v18.16b, v4.16b // V acc low + mov v19.16b, v5.16b // V acc high + + mov v20.16b, v2.16b + mov v21.16b, v3.16b + mov v22.16b, v4.16b + mov v23.16b, v5.16b + + mov w9, w3 // chrFilterSize counter + mov x10, x2 // chrFilter pointer + mov x11, x4 // chrUSrc base + mov x12, x5 // chrVSrc base + +2: + ldr h6, [x10], #2 // Load filter coefficient + + ldr x13, [x11], #8 // chrUSrc[j] + ldr x14, [x12], #8 // chrVSrc[j] + add x13, x13, x8, lsl #1 // &chrUSrc[j][i] + add x14, x14, x8, lsl #1 // &chrVSrc[j][i] + add x15, x13, #16 + add x16, x14, #16 + + ld1 {v24.8h}, [x13] // U samples 0-7 + ld1 {v25.8h}, [x14] // V samples 0-7 + + ld1 {v26.8h}, [x15] // U samples 8-15 + ld1 {v27.8h}, [x16] // V samples 8-15 + + smlal v16.4s, v24.4h, v6.h[0] + smlal2 v17.4s, v24.8h, v6.h[0] + smlal v18.4s, v25.4h, v6.h[0] + smlal2 v19.4s, v25.8h, v6.h[0] + + smlal v20.4s, v26.4h, v6.h[0] + smlal2 v21.4s, v26.8h, v6.h[0] + smlal v22.4s, v27.4h, v6.h[0] + smlal2 v23.4s, v27.8h, v6.h[0] + + subs w9, w9, #1 + b.gt 2b + + sqshrun v28.4h, v16.4s, #16 // Process and store first 8 pixels + sqshrun2 v28.8h, v17.4s, #16 + sqshrun v29.4h, v18.4s, #16 + sqshrun2 v29.8h, v19.4s, #16 + + cbz w0, 3f + uqshrn v24.8b, v28.8h, #3 // Storing U + uqshrn v25.8b, v29.8h, #3 // Storing V + st2 {v24.8b, v25.8b}, [x6], #16 + b 4f +3: + uqshrn v24.8b, v29.8h, #3 // Storing V + uqshrn v25.8b, v28.8h, #3 // Storing U + st2 {v24.8b, v25.8b}, [x6], #16 + +4: + sqshrun v28.4h, v20.4s, #16 // Process and store next 8 pixels + sqshrun2 v28.8h, v21.4s, #16 + sqshrun v29.4h, v22.4s, #16 + sqshrun2 v29.8h, v23.4s, #16 + + cbz w0, 5f + uqshrn v30.8b, v28.8h, #3 // Storing U + uqshrn v31.8b, v29.8h, #3 // Storing V + st2 {v30.8b, v31.8b}, [x6], #16 + b 6f +5: + uqshrn v30.8b, v29.8h, #3 // Storing V + uqshrn v31.8b, v28.8h, #3 // Storing U + st2 {v30.8b, v31.8b}, [x6], #16 + +6: + subs w7, w7, #16 + add x8, x8, #16 + b.gt 1b + +7: + cmp w7, #8 + blt 12f +8: + mov v16.16b, v2.16b // U acc low + mov v17.16b, v3.16b // U acc high + mov v18.16b, v4.16b // V acc low + mov v19.16b, v5.16b // V acc high + + mov w9, w3 // chrFilterSize counter + mov x10, x2 // chrFilter pointer + mov x11, x4 // chrUSrc base + mov x12, x5 // chrVSrc base + +9: + ldr h6, [x10], #2 // Load filter coefficient + + ldr x13, [x11], #8 // chrUSrc[j] + ldr x14, [x12], #8 // chrVSrc[j] + add x13, x13, x8, lsl #1 // &chrUSrc[j][i] + add x14, x14, x8, lsl #1 // &chrVSrc[j][i] + + ld1 {v20.8h}, [x13] // U samples + ld1 {v21.8h}, [x14] // V samples + + smlal v16.4s, v20.4h, v6.h[0] + smlal2 v17.4s, v20.8h, v6.h[0] + smlal v18.4s, v21.4h, v6.h[0] + smlal2 v19.4s, v21.8h, v6.h[0] + + subs w9, w9, #1 + b.gt 9b + + sqshrun v26.4h, v16.4s, #16 // Final processing and store + sqshrun2 v26.8h, v17.4s, #16 + sqshrun v27.4h, v18.4s, #16 + sqshrun2 v27.8h, v19.4s, #16 + + cbz w0, 10f + uqshrn v28.8b, v26.8h, #3 // Storing U + uqshrn v29.8b, v27.8h, #3 // Storing V + st2 {v28.8b, v29.8b}, [x6], #16 + b 11f +10: + uqshrn v28.8b, v27.8h, #3 // Storing V + uqshrn v29.8b, v26.8h, #3 // Storing U + st2 {v28.8b, v29.8b}, [x6], #16 +11: + subs w7, w7, #8 + add x8, x8, #8 + +12: + cbz w7, 17f // Scalar loop + +13: + and x15, x8, #7 + ldrb w9, [x1, x15] + sxtw x9, w9 + lsl x9, x9, #12 // u = chrDither[i & 7] << 12; + + add x15, x8, #3 + and x15, x15, #7 + ldrb w10, [x1, x15] + sxtw x10, w10 + lsl x10, x10, #12 // v = chrDither[(i + 3) & 7] << 12; + + mov w11, w3 // chrFilterSize counter + mov x12, x2 // chrFilter pointer + mov x13, x4 // chrUSrc base + mov x14, x5 // chrVSrc base + +14: + ldrsh x16, [x12], #2 + + ldr x17, [x13], #8 // chrUSrc[j] + ldr x19, [x14], #8 // chrVSrc[j] + add x17, x17, x8, lsl #1 // &chrUSrc[j][i] + add x19, x19, x8, lsl #1 // &chrVSrc[j][i] + + ldrsh x20, [x17] + ldrsh x21, [x19] + + madd x9, x16, x20, x9 + madd x10, x16, x21, x10 + + subs w11, w11, #1 + b.gt 14b + + asr x9, x9, #19 // Process and store U and V + asr x10, x10, #19 + + cmp x9, #0 + csel x9, x9, xzr, ge + cmp x10, #0 + csel x10, x10, xzr, ge + + mov x22, #1 + lsl x22, x22, #8 + sub x22, x22, #1 + + cmp x9, x22 + csel x9, x22, x9, gt + cmp x10, x22 + csel x10, x22, x10, gt + + cbz w0, 15f + strb w9, [x6], #1 // Storing U + strb w10, [x6], #1 // Storing V + b 16f +15: + strb w10, [x6], #1 // Storing V + strb w9, [x6], #1 // Storing U + +16: + subs w7, w7, #1 + add x8, x8, #1 + b.gt 13b +17: + ldp x21, x22, [sp], #16 + ldp x19, x20, [sp], #16 + ret + +endfunc diff --git a/libswscale/aarch64/swscale.c b/libswscale/aarch64/swscale.c index 6e5a721c1f..a7dcc451dc 100644 --- a/libswscale/aarch64/swscale.c +++ b/libswscale/aarch64/swscale.c @@ -168,6 +168,28 @@ void ff_yuv2plane1_8_neon( const uint8_t *dither, int offset); +void ff_yuv2nv12cX_neon_asm(int isSwapped, const uint8_t *chrDither, + const int16_t *chrFilter, int chrFilterSize, + const int16_t **chrUSrc, const int16_t **chrVSrc, + uint8_t *dest, int chrDstW); + +static void ff_yuv2nv12cX_neon(enum AVPixelFormat dstFormat, const uint8_t *chrDither, + const int16_t *chrFilter, int chrFilterSize, + const int16_t **chrUSrc, const int16_t **chrVSrc, + uint8_t *dest, int chrDstW) +{ + if (!isSwappedChroma(dstFormat)) + { + ff_yuv2nv12cX_neon_asm(1, chrDither, chrFilter, chrFilterSize, + chrUSrc, chrVSrc, dest, chrDstW); + } + else + { + ff_yuv2nv12cX_neon_asm(0, chrDither, chrFilter, chrFilterSize, + chrUSrc, chrVSrc, dest, chrDstW); + } +} + #define ASSIGN_SCALE_FUNC2(hscalefn, filtersize, opt) do { \ if (c->srcBpc == 8) { \ if(c->dstBpc <= 14) { \ @@ -275,6 +297,8 @@ av_cold void ff_sws_init_swscale_aarch64(SwsInternal *c) ASSIGN_VSCALE_FUNC(c->yuv2plane1, neon); if (c->dstBpc == 8) { c->yuv2planeX = ff_yuv2planeX_8_neon; + if (isSemiPlanarYUV(c->opts.dst_format)) + c->yuv2nv12cX = ff_yuv2nv12cX_neon; } switch (c->opts.src_format) { case AV_PIX_FMT_ABGR: -- 2.36.0.windows.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".