From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 6B4104E8AA for ; Thu, 12 Jun 2025 05:25:32 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 9A3BC68B882; Thu, 12 Jun 2025 08:25:28 +0300 (EEST) Received: from PNYPR01CU001.outbound.protection.outlook.com (mail-centralindiaazon11020089.outbound.protection.outlook.com [52.101.225.89]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 250C968A247 for ; Thu, 12 Jun 2025 08:25:21 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Xq2LOvjpmOF8KRhNxaLgCpX+Y1FWvWBzrDSTWYrI9otv1o54BVRJawIYSG0xL4pzzGKq0fEcrxmf/Ir5/GLvgLAbcptJdEa5qQvgkCM6ELZgpVnZmE8XEGmt6EujfgmC4IvEhVJTpVax6txu1N+dpSDTJN5X3IY+ED1KfNQILUGY7cUoX/BUe32yEp0tEb11uXuWtDJha/ftF9YE/Pp6SypVIVPuBTrP0Rdo49VjmRlJl7oxMaqpeEHCVaV3oP/rVELt8aJkSiQNqalhD4j47kVeFSxz/w4wMGi7/sdLOAbvNQLWw3H5/wIc/bpLHAVoKQ4J1AG5BsJe19DTl9aHAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=EZogawh29U2nn82vBbYYn5ahLggJmcBKsnozzUSOcDU=; b=uYxYBTApGUigBZwlMrNh5FlHMgJkWDPyDz4rXCIY0PDqaWPPEIi5IloCy5AmPCLoxNLMyZapxyjbaxcGENRucClGP+sThHUANxkYJSxGqtzJflJfVS92c15Kxhk4+lpK18XZzykf07zJdO5OunnlYpcf9aLy1XENzbVXCm8oxwnbQ8b7XdrbZHbQ+FLYd/5E8Fasp4pQwGm1VYjFmVunJROLGsKPFxDIBnpmwZfioRMxlJHQShckeaMvpELf3bjHlDqtggx+5OF0Lj8+vdJKjYpK/kE5E94Dxsk9Pc8C2I9ei2iUsLGo1X2Eg5GMrxa1ck1tTuSW0HEeh9Kcis0DGA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=multicorewareinc.com; dmarc=pass action=none header.from=multicorewareinc.com; dkim=pass header.d=multicorewareinc.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multicorewareinc.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=EZogawh29U2nn82vBbYYn5ahLggJmcBKsnozzUSOcDU=; b=xSgNdQ4xlNiAvFFDiRG0DxJOA4jbd4SuSLEhUSoYvPvL4T/06XxtO/02RiUD8IOW1qXXdTIreFzZZnKiDaEdcihpsDOcRxhGqcZH2TTavfuLG74yxk/x0xXpuqVU2o88ed7+jNNZJBCOJqiYdtnLvUj2tzwsBs1AQrT/mL7PLkyqDWQd86eAw8jIrEKhTjkrANUe2vJedF135mJICqV+mlKNis2GCTDU/sxkse+MbPDqz+yFM4EQaWLJy71cD9972pGjgjqzEHCeGmM/k0uupPX52kkbumL7K7I4JCBNd7Zjn3HIdvC9519bc1B41ztxsCjXx7hpxjggqw6BJX5kjw== Received: from PN3P287MB3339.INDP287.PROD.OUTLOOK.COM (2603:1096:c01:22d::6) by MAYP287MB3933.INDP287.PROD.OUTLOOK.COM (2603:1096:a01:150::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8835.22; Thu, 12 Jun 2025 05:25:15 +0000 Received: from PN3P287MB3339.INDP287.PROD.OUTLOOK.COM ([fe80::4a22:77a3:8f7d:445]) by PN3P287MB3339.INDP287.PROD.OUTLOOK.COM ([fe80::4a22:77a3:8f7d:445%4]) with mapi id 15.20.8835.019; Thu, 12 Jun 2025 05:25:15 +0000 From: Logaprakash Ramajayam To: FFmpeg development discussions and patches Thread-Topic: [FFmpeg-devel] [PATCH] swscale/aarch64/output: Implement neon assembly for yuv2planeX_10_c_template() Thread-Index: AQHb21pglgocXGTna0ezHtHWWSmAzw== Date: Thu, 12 Jun 2025 05:25:15 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=multicorewareinc.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: PN3P287MB3339:EE_|MAYP287MB3933:EE_ x-ms-office365-filtering-correlation-id: da16e626-748c-4387-b747-08dda9718321 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|366016|1800799024|376014|38070700018|8096899003|7053199007; x-microsoft-antispam-message-info: =?us-ascii?Q?/TuIJYzYDDZfvY8Npccet1alNU3ldpM0d9RADlV6wiY+AucQ/Hmb7Wk4FPQx?= =?us-ascii?Q?gYpk8tqRGC6S5hNOVxGBXv9DFlFrZqF4S/jCAxvZGXVWT2zXqBZZCEoVnkbk?= =?us-ascii?Q?fI2b9q51V/iRaVvGkffXfR5qEXTR7qN7wpdHhGs+XAM3Ngr0wcForVUTO0XH?= =?us-ascii?Q?L2lMEKPWehAa4Im47VvNB75GDX2pvInf6SROi3oQ3ZrGcCqGnp/JCXV6LFOj?= =?us-ascii?Q?49zMHwY3pFu5I32vSAppfOYIGXGP5IVfaQ4jn+0zbeBHvk7xcnuG5SEe/JL6?= =?us-ascii?Q?gCHXdqtxUbZmzrx1kS/mbP8SD6a705f8U+s1A/Fy22pPMKkKoZ28w7VEooJ+?= =?us-ascii?Q?LP9WXoWlErFoeo/N53YD3avz1HXv+kXzIZhrPlT86woKcKYTBndtr40ZGPUQ?= =?us-ascii?Q?ESqfv7hwStReXwAw1GpanNfldxqh9NjWN3d0WY+6HnnYGugEjHpC5/+JEZHo?= =?us-ascii?Q?ACtWC3sgAO/l/kV4/6sMjCz+/xGBv1W6oca5hXENK6MaL3jZvIbPSZD76Tih?= =?us-ascii?Q?6eyjMEd0U7Eowav5qxcqiOAWGMQxt+TUw89rCmfwNlUb7CtrgGXX/kLulrTe?= =?us-ascii?Q?esj72j40Lnr4Krk6dU+e1NK5s2vSmQVHfGxOQ3kztslHHjBjoRrghTBvIRUk?= =?us-ascii?Q?Hg8O7epHvXRA4LAgQwvcDZfRZDeaQqTq25t9tlm8kkMjlIqZTkmtrZTh94Fc?= =?us-ascii?Q?VeEh2xGm5FJqDSzgply0JeYTyge8m3RTBULxRrP/Y4y28zmi3K5d80t4cj3Q?= =?us-ascii?Q?/wL02Ta+sEM1vYlk3f77+3nIAu4/GGwx3OhdgEs/VgDMVWMawuNNjcCrcEN7?= =?us-ascii?Q?5YV4FCguusreqQ5NOrmnuGjb0llPbwHkTZxP6HuEI3tWJ6YLJ3L+ImOIKo0l?= =?us-ascii?Q?ALyESAFaqL0+jRNCxULuqTODqR6TIEiOFdYRNvlKuh7CRK+k8Z2upN+Ww/Rh?= =?us-ascii?Q?aVQJUgco8TD0hocOQoaZOSAonzjo1Y54sSE0j5dWbKYSjquYCkdXPW2b2/gP?= =?us-ascii?Q?3HI9z65WSn2wvlnCHgjI3LWmtvMNY5oE34ZZIRqjTLcuCue0pphJaXwCdnkp?= =?us-ascii?Q?z7KbHw1FaZhl7zDRdprAEXKjydsQKiUXH1YFp3hj+8KkulgSH/LnIHnikGwD?= =?us-ascii?Q?cEOHyBO7f66ouHW3SOcxb/Ye9I5nJjD/QbotivxSlWxIS5R5g9ugsUkqPj7q?= =?us-ascii?Q?R4iUgMEuzzB4PacFaG2+X1JcqeWhlpqSYVLzV0Sme0dFk913xvZTa7BoFHnK?= =?us-ascii?Q?hR4NI8unXdyv/GRMd3hu9kuzrmYE1867Ag/+gbjrBZy/3Lxy9mxnjvFmsWBq?= =?us-ascii?Q?urSBLTSCmy081Zz5AV+stzrE89gNs+9x8Fnb0u6R7Fib8vWzmguWUhqrT+OW?= =?us-ascii?Q?6/5xvpqYtIHbG49NOiHuxeH8uTCW0jZQMpmxbf2kRwSu+W+OB84MOsnBio2G?= =?us-ascii?Q?ecHNU0M/bt3QHMFwbBJVu7yvIxGaBIH5l8bw3AQeBaC4TAYL3Flj1w=3D=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PN3P287MB3339.INDP287.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFS:(13230040)(366016)(1800799024)(376014)(38070700018)(8096899003)(7053199007); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?AFnnBHpWxu3ryt6VZEmfiFqnCHOSiSdLENslwlfGzp0yj6xbgVIBzW6NorGr?= =?us-ascii?Q?N99MRkI1iOEjMgr+DuA0wXyQcdEJsmFNhdlYgidX+x3NZ5rUTQpDsWU2dpAC?= =?us-ascii?Q?73axpH2so6JYJ80Z5qCezg+/3Yg/TCaq4S/mh+S/fHbgOKCrkrSJxA6NZ7GC?= =?us-ascii?Q?S2Fl/ttIBs8JvO6uKCiMnDhGu/2HWRQ6304gGlpBw7PoAYU2aUbbCDpBTOfM?= =?us-ascii?Q?GUlLN8S7HZrW1i0XhPG0Y2h37BtrDFEk93lg4X2R2+dQVwf/LpbvKum6ARxq?= =?us-ascii?Q?FULD7LDRnqv9l2R17WubFVNTNFi+YbeFHEP7rTrS0vFnWr30Z0wtNVgSqmDO?= =?us-ascii?Q?KXrV9Zm3y03FofGz5nT6ea67OIerJv2dvmUCuBNKP7MsMf0NOUuPjDIi0Bb2?= =?us-ascii?Q?5SkEap5KZ/BgC8PPA3ZSmUDtee6tOdmPILmX6yrO+1418F/YQqY5OJLaLP+v?= =?us-ascii?Q?e5QMuFTFa0bg+o1OPIb80LRZ+J0XBfTDhfJ5w/MTnjLdwi4bjq5i120UHtij?= =?us-ascii?Q?KV2V4ClAAv6vsUiiM5kF9bYgPirQctCJ6Rwkrf1FwM8GW8TDdmwSAG8eYtew?= =?us-ascii?Q?o5MptjQu4dzYu+UPRYYwycMGVhGYhqrLBNXAOlmQ5rRIiMp31YBSws5/luTa?= =?us-ascii?Q?J0lBa1Qnj57gNpwLVBglIodoCjbxt5TvyIfcEZ5MEXPgMYcrEzyCjqkjK2Iu?= =?us-ascii?Q?GWQHngMi10vNBqpYhj//gAZXyftqMYGhehhav4ufhhEDmtwjBybs9rXgqAQF?= =?us-ascii?Q?3IsIxGs1FKp1ZFZvkPJgjYK0eoTDx2AxaiKXc1wJTKDWjvZXoroUwS4QjlGf?= =?us-ascii?Q?LkMjZHoGncOr+6jBpTICm2/bbODdppHx7Hh3RcdohS7krIwXDNbTOEHxbV3M?= =?us-ascii?Q?rD0Nr2K92vAQYohkad3n562gHm9xczaUFzaxS9sw6aagw6kCW6v2Ux+4+dEX?= =?us-ascii?Q?3VlY+zdJgCVjrjeYK3QyoZSOGViolj0P+fIiqz5eiHuEHQ/XErJYRq/b5UrQ?= =?us-ascii?Q?k739yqd1+XteqbbQbXqQ2J6Z+sKymiMK3blHsrebsqLPW41GN7Nz//2bg8G8?= =?us-ascii?Q?yjN4avlDKs5o2vYivXc0IdEqjKa7bBfZGvUeN+4HWYY8pawg8yT9ioYkJxAx?= =?us-ascii?Q?YeRCZJRqNqM/THeOHUFPdaICZUwY09SshcP0A6pPOnVqHkxQbd11NvYbr9Fd?= =?us-ascii?Q?RN5pvesHXaQUAio27cf+fcP3v0G/ffY6f6/T7lkUYgCSCnKqxWHgv5FLuyRP?= =?us-ascii?Q?qvE99pbOaMTLDonTN9SyA1nRHBA3FhTaghcZk/C50vBvqXdQZs15Usw3OARJ?= =?us-ascii?Q?gWp3p+HZ0iNnhMPEefHkw/cEc+Nz3dsLazkOSL568WlK9/5ONWKIEejReOE5?= =?us-ascii?Q?uH0WqJRbAcD84wDkl9ia0tyVOZHnuPowog/wyB3wLDsBK4Hhd1HAerYK/1XD?= =?us-ascii?Q?8nBGqH5+1KaCbRutrGSvveLObqSsz361R/G2ee0lWL1LeDnIbJlTWEcnO3aq?= =?us-ascii?Q?1JnuXpl87C5TJ/uFZkb4ZKX3OI4kijyQ2A3nc1XMge2Q1Vyj2gKsjvBI8iIK?= =?us-ascii?Q?tp7OZ75PAaPhy8NbypTXBQ4rTwAyZRozIH5ri1AGFMAdfksV1444JCNQW7Ev?= =?us-ascii?Q?SIwStxQDitYCXUDcC2BdMc8=3D?= MIME-Version: 1.0 X-OriginatorOrg: multicorewareinc.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: PN3P287MB3339.INDP287.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: da16e626-748c-4387-b747-08dda9718321 X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Jun 2025 05:25:15.2405 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: ffc5e88b-3fa2-4d69-a468-344b6b766e7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: /wiCRnWWQFt1a0YE+g9vmikzP5bvt/IjeBkDsD7nlN/+F4ptD8HORHG1d1l4j1O/MIdN0oG+kVqNAsQzankcbsKu/jUdH8XVDTDqHJXqVoHIpqauYxV27F7kTJYSFRKSVIOjGy+PhNaivMPsfrteXw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MAYP287MB3933 X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [FFmpeg-devel] [PATCH] swscale/aarch64/output: Implement neon assembly for yuv2planeX_10_c_template() X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Hi, Could you please check and review this patch? ________________________________ From: ffmpeg-devel on behalf of Logaprakash Ramajayam Sent: Friday, June 6, 2025 2:14 PM To: Kieran Kunhya via ffmpeg-devel Cc: Dash Santosh Sathyanarayanan ; Harshitha Sarangu Suresh Subject: [FFmpeg-devel] [PATCH] swscale/aarch64/output: Implement neon assembly for yuv2planeX_10_c_template() Checked FATE tests and gha-aarch64 git workflow. >From 34cdef26eaebcf98916e9881b3a04f4f698f09c6 Mon Sep 17 00:00:00 2001 From: Logaprakash Ramajayam Date: Thu, 5 Jun 2025 01:33:39 -0700 Subject: [PATCH] swscale/aarch64/output: Implement neon assembly for yuv2planeX_10_c_template() --- libswscale/aarch64/output.S | 167 +++++++++++++++++++++++++++++++++++ libswscale/aarch64/swscale.c | 38 ++++++++ 2 files changed, 205 insertions(+) diff --git a/libswscale/aarch64/output.S b/libswscale/aarch64/output.S index 190c438870..e039e820ae 100644 --- a/libswscale/aarch64/output.S +++ b/libswscale/aarch64/output.S @@ -20,6 +20,173 @@ #include "libavutil/aarch64/asm.S" +function ff_yuv2planeX_10_neon, export=1 +// x0 = filter (int16_t*) +// w1 = filterSize +// x2 = src (int16_t**) +// x3 = dest (uint16_t*) +// w4 = dstW +// w5 = big_endian +// w6 = output_bits + + mov w8, #27 + sub w8, w8, w6 // shift = 11 + 16 - output_bits + + sub w9, w8, #1 + mov w10, #1 + lsl w9, w10, w9 // val = 1 << (shift - 1) + + dup v1.4s, w9 + dup v2.4s, w9 // Create vectors with val + + mov w17, #0 + sub w16, w17, w8 + dup v8.4s, w16 // Create (-shift) vector for right shift + + movi v11.4s, #0 + + mov w10, #1 + lsl w10, w10, w6 + sub w10, w10, #1 // (1U << output_bits) - 1 + dup v12.4s, w10 // Create Clip vector for uppr bound + + tst w4, #15 // if dstW divisible by 16, process 16 elements + b.ne 4f // else process 8 elements + + mov x7, #0 // i = 0 +1: // Loop + + mov v3.16b, v1.16b + mov v4.16b, v2.16b + mov v5.16b, v1.16b + mov v6.16b, v2.16b + + mov w11, w1 // tmpfilterSize = filterSize + mov x12, x2 // srcp = src + mov x13, x0 // filterp = filter + +2: // Filter loop + + ldp x14, x15, [x12], #16 // get 2 pointers: src[j] and src[j+1] + ldr s7, [x13], #4 // load filter coefficients + add x14, x14, x7, lsl #1 + add x15, x15, x7, lsl #1 + ld1 {v16.8h, v17.8h}, [x14] + ld1 {v18.8h, v19.8h}, [x15] + + // Multiply-accumulate + smlal v3.4s, v16.4h, v7.h[0] + smlal2 v4.4s, v16.8h, v7.h[0] + smlal v5.4s, v17.4h, v7.h[0] + smlal2 v6.4s, v17.8h, v7.h[0] + + smlal v3.4s, v18.4h, v7.h[1] + smlal2 v4.4s, v18.8h, v7.h[1] + smlal v5.4s, v19.4h, v7.h[1] + smlal2 v6.4s, v19.8h, v7.h[1] + + subs w11, w11, #2 // tmpfilterSize -= 2 + b.gt 2b // continue filter loop + + // Shift results + sshl v3.4s, v3.4s, v8.4s + sshl v4.4s, v4.4s, v8.4s + sshl v5.4s, v5.4s, v8.4s + sshl v6.4s, v6.4s, v8.4s + + // Clamp to 0 + smax v3.4s, v3.4s, v11.4s + smax v4.4s, v4.4s, v11.4s + smax v5.4s, v5.4s, v11.4s + smax v6.4s, v6.4s, v11.4s + + // Clip upper bound + smin v3.4s, v3.4s, v12.4s + smin v4.4s, v4.4s, v12.4s + smin v5.4s, v5.4s, v12.4s + smin v6.4s, v6.4s, v12.4s + + // Narrow to 16-bit + xtn v13.4h, v3.4s + xtn2 v13.8h, v4.4s + xtn v14.4h, v5.4s + xtn2 v14.8h, v6.4s + + cbz w5, 3f // Check if big endian + rev16 v13.16b, v13.16b + rev16 v14.16b, v14.16b // Swap bits for big endian +3: + // Store 16 pixels + st1 {v13.8h}, [x3], #16 + st1 {v14.8h}, [x3], #16 + + add x7, x7, #16 // i = i + 16 + subs w4, w4, #16 // dstW = dstW - 16 + b.gt 1b // Continue loop + b 8f // end + +4: // Process 8 elements + mov x7, #0 +5: // Loop + + mov v3.16b, v1.16b + mov v4.16b, v2.16b + + mov w11, w1 + mov x12, x2 + mov x13, x0 + +6: // Filter loop + + ldp x14, x15, [x12], #16 + ldr s7, [x13], #4 + add x14, x14, x7, lsl #1 + add x15, x15, x7, lsl #1 + ld1 {v5.8h}, [x14] + ld1 {v6.8h}, [x15] + + // Multiply-accumulate + smlal v3.4s, v5.4h, v7.h[0] + smlal2 v4.4s, v5.8h, v7.h[0] + smlal v3.4s, v6.4h, v7.h[1] + smlal2 v4.4s, v6.8h, v7.h[1] + + subs w11, w11, #2 // tmpfilterSize -= 2 + b.gt 6b // loop until filterSize consumed + + // Shift results + sshl v3.4s, v3.4s, v8.4s + sshl v4.4s, v4.4s, v8.4s + + // Clamp to 0 + smax v3.4s, v3.4s, v11.4s + smax v4.4s, v4.4s, v11.4s + + // Clip upper bound + smin v3.4s, v3.4s, v12.4s + smin v4.4s, v4.4s, v12.4s + + // Narrow to 16-bit + xtn v9.4h, v3.4s + xtn v10.4h, v4.4s + + cbz w5, 7f // Check if big endian + rev16 v9.8b, v9.8b + rev16 v10.8b, v10.8b // Swap bits for big endian + +7: + // Store 8 pixels + st1 {v9.4h}, [x3], #8 + st1 {v10.4h}, [x3], #8 + + add x7, x7, #8 // i = i + 8 + subs w4, w4, #8 // dstW = dstW - 8 + b.gt 5b // Continue Loop + +8: + ret +endfunc + function ff_yuv2planeX_8_neon, export=1 // x0 - const int16_t *filter, // x1 - int filterSize, diff --git a/libswscale/aarch64/swscale.c b/libswscale/aarch64/swscale.c index 6e5a721c1f..23cdb7d26e 100644 --- a/libswscale/aarch64/swscale.c +++ b/libswscale/aarch64/swscale.c @@ -158,6 +158,29 @@ void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt( \ ALL_SCALE_FUNCS(neon); +void ff_yuv2planeX_10_neon(const int16_t *filter, int filterSize, + const int16_t **src, uint16_t *dest, int dstW, + int big_endian, int output_bits); + +#define yuv2NBPS(bits, BE_LE, is_be, template_size, typeX_t) \ +static void yuv2planeX_ ## bits ## BE_LE ## _neon(const int16_t *filter, int filterSize, \ + const int16_t **src, uint8_t *dest, int dstW, \ + const uint8_t *dither, int offset)\ +{ \ + ff_yuv2planeX_## template_size ## _neon(filter, \ + filterSize, (const typeX_t **) src, \ + (uint16_t *) dest, dstW, is_be, bits); \ +} + +yuv2NBPS( 9, BE, 1, 10, int16_t) +yuv2NBPS( 9, LE, 0, 10, int16_t) +yuv2NBPS(10, BE, 1, 10, int16_t) +yuv2NBPS(10, LE, 0, 10, int16_t) +yuv2NBPS(12, BE, 1, 10, int16_t) +yuv2NBPS(12, LE, 0, 10, int16_t) +yuv2NBPS(14, BE, 1, 10, int16_t) +yuv2NBPS(14, LE, 0, 10, int16_t) + void ff_yuv2planeX_8_neon(const int16_t *filter, int filterSize, const int16_t **src, uint8_t *dest, int dstW, const uint8_t *dither, int offset); @@ -268,6 +291,8 @@ av_cold void ff_sws_init_range_convert_aarch64(SwsInternal *c) av_cold void ff_sws_init_swscale_aarch64(SwsInternal *c) { int cpu_flags = av_get_cpu_flags(); + enum AVPixelFormat dstFormat = c->opts.dst_format; + const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(dstFormat); if (have_neon(cpu_flags)) { ASSIGN_SCALE_FUNC(c->hyScale, c->hLumFilterSize, neon); @@ -276,6 +301,19 @@ av_cold void ff_sws_init_swscale_aarch64(SwsInternal *c) if (c->dstBpc == 8) { c->yuv2planeX = ff_yuv2planeX_8_neon; } + + if (isNBPS(dstFormat) && !isSemiPlanarYUV(dstFormat)) { + if (desc->comp[0].depth == 9) { + c->yuv2planeX = isBE(dstFormat) ? yuv2planeX_9BE_neon : yuv2planeX_9LE_neon; + } else if (desc->comp[0].depth == 10) { + c->yuv2planeX = isBE(dstFormat) ? yuv2planeX_10BE_neon : yuv2planeX_10LE_neon; + } else if (desc->comp[0].depth == 12) { + c->yuv2planeX = isBE(dstFormat) ? yuv2planeX_12BE_neon : yuv2planeX_12LE_neon; + } else if (desc->comp[0].depth == 14) { + c->yuv2planeX = isBE(dstFormat) ? yuv2planeX_14BE_neon : yuv2planeX_14LE_neon; + } else + av_assert0(0); + } switch (c->opts.src_format) { case AV_PIX_FMT_ABGR: c->lumToYV12 = ff_abgr32ToY_neon; -- 2.36.0.windows.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".