From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 8A9384D828 for ; Mon, 2 Jun 2025 04:45:15 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 6CF4868D92C; Mon, 2 Jun 2025 07:45:11 +0300 (EEST) Received: from PNZPR01CU001.outbound.protection.outlook.com (mail-centralindiaazon11021133.outbound.protection.outlook.com [40.107.51.133]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id BAE3868CD6D for ; Mon, 2 Jun 2025 07:45:03 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Dvn0kt//kSx4EFYv1mvRNVPGaysIJ6LgaHgPdveqCxNyc2ezQRUEmjEbIyAqH0LVjRwcZKUxO3UrEGdK2E2LRMkvWvcOWHiRspIjM+0tWVal5NEtr4ayS0MzjYC2Ris4jBzpDZ9ul9qkSwkTkMrS1JjD9swySSxtYN6Dq55TJaGmAQ4V6uutMveqq6fnMVE++TJSb1CyCWPf2vT2WaFan+Ovo1wh77umoU3eDD0YuLX7ebHcTYIC5mT09KP69yDLmdIejP9J323TgpxsBEzOqgeTk9mGf8vW2ArDUc9hqv97baOtk0KEYjSDujZ8ENuk/vGcV7sCadeQesPzpl90Mw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QH9ija5TVaqKfwzre7itFKNZ/AG6DgQQj8Oo9RZWiBs=; b=DrSTUhbpi7HHEVTowUi+NwOg5lKVZSqD/xKhRvfStPRuI91UH3cUWr3oryMxmPe+/Gcps1l7XCjFZ50GO53l1oFvAlWtc/8MltqDYkjSA4MhD8DBFQ3TlP1I3ulUqw9RT7FW9h+I1yemJM78p3jvjlaQSBtBmup/k1w6M02WJhYyuVHGSr1rIT/TpyLI8P+UmAbHlgGD7DC3H4jPVF9PfRnk8cl2a42TB7W280yVOs5knXbvkyI6B+qcpKJ5vVwNsIBkf/ECy/JL8cwg2nJEiWWSWjx2p6MHGpPyzAitlFgpuQMcFLW0jxfXY1AgP1XL3UjSXfKmoBcErTXfsu3YKA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=multicorewareinc.com; dmarc=pass action=none header.from=multicorewareinc.com; dkim=pass header.d=multicorewareinc.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multicorewareinc.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QH9ija5TVaqKfwzre7itFKNZ/AG6DgQQj8Oo9RZWiBs=; b=LDMTkjz+qunjJppE/Uuwnkgr4QmbISAQ2AhchR9PiiPGqbMAjK1KwehuLtCz/5O0gPeZkiPNhUw1vuepsZN8wvGIDcB1KmUKGfj0xTLQs74XaUnI8Xlz485lv1d8VBOWR/Bd6BbbRhHnovfMZjAlda7jlG97M93BejhTIRcUhht6P7dygt37gZ66sYwDtwVS4U133O/ozyxpUfvZFHvgBdGwFWllgUk9nqvjKjahy+FS0OqT9KdFGMLs/1178EcRmvRoNRbvW6ZKSOP2BRtukqGOsm3Q6B01woKdnRuBAXQb46KTZfOT9Nb6cKZGSHr274/MHn+eHJcYxnGwY1/+xg== Received: from PN2P287MB1165.INDP287.PROD.OUTLOOK.COM (2603:1096:c01:154::10) by PNYP287MB4101.INDP287.PROD.OUTLOOK.COM (2603:1096:c01:28f::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8792.34; Mon, 2 Jun 2025 04:30:38 +0000 Received: from PN2P287MB1165.INDP287.PROD.OUTLOOK.COM ([fe80::c146:9404:f8bd:6b70]) by PN2P287MB1165.INDP287.PROD.OUTLOOK.COM ([fe80::c146:9404:f8bd:6b70%4]) with mapi id 15.20.8769.037; Mon, 2 Jun 2025 04:30:38 +0000 From: Harshitha Sarangu Suresh To: FFmpeg development discussions and patches Thread-Topic: [FFmpeg-devel] [PATCH] swscale/output: Implement neon intrinsics for yuv2nv12cX_c() Thread-Index: AQHbyyBEREKpn96YzUSk6aapvtN+JbPknWdIgAAPKoCACqstlw== Date: Mon, 2 Jun 2025 04:30:38 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-IN, en-US Content-Language: en-IN X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=multicorewareinc.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: PN2P287MB1165:EE_|PNYP287MB4101:EE_ x-ms-office365-filtering-correlation-id: d4de973d-1234-43e8-ae8f-08dda18e39cf x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|10070799003|366016|4022899009|1800799024|376014|38070700018|8096899003|13003099007|7053199007; x-microsoft-antispam-message-info: =?us-ascii?Q?vtkGS1Uf+3I4hVd878gxdf3mq0OBiqSJScmw5o+FRxsivtGBNtHGcBSHPGun?= =?us-ascii?Q?V0PXeh3q4lTXB/oX3VXDGkJuYSwZQDM/FonCW1/tXHaccoLuUuEj3RRUAOPN?= =?us-ascii?Q?9CFSVwOJYL5QyKE775jV7flMTu9omdVeVOQAZPwAtGOS3M50m57ss2uOjfeo?= =?us-ascii?Q?t3wi3aaSeu0L0JGHErymiTV1F8ibZwPObLyvjrVksUAeYROLH8+rm7xs34L1?= =?us-ascii?Q?UGoPNADshtaJLXeL18JzaPHvnzHZnEm09RI6wjfFaqYYenJzUWx4aJy9JXgm?= =?us-ascii?Q?UGe9s3d2U3j8FV6OLy6GxtdNw6ZsKCVERTx97ekUJYEdd896uRRB/Gf+vWBu?= =?us-ascii?Q?s0jIltYMrAj7LQ4HgqHm3LTytczugnbgMZSKZes7sE+ZKG5cU8qR1PJZoQCI?= =?us-ascii?Q?/RDr7j1JVS83R610HLF5Nmq6uVKo9KlbvPJEmTqD6ca+le95L7M18RdixJSp?= =?us-ascii?Q?yboS6kV+H39CBdTdiPQIbCBLAWEYGdGeQlwNbuzU7W0GdGPjSh+DcOts0FN+?= =?us-ascii?Q?kliCtqs844W2fXT6uukfXH042kQ/jr9V+qxPz6H4ia6Zwex+0t9IO7iFd0mp?= =?us-ascii?Q?0GKBe/vb3PNBKab7Nz/l9snGY/4J/FN7wMIMCfpm04n0H7H7kGlJO6L7Q/LK?= =?us-ascii?Q?oQrvNVmGy+ItivvATRgBSvTLaq7svV7Uw2don6xWdK9wDe9zSdOd2LIb+QJw?= =?us-ascii?Q?ZSbH18269nmHa+JtlfyZF4x2v55ZZiqWFizr0ySGBnRKZ/KnFxM43NRuIOrd?= =?us-ascii?Q?cDbAd7AVt02EY6TYyT0YwXmpVMFkKcFWSIogMwf5z3iLlgx++3ZncPdUyvJV?= =?us-ascii?Q?hgg8Dg7+WCISIAfqPW/nM4DZjkmoI+f9FPLrxuZ1DKGjWlOYGMZj1xXgEy8J?= =?us-ascii?Q?Hpns4KUpaCTvQ0Msg+nY7Y5cw7BVNNt/0DsSaU205Z3RNsPQGIM30yXU0cp9?= =?us-ascii?Q?uPDiHiyGD9Q5rl0wB/9SNlgEOyVX0xk/MFCQBwP4JUbxvBlu6hLG4U7sDI3H?= =?us-ascii?Q?P7F8MdWrRTPq4BKBZiVpHTv1WJe/8waDqZCkymQVPa8tDLEwY50/uAqIY8EG?= =?us-ascii?Q?2+BjqGIBKJqJEEdqZ3NpEk+Q/pR5Qm74cukMGLuOk5X20NlE1ZgqZ8zVsrAw?= =?us-ascii?Q?V08W1X+IWFfpg3CAl1z3RPCwxLmflxZKcSPROdYwlPQ1Kyk+oMiEZeO36rRY?= =?us-ascii?Q?BN00N3dL5Xy7s2vcALKvl//E0WW9gFBzE0ws255IZKWW1x3ia6EvXwGER0rD?= =?us-ascii?Q?tfeyn5O1+r8QSVw9p6DsjwFfi3zc3LdqHO45gZEKeCATT5otOeYjHpdWJl5m?= =?us-ascii?Q?+iaUe3HGubsQMd1J8/1MH4fyRq8V7YDyUzJAQ0jTKyzKMy6Tt7F+z0OdClgd?= =?us-ascii?Q?eZnXn+YaYBmekqTBoE0ifbRabmx3uOJ7TCR95bkCljLlq1qZl8UexEY+Z9bg?= =?us-ascii?Q?H7VvoE3cuvI=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PN2P287MB1165.INDP287.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFS:(13230040)(10070799003)(366016)(4022899009)(1800799024)(376014)(38070700018)(8096899003)(13003099007)(7053199007); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?dmdMQzGMtmbkpYpW0K8d5ple2jM33LgRiCYOe/kT7CcVoYdtnUKGuSr8nN24?= =?us-ascii?Q?/OVeXydq4JaXa/8W5PXG3mUtJVs00ijyKa+KnoZylecujyER5M8SyRWViDRP?= =?us-ascii?Q?9T4eHAuUJ85a/SIKvjkEWYp4BZXTU/hmEVYqWEAuFMHeyS2FyDDn1F67AA8H?= =?us-ascii?Q?vdSm9//1e+qMc44G+rAxBOtytMgx32vUH+Zrb07mbsILb3Umd/FV0TMUAat5?= =?us-ascii?Q?Wc9Vxb3cTeibwZ+oOpvd+dxkxq0zKiSdGi1QOccG7qhsNjetnrP+YzTVgSbe?= =?us-ascii?Q?yxGr48TLv3m4ccZFK470ck83DMdZxeQ0S9oumsFLhkleX07Tpy1WZ1yClN8P?= =?us-ascii?Q?6eRpnUOJ7oYseutVZWoEHchkUt1XLzFQXSieXqDvVXz7jzFvTkbUvaWWXloL?= =?us-ascii?Q?M4+R0GeJzGVXeWcWSU2wcF8KpSdDFerqnFVIWJDQaB8UwkVic0zRqMf53Gpr?= =?us-ascii?Q?JBvIS7E6sFtfaiBi0BbINDMf/kmD08lVAdlEJovXPvmBhe0hHZ/TfineOIIH?= =?us-ascii?Q?iNL3HrGtBO/tm9iRoXX5/I0ZRlfM1iseHy/in7asXAQH+oqVdKm0ZGcSgmcE?= =?us-ascii?Q?AIdplJlKRzycJP5IPmOa9ipzTnfkCmswaXfTKe9m/8m/un26jOyGaDi6yCIj?= =?us-ascii?Q?ryz/pmuQyoVODB3QJtG9S5TVrSpSqKQrTIHzn7IBqVbd8s4DBTP7yfgMIHxi?= =?us-ascii?Q?j2bBHBMt2zkoKRy9cMP6Nxda+AbHeEJZqdxNuHZ6GeySS9IQkiwlQKZtRsF9?= =?us-ascii?Q?hpXVC+Dh4CkvxSLN7s3SqpNYFeZJ+RJ+fqECQLboqD0f4SFSpbPjFRU7BFIV?= =?us-ascii?Q?Kpf1t6peh5/BsMUZjNyQxbxEtR8h9Zepk3zCWcJaC/OqRnoffREOm8UoMkLw?= =?us-ascii?Q?OBaGcLRYX7hanclX24sUaao4x5t3kxJHl6QAJyE95EszlsK+3b5rySIeBU8B?= =?us-ascii?Q?MiaUUhv4O+XZedWcI3e9Jjx4gI6xYWTE1/AP2FTzgnjefDOPiXszPjCkukaW?= =?us-ascii?Q?koBKSIaQjI87QuAHwob31GGG5Q/uo5SThFc3sEx3qL0UDfSoFmDPH9Iz4BfF?= =?us-ascii?Q?HVQFWGxAi5dJ28G1NAdghVLY69ka1ZB/qhiOk07/NA4IOp2KG135WatoVsYZ?= =?us-ascii?Q?pU8MRiG6xqWCiTRLSR8dMoYXcIByfaijqy45t0RBdUOSMBvIuZyws41qOZPh?= =?us-ascii?Q?1PH3EvGtkZ25P7Qa5opHN3TxdTEuIhNMtcWG4h+jLKXis5UGCcb34K3AHfUA?= =?us-ascii?Q?sKq2JF/tqpH8qKc+0YbVu4ieblcoOemB0j6nl3pdadtkWGq+UPXntr+vb80y?= =?us-ascii?Q?yqgijuOk1Xj8WK/OyPrSH8CDmjn/0nV4lZOHU2MlkR8TZsljqZJIevjLNqRR?= =?us-ascii?Q?qANjPgYwIHyVpD2PlEwc08jjdgnFY0z2Jzh9sx7r6gauhkr1CqBRc+8T/VD3?= =?us-ascii?Q?nch9DJ9GajMZX5SMTu5zWIWgkzsZAQgs7VZnFnO63c5myI2zEikT3edoMZhN?= =?us-ascii?Q?UduoGtZoXhEMCsGLFXkn0KE6Oky1ING2LaMwo7ZdfywJkrCmJqPKRwe6QGl0?= =?us-ascii?Q?umvwcRMY+SbIY+6KYhIxndSaKcSH1W4ZuboFpcpmrXPnQs1L8CBhjZIm7SL8?= =?us-ascii?Q?J4FPnffx1ZlAXGlvTs25hylZfbtW6xZppOyduO7+dVabg1hERLCbUhYZ+jIR?= =?us-ascii?Q?UWrLNQ=3D=3D?= MIME-Version: 1.0 X-OriginatorOrg: multicorewareinc.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: PN2P287MB1165.INDP287.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: d4de973d-1234-43e8-ae8f-08dda18e39cf X-MS-Exchange-CrossTenant-originalarrivaltime: 02 Jun 2025 04:30:38.3531 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: ffc5e88b-3fa2-4d69-a468-344b6b766e7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: mwbSkBiiWhLzeTWiTqm7hBKYOjGygKSTYvS7d3Aaj3kNjGBIDsNAs9y2AhnPer1EzUNSNQ3zS0IhDXYGmFgidyvz0Wv74XaDV8SJlz6Yh48= X-MS-Exchange-Transport-CrossTenantHeadersStamped: PNYP287MB4101 X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [FFmpeg-devel] [PATCH] swscale/output: Implement neon intrinsics for yuv2nv12cX_c() X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: I have added assembly implementation and sent it as a new patch. Thanks ________________________________ From: ffmpeg-devel on behalf of Zhao Zhili Sent: 26 May 2025 15:04 To: FFmpeg development discussions and patches Cc: Dash Santosh Sathyanarayanan Subject: Re: [FFmpeg-devel] [PATCH] swscale/output: Implement neon intrinsics for yuv2nv12cX_c() > On May 26, 2025, at 16:40, Harshitha Sarangu Suresh wrote: > > Hi, > Did you get a a chance to review this patch? Thank you for your contribution. However, we use manual assembly instead of intrinsics for neon. > > Get Outlook for Android > ________________________________ > From: Harshitha Sarangu Suresh > Sent: Thursday, May 22, 2025 7:24:15 PM > To: ffmpeg-devel@ffmpeg.org > Cc: Dash Santosh Sathyanarayanan > Subject: [FFmpeg-devel] [PATCH] swscale/output: Implement neon intrinsics for yuv2nv12cX_c() > > This optimization provides 6x improvement for the module. The boost in performance was calculated by adding C timers inside the C function and the optimized neon intrinsic function. > > > From 1deceb0394a5acdf70677870dc252fd66a91dd9f Mon Sep 17 00:00:00 2001 > From: Harshitha Suresh > Date: Mon, 19 May 2025 22:37:20 +0530 > Subject: [PATCH] swscale/output: Implement neon intrinsics for yuv2nv12cX_c() > > --- > libswscale/aarch64/swscale.c | 151 +++++++++++++++++++++++++++++++++++ > 1 file changed, 151 insertions(+) > > diff --git a/libswscale/aarch64/swscale.c b/libswscale/aarch64/swscale.c > index 6e5a721c1f..fb59c3f1b0 100644 > --- a/libswscale/aarch64/swscale.c > +++ b/libswscale/aarch64/swscale.c > @@ -21,6 +21,9 @@ > #include "libswscale/swscale.h" > #include "libswscale/swscale_internal.h" > #include "libavutil/aarch64/cpu.h" > +#if defined (__aarch64__) > +#include > +#endif > > void ff_hscale16to15_4_neon_asm(int shift, int16_t *_dst, int dstW, > const uint8_t *_src, const int16_t *filter, > @@ -142,6 +145,153 @@ static void ff_hscale16to19_X4_neon(SwsInternal *c, int16_t *_dst, int dstW, > > } > > +static void ff_yuv2nv12cX_neon(enum AVPixelFormat dstFormat, const uint8_t *chrDither, > + const int16_t *chrFilter, int chrFilterSize, > + const int16_t **chrUSrc, const int16_t **chrVSrc, > + uint8_t *dest, int chrDstW) > +{ > + > + int i; > + int u_dither[8], v_dither[8]; > + for (i = 0; i < 8; i++) { > + u_dither[i] = chrDither[i & 7] << 12; > + v_dither[i] = chrDither[(i + 3) & 7] << 12; > + } > + int32x4_t u0 = vld1q_s32(&u_dither[0]); > + int32x4_t u1 = vld1q_s32(&u_dither[4]); > + int32x4_t v0 = vld1q_s32(&v_dither[0]); > + int32x4_t v1 = vld1q_s32(&v_dither[4]); > + > + if (!isSwappedChroma(dstFormat)) > + { > + for (i = 0; i <= chrDstW - 8; i += 8) > + { > + int32x4_t udst0 = u0; > + int32x4_t udst1 = u1; > + int32x4_t vdst0 = v0; > + int32x4_t vdst1 = v1; > + > + for (int j = 0; j < chrFilterSize; j++) > + { > + int16x8_t usrc0 = vld1q_s16(&chrUSrc[j][i]); > + int16x8_t vsrc0 = vld1q_s16(&chrVSrc[j][i]); > + > + int32x4_t usrc0_low = vmovl_s16(vget_low_s16(usrc0)); > + int32x4_t usrc0_high = vmovl_s16(vget_high_s16(usrc0)); > + int32x4_t vsrc0_low = vmovl_s16(vget_low_s16(vsrc0)); > + int32x4_t vsrc0_high = vmovl_s16(vget_high_s16(vsrc0)); > + > + udst0 = vmlaq_n_s32(udst0, usrc0_low, chrFilter[j]); > + udst1 = vmlaq_n_s32(udst1, usrc0_high, chrFilter[j]); > + vdst0 = vmlaq_n_s32(vdst0, vsrc0_low, chrFilter[j]); > + vdst1 = vmlaq_n_s32(vdst1, vsrc0_high, chrFilter[j]); > + > + } > + // Right shift by 19 > + udst0 = vshrq_n_s32(udst0, 19); > + udst1 = vshrq_n_s32(udst1, 19); > + vdst0 = vshrq_n_s32(vdst0, 19); > + vdst1 = vshrq_n_s32(vdst1, 19); > + > + // Convert to 16-bit and then to uint8, with saturation > + int16x8_t u16 = vcombine_s16(vqmovn_s32(udst0), vqmovn_s32(udst1)); > + int16x8_t v16 = vcombine_s16(vqmovn_s32(vdst0), vqmovn_s32(vdst1)); > + > + uint8x8_t u8 = vqmovun_s16(u16); > + uint8x8_t v8 = vqmovun_s16(v16); > + > + // Store interleaved u/v as UV UV UV... > + uint8x8x2_t uv; > + uv.val[0] = u8; > + uv.val[1] = v8; > + vst2_u8(dest + 2 * i, uv); > + } > + > + // Handle remaining pixels with scalar fallback > + for (; i < chrDstW; i++) > + { > + int u = chrDither[i & 7] << 12; > + int v = chrDither[(i + 3) & 7] << 12; > + > + for (int j = 0; j < chrFilterSize; j++) > + { > + u += chrUSrc[j][i] * chrFilter[j]; > + v += chrVSrc[j][i] * chrFilter[j]; > + } > + > + uint8_t uu = av_clip_uint8(u >> 19); > + uint8_t vv = av_clip_uint8(v >> 19); > + dest[2 * i] = uu; > + dest[2 * i + 1] = vv; > + } > + } > + else > + { > + if (!isSwappedChroma(dstFormat)) > + { > + for (i = 0; i <= chrDstW - 8; i += 8) > + { > + int32x4_t udst0 = u0; > + int32x4_t udst1 = u1; > + int32x4_t vdst0 = v0; > + int32x4_t vdst1 = v1; > + > + for (int j = 0; j < chrFilterSize; j++) > + { > + int16x8_t usrc = vld1q_s16(&chrUSrc[j][i]); > + int16x8_t vsrc = vld1q_s16(&chrVSrc[j][i]); > + > + int32x4_t usrc_low = vmovl_s16(vget_low_s16(usrc)); > + int32x4_t usrc_high = vmovl_s16(vget_high_s16(usrc)); > + int32x4_t vsrc_low = vmovl_s16(vget_low_s16(vsrc)); > + int32x4_t vsrc_high = vmovl_s16(vget_high_s16(vsrc)); > + > + udst0 = vmlaq_n_s32(udst0, usrc_low, chrFilter[j]); > + udst1 = vmlaq_n_s32(udst1, usrc_high, chrFilter[j]); > + vdst0 = vmlaq_n_s32(vdst0, vsrc_low, chrFilter[j]); > + vdst1 = vmlaq_n_s32(vdst1, vsrc_high, chrFilter[j]); > + } > + // Right shift by 19 > + udst0 = vshrq_n_s32(udst0, 19); > + udst1 = vshrq_n_s32(udst1, 19); > + vdst0 = vshrq_n_s32(vdst0, 19); > + vdst1 = vshrq_n_s32(vdst1, 19); > + > + // Convert to 16-bit and then to uint8, with saturation > + int16x8_t u16 = vcombine_s16(vqmovn_s32(udst0), vqmovn_s32(udst1)); > + int16x8_t v16 = vcombine_s16(vqmovn_s32(vdst0), vqmovn_s32(vdst1)); > + > + uint8x8_t u8 = vqmovun_s16(u16); > + uint8x8_t v8 = vqmovun_s16(v16); > + > + // Store interleaved u/v as UV UV UV... > + uint8x8x2_t uv; > + uv.val[0] = v8; > + uv.val[1] = u8; > + vst2_u8(dest + 2 * i, uv); > + } > + > + // Handle remaining pixels with scalar fallback > + for (; i < chrDstW; i++) > + { > + int u = chrDither[i & 7] << 12; > + int v = chrDither[(i + 3) & 7] << 12; > + > + for (int j = 0; j < chrFilterSize; j++) > + { > + u += chrUSrc[j][i] * chrFilter[j]; > + v += chrVSrc[j][i] * chrFilter[j]; > + } > + > + uint8_t uu = av_clip_uint8(u >> 19); > + uint8_t vv = av_clip_uint8(v >> 19); > + dest[2 * i] = vv; > + dest[2 * i + 1] = uu; > + } > + } > + } > +} > + > #define SCALE_FUNC(filter_n, from_bpc, to_bpc, opt) \ > void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt( \ > SwsInternal *c, int16_t *data, \ > @@ -275,6 +425,7 @@ av_cold void ff_sws_init_swscale_aarch64(SwsInternal *c) > ASSIGN_VSCALE_FUNC(c->yuv2plane1, neon); > if (c->dstBpc == 8) { > c->yuv2planeX = ff_yuv2planeX_8_neon; > + c->yuv2nv12cX = ff_yuv2nv12cX_neon; > } > switch (c->opts.src_format) { > case AV_PIX_FMT_ABGR: > -- > 2.36.0.windows.1 > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".