From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTPS id 0EC5C4C5EF
	for <ffmpegdev@gitmailbox.com>; Mon, 26 May 2025 08:40:57 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id C76CA68D432;
	Mon, 26 May 2025 11:40:55 +0300 (EEST)
Received: from MA0PR01CU009.outbound.protection.outlook.com
 (mail-southindiaazon11020128.outbound.protection.outlook.com
 [52.101.227.128])
 by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 9600C68D3B3
 for <ffmpeg-devel@ffmpeg.org>; Mon, 26 May 2025 11:40:53 +0300 (EEST)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=UhgZNeY+vT6Bd4l0zLRweBFEvpz7dHTfHrFyFYpf8AY2vAhtTy1jiOVfFGUuqE5kxZ3z0ei+xq3pARB66syEZeBn4Q8eBsaoIbxiNkXCZI18XtpteyHpM7WgKMF0e2mbEYbxmi2qvDbdsdHOSJWSneABq6nX8FFtxv3W/nAoP/Xyfb0JyIRNEB/ErdCBwTp8J1VK+8DM3WzyRzeb4pWRNKFnjUVynpzJvVp0p/RRmA/4nu1ZFeaWE9X2oJdvXMuYFGh04gYSfVXVCuls7/cZDzApWO2kFf1ZQir2eZezzpa3a6hkdf6n+NVFEUhN4sOgQcLMQB2JNWdck1HoDcqTJw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=Byedas9/0+DlhsQQ/UhlmstJtmgoRbxsh9KOt0CvBJ8=;
 b=m1PZ8fJPKdw9HPtsFqPi1d2VpemLAbmE9jNEyR7WQHPeyw+tXfBkUlBqQEJutmQ3s/ApdVXT9LAt8q2ok1Y1/cQ3oi2SFV7nXO/vwBBm1QQoc74xDcKjn6hfGBzpYekgOBGwEBlXDxfJxw62/orqNuTaRGOpBKoq1CC81xJKWsCcuicUzcagWmGMDcXFMwCGtCrGVbRFHgMOBSbUXKfIo5hmOP2PRouGeP1DAVHLYIGOyBj2Vj9e49wHJkRCZSZaHsX5Uu+wpnl1/WeuziNai/7dP11Yuv1SRZkzPtrhcIdYwzl+X8T5Fm3LimDdwgFsPuHSdngYiss94Z2Xh3Zc9g==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=multicorewareinc.com; dmarc=pass action=none
 header.from=multicorewareinc.com; dkim=pass header.d=multicorewareinc.com;
 arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multicorewareinc.com; 
 s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=Byedas9/0+DlhsQQ/UhlmstJtmgoRbxsh9KOt0CvBJ8=;
 b=P3zhHmV9oUBVoeN08VZoXVmSuSeKnCAiF8GLqtDd65HWDgoYPancPQQ5QU3IuqtVe3Qhg2Xn1KOzIMCUWRtSivJAKCd+n2gcgSuD3fjq+LnXtJZQXmAYx/xJxPwWd+Pd7s2XuKh/z0MdCCpCPPKrC5YfIjL0fqNBjMeluKfNlOjXspKK+aRn4i3bOb/zJAfxzY7tA6O1tJO0X8rLvmYRa3JR9V+xRrDeupJNpFmJn4vNrooFRx9gNWb6De8bbNeeuajfFTxpxsj8M7HygjB7C0k/IWVdCaAhsdt9GQSyTL99ntBlomVAx1LY16Fn+/Lzp8tGFmRrodaUf0W1OA9y7Q==
Received: from MA0P287MB1158.INDP287.PROD.OUTLOOK.COM (2603:1096:a01:e5::5) by
 PN2P287MB2045.INDP287.PROD.OUTLOOK.COM (2603:1096:c01:1c6::11) with
 Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.8769.26; Mon, 26 May 2025 08:40:48 +0000
Received: from MA0P287MB1158.INDP287.PROD.OUTLOOK.COM
 ([fe80::d173:abc7:2297:fdc8]) by MA0P287MB1158.INDP287.PROD.OUTLOOK.COM
 ([fe80::d173:abc7:2297:fdc8%3]) with mapi id 15.20.8769.025; Mon, 26 May 2025
 08:40:48 +0000
From: Harshitha Sarangu Suresh <harshitha@multicorewareinc.com>
To: "ffmpeg-devel@ffmpeg.org" <ffmpeg-devel@ffmpeg.org>
Thread-Topic: [FFmpeg-devel] [PATCH] swscale/output: Implement neon intrinsics
 for yuv2nv12cX_c()
Thread-Index: AQHbyyBEREKpn96YzUSk6aapvtN+JbPknWdI
Date: Mon, 26 May 2025 08:40:48 +0000
Message-ID: <MA0P287MB1158352BE49E1A61B05FECB8D665A@MA0P287MB1158.INDP287.PROD.OUTLOOK.COM>
References: <MA0P287MB1158754434858E1A7DFA1858D699A@MA0P287MB1158.INDP287.PROD.OUTLOOK.COM>
In-Reply-To: <MA0P287MB1158754434858E1A7DFA1858D699A@MA0P287MB1158.INDP287.PROD.OUTLOOK.COM>
Accept-Language: en-IN, en-US
Content-Language: en-IN
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
msip_labels: 
x-ms-reactions: allow
authentication-results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=multicorewareinc.com;
x-ms-publictraffictype: Email
x-ms-traffictypediagnostic: MA0P287MB1158:EE_|PN2P287MB2045:EE_
x-ms-office365-filtering-correlation-id: 9be6c418-5b51-469a-b06d-08dd9c310367
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
 ARA:13230040|366016|376014|1800799024|7053199007|8096899003|38070700018; 
x-microsoft-antispam-message-info: =?us-ascii?Q?+wDXlYIC6bUetGvAg6I3iX1Lk/AU9Czg+i4pDH0Ub6HBU8rlmdL4z1TbSnOP?=
 =?us-ascii?Q?nXgEN7QGZdM+bRVX+duDzt8oYeVN6z7vkgUWx93ZY25XSmPAd3R7hkH3+2Os?=
 =?us-ascii?Q?yyKrXGZS0JVernsEv0DPnr6WmD80lGwq8Uy57s5Y8tbLliYopofdOHKZbpRe?=
 =?us-ascii?Q?30rSnIwK6GSXZZamW+21pE+uNAg35burmuFxHOHRXApjdzcooRkwcQqm47Xy?=
 =?us-ascii?Q?X7hc8UgzBWhnRH+mK2dZisEc/gXTwJw2v+4nTfF4Sge1dpRduRSRQbrQ79Av?=
 =?us-ascii?Q?Ced0RO+jlb7TTEavuIkBzeO8IeHd3M+/Q7fxiWgya1GGGgLoX2UZaDsdSXsH?=
 =?us-ascii?Q?yGRkizRgjOySAVmOggwK0g/bdqWERua7TRwJIZ4Agrm33g2lV0YEbyMwK+TX?=
 =?us-ascii?Q?HPnQDo5mBXsedM3oT4dZrtZi5t9c5POUMqQD400lX4cb49/zXPYpc5Ep/wZ9?=
 =?us-ascii?Q?hjK4TB3vMgeeLDJ3XV/fkey+n+A1/luyyH1uDKUdoX3iAP/HQcQNxekh3mze?=
 =?us-ascii?Q?5MgRf+15h2sZbiIgbjHNsWVb1AYdHSZL9XT1mnoO5qvX1CWKYQXoF2VKvCyN?=
 =?us-ascii?Q?l8ElSO2Y0V1rWsgoAQ9F6n0PBOpcfu1bTDyuW9GX+cR3Beg1UnKnZHjstFcd?=
 =?us-ascii?Q?UVhvlfk2sfPJ9AFSoWPdlESKqpCS1jBCtqDKLiV8qU3MNsxVJ08ZJEQppox0?=
 =?us-ascii?Q?KO0Mv/L6j2B1xCut8S2pwqH50HdtYPAUtMfHfVksCZ/pniwHzrSnqw4VlRIg?=
 =?us-ascii?Q?HWoLuZcFLAaFwaM4QDkDXyWRRBiYqmwaP7m2fVzfMeIejjIcl6GRmmz5jQGw?=
 =?us-ascii?Q?UWZ9EksAeIkZ2tdSJvW7Gw4QHzDlZt3rQp9usOXWNQp4wxmy5PLnmXTJCt/X?=
 =?us-ascii?Q?ynioMUeAKVoaAOZJpYmFvb3hy/7B3ljawnoOOx1xQjQIU6ShUHF6UPnqpUra?=
 =?us-ascii?Q?5ahnAW7Azer+Rt9xtmPDecX9WMHE8nH5LbN0PJiErsMOakgiBpRTG9n5ZagT?=
 =?us-ascii?Q?vXYX8xGHbQL/xbX7uP4l+oBCposHTezBqIgSRTWQD3kygGAZuciNg435OTLR?=
 =?us-ascii?Q?jXaP90HKIwi8kRN4l3htEltqwUrNswWnf57sC0XKpLA35HG4o7l7MKi2g8uF?=
 =?us-ascii?Q?7vPWwyzipTo2iaKJ+0nalSY0i9asM7WSPUcg7BLLD+2a1mKzses71Yjk+z7t?=
 =?us-ascii?Q?zDstGGnmO0pcKBa/NuCO1mDFphjN7rWEE2MJqtSEYVp4w+tOZlPrsWXO/+OJ?=
 =?us-ascii?Q?/hUtEL+N0nDh2Tqw5a44d4BFUV4jxZL0INm/VOw2iHv4sawaDrR3Df53jynh?=
 =?us-ascii?Q?KXXWBZAL5Ma2PvwYQXj/5gD3l336UnBSb/hcLyFUNT3ln3j65Ir6ppxNu8Ck?=
 =?us-ascii?Q?LsAjbjHxeYunMHWlXtfAlm7bvDp7otnp2n0uQLSCuXeybASw2QUDDJTEulIY?=
 =?us-ascii?Q?huVlUoPykGc=3D?=
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:MA0P287MB1158.INDP287.PROD.OUTLOOK.COM; PTR:; CAT:NONE;
 SFS:(13230040)(366016)(376014)(1800799024)(7053199007)(8096899003)(38070700018);
 DIR:OUT; SFP:1102; 
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?GTx3GsqJ6u6n08XRJQYBSENS7PZqDS0qE2jtKZTQXjOn8yWHe31yt+qa7fRk?=
 =?us-ascii?Q?HcXpT6jpSgDvNctFNVpvEMvi6kavqGRdHos3jbdvUt7F3aDKHWUe+7w9qprA?=
 =?us-ascii?Q?8wogyctMK1qUnHh3EMG4cRmaFjCDdkTpjfo1843pVXjtBUrqg5e9hdzgKQmI?=
 =?us-ascii?Q?gCBAz09WhDo+S+bCiZ2OIpYpzCXLsJ51fF52Mxofi6eJ/Uc5wdKYJMDbpQZ7?=
 =?us-ascii?Q?AKVv17zTqwsBCCl3ebFamSav9bcVyaNgGJd6maNElhgJvRBVYEVi3f7VnkOn?=
 =?us-ascii?Q?XwyNmj897jvIfc3ON6vXhr3+I+QggLXfBASImkRWMxKF08Y30ZBtnzDOfq8f?=
 =?us-ascii?Q?xJga6JBxoEfInNbf+q/HKxWtYCmVFc9JDhNTkbNc0HRIxFMJspPIZb/hdTgZ?=
 =?us-ascii?Q?diYtYql0gx0ozuU3174pqjnu6yfNIYmFpvoEbua4qxcAGXo0HGwrTEW6HWIN?=
 =?us-ascii?Q?3QWkiFK1lB3qWLV6RC/gu26oeF4F3HCiRejso4TcMsaZ4ZF0XVfnU/JJPqAO?=
 =?us-ascii?Q?K87K9UtMo5AN9+mWX3+zxwf1R5PhEWuT4aTyVwVY0NcPhZvaukG8XcWEmZiW?=
 =?us-ascii?Q?I//QCBXwXnrwv1HnyvcTa+9rjTof7Ar+lmdx4MREh9yqxhbLAu8ItGdpBf5S?=
 =?us-ascii?Q?7j7GhLl7YItZBhIZGgvnZpjC08RC3VrXRD6E2rHjyN+5gLfzHrJ2YlZikHLU?=
 =?us-ascii?Q?PuH5StSa9s+LisN7gjrwMrlg0Gff/s1yu1k56uQhq6N9TWhNfvHP2AHqo0jX?=
 =?us-ascii?Q?YaSqitaAWkAlIGzFK536tXGmwDy2X9iRDw+U0Kn46ZKq8uwYy5vn9U1QYewC?=
 =?us-ascii?Q?SJL44r8UXXuhpe+ePdMEXIDgiAyBvqTPhAAYsYqZH5i8H/34sblzOHcsiglk?=
 =?us-ascii?Q?uQvdCQRHk7HvWQd+GPzQJBulrRaPBCWJeKedep3XchI7Qbn5E2wfCzujwb70?=
 =?us-ascii?Q?4pZ+ML/PYiTT8e8e4CiJMVpKgp/LcnpIl87orpbHzEQbmGvKKou2jHWo0yMv?=
 =?us-ascii?Q?AHzSgpRHCrGM4BTzvw7qtJDBafy5Ot2pNjSXqV5gM82s1cDBp00hBDd3NDwd?=
 =?us-ascii?Q?+23YWPplC20qyY60vTfWsn72gGBCxc9qz7Ieq/t4fH+N4H0J9cjMgs+8JxXD?=
 =?us-ascii?Q?fei5zGLCYKii1GgseXGMzWULsjtlo9QbgEKWr4LaMrxKwGvcy8+AIATGt2Iy?=
 =?us-ascii?Q?Dpj88fcGJbef1eAHEif9mZrBrx6byYpzKVSL1MZE8ktU2Jl7k8gZ2Mg83w+d?=
 =?us-ascii?Q?HTx3iKR1yaq2HVem8MOqaufKpGLOhxok9ep7Kh6cbMdEuRRAyalb0y1SrGiR?=
 =?us-ascii?Q?Re2JL3ZbBbyajOWX7RONtbBIrTDDBVQIG/ztkNdwE2EPJsr0203KeRDakyKw?=
 =?us-ascii?Q?696C4meT2gWgH7X0wN1JMNhRdXR18nFV4A9qn6HyLDnHCweSNXzM1opeF1cd?=
 =?us-ascii?Q?L3vv3tNJhonAy1X3g/PpV/8rvPlryqN8sSO80xjkh2+p8bLAub7Tol7fXaiy?=
 =?us-ascii?Q?fMukb5Ly9OwwkHv/KMCyXyMqoJLVK0wuuLRGc3ExZwQy0Pn2ED4/r1tM3wd/?=
 =?us-ascii?Q?z0cPJa4OqKU3LbSEQi+FiDAMDxbtNZbhalnIvhJ2/dYguqrGivek1JHUN7fj?=
 =?us-ascii?Q?VItkQeQAfmncf3onNMVLvxc=3D?=
MIME-Version: 1.0
X-OriginatorOrg: multicorewareinc.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: MA0P287MB1158.INDP287.PROD.OUTLOOK.COM
X-MS-Exchange-CrossTenant-Network-Message-Id: 9be6c418-5b51-469a-b06d-08dd9c310367
X-MS-Exchange-CrossTenant-originalarrivaltime: 26 May 2025 08:40:48.0501 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: ffc5e88b-3fa2-4d69-a468-344b6b766e7d
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: 790Vx/nsmoNX7Z6Z7V7xlWefj/FTVQEK1FtlJ73W8yoFCSQhv1TpxQSEPvMJuww6EItIs9RggqbbBtqPm5mIoyNLviwwKYsv9KEOQtn5ZcA=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: PN2P287MB2045
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
Subject: Re: [FFmpeg-devel] [PATCH] swscale/output: Implement neon
 intrinsics for yuv2nv12cX_c()
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Dash Santosh Sathyanarayanan <dash.sathyanarayanan@multicorewareinc.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/MA0P287MB1158352BE49E1A61B05FECB8D665A@MA0P287MB1158.INDP287.PROD.OUTLOOK.COM/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

Hi,
     Did you get a a chance to review this patch?

Get Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: Harshitha Sarangu Suresh
Sent: Thursday, May 22, 2025 7:24:15 PM
To: ffmpeg-devel@ffmpeg.org <ffmpeg-devel@ffmpeg.org>
Cc: Dash Santosh Sathyanarayanan <dash.sathyanarayanan@multicorewareinc.com>
Subject: [FFmpeg-devel] [PATCH] swscale/output: Implement neon intrinsics for yuv2nv12cX_c()

This optimization provides 6x improvement for the module. The boost in performance was calculated by adding C timers inside the C function and the optimized neon intrinsic function.



>From 1deceb0394a5acdf70677870dc252fd66a91dd9f Mon Sep 17 00:00:00 2001
From: Harshitha Suresh <harshitha@multicorewareinc.com>
Date: Mon, 19 May 2025 22:37:20 +0530
Subject: [PATCH] swscale/output: Implement neon intrinsics for yuv2nv12cX_c()

---
 libswscale/aarch64/swscale.c | 151 +++++++++++++++++++++++++++++++++++
 1 file changed, 151 insertions(+)

diff --git a/libswscale/aarch64/swscale.c b/libswscale/aarch64/swscale.c
index 6e5a721c1f..fb59c3f1b0 100644
--- a/libswscale/aarch64/swscale.c
+++ b/libswscale/aarch64/swscale.c
@@ -21,6 +21,9 @@
 #include "libswscale/swscale.h"
 #include "libswscale/swscale_internal.h"
 #include "libavutil/aarch64/cpu.h"
+#if defined (__aarch64__)
+#include <arm_neon.h>
+#endif

 void ff_hscale16to15_4_neon_asm(int shift, int16_t *_dst, int dstW,
                       const uint8_t *_src, const int16_t *filter,
@@ -142,6 +145,153 @@ static void ff_hscale16to19_X4_neon(SwsInternal *c, int16_t *_dst, int dstW,

 }

+static void ff_yuv2nv12cX_neon(enum AVPixelFormat dstFormat, const uint8_t *chrDither,
+    const int16_t *chrFilter, int chrFilterSize,
+    const int16_t **chrUSrc, const int16_t **chrVSrc,
+    uint8_t *dest, int chrDstW)
+{
+
+    int i;
+    int u_dither[8], v_dither[8];
+    for (i = 0; i < 8; i++) {
+        u_dither[i] = chrDither[i & 7] << 12;
+        v_dither[i] = chrDither[(i + 3) & 7] << 12;
+    }
+    int32x4_t u0 = vld1q_s32(&u_dither[0]);
+    int32x4_t u1 = vld1q_s32(&u_dither[4]);
+    int32x4_t v0 = vld1q_s32(&v_dither[0]);
+    int32x4_t v1 = vld1q_s32(&v_dither[4]);
+
+    if (!isSwappedChroma(dstFormat))
+    {
+        for (i = 0; i <= chrDstW - 8; i += 8)
+        {
+            int32x4_t udst0 = u0;
+            int32x4_t udst1 = u1;
+            int32x4_t vdst0 = v0;
+            int32x4_t vdst1 = v1;
+
+            for (int j = 0; j < chrFilterSize; j++)
+            {
+                int16x8_t usrc0 = vld1q_s16(&chrUSrc[j][i]);
+                int16x8_t vsrc0 = vld1q_s16(&chrVSrc[j][i]);
+
+                int32x4_t usrc0_low = vmovl_s16(vget_low_s16(usrc0));
+                int32x4_t usrc0_high = vmovl_s16(vget_high_s16(usrc0));
+                int32x4_t vsrc0_low = vmovl_s16(vget_low_s16(vsrc0));
+                int32x4_t vsrc0_high = vmovl_s16(vget_high_s16(vsrc0));
+
+                udst0 = vmlaq_n_s32(udst0, usrc0_low, chrFilter[j]);
+                udst1 = vmlaq_n_s32(udst1, usrc0_high, chrFilter[j]);
+                vdst0 = vmlaq_n_s32(vdst0, vsrc0_low, chrFilter[j]);
+                vdst1 = vmlaq_n_s32(vdst1, vsrc0_high, chrFilter[j]);
+
+            }
+            // Right shift by 19
+            udst0 = vshrq_n_s32(udst0, 19);
+            udst1 = vshrq_n_s32(udst1, 19);
+            vdst0 = vshrq_n_s32(vdst0, 19);
+            vdst1 = vshrq_n_s32(vdst1, 19);
+
+            // Convert to 16-bit and then to uint8, with saturation
+            int16x8_t u16 = vcombine_s16(vqmovn_s32(udst0), vqmovn_s32(udst1));
+            int16x8_t v16 = vcombine_s16(vqmovn_s32(vdst0), vqmovn_s32(vdst1));
+
+            uint8x8_t u8 = vqmovun_s16(u16);
+            uint8x8_t v8 = vqmovun_s16(v16);
+
+            // Store interleaved u/v as UV UV UV...
+            uint8x8x2_t uv;
+            uv.val[0] = u8;
+            uv.val[1] = v8;
+            vst2_u8(dest + 2 * i, uv);
+        }
+
+        // Handle remaining pixels with scalar fallback
+        for (; i < chrDstW; i++)
+        {
+            int u = chrDither[i & 7] << 12;
+            int v = chrDither[(i + 3) & 7] << 12;
+
+            for (int j = 0; j < chrFilterSize; j++)
+            {
+                u += chrUSrc[j][i] * chrFilter[j];
+                v += chrVSrc[j][i] * chrFilter[j];
+            }
+
+            uint8_t uu = av_clip_uint8(u >> 19);
+            uint8_t vv = av_clip_uint8(v >> 19);
+            dest[2 * i] = uu;
+            dest[2 * i + 1] = vv;
+        }
+    }
+    else
+    {
+        if (!isSwappedChroma(dstFormat))
+        {
+            for (i = 0; i <= chrDstW - 8; i += 8)
+            {
+                int32x4_t udst0 = u0;
+                int32x4_t udst1 = u1;
+                int32x4_t vdst0 = v0;
+                int32x4_t vdst1 = v1;
+
+                for (int j = 0; j < chrFilterSize; j++)
+                {
+                    int16x8_t usrc = vld1q_s16(&chrUSrc[j][i]);
+                    int16x8_t vsrc = vld1q_s16(&chrVSrc[j][i]);
+
+                    int32x4_t usrc_low = vmovl_s16(vget_low_s16(usrc));
+                    int32x4_t usrc_high = vmovl_s16(vget_high_s16(usrc));
+                    int32x4_t vsrc_low = vmovl_s16(vget_low_s16(vsrc));
+                    int32x4_t vsrc_high = vmovl_s16(vget_high_s16(vsrc));
+
+                    udst0 = vmlaq_n_s32(udst0, usrc_low, chrFilter[j]);
+                    udst1 = vmlaq_n_s32(udst1, usrc_high, chrFilter[j]);
+                    vdst0 = vmlaq_n_s32(vdst0, vsrc_low, chrFilter[j]);
+                    vdst1 = vmlaq_n_s32(vdst1, vsrc_high, chrFilter[j]);
+                }
+                // Right shift by 19
+                udst0 = vshrq_n_s32(udst0, 19);
+                udst1 = vshrq_n_s32(udst1, 19);
+                vdst0 = vshrq_n_s32(vdst0, 19);
+                vdst1 = vshrq_n_s32(vdst1, 19);
+
+                // Convert to 16-bit and then to uint8, with saturation
+                int16x8_t u16 = vcombine_s16(vqmovn_s32(udst0), vqmovn_s32(udst1));
+                int16x8_t v16 = vcombine_s16(vqmovn_s32(vdst0), vqmovn_s32(vdst1));
+
+                uint8x8_t u8 = vqmovun_s16(u16);
+                uint8x8_t v8 = vqmovun_s16(v16);
+
+                // Store interleaved u/v as UV UV UV...
+                uint8x8x2_t uv;
+                uv.val[0] = v8;
+                uv.val[1] = u8;
+                vst2_u8(dest + 2 * i, uv);
+            }
+
+            // Handle remaining pixels with scalar fallback
+            for (; i < chrDstW; i++)
+            {
+                int u = chrDither[i & 7] << 12;
+                int v = chrDither[(i + 3) & 7] << 12;
+
+                for (int j = 0; j < chrFilterSize; j++)
+                {
+                    u += chrUSrc[j][i] * chrFilter[j];
+                    v += chrVSrc[j][i] * chrFilter[j];
+                }
+
+                uint8_t uu = av_clip_uint8(u >> 19);
+                uint8_t vv = av_clip_uint8(v >> 19);
+                dest[2 * i] = vv;
+                dest[2 * i + 1] = uu;
+            }
+        }
+    }
+}
+
 #define SCALE_FUNC(filter_n, from_bpc, to_bpc, opt) \
 void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt( \
                                                 SwsInternal *c, int16_t *data, \
@@ -275,6 +425,7 @@ av_cold void ff_sws_init_swscale_aarch64(SwsInternal *c)
         ASSIGN_VSCALE_FUNC(c->yuv2plane1, neon);
         if (c->dstBpc == 8) {
             c->yuv2planeX = ff_yuv2planeX_8_neon;
+            c->yuv2nv12cX = ff_yuv2nv12cX_neon;
         }
         switch (c->opts.src_format) {
         case AV_PIX_FMT_ABGR:
--
2.36.0.windows.1


_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".