From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 8DFAF4BDB7 for ; Thu, 22 May 2025 13:54:31 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 453C868DB62; Thu, 22 May 2025 16:54:26 +0300 (EEST) Received: from PNYPR01CU001.outbound.protection.outlook.com (mail-centralindiaazon11020128.outbound.protection.outlook.com [52.101.225.128]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 1A2A468D132 for ; Thu, 22 May 2025 16:54:19 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=RFUP5rR7QqlkdkdgboNuzplFmZH9RSln7HjijAudLZPQSw5FXBZF6S8xtsq51CE/JnMi7B6XNNqKipLj4xpK0SNuyzAeSrkYhuKvzbDWoOyQhDY30QOUSbv7fI7JqAie4dbRtbTBMd9A/YcaBia+Gep2oc1bP2+CQTQ4b5y1f3LybeEbRywpRucOV6MtSyX3Bxc2U/Ibj/zeKnh6jgn75Ek3L/W1X/XBIXsroSj5ExgJF4tTpObVfTB13BzJZ4iZ5w5VgNE/RkJtFhhZ8iYxFL1XdyHg7XjS+dUpYd8sEXKT8pL864mjGq4rZuvjPlQ3Slkiyy3jjGnWkOLaGE+PUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=h4uTSsDXY8beE4laXqAZ7GUBsf6KX9NmZgt2eTRLn3I=; b=OSC4jMZQiZdedW2zFug27uiLsQtqDmTNypO7gVPZIXwRYKi6uCDyGbqyoLhOqO5coXJsDomKH7meBLkkSRow1eHzT9zjCz9r47YVKhJVzOzLnuulNFJyvdwjtT6y8FG2otFcDoKEAwPXyL/2JHI026Qw+qpIO4Y305m0reJJrvD6LtIr7giSD7uf5jQ4vV6kJfS0TIzZ2tIAyar51KkK0KQETIev+UUl4hDy4Grgk4fNF26uy72ToqEz2x1e/KVkljSYvQ9b85vQ2nfmdndqJLdK0v0jN3um5dmv8C4mvid6h0tZw1yQA0T4qakyzhciCotm3MNvBV0rkv0wd4AOrw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=multicorewareinc.com; dmarc=pass action=none header.from=multicorewareinc.com; dkim=pass header.d=multicorewareinc.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multicorewareinc.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=h4uTSsDXY8beE4laXqAZ7GUBsf6KX9NmZgt2eTRLn3I=; b=V8iUl8GZN3Xr6QgnTNhHV9EqWAJZUjdHNuGucqgto7taq0/mP+JgUb1RCYoC7W9aTh5YLPo9AO/rXJtbtRs/74lSJl/rESm58Fz1/H94AA3CAxZmue64YZ3pjGi8LGhSYehyIt02bERmmwCwS4gvmbBQxLWqLZLFr0UL+yZfNaeHJQL0oPs+p41qss+vY90mf8eonQmRoNjkoP4uz9r3A0xTRJvsHBboEnfOC4ulpaBcZfRUL9NRrSaolovPhLFEfR0Q0g5QYVVthpTkh0u/34mlCCqGIbN7p7dGW8xPuoEVu/jN35Y3raHFQ7wUVKUNei2DVCUH1ucI5QzOQvsGIw== Received: from MA0P287MB1158.INDP287.PROD.OUTLOOK.COM (2603:1096:a01:e5::5) by MAZP287MB0024.INDP287.PROD.OUTLOOK.COM (2603:1096:a01:58::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8769.21; Thu, 22 May 2025 13:54:15 +0000 Received: from MA0P287MB1158.INDP287.PROD.OUTLOOK.COM ([fe80::d173:abc7:2297:fdc8]) by MA0P287MB1158.INDP287.PROD.OUTLOOK.COM ([fe80::d173:abc7:2297:fdc8%3]) with mapi id 15.20.8769.019; Thu, 22 May 2025 13:54:15 +0000 From: Harshitha Sarangu Suresh To: "ffmpeg-devel@ffmpeg.org" Thread-Topic: [FFmpeg-devel] [PATCH] swscale/output: Implement neon intrinsics for yuv2nv12cX_c() Thread-Index: AQHbyyBEREKpn96YzUSk6aapvtN+JQ== Date: Thu, 22 May 2025 13:54:15 +0000 Message-ID: Accept-Language: en-IN, en-US Content-Language: en-IN X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: undefined: 2079968 drawingcanvaselements: [] composetype: newMail authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=multicorewareinc.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: MA0P287MB1158:EE_|MAZP287MB0024:EE_ x-ms-office365-filtering-correlation-id: 49c17ac9-ffb7-42e2-6c5d-08dd993823a3 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|1800799024|366016|376014|8096899003|38070700018|4053099003; x-microsoft-antispam-message-info: =?iso-8859-1?Q?nIv/sw8l3U6/cIqmxvlcGMw9uOjEFJ7FXOZNl8sdCGsQrR1w5tY+Eyd6GO?= =?iso-8859-1?Q?rXs34Qh3MBd6geqjcnGZNBKM8/OaXlVTOSuXkO/9q4ojfuhiWXm5TRC8Ql?= =?iso-8859-1?Q?pAmDVhl2Wto2+YxDAlJSRzEOL8mFbAGl9yi5qpI1WUUz8pdKNvaCJEbbQ5?= =?iso-8859-1?Q?L8G8iWQeSkm0QyS9PCGjoidcdVlpP8jcbSZVD0WQoDRj28pHPMTjBI5rJ1?= =?iso-8859-1?Q?oUKAWLrBmTt3URKBaj5YkAFLSOX2JOSnNb61/5nnKt/ckR67uwrgk7Rntm?= =?iso-8859-1?Q?edkSQL+LKsg7F5mnt5rspbh4eWjzLhOdG9IngfEjhuiZu9Wfyvtf1aXE7z?= =?iso-8859-1?Q?nXlMZ90RyVXChwI5EhYsgOxfwiszgc4pcL5329XNDZkdEcQvS+q09UR9uC?= =?iso-8859-1?Q?WVgOsnNdrzRqckdfYQ9jb0iiirQeraH5s0HiCEQCWZ5MfXBG6z/LpvBmxg?= =?iso-8859-1?Q?wa5Cir/bVToANGASeO6NXPT9A19C0zUarAk/UMqsizoI5F8xvn7rMj+KA+?= =?iso-8859-1?Q?azcFmpHpWVkUfWKTUjNvGR6HsmCjwmXBJFVqbB0XAeQenNogs7FGcLsi5v?= =?iso-8859-1?Q?Z4mbdINYete6ez6CDDf5cd5rifQ4I9DNhKdoa7aRhksvwWcPbx4Ljh/vi3?= =?iso-8859-1?Q?QXCZO4OZRyexzrKLbzVoVD64HCI7Q2US98TNhwUwed0VkwavZ5EG4/fDmf?= =?iso-8859-1?Q?GxV7azs4DpqCnXJxv7vkQ4PCLk3NUhAOgVU0wWNKSy2b5omY0rWfrlRE+T?= =?iso-8859-1?Q?BHaVdIqCcwhLm8gfsdK2vfWHhSx/EO9DufdaxJxLXY9KEIsjMkMB1vjsE3?= =?iso-8859-1?Q?HAm38Qk5RQKoUPcKPQGKma69dm2dRtum2zamAeu3rL81ZufxiS0qdA1/+j?= =?iso-8859-1?Q?SHcGQk7gRwNQ6eTcmoru3yKbGkXLxHMW7N55h+Hi4NmK8Z08XhQMVs6Ip1?= =?iso-8859-1?Q?vGsEdZu/bBNFAL2jBv6D7ZOOlEsLlaFoiJN8dwv//5gQt51FlIgRM5VQ5K?= =?iso-8859-1?Q?36g9ijitY+rcRw0FuObkIlpZhOOVbkmlQO9e0u/hCfN0ZWXK0c6xz8yPiX?= =?iso-8859-1?Q?mtK//zU65hrnM1tvZKedzXUVq51kHoiKgrQEzzQDBed9yHhHwtWr6eSsTK?= =?iso-8859-1?Q?s1uY9dZPlzbYo27a/cS3dIiTELQZ7iN55DtuTHzxtS0LQ0+blITTkwDXEA?= =?iso-8859-1?Q?6FFZc/Nv09pM4KTRnhf+BpGQtW4trq7a8jPTZBJKaN3adPpTm0Hf+k9QvJ?= =?iso-8859-1?Q?hRnzeIZdQUpbdDLzyOEgZxLPWe5f//w3roCHmRCIKRgF3nQLKHx6puCkXF?= =?iso-8859-1?Q?CeyrQvCASky/fHS0ELGSRMRNPPbt1WcGwxUfzGPnAAlZLwK9dJvvd0HkAd?= =?iso-8859-1?Q?4g2EDufIBketo/4nTnmmzs/pjznHwJ2Nid4yCpzpFA4iRQxfQQfiQy7pxL?= =?iso-8859-1?Q?zdfl3bufNY3F7qlrEzWCRbrru/2AGtIqzQUewME2Gab8gOkjOdWZJgPzrv?= =?iso-8859-1?Q?h+mdGD4P2xQTdQbd2b12iAE8fujEGzwWdmUbRA/pydsiXX+tTD8xHPnstp?= =?iso-8859-1?Q?K5w9fb8=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:MA0P287MB1158.INDP287.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFS:(13230040)(1800799024)(366016)(376014)(8096899003)(38070700018)(4053099003); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?EDZSDlI2kMWy8i7vpcGG4oMq0dyd0qD82IxhhMck2kPg1w2Drzx+rjo18o?= =?iso-8859-1?Q?f2N05zEmxCSluJdvqBcyJAmlCOX/0AD2UTeLz0ARD1aWr0kKAwG9ogJZR2?= =?iso-8859-1?Q?fzqgq23cublidP6eqrFh5EkYTG1qtk8b+IGOkOq/SJJwTZfy+v5uJvWOs2?= =?iso-8859-1?Q?+8+qAIrA1mqEf8U1QdHGrpuoJAkxqOIw8xR7EoxcqCm/A8GbZ6EwduYVb3?= =?iso-8859-1?Q?azeftRli20fiuEgJBfhHU2OdmRGbf01FFzdOjCk14B5Te6c9TSwBZnpXbW?= =?iso-8859-1?Q?eUjzJua3Y1/EsGo/LtApbpycciKFNXhkuXE7OAFn20d59eK6YhIaxC+ZoH?= =?iso-8859-1?Q?IOazGUhhh5jhzeHWJb9RBBtdSfKz03tsonDOprneldAAN/TBA/tn6qfxbJ?= =?iso-8859-1?Q?wXBLhSkm6A7QYtqYJuqa7jnxNtt3iNHnIZWCJALWoQpkKnC/0NxhCYcILP?= =?iso-8859-1?Q?IiInpzEvd8aK5YNs9j07xYNF73uQocjOuxHl4tE0GreFuz/1mT50qPmHQi?= =?iso-8859-1?Q?NKbBsy4JIFkhQXkhSC/e38g70wPy3tdDGbUX22vNVfzWy0kpVKy2Jvlvs4?= =?iso-8859-1?Q?ebEJkG7lPjDRxhjnueb1wUZGSBkBXydE0oNjpZJpa6QJz0v+PqOpN2zqlt?= =?iso-8859-1?Q?6qzfuHqF6xxrR7vqUQ6liBrXKenPOMIzfeJhz80uGlZSPUn/McNVWRbi0x?= =?iso-8859-1?Q?eSenb/C2t1r8daOWTezGvP1Lvjf7+Yy4o2Regk/MGOQnajt/v+3hD2/0Sz?= =?iso-8859-1?Q?SUc3ynjF96z50tKn6lK7GGTZmIBe6fz+YUQetzQWPvGG8E2RiVYFajJ92Z?= =?iso-8859-1?Q?RVzqcpohhzDK4hoKq51p+RwapY8mKnu27JWZsXzSzSYGKy5QPW3bTJOs+O?= =?iso-8859-1?Q?lEKCSbhS6tH1UJvRhkdY9oU5RtEh28dgvSZwuHyWwYQJGjt7lHbJ/VDIrw?= =?iso-8859-1?Q?tFt49iQsyYTwTpZZEbcLxTC57PXYeqeuASz/RpBUkv8HHRg4Iy9squlUsa?= =?iso-8859-1?Q?7RiIs13VbY8U6owxrB8TGVGB9u+IQjsMzHvUO2ebGfBSp2x5Koc0Wwb4Kk?= =?iso-8859-1?Q?PsTPxGC7y0ZzHeUL5qZKIL4J6EwJ6s1ezhwvpEyLuobA1YFBbz/P/fRyyJ?= =?iso-8859-1?Q?2nYHb6Llu2ky6UsJy/g9MyD8yWNzcZ2mr4Ykiefx8Xb3aQzYwdbGYAprHt?= =?iso-8859-1?Q?qcUaROkKC7y6U8Y7Zfu5DMxueqBfaq5RL8zMduOH7tSWIXdpxNFHaPm6ls?= =?iso-8859-1?Q?s3hm0CKswVw3EK91cwCcvK9xlaOR1JShQtQD1DSAWKe2GK/LOvURVTXcUz?= =?iso-8859-1?Q?54V5cKTI/NrPbm9fYwI4fGu6n20W920rsLg//HeQUVrQkZ6QWF0fdrED8w?= =?iso-8859-1?Q?0BA0pqSd8hzpZLtQw39vh8yU59YjrEur/N33BjX0Dt+J7oO4ioUI01vbue?= =?iso-8859-1?Q?9EVfTlA/8CNVNJOb0UJjnK/qnYqEqt502FQHTGR4qV6WhV122qIOZyTx2Q?= =?iso-8859-1?Q?zHpyaxPJS7WKZ2aA/Udvld7et5plUMwygpvI9XJbI9RBSRwW6Y5vRfPYm7?= =?iso-8859-1?Q?uzqJW9Smo23ZXGxpseSOvp/WDozzETxeKrskBHn42eI10eDqcmoPqNwgiK?= =?iso-8859-1?Q?WLxTY7BdoBOXywcKKJnCIPa/IG0hKuArBSgYK8oXky6PgpdYkyMnqu2Q?= =?iso-8859-1?Q?=3D=3D?= Content-Type: multipart/mixed; boundary="_004_MA0P287MB1158754434858E1A7DFA1858D699AMA0P287MB1158INDP_" MIME-Version: 1.0 X-OriginatorOrg: multicorewareinc.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MA0P287MB1158.INDP287.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: 49c17ac9-ffb7-42e2-6c5d-08dd993823a3 X-MS-Exchange-CrossTenant-originalarrivaltime: 22 May 2025 13:54:15.1517 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: ffc5e88b-3fa2-4d69-a468-344b6b766e7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: kfbcYHg7vNltUY6zvoyEhVty0YMmmRueTJgD9EYYXz9rKv9Ani+qYXTwDT/lYXMQWGQNeAWkBLyMWerjMUHowk+1X5g1WeXNYm474R22Rd4= X-MS-Exchange-Transport-CrossTenantHeadersStamped: MAZP287MB0024 X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: [FFmpeg-devel] [PATCH] swscale/output: Implement neon intrinsics for yuv2nv12cX_c() X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Dash Santosh Sathyanarayanan Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --_004_MA0P287MB1158754434858E1A7DFA1858D699AMA0P287MB1158INDP_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable This optimization provides 6x improvement for the module. The boost in perf= ormance was calculated by adding C timers inside the C function and the opt= imized neon intrinsic function. >From 1deceb0394a5acdf70677870dc252fd66a91dd9f Mon Sep 17 00:00:00 2001 From: Harshitha Suresh Date: Mon, 19 May 2025 22:37:20 +0530 Subject: [PATCH] swscale/output: Implement neon intrinsics for yuv2nv12cX_c= () --- libswscale/aarch64/swscale.c | 151 +++++++++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+) diff --git a/libswscale/aarch64/swscale.c b/libswscale/aarch64/swscale.c index 6e5a721c1f..fb59c3f1b0 100644 --- a/libswscale/aarch64/swscale.c +++ b/libswscale/aarch64/swscale.c @@ -21,6 +21,9 @@ #include "libswscale/swscale.h" #include "libswscale/swscale_internal.h" #include "libavutil/aarch64/cpu.h" +#if defined (__aarch64__) +#include +#endif void ff_hscale16to15_4_neon_asm(int shift, int16_t *_dst, int dstW, const uint8_t *_src, const int16_t *filter, @@ -142,6 +145,153 @@ static void ff_hscale16to19_X4_neon(SwsInternal *c, i= nt16_t *_dst, int dstW, } +static void ff_yuv2nv12cX_neon(enum AVPixelFormat dstFormat, const uint8_t= *chrDither, + const int16_t *chrFilter, int chrFilterSize, + const int16_t **chrUSrc, const int16_t **chrVSrc, + uint8_t *dest, int chrDstW) +{ + + int i; + int u_dither[8], v_dither[8]; + for (i =3D 0; i < 8; i++) { + u_dither[i] =3D chrDither[i & 7] << 12; + v_dither[i] =3D chrDither[(i + 3) & 7] << 12; + } + int32x4_t u0 =3D vld1q_s32(&u_dither[0]); + int32x4_t u1 =3D vld1q_s32(&u_dither[4]); + int32x4_t v0 =3D vld1q_s32(&v_dither[0]); + int32x4_t v1 =3D vld1q_s32(&v_dither[4]); + + if (!isSwappedChroma(dstFormat)) + { + for (i =3D 0; i <=3D chrDstW - 8; i +=3D 8) + { + int32x4_t udst0 =3D u0; + int32x4_t udst1 =3D u1; + int32x4_t vdst0 =3D v0; + int32x4_t vdst1 =3D v1; + + for (int j =3D 0; j < chrFilterSize; j++) + { + int16x8_t usrc0 =3D vld1q_s16(&chrUSrc[j][i]); + int16x8_t vsrc0 =3D vld1q_s16(&chrVSrc[j][i]); + + int32x4_t usrc0_low =3D vmovl_s16(vget_low_s16(usrc0)); + int32x4_t usrc0_high =3D vmovl_s16(vget_high_s16(usrc0)); + int32x4_t vsrc0_low =3D vmovl_s16(vget_low_s16(vsrc0)); + int32x4_t vsrc0_high =3D vmovl_s16(vget_high_s16(vsrc0)); + + udst0 =3D vmlaq_n_s32(udst0, usrc0_low, chrFilter[j]); + udst1 =3D vmlaq_n_s32(udst1, usrc0_high, chrFilter[j]); + vdst0 =3D vmlaq_n_s32(vdst0, vsrc0_low, chrFilter[j]); + vdst1 =3D vmlaq_n_s32(vdst1, vsrc0_high, chrFilter[j]); + + } + // Right shift by 19 + udst0 =3D vshrq_n_s32(udst0, 19); + udst1 =3D vshrq_n_s32(udst1, 19); + vdst0 =3D vshrq_n_s32(vdst0, 19); + vdst1 =3D vshrq_n_s32(vdst1, 19); + + // Convert to 16-bit and then to uint8, with saturation + int16x8_t u16 =3D vcombine_s16(vqmovn_s32(udst0), vqmovn_s32(u= dst1)); + int16x8_t v16 =3D vcombine_s16(vqmovn_s32(vdst0), vqmovn_s32(v= dst1)); + + uint8x8_t u8 =3D vqmovun_s16(u16); + uint8x8_t v8 =3D vqmovun_s16(v16); + + // Store interleaved u/v as UV UV UV... + uint8x8x2_t uv; + uv.val[0] =3D u8; + uv.val[1] =3D v8; + vst2_u8(dest + 2 * i, uv); + } + + // Handle remaining pixels with scalar fallback + for (; i < chrDstW; i++) + { + int u =3D chrDither[i & 7] << 12; + int v =3D chrDither[(i + 3) & 7] << 12; + + for (int j =3D 0; j < chrFilterSize; j++) + { + u +=3D chrUSrc[j][i] * chrFilter[j]; + v +=3D chrVSrc[j][i] * chrFilter[j]; + } + + uint8_t uu =3D av_clip_uint8(u >> 19); + uint8_t vv =3D av_clip_uint8(v >> 19); + dest[2 * i] =3D uu; + dest[2 * i + 1] =3D vv; + } + } + else + { + if (!isSwappedChroma(dstFormat)) + { + for (i =3D 0; i <=3D chrDstW - 8; i +=3D 8) + { + int32x4_t udst0 =3D u0; + int32x4_t udst1 =3D u1; + int32x4_t vdst0 =3D v0; + int32x4_t vdst1 =3D v1; + + for (int j =3D 0; j < chrFilterSize; j++) + { + int16x8_t usrc =3D vld1q_s16(&chrUSrc[j][i]); + int16x8_t vsrc =3D vld1q_s16(&chrVSrc[j][i]); + + int32x4_t usrc_low =3D vmovl_s16(vget_low_s16(usrc)); + int32x4_t usrc_high =3D vmovl_s16(vget_high_s16(usrc))= ; + int32x4_t vsrc_low =3D vmovl_s16(vget_low_s16(vsrc)); + int32x4_t vsrc_high =3D vmovl_s16(vget_high_s16(vsrc))= ; + + udst0 =3D vmlaq_n_s32(udst0, usrc_low, chrFilter[j]); + udst1 =3D vmlaq_n_s32(udst1, usrc_high, chrFilter[j]); + vdst0 =3D vmlaq_n_s32(vdst0, vsrc_low, chrFilter[j]); + vdst1 =3D vmlaq_n_s32(vdst1, vsrc_high, chrFilter[j]); + } + // Right shift by 19 + udst0 =3D vshrq_n_s32(udst0, 19); + udst1 =3D vshrq_n_s32(udst1, 19); + vdst0 =3D vshrq_n_s32(vdst0, 19); + vdst1 =3D vshrq_n_s32(vdst1, 19); + + // Convert to 16-bit and then to uint8, with saturation + int16x8_t u16 =3D vcombine_s16(vqmovn_s32(udst0), vqmovn_s= 32(udst1)); + int16x8_t v16 =3D vcombine_s16(vqmovn_s32(vdst0), vqmovn_s= 32(vdst1)); + + uint8x8_t u8 =3D vqmovun_s16(u16); + uint8x8_t v8 =3D vqmovun_s16(v16); + + // Store interleaved u/v as UV UV UV... + uint8x8x2_t uv; + uv.val[0] =3D v8; + uv.val[1] =3D u8; + vst2_u8(dest + 2 * i, uv); + } + + // Handle remaining pixels with scalar fallback + for (; i < chrDstW; i++) + { + int u =3D chrDither[i & 7] << 12; + int v =3D chrDither[(i + 3) & 7] << 12; + + for (int j =3D 0; j < chrFilterSize; j++) + { + u +=3D chrUSrc[j][i] * chrFilter[j]; + v +=3D chrVSrc[j][i] * chrFilter[j]; + } + + uint8_t uu =3D av_clip_uint8(u >> 19); + uint8_t vv =3D av_clip_uint8(v >> 19); + dest[2 * i] =3D vv; + dest[2 * i + 1] =3D uu; + } + } + } +} + #define SCALE_FUNC(filter_n, from_bpc, to_bpc, opt) \ void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt( \ SwsInternal *c, int16_t *d= ata, \ @@ -275,6 +425,7 @@ av_cold void ff_sws_init_swscale_aarch64(SwsInternal *c= ) ASSIGN_VSCALE_FUNC(c->yuv2plane1, neon); if (c->dstBpc =3D=3D 8) { c->yuv2planeX =3D ff_yuv2planeX_8_neon; + c->yuv2nv12cX =3D ff_yuv2nv12cX_neon; } switch (c->opts.src_format) { case AV_PIX_FMT_ABGR: -- 2.36.0.windows.1 --_004_MA0P287MB1158754434858E1A7DFA1858D699AMA0P287MB1158INDP_ Content-Type: application/octet-stream; name="0001-swscale-output-Implement-neon-intrinsics-for-yuv2nv1.patch" Content-Description: 0001-swscale-output-Implement-neon-intrinsics-for-yuv2nv1.patch Content-Disposition: attachment; filename="0001-swscale-output-Implement-neon-intrinsics-for-yuv2nv1.patch"; size=7037; creation-date="Thu, 22 May 2025 13:50:17 GMT"; modification-date="Thu, 22 May 2025 13:50:20 GMT" Content-Transfer-Encoding: base64 RnJvbSAxZGVjZWIwMzk0YTVhY2RmNzA2Nzc4NzBkYzI1MmZkNjZhOTFkZDlmIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBIYXJzaGl0aGEgU3VyZXNoIDxoYXJzaGl0aGFAbXVsdGljb3Jl d2FyZWluYy5jb20+CkRhdGU6IE1vbiwgMTkgTWF5IDIwMjUgMjI6Mzc6MjAgKzA1MzAKU3ViamVj dDogW1BBVENIXSBzd3NjYWxlL291dHB1dDogSW1wbGVtZW50IG5lb24gaW50cmluc2ljcyBmb3Ig eXV2Mm52MTJjWF9jKCkKCi0tLQogbGlic3dzY2FsZS9hYXJjaDY0L3N3c2NhbGUuYyB8IDE1MSAr KysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKwogMSBmaWxlIGNoYW5nZWQsIDE1MSBp bnNlcnRpb25zKCspCgpkaWZmIC0tZ2l0IGEvbGlic3dzY2FsZS9hYXJjaDY0L3N3c2NhbGUuYyBi L2xpYnN3c2NhbGUvYWFyY2g2NC9zd3NjYWxlLmMKaW5kZXggNmU1YTcyMWMxZi4uZmI1OWMzZjFi MCAxMDA2NDQKLS0tIGEvbGlic3dzY2FsZS9hYXJjaDY0L3N3c2NhbGUuYworKysgYi9saWJzd3Nj YWxlL2FhcmNoNjQvc3dzY2FsZS5jCkBAIC0yMSw2ICsyMSw5IEBACiAjaW5jbHVkZSAibGlic3dz Y2FsZS9zd3NjYWxlLmgiCiAjaW5jbHVkZSAibGlic3dzY2FsZS9zd3NjYWxlX2ludGVybmFsLmgi CiAjaW5jbHVkZSAibGliYXZ1dGlsL2FhcmNoNjQvY3B1LmgiCisjaWYgZGVmaW5lZCAoX19hYXJj aDY0X18pCisjaW5jbHVkZSA8YXJtX25lb24uaD4KKyNlbmRpZgogCiB2b2lkIGZmX2hzY2FsZTE2 dG8xNV80X25lb25fYXNtKGludCBzaGlmdCwgaW50MTZfdCAqX2RzdCwgaW50IGRzdFcsCiAgICAg ICAgICAgICAgICAgICAgICAgY29uc3QgdWludDhfdCAqX3NyYywgY29uc3QgaW50MTZfdCAqZmls dGVyLApAQCAtMTQyLDYgKzE0NSwxNTMgQEAgc3RhdGljIHZvaWQgZmZfaHNjYWxlMTZ0bzE5X1g0 X25lb24oU3dzSW50ZXJuYWwgKmMsIGludDE2X3QgKl9kc3QsIGludCBkc3RXLAogCiB9CiAKK3N0 YXRpYyB2b2lkIGZmX3l1djJudjEyY1hfbmVvbihlbnVtIEFWUGl4ZWxGb3JtYXQgZHN0Rm9ybWF0 LCBjb25zdCB1aW50OF90ICpjaHJEaXRoZXIsCisgICAgY29uc3QgaW50MTZfdCAqY2hyRmlsdGVy LCBpbnQgY2hyRmlsdGVyU2l6ZSwKKyAgICBjb25zdCBpbnQxNl90ICoqY2hyVVNyYywgY29uc3Qg aW50MTZfdCAqKmNoclZTcmMsCisgICAgdWludDhfdCAqZGVzdCwgaW50IGNockRzdFcpCit7CisK KyAgICBpbnQgaTsKKyAgICBpbnQgdV9kaXRoZXJbOF0sIHZfZGl0aGVyWzhdOworICAgIGZvciAo aSA9IDA7IGkgPCA4OyBpKyspIHsKKyAgICAgICAgdV9kaXRoZXJbaV0gPSBjaHJEaXRoZXJbaSAm IDddIDw8IDEyOworICAgICAgICB2X2RpdGhlcltpXSA9IGNockRpdGhlclsoaSArIDMpICYgN10g PDwgMTI7CisgICAgfQorICAgIGludDMyeDRfdCB1MCA9IHZsZDFxX3MzMigmdV9kaXRoZXJbMF0p OworICAgIGludDMyeDRfdCB1MSA9IHZsZDFxX3MzMigmdV9kaXRoZXJbNF0pOworICAgIGludDMy eDRfdCB2MCA9IHZsZDFxX3MzMigmdl9kaXRoZXJbMF0pOworICAgIGludDMyeDRfdCB2MSA9IHZs ZDFxX3MzMigmdl9kaXRoZXJbNF0pOworCisgICAgaWYgKCFpc1N3YXBwZWRDaHJvbWEoZHN0Rm9y bWF0KSkKKyAgICB7CisgICAgICAgIGZvciAoaSA9IDA7IGkgPD0gY2hyRHN0VyAtIDg7IGkgKz0g OCkKKyAgICAgICAgeworICAgICAgICAgICAgaW50MzJ4NF90IHVkc3QwID0gdTA7CisgICAgICAg ICAgICBpbnQzMng0X3QgdWRzdDEgPSB1MTsKKyAgICAgICAgICAgIGludDMyeDRfdCB2ZHN0MCA9 IHYwOworICAgICAgICAgICAgaW50MzJ4NF90IHZkc3QxID0gdjE7CisKKyAgICAgICAgICAgIGZv ciAoaW50IGogPSAwOyBqIDwgY2hyRmlsdGVyU2l6ZTsgaisrKQorICAgICAgICAgICAgeworICAg ICAgICAgICAgICAgIGludDE2eDhfdCB1c3JjMCA9IHZsZDFxX3MxNigmY2hyVVNyY1tqXVtpXSk7 CisgICAgICAgICAgICAgICAgaW50MTZ4OF90IHZzcmMwID0gdmxkMXFfczE2KCZjaHJWU3JjW2pd W2ldKTsKKworICAgICAgICAgICAgICAgIGludDMyeDRfdCB1c3JjMF9sb3cgPSB2bW92bF9zMTYo dmdldF9sb3dfczE2KHVzcmMwKSk7CisgICAgICAgICAgICAgICAgaW50MzJ4NF90IHVzcmMwX2hp Z2ggPSB2bW92bF9zMTYodmdldF9oaWdoX3MxNih1c3JjMCkpOworICAgICAgICAgICAgICAgIGlu dDMyeDRfdCB2c3JjMF9sb3cgPSB2bW92bF9zMTYodmdldF9sb3dfczE2KHZzcmMwKSk7CisgICAg ICAgICAgICAgICAgaW50MzJ4NF90IHZzcmMwX2hpZ2ggPSB2bW92bF9zMTYodmdldF9oaWdoX3Mx Nih2c3JjMCkpOworCisgICAgICAgICAgICAgICAgdWRzdDAgPSB2bWxhcV9uX3MzMih1ZHN0MCwg dXNyYzBfbG93LCBjaHJGaWx0ZXJbal0pOworICAgICAgICAgICAgICAgIHVkc3QxID0gdm1sYXFf bl9zMzIodWRzdDEsIHVzcmMwX2hpZ2gsIGNockZpbHRlcltqXSk7CisgICAgICAgICAgICAgICAg dmRzdDAgPSB2bWxhcV9uX3MzMih2ZHN0MCwgdnNyYzBfbG93LCBjaHJGaWx0ZXJbal0pOworICAg ICAgICAgICAgICAgIHZkc3QxID0gdm1sYXFfbl9zMzIodmRzdDEsIHZzcmMwX2hpZ2gsIGNockZp bHRlcltqXSk7CisKKyAgICAgICAgICAgIH0KKyAgICAgICAgICAgIC8vIFJpZ2h0IHNoaWZ0IGJ5 IDE5CisgICAgICAgICAgICB1ZHN0MCA9IHZzaHJxX25fczMyKHVkc3QwLCAxOSk7CisgICAgICAg ICAgICB1ZHN0MSA9IHZzaHJxX25fczMyKHVkc3QxLCAxOSk7CisgICAgICAgICAgICB2ZHN0MCA9 IHZzaHJxX25fczMyKHZkc3QwLCAxOSk7CisgICAgICAgICAgICB2ZHN0MSA9IHZzaHJxX25fczMy KHZkc3QxLCAxOSk7CisKKyAgICAgICAgICAgIC8vIENvbnZlcnQgdG8gMTYtYml0IGFuZCB0aGVu IHRvIHVpbnQ4LCB3aXRoIHNhdHVyYXRpb24KKyAgICAgICAgICAgIGludDE2eDhfdCB1MTYgPSB2 Y29tYmluZV9zMTYodnFtb3ZuX3MzMih1ZHN0MCksIHZxbW92bl9zMzIodWRzdDEpKTsKKyAgICAg ICAgICAgIGludDE2eDhfdCB2MTYgPSB2Y29tYmluZV9zMTYodnFtb3ZuX3MzMih2ZHN0MCksIHZx bW92bl9zMzIodmRzdDEpKTsKKworICAgICAgICAgICAgdWludDh4OF90IHU4ID0gdnFtb3Z1bl9z MTYodTE2KTsKKyAgICAgICAgICAgIHVpbnQ4eDhfdCB2OCA9IHZxbW92dW5fczE2KHYxNik7CisK KyAgICAgICAgICAgIC8vIFN0b3JlIGludGVybGVhdmVkIHUvdiBhcyBVViBVViBVVi4uLgorICAg ICAgICAgICAgdWludDh4OHgyX3QgdXY7CisgICAgICAgICAgICB1di52YWxbMF0gPSB1ODsKKyAg ICAgICAgICAgIHV2LnZhbFsxXSA9IHY4OworICAgICAgICAgICAgdnN0Ml91OChkZXN0ICsgMiAq IGksIHV2KTsKKyAgICAgICAgfQorCisgICAgICAgIC8vIEhhbmRsZSByZW1haW5pbmcgcGl4ZWxz IHdpdGggc2NhbGFyIGZhbGxiYWNrCisgICAgICAgIGZvciAoOyBpIDwgY2hyRHN0VzsgaSsrKQor ICAgICAgICB7CisgICAgICAgICAgICBpbnQgdSA9IGNockRpdGhlcltpICYgN10gPDwgMTI7Cisg ICAgICAgICAgICBpbnQgdiA9IGNockRpdGhlclsoaSArIDMpICYgN10gPDwgMTI7CisKKyAgICAg ICAgICAgIGZvciAoaW50IGogPSAwOyBqIDwgY2hyRmlsdGVyU2l6ZTsgaisrKQorICAgICAgICAg ICAgeworICAgICAgICAgICAgICAgIHUgKz0gY2hyVVNyY1tqXVtpXSAqIGNockZpbHRlcltqXTsK KyAgICAgICAgICAgICAgICB2ICs9IGNoclZTcmNbal1baV0gKiBjaHJGaWx0ZXJbal07CisgICAg ICAgICAgICB9CisKKyAgICAgICAgICAgIHVpbnQ4X3QgdXUgPSBhdl9jbGlwX3VpbnQ4KHUgPj4g MTkpOworICAgICAgICAgICAgdWludDhfdCB2diA9IGF2X2NsaXBfdWludDgodiA+PiAxOSk7Cisg ICAgICAgICAgICBkZXN0WzIgKiBpXSA9IHV1OworICAgICAgICAgICAgZGVzdFsyICogaSArIDFd ID0gdnY7CisgICAgICAgIH0KKyAgICB9CisgICAgZWxzZQorICAgIHsKKyAgICAgICAgaWYgKCFp c1N3YXBwZWRDaHJvbWEoZHN0Rm9ybWF0KSkKKyAgICAgICAgeworICAgICAgICAgICAgZm9yIChp ID0gMDsgaSA8PSBjaHJEc3RXIC0gODsgaSArPSA4KQorICAgICAgICAgICAgeworICAgICAgICAg ICAgICAgIGludDMyeDRfdCB1ZHN0MCA9IHUwOworICAgICAgICAgICAgICAgIGludDMyeDRfdCB1 ZHN0MSA9IHUxOworICAgICAgICAgICAgICAgIGludDMyeDRfdCB2ZHN0MCA9IHYwOworICAgICAg ICAgICAgICAgIGludDMyeDRfdCB2ZHN0MSA9IHYxOworCisgICAgICAgICAgICAgICAgZm9yIChp bnQgaiA9IDA7IGogPCBjaHJGaWx0ZXJTaXplOyBqKyspCisgICAgICAgICAgICAgICAgeworICAg ICAgICAgICAgICAgICAgICBpbnQxNng4X3QgdXNyYyA9IHZsZDFxX3MxNigmY2hyVVNyY1tqXVtp XSk7CisgICAgICAgICAgICAgICAgICAgIGludDE2eDhfdCB2c3JjID0gdmxkMXFfczE2KCZjaHJW U3JjW2pdW2ldKTsKKworICAgICAgICAgICAgICAgICAgICBpbnQzMng0X3QgdXNyY19sb3cgPSB2 bW92bF9zMTYodmdldF9sb3dfczE2KHVzcmMpKTsKKyAgICAgICAgICAgICAgICAgICAgaW50MzJ4 NF90IHVzcmNfaGlnaCA9IHZtb3ZsX3MxNih2Z2V0X2hpZ2hfczE2KHVzcmMpKTsKKyAgICAgICAg ICAgICAgICAgICAgaW50MzJ4NF90IHZzcmNfbG93ID0gdm1vdmxfczE2KHZnZXRfbG93X3MxNih2 c3JjKSk7CisgICAgICAgICAgICAgICAgICAgIGludDMyeDRfdCB2c3JjX2hpZ2ggPSB2bW92bF9z MTYodmdldF9oaWdoX3MxNih2c3JjKSk7CisKKyAgICAgICAgICAgICAgICAgICAgdWRzdDAgPSB2 bWxhcV9uX3MzMih1ZHN0MCwgdXNyY19sb3csIGNockZpbHRlcltqXSk7CisgICAgICAgICAgICAg ICAgICAgIHVkc3QxID0gdm1sYXFfbl9zMzIodWRzdDEsIHVzcmNfaGlnaCwgY2hyRmlsdGVyW2pd KTsKKyAgICAgICAgICAgICAgICAgICAgdmRzdDAgPSB2bWxhcV9uX3MzMih2ZHN0MCwgdnNyY19s b3csIGNockZpbHRlcltqXSk7CisgICAgICAgICAgICAgICAgICAgIHZkc3QxID0gdm1sYXFfbl9z MzIodmRzdDEsIHZzcmNfaGlnaCwgY2hyRmlsdGVyW2pdKTsKKyAgICAgICAgICAgICAgICB9Cisg ICAgICAgICAgICAgICAgLy8gUmlnaHQgc2hpZnQgYnkgMTkKKyAgICAgICAgICAgICAgICB1ZHN0 MCA9IHZzaHJxX25fczMyKHVkc3QwLCAxOSk7CisgICAgICAgICAgICAgICAgdWRzdDEgPSB2c2hy cV9uX3MzMih1ZHN0MSwgMTkpOworICAgICAgICAgICAgICAgIHZkc3QwID0gdnNocnFfbl9zMzIo dmRzdDAsIDE5KTsKKyAgICAgICAgICAgICAgICB2ZHN0MSA9IHZzaHJxX25fczMyKHZkc3QxLCAx OSk7CisKKyAgICAgICAgICAgICAgICAvLyBDb252ZXJ0IHRvIDE2LWJpdCBhbmQgdGhlbiB0byB1 aW50OCwgd2l0aCBzYXR1cmF0aW9uCisgICAgICAgICAgICAgICAgaW50MTZ4OF90IHUxNiA9IHZj b21iaW5lX3MxNih2cW1vdm5fczMyKHVkc3QwKSwgdnFtb3ZuX3MzMih1ZHN0MSkpOworICAgICAg ICAgICAgICAgIGludDE2eDhfdCB2MTYgPSB2Y29tYmluZV9zMTYodnFtb3ZuX3MzMih2ZHN0MCks IHZxbW92bl9zMzIodmRzdDEpKTsKKworICAgICAgICAgICAgICAgIHVpbnQ4eDhfdCB1OCA9IHZx bW92dW5fczE2KHUxNik7CisgICAgICAgICAgICAgICAgdWludDh4OF90IHY4ID0gdnFtb3Z1bl9z MTYodjE2KTsKKworICAgICAgICAgICAgICAgIC8vIFN0b3JlIGludGVybGVhdmVkIHUvdiBhcyBV ViBVViBVVi4uLgorICAgICAgICAgICAgICAgIHVpbnQ4eDh4Ml90IHV2OworICAgICAgICAgICAg ICAgIHV2LnZhbFswXSA9IHY4OworICAgICAgICAgICAgICAgIHV2LnZhbFsxXSA9IHU4OworICAg ICAgICAgICAgICAgIHZzdDJfdTgoZGVzdCArIDIgKiBpLCB1dik7CisgICAgICAgICAgICB9CisK KyAgICAgICAgICAgIC8vIEhhbmRsZSByZW1haW5pbmcgcGl4ZWxzIHdpdGggc2NhbGFyIGZhbGxi YWNrCisgICAgICAgICAgICBmb3IgKDsgaSA8IGNockRzdFc7IGkrKykKKyAgICAgICAgICAgIHsK KyAgICAgICAgICAgICAgICBpbnQgdSA9IGNockRpdGhlcltpICYgN10gPDwgMTI7CisgICAgICAg ICAgICAgICAgaW50IHYgPSBjaHJEaXRoZXJbKGkgKyAzKSAmIDddIDw8IDEyOworCisgICAgICAg ICAgICAgICAgZm9yIChpbnQgaiA9IDA7IGogPCBjaHJGaWx0ZXJTaXplOyBqKyspCisgICAgICAg ICAgICAgICAgeworICAgICAgICAgICAgICAgICAgICB1ICs9IGNoclVTcmNbal1baV0gKiBjaHJG aWx0ZXJbal07CisgICAgICAgICAgICAgICAgICAgIHYgKz0gY2hyVlNyY1tqXVtpXSAqIGNockZp bHRlcltqXTsKKyAgICAgICAgICAgICAgICB9CisKKyAgICAgICAgICAgICAgICB1aW50OF90IHV1 ID0gYXZfY2xpcF91aW50OCh1ID4+IDE5KTsKKyAgICAgICAgICAgICAgICB1aW50OF90IHZ2ID0g YXZfY2xpcF91aW50OCh2ID4+IDE5KTsKKyAgICAgICAgICAgICAgICBkZXN0WzIgKiBpXSA9IHZ2 OworICAgICAgICAgICAgICAgIGRlc3RbMiAqIGkgKyAxXSA9IHV1OworICAgICAgICAgICAgfQor ICAgICAgICB9CisgICAgfQorfQorCiAjZGVmaW5lIFNDQUxFX0ZVTkMoZmlsdGVyX24sIGZyb21f YnBjLCB0b19icGMsIG9wdCkgXAogdm9pZCBmZl9oc2NhbGUgIyMgZnJvbV9icGMgIyMgdG8gIyMg dG9fYnBjICMjIF8gIyMgZmlsdGVyX24gIyMgXyAjIyBvcHQoIFwKICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIFN3c0ludGVybmFsICpjLCBpbnQxNl90ICpk YXRhLCBcCkBAIC0yNzUsNiArNDI1LDcgQEAgYXZfY29sZCB2b2lkIGZmX3N3c19pbml0X3N3c2Nh bGVfYWFyY2g2NChTd3NJbnRlcm5hbCAqYykKICAgICAgICAgQVNTSUdOX1ZTQ0FMRV9GVU5DKGMt Pnl1djJwbGFuZTEsIG5lb24pOwogICAgICAgICBpZiAoYy0+ZHN0QnBjID09IDgpIHsKICAgICAg ICAgICAgIGMtPnl1djJwbGFuZVggPSBmZl95dXYycGxhbmVYXzhfbmVvbjsKKyAgICAgICAgICAg IGMtPnl1djJudjEyY1ggPSBmZl95dXYybnYxMmNYX25lb247CiAgICAgICAgIH0KICAgICAgICAg c3dpdGNoIChjLT5vcHRzLnNyY19mb3JtYXQpIHsKICAgICAgICAgY2FzZSBBVl9QSVhfRk1UX0FC R1I6Ci0tIAoyLjM2LjAud2luZG93cy4xCgo= --_004_MA0P287MB1158754434858E1A7DFA1858D699AMA0P287MB1158INDP_ Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --_004_MA0P287MB1158754434858E1A7DFA1858D699AMA0P287MB1158INDP_--