From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 496FE4C61A for ; Mon, 26 May 2025 08:40:34 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 3E27868CDBF; Mon, 26 May 2025 11:40:31 +0300 (EEST) Received: from PNZPR01CU001.outbound.protection.outlook.com (mail-centralindiaazon11021119.outbound.protection.outlook.com [40.107.51.119]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id A680C68CCDC for ; Mon, 26 May 2025 11:40:24 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=jQVBzKDT51X8nxujsigqWzd5qAZY2pqdnK0s5Q0OPEQi1aoldHZcXlAcWftYdeh1dMw6n40C+E8jO446tQrxquWqahSiRtV4xtxiByskX6Pev0RkLDyfVKYSLipZknlJSIWG4K2PYFMfj14MDjvQI2NT1wxFYdXMh+i7EfzLAXWH1FbQzo63TxWc9KHbgUMrxQuDdsugRYJcGwyGSsAG/ZPADbWGN5D3CbjGuX8VW7YL625am+2Qm83oMrGsO3zOgL6fD5kVseQTqRcHPD7srLAUESHvRPV4iI5wRDKztUJoExT8hdcAq52NdIMi7L7/mAJLq1kDTSkKW2JW0jfIJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4eaDGFuNbPdnRauN5gLCsFMAIoLUEE3te/UqDxUeOaU=; b=a5TNMIqv78j+6C1z7+0agPgWs8qMwTU1akP1oO6lt3tqW5BVtKwDATo0rikx9EX55kAVLQRkCFYeIf8S/w6xP10uRs4xSo75oyen7vvE230ULFlzgQ1DqRVafZ8KVDH4pXqlbETT3zIbRnJ/vm5jaL+ECbXhgMiv/sqyioCnW3OcwiOfZEvpeFkpaj5XqPQp5pPYYJYjUi2O4wp7AvAtqlYoCXuFCIXgwCoJEfgrY1+fTox2lPQyjRr8oJ3m2yj0LLozovGoRl2/lIGRRfiPb1x3HDOMXfdvZAdmDbeAwmWnCukONqC5uTIonrL4ICadcnZd2a6+f39ubm5TbZnUnA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=multicorewareinc.com; dmarc=pass action=none header.from=multicorewareinc.com; dkim=pass header.d=multicorewareinc.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multicorewareinc.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4eaDGFuNbPdnRauN5gLCsFMAIoLUEE3te/UqDxUeOaU=; b=LJLPLSpjgb4c2ugFwa5mK8pgMYJre45VeN/d+ewZNQr/ztBMpN5ZnFTs4oKau5O8LeppvtIc21vBazUIC1IcNGhBCKcMSmQlH5v9S4kvZ27m6GRbCjT4iFZNkWmcACVSa4ITDwTipAtSWEtw5RrtBo9lbHhTHs0RRV2CdDE3tckR0IpQN4Im/XDUj8sEAnx8Nt+Wf70/kE3CfffEAOqjRjpvner/iTuLXrOhN7Z53UVpN0S6xFWTs/OX/fMlvSSXXBsSONwCvkdHKWaGaTN9qPDnCdHOiz5q/XyxNTHVN2x/7DFHVQcW/aei71u93Fo6BXrIhxymdeQFeHqJPXTj1g== Received: from MA0P287MB1158.INDP287.PROD.OUTLOOK.COM (2603:1096:a01:e5::5) by PN2P287MB2045.INDP287.PROD.OUTLOOK.COM (2603:1096:c01:1c6::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8769.26; Mon, 26 May 2025 08:40:17 +0000 Received: from MA0P287MB1158.INDP287.PROD.OUTLOOK.COM ([fe80::d173:abc7:2297:fdc8]) by MA0P287MB1158.INDP287.PROD.OUTLOOK.COM ([fe80::d173:abc7:2297:fdc8%3]) with mapi id 15.20.8769.025; Mon, 26 May 2025 08:40:17 +0000 From: Harshitha Sarangu Suresh To: "ffmpeg-devel@ffmpeg.org" Thread-Topic: [FFmpeg-devel] [PATCH] swscale/output: Implement neon intrinsics for yuv2planeX_10_c_template() Thread-Index: AQHbyyEGtVqHc1DplEWWYn2oMt3YU7PknSvn Date: Mon, 26 May 2025 08:40:17 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-IN, en-US Content-Language: en-IN X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: x-ms-reactions: allow authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=multicorewareinc.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: MA0P287MB1158:EE_|PN2P287MB2045:EE_ x-ms-office365-filtering-correlation-id: 4bba3285-fe8c-402a-3fd3-08dd9c30f124 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|366016|376014|1800799024|7053199007|8096899003|38070700018; x-microsoft-antispam-message-info: =?us-ascii?Q?nDuduFtM9ZS8u2Nuvf+Gvb+GzBMTx3AcsoDYjI7v+XRpMwGBchnlYLBHX6bo?= =?us-ascii?Q?Dh+fu+f0HshDSNEcKHfN+HC3q5ph+Vc3IufYHCebIv01iLvOcJhCTNTfN17e?= =?us-ascii?Q?3cJOIe/eYtrF1fPLdJJuBBJ8NpxMy8IMHQQ+Y5GKZDm6c6uPHG1d2AKrqhsj?= =?us-ascii?Q?6RFBai9DJeGo3+vMWS7X/U1yMMZ1LQ3xW6lWZDvZWe1khZehEJrpT2F0D8Ul?= =?us-ascii?Q?Sa2HElM77mUfQjJtbMEv1Lzs2h8a/ngYKyLlRZxyP9eRwjQdw2VLdquBWhzu?= =?us-ascii?Q?psWyoZknEUZZYolBeP4AcngtJn+pNYxM8WRYzkSbnQt/y+ucxxIXS74OlrvB?= =?us-ascii?Q?fVUD01DYT4lVotaVeB60Et3EXpRD89a0V9aSqrS6UqUhG0ls0OpIle1LmKUn?= =?us-ascii?Q?jGvVprMNg+ujWEDlYYxPtC5wmMDVxCkMM0oTxgEIzgxDJgobxPbvYNsmrvk1?= =?us-ascii?Q?6uJoGqTNd1ZmuRgq2kZlumJjl4YVaIZamhGWIKcmeUvIAMX57tcxi67YmFjL?= =?us-ascii?Q?/lD4Ks5UAb+nZceKRMOzHOVwKmiyzNHOfofFtXAaVselS0++Tt8cGRegl9SR?= =?us-ascii?Q?PicYHRPr7rA0BnfQwJ3PGpHEoAbna0zpHX+nZakW0/WG0XLLcL+GJzW23Jn1?= =?us-ascii?Q?waJWtUYVwyIzXZbDBipvXSpeWUXmvw5FsSsZdsh+ydJ7Fva0xUKG4Oxql00v?= =?us-ascii?Q?ZVVu3rC6oquVYt/21OMkVMI995Kk79Jv0Lk3023SrYPFIgdTXM5Tw7QXZyQF?= =?us-ascii?Q?3dPn5hc/4jdgYQGMxo4wzMYYgs/VnW1+31eOCdmURymP/451RyFRzgJZhg1N?= =?us-ascii?Q?RMD8RNAeKF2nTwEBkR3H2yZdg0nei1BxEpBqtp5c1CMCJeTgxPWHgqHYIw3w?= =?us-ascii?Q?a2dcQYW0w5rhHk1vuMiL6h4lYhdq7hZdfyZ3hJmdLAK3/eOu+JsQ7+sC+gl8?= =?us-ascii?Q?ESwK5fr0kcrDt38ez5dCRqTOKAWbG4ijjTkLVvldXKMwDJ17PDVoDkbTpE7E?= =?us-ascii?Q?Nx5NBKHXOYr9dKB2Fx3QUksqm5U1wsLEBIKwzQuNf9NnFQJglWSQCPynY+dZ?= =?us-ascii?Q?USPzypWVcY++8Kt6JkDqUi1UWHU1JM+1/siIQQvsnYk9gNXMvSl5fmcgC6ek?= =?us-ascii?Q?ur1rTgvjIYLC0BA4+SCXqlVHPoI5PlLrmnHqr3tGGMeMPszPH2J5zWhF/4t7?= =?us-ascii?Q?JdPnKgdPV6WY00QCloyWI3fvVd/WRCZe/Elsm8SXS8O4wwFs6+IaM6KIXU6z?= =?us-ascii?Q?H5R9KlrEfTYfCbFgeSahI9/OA60VLPlHyhHraZLhiU8RN9MR6AfWfAt2jVhZ?= =?us-ascii?Q?XGYOuzGoNdVIx3h2jb4zL5zH3rkGlfbIzupYbJ+kfVc3xvy5jSi+oKgvQrE0?= =?us-ascii?Q?VFnZEwE4YV0ymcNvaho52gtXBwSJnQaRxLDR1rAxklAKUbBkViqvay4YUn2J?= =?us-ascii?Q?eRhLnWI0U4A=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:MA0P287MB1158.INDP287.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFS:(13230040)(366016)(376014)(1800799024)(7053199007)(8096899003)(38070700018); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?rQtKGjHypl+xkgDoEw88C7mcmkQnQ24dCDFvWW6BncC9ACAScLoD/r/14Brh?= =?us-ascii?Q?pyeVX0BqlqzkZ+i2DtTOncSp6s+ww9q1qOAJiv5gL3yWPIWEy2Jevbrr8FsH?= =?us-ascii?Q?ic6qYclhkbCQ2x8BpyhKPkTccBrMwYoEOU5Dkf75BaaRo7xOgSfuBlhIQqHv?= =?us-ascii?Q?sMQH8wV26REpPTmRyprqlbxZQFWE3C/oWJfZ+P7YpRZKgtUjJs4WDsB84iV/?= =?us-ascii?Q?gYQmFl56I4PZj9ZLTloBB66eiDNE6LWOndNF46FYXNgL+yuK6n8eDB/k4lCq?= =?us-ascii?Q?R8FXoI88y8WR4LxM7cQWL22Qs2zeEOH0OfQRJbyLcun2+1Gwxt14Numef/Uh?= =?us-ascii?Q?u8y2zbutMxCCISgxSEJuZOMMYxsgumvTSHo4vx9uPEs1REqyVEmjgHbnbxvJ?= =?us-ascii?Q?hZBKRimslOObMVN3+6mPuaES7r3DRaScS9Dr0aAf/3I6R7efjtPKTw9gcNre?= =?us-ascii?Q?I6JExCKtzEVKoLsIjre4eM3Nofb4pIVsNtuGLJZWci5vfc+lZWpSDAddG9w0?= =?us-ascii?Q?wLGaI/Cbne4GQcW2eKyXNcsa1aA9lRNGs6Ba6VPepeDQdIZJqUCq60dn7YU1?= =?us-ascii?Q?08OJKi/tvTKtQjHMotoQHceq+br+vN/PkRFG/gopNJUPYJDdUYvUveMM+laz?= =?us-ascii?Q?yKaNlSKbBbJpCEjLX+zJ2or6Ni0mNgCvmiHH7wUvUBX5+gUEcYWcn4sTMl93?= =?us-ascii?Q?c7wlak8x3JdeOL/0GFOKS+gnBHcZcKj8McgQDPTnFPUkLl2I96+yaXHIJFt5?= =?us-ascii?Q?XYqIfJ+QPgfhUFf7tc+EHpqfeY40+FOuV9xKKr69BfOGggpQmV5RqT9VD78J?= =?us-ascii?Q?cyyZYwvQHkKSI2kTtnURBjYxULmndS9WOPKK/EfqwhOdPaJ+pXblIcJyoIHk?= =?us-ascii?Q?G2JOYFwxe3c4dFhHaJRpGnbgmboUMwxkwtHxO3LBhoV/N3PrhgNX1/9CVcOx?= =?us-ascii?Q?09Y12GERw40ysemT6QwTiuwlAmTDGLL+2ZG5F2DwL1sSwzXnIWdHgG5CxOWY?= =?us-ascii?Q?2wMB+zmGfygLDsyt+1EY85q6xgtH7CPhSXWG2PGq5+rI1sSoZa7Vr8GEO0D1?= =?us-ascii?Q?f020ncWA98EN6jetTLbjFzf37g94lKwTml6KB5FifOTXSGykEOa3whrIJIis?= =?us-ascii?Q?vzDqPO3ILdIClA0A2/t3gWr4AehKsmyihN+fQSbAodgFDsmX+PNBd/HEhYsW?= =?us-ascii?Q?RhtUrGsyDqFVtTXXCs2qPl3uZNbwkqniXVspJ0usfAyBqUYsM+7Xe75+T26X?= =?us-ascii?Q?+0tQ2g5IDpwbNv+q3NTWcGUbOmdvom8dDy8SPxRiaLLL4sCILo2PU6BsiGdV?= =?us-ascii?Q?D9drb4yPlMTlJapndZ92aBehR79RaNWPfW4HlM7A2OKLOlO132ujQw2peED6?= =?us-ascii?Q?+P0p8kjL63qV6Ob2r/+an+S+KEhLDYOuwoMVoedT6uZaCIJ42gOZMNHwXfIT?= =?us-ascii?Q?6WVsXiEygb+I0C1WxSYbYY2UgtdZU91rBuRghqurkw3tSl4T1shSKuK/WyTg?= =?us-ascii?Q?9aKemIumDVa4sHM8YuB1swIHBMq4vt8l3Mq1N80a1Vb+yQTu/p7ha2W82g4c?= =?us-ascii?Q?pAdG0rEhohrb9USf9H1aTyAKpE826sjWyY6Xlg4OpixWzcRp9ivfOAzzQBB9?= =?us-ascii?Q?SshYACBTCN36/v+F+12LfsA=3D?= MIME-Version: 1.0 X-OriginatorOrg: multicorewareinc.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MA0P287MB1158.INDP287.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: 4bba3285-fe8c-402a-3fd3-08dd9c30f124 X-MS-Exchange-CrossTenant-originalarrivaltime: 26 May 2025 08:40:17.4089 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: ffc5e88b-3fa2-4d69-a468-344b6b766e7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: cPamNqIiNHWrQHfVMTEbsuhV4PJt2EBjLvgk0zPgTcN6iELT/zZPOp75QhMXMsJzvdBGnK6D40BHkYazITMsHvu2+UsEqIshmAo3NOrJp40= X-MS-Exchange-Transport-CrossTenantHeadersStamped: PN2P287MB2045 X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [FFmpeg-devel] [PATCH] swscale/output: Implement neon intrinsics for yuv2planeX_10_c_template() X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Dash Santosh Sathyanarayanan Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Hi, Did you get a chance to review this patch? Get Outlook for Android ________________________________ From: Harshitha Sarangu Suresh Sent: Thursday, May 22, 2025 7:27:31 PM To: ffmpeg-devel@ffmpeg.org Cc: Dash Santosh Sathyanarayanan Subject: [FFmpeg-devel] [PATCH] swscale/output: Implement neon intrinsics for yuv2planeX_10_c_template() This optimization provides 5x improvement for the module. The boost in performance was calculated by adding C timers inside the C function and the optimized neon intrinsic function. >From 904144c2db9e5e72d56360c4c2eb38d426852901 Mon Sep 17 00:00:00 2001 From: Harshitha Suresh Date: Thu, 22 May 2025 10:23:55 +0530 Subject: [PATCH] swscale/output: Implement neon intrinsics for yuv2planeX_10_c_template() --- libswscale/output.c | 76 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 75 insertions(+), 1 deletion(-) diff --git a/libswscale/output.c b/libswscale/output.c index c37649e7ce..345df5ce59 100644 --- a/libswscale/output.c +++ b/libswscale/output.c @@ -22,7 +22,9 @@ #include #include #include - +#if defined (__aarch64__) +#include +#endif #include "libavutil/attributes.h" #include "libavutil/avutil.h" #include "libavutil/avassert.h" @@ -337,6 +339,77 @@ yuv2plane1_10_c_template(const int16_t *src, uint16_t *dest, int dstW, } } + +#if defined (__aarch64__) && !defined(__APPLE__) +static av_always_inline void +yuv2planeX_10_c_template(const int16_t *filter, int filterSize, + const int16_t **src, uint16_t *dest, int dstW, + int big_endian, int output_bits) +{ + const int shift = 11 + 16 - output_bits; + const int bias = 1 << (shift - 1); + const int clip_max = (1 << output_bits) - 1; + int i; + + for (i = 0; i < dstW; i += 16) { + int32x4_t sum0_lo = vdupq_n_s32(bias); + int32x4_t sum0_hi = vdupq_n_s32(bias); + int32x4_t sum1_lo = vdupq_n_s32(bias); + int32x4_t sum1_hi = vdupq_n_s32(bias); + + for (int j = 0; j < filterSize; j++) { + int16x8_t src_vec0 = vld1q_s16(&src[j][i]); + int16x8_t src_vec1 = vld1q_s16(&src[j][i + 8]); + int16x8_t filter_val = vdupq_n_s16(filter[j]); + + sum0_lo = vmlal_s16(sum0_lo, vget_low_s16(src_vec0), vget_low_s16(filter_val)); + sum0_hi = vmlal_s16(sum0_hi, vget_high_s16(src_vec0), vget_high_s16(filter_val)); + sum1_lo = vmlal_s16(sum1_lo, vget_low_s16(src_vec1), vget_low_s16(filter_val)); + sum1_hi = vmlal_s16(sum1_hi, vget_high_s16(src_vec1), vget_high_s16(filter_val)); + } + + // Right shift with rounding + int32x4_t shift_vec = vdupq_n_s32(-shift); + sum0_lo = vshlq_s32(sum0_lo, shift_vec); + sum0_hi = vshlq_s32(sum0_hi, shift_vec); + sum1_lo = vshlq_s32(sum1_lo, shift_vec); + sum1_hi = vshlq_s32(sum1_hi, shift_vec); + + // Clip to output_bits range + sum0_lo = vmaxq_s32(vminq_s32(sum0_lo, vdupq_n_s32(clip_max)), vdupq_n_s32(0)); + sum0_hi = vmaxq_s32(vminq_s32(sum0_hi, vdupq_n_s32(clip_max)), vdupq_n_s32(0)); + sum1_lo = vmaxq_s32(vminq_s32(sum1_lo, vdupq_n_s32(clip_max)), vdupq_n_s32(0)); + sum1_hi = vmaxq_s32(vminq_s32(sum1_hi, vdupq_n_s32(clip_max)), vdupq_n_s32(0)); + + // Convert to 16-bit + uint16x8_t result0 = vcombine_u16( + vreinterpret_u16_s16(vmovn_s32(sum0_lo)), + vreinterpret_u16_s16(vmovn_s32(sum0_hi)) + ); + uint16x8_t result1 = vcombine_u16( + vreinterpret_u16_s16(vmovn_s32(sum1_lo)), + vreinterpret_u16_s16(vmovn_s32(sum1_hi)) + ); + + // Store with proper endianness + if (big_endian) { + result0 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(result0))); + result1 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(result1))); + } + vst1q_u16(&dest[i], result0); + vst1q_u16(&dest[i + 8], result1); + } + + // Handle remaining pixels + for (; i < dstW; i++) { + int val = bias; + for (int j = 0; j < filterSize; j++) { + val += src[j][i] * filter[j]; + } + output_pixel(&dest[i], val); + } +} +#else static av_always_inline void yuv2planeX_10_c_template(const int16_t *filter, int filterSize, const int16_t **src, uint16_t *dest, int dstW, @@ -355,6 +428,7 @@ yuv2planeX_10_c_template(const int16_t *filter, int filterSize, output_pixel(&dest[i], val); } } +#endif #undef output_pixel -- 2.36.0.windows.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".