From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 6CFF74B89D for ; Mon, 26 May 2025 00:44:29 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id BE0DE68C9C0; Mon, 26 May 2025 03:44:24 +0300 (EEST) Received: from CY4PR02CU008.outbound.protection.outlook.com (mail-westcentralusazon11021136.outbound.protection.outlook.com [40.93.199.136]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 3F9A268C8C5 for ; Mon, 26 May 2025 03:44:18 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=ZO/WikunsxrEpQTyswDRar/953zdp9Eh83ebsSoZWh87JGuVagDzpe6I1xvBcTMdjQ8h6dyLHf0BbIvYSn7C9w+iuLma0X5aGZW06Dczhe3NobSQY/vGCCHKcRWKeVS6lF1vMmoL+xEjBlincxsYd1PwZXRFoI8awiCvS5MOoPvXiTzooBcZ/EyuWBQFdQMC2xkSukpSiwTT+LGRFLFmQl/4qu3IWbWpPiQaJ86XmxZQQ64nkRPazohoYhNh5EDtk7Q471zFE0FebhnuhcWT++0UNSMemY3wqGP4FVuaurnr2QXZBu63XeIzxLFbkPN3rab5bPd2Z/GqHw3xqgX9eQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qcKFocKjDooV8TZFu4QNRPwfwPNZpkRYTn5XclNSYSs=; b=DftNB6sLodS3tpT5ki3Z93jBRY+8B0LgoWZIdCpL/AAX3wMKGAjVGAhQvqcLP9LCAqD8qAL3f5aPjpDT2cCHsnuqOAZ2d1UXOQrxDyStbmY/yoM6oQX0LuxS0Yf19Kf7cseGeXufK+c3nMxVMXCRzvt5SaAephkVSYMFHs8RFPJ6BskmUk67j8S0mpZhpbf3QGQ6tIdoIPFJPaV5Ik6YtCXaQvwB3KwlMnJ0cVzhmfqvQXfg4+2ceP+CCorqExlF50Lc4S7xDMtTBnkJlaXbo41AC//1lNqfub95i0Vwfn56kBV6vZlyQixpOPyhN30zScCRnBvl52xNtY9hRTN4hg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=microsoft.com; dmarc=pass action=none header.from=microsoft.com; dkim=pass header.d=microsoft.com; arc=none Received: from DM4PR21MB4619.namprd21.prod.outlook.com (2603:10b6:8:244::8) by DM4PR21MB3153.namprd21.prod.outlook.com (2603:10b6:8:65::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8769.21; Mon, 26 May 2025 00:44:14 +0000 Received: from DM4PR21MB4619.namprd21.prod.outlook.com ([fe80::7670:fa05:fb39:c4c9]) by DM4PR21MB4619.namprd21.prod.outlook.com ([fe80::7670:fa05:fb39:c4c9%4]) with mapi id 15.20.8746.006; Mon, 26 May 2025 00:44:14 +0000 To: FFmpeg development discussions and patches Thread-Topic: libswscale.c : ff_xyz12Torgb48 expensive unaligned 16 byte accesses Thread-Index: AdvLkh86pv4g8vD4QVOVzys94zh8aQB/Cl7Q Date: Mon, 26 May 2025 00:44:13 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_ActionId=21ef5910-65c8-448d-8338-beb9d54bedd4; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_ContentBits=0; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Enabled=true; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Method=Standard; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Name=Internal; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_SetDate=2025-05-23T03:20:28Z; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_SiteId=72f988bf-86f1-41af-91ab-2d7cd011db47; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Tag=10, 3, 0, 1; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: DM4PR21MB4619:EE_|DM4PR21MB3153:EE_ x-ms-office365-filtering-correlation-id: 9054269f-eba5-45b3-2525-08dd9bee6ffe x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|1800799024|366016|376014|10070799003|8096899003|38070700018; x-microsoft-antispam-message-info: =?us-ascii?Q?0CKIDSJ1xGuzBIARK2xCpLoODTc33oG4xBEZahhBX+vp5w8rFg12/3rfF3NZ?= =?us-ascii?Q?jofaf22FzASk2DlPQ7ZsgrIrbCD0w5J02sx5xY/1z3BhiiTxKf3iuPa/62EH?= =?us-ascii?Q?089QFK1a2EVoD9Q8dYuqWFBEaPm0X/q9Rp/e9a6aYA3v7M9Hm77UV9Y1okJ8?= =?us-ascii?Q?QJ9aeZfqZMg6M87fxAP9fAyDp7qJt0Gxbya4bGAyeY/ushk4GOsC4ACUI3Ac?= =?us-ascii?Q?r8WdPe9c7H1S0AiDp5St7xECN0QPGH9t5E+oBS0oUv4czwoZARQieVyPlpzG?= =?us-ascii?Q?0X7V2z/cLqSck5nFPqLMA8f2we2HgT7fBeVYSL/nxo50Hua1svY/6vz3/HwT?= =?us-ascii?Q?muzvq0B8Df04v4kjtGejBz4m5R/C9sWaGnxSvRSrWSR3lpyd5fHK4QovSOyD?= =?us-ascii?Q?c2nUrcA08u1W0bHmv+Y99LZF7boBi/v2oja7LJ64JD8XAebQaqCt94rzbFGS?= =?us-ascii?Q?dOqypdioR7KoxvJrS6mbhw3pnHNheGFELOrSg3paIdXC9NDH9Cb/ZLkaYagi?= =?us-ascii?Q?ee5vfkDJzmeA11Z+TMB3QP/Rvi6ZwtK43ScRVR0AZijbjMFAf9tg83M7A3H7?= =?us-ascii?Q?hFNwfs4XcgHU67H+eTWpDMrEYOlhvx0LTMweYGGiSCVHURkuJUCiJp8qf8pM?= =?us-ascii?Q?2kHtw0kz1TyyedV2kqYWxt2p1i5J3rQjiSp1pCCxunFSZ2IAAs6LOVl9r4PU?= =?us-ascii?Q?VGlx9++z6NQZ99gmB+FUB6L+DYRg0jKcCVoMii41jN9lwtYpGn+7V1j2jTtH?= =?us-ascii?Q?GgFCUqVaBvzuyD0HWbbHEb4bYh1tT0QE9un7+oyO3KHYag9Bu2btwWk+TF/j?= =?us-ascii?Q?g6+u0qyEY4MEuwkBvVQvYG4wLcZfZCt7uTy8PykAqb0C6y844y0SVEOUVuOg?= =?us-ascii?Q?oHPI6wr44QaJHIa7AD5Nb9VVAudhq0AfN14e1Ii5QR/xEQwZSp2QKFNRRNLN?= =?us-ascii?Q?WcKWK0CHHOYXKwDAOzde3lpGDJEKs2J8gHAZCFilNrBd3fX9rSPEAqloKfNu?= =?us-ascii?Q?D6ilzJmN4yej8z9WzTxSlgO1OkHSyjC21P1VYShBIX51rQpICod7t6KZiwVs?= =?us-ascii?Q?tI8urHGgecnz+AsDMa/BxM83cBEClAfwZQckDesgDzs2kgDZ5lER6/vDfwT1?= =?us-ascii?Q?EjS6UzTO0nKnXE+5sF5ObE0e7/Z5iuSH6JjG3JU85JF6fcA6+v7Qu/OcUgZZ?= =?us-ascii?Q?qwuAF7y/Z6wnp6WSXevy8Kixcja+H2mWkdcKMotJNrcx/IiluB95VvsLT+9O?= =?us-ascii?Q?Ul+cqjSDpLdr3KNcaHadDEKsKEC5lauKGpseOSKLw3wJHy1tpWRAPQD28dkq?= =?us-ascii?Q?pCir1Jm3vRiNxqiStl0/Sx4LtFBXO1/ZSyAhj8aKYHiy2/SRu/ueuuuwyWdt?= =?us-ascii?Q?0xWoVW20fDO4/SaHGpqLDJRAF5SgU0XjVtZZvqZs/xXYp5rxecG93LscVbsH?= =?us-ascii?Q?13J6WSARoEw=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM4PR21MB4619.namprd21.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(1800799024)(366016)(376014)(10070799003)(8096899003)(38070700018); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?7+XcXtHaSFkQ1eG8TdUP9KjntvDAfvadMcmeABcBTu7Yh+jihejUnqd95AA+?= =?us-ascii?Q?rV82RFtE1GqkZR2L7koL5KO9fiMGlCzIockoqv74uPN/qgpV66Ia90w9KiGm?= =?us-ascii?Q?XB5mhdi8trcKBfGo77Q6/b/eQE4QQOA75uPF5kxNDegJ6ZS1LvcFVRPMBIfk?= =?us-ascii?Q?34GZAOYNAvE2hSVyUE0njlxwi+PQ5I1MnM5wzXlnp/DyMLzzx90kZE5U8/gV?= =?us-ascii?Q?JzbnMcFoAUP/rr1dEe+Lq7YUL2KeWToaY74FtqxBh6sdEH+RHT8Z4Q3aLdj6?= =?us-ascii?Q?MunYUQEq+gAceHP2X7ZNyYXyhUpGrFF3rWzLsu+Itt9DjT8q/d0SyZSj6jqW?= =?us-ascii?Q?atR1psq8PZ5CpI1xyJ7lHA/GbZE682qNkywmjx1aQvlrltqZtdpPbKr6aaJi?= =?us-ascii?Q?1nSe+lSqp6p/pdfEhdVSD9d8AGUCxNv8JidQ9TDVmaK9V+VlQnQQ8GU2AsRE?= =?us-ascii?Q?KjBKXw+CGaonZDmfLgnhZ3HDykKjajSBGF5VGBG+MjGqIHRS0JluvOln06Vx?= =?us-ascii?Q?wcwiT/Jva2F6H35qol0mx7BlJuF4JgysXCUc0NPOwhtB/6K5XS/Pvha+7scS?= =?us-ascii?Q?I1bsl7OfhHzaXgryQmLevxou97qDp7xofmdZejRlisDGCGcDyMfSERqnyIm3?= =?us-ascii?Q?874rUYdmXyfjMJeKpUoDRrPlc3MWMq9bYwFFkGdZl9F8wCt+HzaI4Zl29TIQ?= =?us-ascii?Q?sgebwRbCslZ4f6bJ8X/aLuDrxtzelHhVbBOHMhE8+W202ZgiyS3rtLMb2c0t?= =?us-ascii?Q?dzzPCX/5Lg/BADaUMjUi4ENQlCWjVmbVVHtC9VNijYv2DH9iq0K1YUqSL3Ag?= =?us-ascii?Q?QjNySdz4f1btU7agyUoNuIv/X6ricCYG2X6OxyXmtWFVF/qP7n0KtFjieHRz?= =?us-ascii?Q?rZ3tr98Z6MwrmJe/izhN9TYvKzZoyqk8nS2Hmkdxn7URWd0UtKGubTwTlaVf?= =?us-ascii?Q?wBTdcC+ElqoVgsWiGkLNLCaa1jeKePlqOztgac/KzhYPlR4yNat2PhYIgUWU?= =?us-ascii?Q?h5UArgJe89R1Oyh+BxgGXvPSDR9iqL0m//lJ+MyThMoTiHa7kkFNPqvM+Dyk?= =?us-ascii?Q?Rxa7Ssi3WCvVxUC8fst8U05nCWaZywXkOpPReIkSKxTo3JMDy8YXD0ksQ6Kk?= =?us-ascii?Q?3MKuC04IdLmSN6+o2ogzI4AWLjxBspHPNHpDHODJusTHHgk+dvitm/QJixCn?= =?us-ascii?Q?Pv8yGn/G5Qd1sF7D42bRBPflndCPyk5Uct02XpWsVpWGwZLFEIFRl703NgX5?= =?us-ascii?Q?Vp4eV1BhY1cacaAs2t0eUQLSypsNGVDl8ZEd+jmTxp9jrap/E0LpsxLsOkm6?= =?us-ascii?Q?zMtRTJVz1x5/fReTITjT9x4o0BLqoWOXyXZN42KNNL4bNzJbYofPqAcQGqWm?= =?us-ascii?Q?1ReiQL4V+9gB67XDcJpJvgEmFpSsVUpnOKbRaIisOwK7YnsvVftWlm+tDX2z?= =?us-ascii?Q?fnDSnMYBBIzYaU7SlMF7BIpOttKJkRtIZKXrhxPqvoeC1jGI47oCJ8Eq92L1?= =?us-ascii?Q?GBpSA8/qpunrWqzUC/j55yqk2nEGDTOd4Dg2o0UnZMMS5wk2NekAwPl+Vyz7?= =?us-ascii?Q?rZeg8mb6XhIEoJC1JkAvI6xpA8qne+8YWTSHivn5OOb5Z1ii4PVhV38M8FAy?= =?us-ascii?Q?WPe074OHNebyre26oGnQpqE6mXU2o3K5GowsNG2G8bSp?= MIME-Version: 1.0 X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DM4PR21MB4619.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 9054269f-eba5-45b3-2525-08dd9bee6ffe X-MS-Exchange-CrossTenant-originalarrivaltime: 26 May 2025 00:44:13.9777 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: SVKhx5XXyAMb2pr917Q/dfOdzZ3C8nDPQMOvJEqO78D44UWUalrc0J1NIARY9RGwZ90/1GuegxMnDUOjHLd1zQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR21MB3153 X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: [FFmpeg-devel] libswscale.c : ff_xyz12Torgb48 expensive unaligned 16 byte accesses X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Chitra Dey Sarkar via ffmpeg-devel Reply-To: FFmpeg development discussions and patches Cc: Chitra Dey Sarkar Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: H We have been profiling FFmpeg at Microsoft and have identified that ff_xyz12ToRgb48 has a high sample count ( profiled every 1ms ) It seems like ff_xyz12ToRgb48 has performance penalty for 1. Unaligned read and write access 2. Access to xyz2rgb_matrix 3. Multiplication I would be interested in optimizing this code , wanted to check if there is an existing optimized version of this function, or any recommended approach to improve it(? I can move the repeated access to xyz2rgb_matrix outside the inner loop and load a full cache line at once to extract the X, Y, and Z values more efficiently-but I wanted to start by getting some initial feedback or thoughts before proceeding further File : FFmpeg/libswscale/swscale.c at a79720e10f30e9fd18bd78242ce96dde06461343 * FFmpeg/FFmpeg void ff_xyz12Torgb48(const SwsInternal *c, uint8_t *dst, int dst_stride, const uint8_t *src, int src_stride, int w, int h) { .......... Unaligned read ....................... x = AV_RL16(src16 + xp + 0); y = AV_RL16(src16 + xp + 1); z = AV_RL16(src16 + xp + 2); .......... DRAM Access and multiply ....................... // convert from XYZlinear to sRGBlinear r = c->xyz2rgb_matrix[0][0] * x + c->xyz2rgb_matrix[0][1] * y + c->xyz2rgb_matrix[0][2] * z >> 12; g = c->xyz2rgb_matrix[1][0] * x + c->xyz2rgb_matrix[1][1] * y + c->xyz2rgb_matrix[1][2] * z >> 12; b = c->xyz2rgb_matrix[2][0] * x + c->xyz2rgb_matrix[2][1] * y + c->xyz2rgb_matrix[2][2] * z >> 12; .......... RMW Access ....................... AV_WL16(dst16 + xp + 0, c->rgbgamma[r] << 4); AV_WL16(dst16 + xp + 1, c->rgbgamma[g] << 4); AV_WL16(dst16 + xp + 2, c->rgbgamma[b] << 4); Regards Chitra _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".