From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 900F843FB0 for ; Tue, 23 Aug 2022 18:09:21 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2ED9E68B9B3; Tue, 23 Aug 2022 21:09:20 +0300 (EEST) Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-oln040092066050.outbound.protection.outlook.com [40.92.66.50]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7BEFF68B713 for ; Tue, 23 Aug 2022 21:09:13 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Dy8eIA9sf+VGt1B4Ylo1IXNs1judXu0+/223Hb2gISwAfLTsqpO5LA/GU6Wx4vsijYi41fNUNiIrC+rjZXiGkrFb5BccvmQ9rrMHFeMphiAhGgmgKzo7lljuJtQ9+dsnnj/GcOkRmgXKc1jYw/SvWMav1vaAINNpqDAfyyGnekyxDc0Hpe0U/XjTXQh3yQIhXne7AeVx/eU5lNPYlBekCqq7DLWrZPM/smMcr6M4xwvAml/C7AzVa3H0QnMWTHZCnrQim9F1Hi+WUcgugvhXfY7GeYZ7QSN57wfMMdFHpqYpB65el+p1b/MxI0tkbSKdW5IBWgGuyakiPQi41guyFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zR4Lb4YPYJqby8LjA30370dTXy7h5zpkK+s175GKkS0=; b=IhgvZa7yOIhEXhSB0blMM2tbDCtf5PbtEz7OmKpa5XKK7gpXF5EOC974INzh15cWTyj2/phJ0PMUHXSwBxAtS/R/zL0W2aF+cyutKJ6di5H3l8HIb2w04y7x59LAbSi+zYfu0/yu9ty69aNmZ3tvG5rRvXwvWsQEhfZ9jcZRdJ4PNdid+eMK2Af6G646a9E0iYpixt/3MLjGweiL6YeXH24i0TlYW/2/gHwPVp9l89bZ2MaL9qspqfcvnbNYLtWMNFg+9Q/GE+NHkhUmoF57mn6Gqt+EjNUecgItRfxVURasAEWmmqPiOJL9yZwa7o1MLXbovRwRWJ1fs9QxNh/Vjw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zR4Lb4YPYJqby8LjA30370dTXy7h5zpkK+s175GKkS0=; b=ctSEqfUzSXMMOjIli7ZvwAyassK1zPqG49XKdhBFrAbbXeQQqPzNcHZz5eAXPpXbL2BAAFsvkbO2elfRzCkr6VgB1upw+kJEiP+//X4tdO05MzVcB0NsIYpk5GwHZfi8O/gF0kpP5jhVKoRdvwHre/SU3fUPPyFdWEOw7WoXIVLZyOCN/ZB7gqH5UBpqlfqgQhw7Y1psyz0R5vIvSaxQk/o1YCFatISpnS5S52slVszogMaUVT93Eb0wwtg9pt+NTUCulXyA0Hp5ysx0O0zcx7Z96jW7aLedvtvp+2VfXNwRdJZqqJ/VfJW7r4KDk8Z1TyzqTiiiLBjt9IpqXzo0vQ== Received: from DB6PR0101MB2214.eurprd01.prod.exchangelabs.com (2603:10a6:4:42::27) by DBAPR01MB6920.eurprd01.prod.exchangelabs.com (2603:10a6:10:19f::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5566.14; Tue, 23 Aug 2022 18:09:12 +0000 Received: from DB6PR0101MB2214.eurprd01.prod.exchangelabs.com ([fe80::210e:b627:bcc9:8c46]) by DB6PR0101MB2214.eurprd01.prod.exchangelabs.com ([fe80::210e:b627:bcc9:8c46%11]) with mapi id 15.20.5546.022; Tue, 23 Aug 2022 18:09:12 +0000 Message-ID: Date: Tue, 23 Aug 2022 20:09:09 +0200 Content-Language: en-US To: ffmpeg-devel@ffmpeg.org References: <20220823154215.GJ2088045@pb2> <20220823175112.GK2088045@pb2> From: Andreas Rheinhardt In-Reply-To: <20220823175112.GK2088045@pb2> X-TMN: [Hw9wg4ug1l+2Oo7/6tz0aywxzEScu/3U] X-ClientProxiedBy: ZR0P278CA0083.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:22::16) To DB6PR0101MB2214.eurprd01.prod.exchangelabs.com (2603:10a6:4:42::27) X-Microsoft-Original-Message-ID: <54c21ec9-ffc4-5f2d-73cf-dff019285ed6@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 4672ef15-7e92-40ad-4877-08da853294fd X-MS-TrafficTypeDiagnostic: DBAPR01MB6920:EE_ X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: +dZrAA6K5yzLnySnsXkQxPDHfsuGZ2LXeoUhYRo6tEgXTh7RZsTEDrektdw5ZZSRkJlIA3JI1CK3esw2RiGn8M7On9++UpDZ2F5yBoa404FFU+sVGvs4jOHXDukcwg1chlOGmhMCDaBoeXxiGtUEeba6lvsBvtTxr8YIWbI+xfleCIQxp//Lmv1rH8zyavCzcCuv5yplyAfbVDdS1HSAuWaNj+Glilb6tHnKQvWBspTHOp88HZRoopKpHJGGWxHkfZhcU/rE/JcjwlBvWsiRwOpIcUXKFloz3n3uiaxslK6uW338+ktO2f9n+d5VH6dViVgZDh4oatxxvab939zUgBLlUbqtnDAHtJX0CSS5wZBBFun3umhOJcmJaLyuGybAMxd3z+AyWG1dGO61gvqikNVHBMlw1zDENozr91KwCnnkmlNQNNKgG8hRnU9ZM4EZlPuRd3VPS9s5VYmVGcxyjz7nEi/NYj3AOrkU1kFc69PsWxjyiTI3KQTnNvHHddTdhsq04bvkvexuV0nvufB3mPoOf6uAAEcGh7BZteyYcSBVjEdRBhTX5ZBf+3ptKCse0GFjRigkTvCn5eDTSBVsV7HyAzG3P5mMj1Mk5VTw5arl9TzzsB5QW+FuRSnMmF6uvR2wmZ/Yz5Aa14fulcqq9NBJ/j/WENeGJvOvpc8bYQWHLCqXomShjySHbnRj6vPR X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?YW9hdUNLMnNXNmhzMHhvL3diMm43bTFtVWFiaVBPNHNSeTRpNnpQTWVlekxn?= =?utf-8?B?MXBTOEx5VEV3cUV3aGFUeFYrUFZZWGx6V2xIWlBkVnlnYTdvWmpGTmlEK2lr?= =?utf-8?B?MjRqUjFndjRXUDBtWHVGQUxHSTlaOU5rT0JOTEdlRDE2c3FnQ296aGdwQlNE?= =?utf-8?B?bjhZdXhONWhUS0tXS2l0RUdUYklZeERnaXZBKzZwZVI2N3h5VHdqYXFtZkNo?= =?utf-8?B?ZFpYSFNOWm9Fa1NuYitpVVFLR084ZFMvMGhFdHlQUTdIZWpqNDhCcTVTNGpG?= =?utf-8?B?VFA5Q1RJU0tCOHl4YmMvQzUrZWkxTWVNMWtua3NLZW02dXhqa3NaRlhvVURN?= =?utf-8?B?ZEhpejUzRGVDSVYrTnVDOFc0Q0IrZklQSWJnZnBBVml6Nkd1bm9ZUlJMWWd3?= =?utf-8?B?V3BrUXNza2hTRVl1UjdCOWZXY3U4U0NDTEd4UnRHNlJEZzkxYm9vZm8rSGsr?= =?utf-8?B?Sk5KY1RnZThqUC8zNkVrbDlmTklEczFJdjNnNVJ1bStVaEUzekkwMm15a2Vz?= =?utf-8?B?NWF5bEpBaFhBMkpVK3ZiMGU1cjN1WmsyR0FUbjUvZ0Z3K2p3MWhhU2gxYkxE?= =?utf-8?B?b0RBUjBubExDd01HVlI0MEJwSHVCVnJmNFVuYjI3VU1CRDhsWlZHV1lDYWk5?= =?utf-8?B?OUd3Y3Fpd1l1YzZvTnJobUE0NFo3bDM5bzB4dGNyZEdKLytYRHJKZjBBV1lN?= =?utf-8?B?cnhRanFpelFwVVNhUHo5S2tqUjFMRlJjV0ZqTVNCWHhqRmliMDFvWFU1NjRQ?= =?utf-8?B?Q0wyZXB1dDNVN1FnMzF4T3J5NUhwU0FKbXY1ZEtzRkNsclRsenNpTEtqZlVx?= =?utf-8?B?RzhmTTEwOWpFWkYzd2FSMDFoL2JQb3Zqc0MyVzZMcjhiTjY3Y09ETFVjRzBE?= =?utf-8?B?WUZSTEc1alVTVG4yUFJuakRuYjh1SUlVVnhJN0VnUGVoUTBsMW9LdGk4ZnVX?= =?utf-8?B?QWN2VHh3SkR0bjVKdjROaG8yeXdCVUYvVGQzYmkySFNUUkc2R0g5SG5xalNv?= =?utf-8?B?SVg4UkhoaXRxckFxOWdGWFB5cVBGWFZGVG0rMlE3TUtIU09iZzlwdTg0QWJz?= =?utf-8?B?SFBTZFFZcVg0eStXVG9ORU9iSXpCVmtrRnpkV0k2WTBPZ3lJVzF0U2tnYlFW?= =?utf-8?B?d3ZoWlJ6NFVFMUQvYkFNcXZJN3libjY3NXlHalk5RnJZU3cxYmR0YUlyZXhm?= =?utf-8?B?NG9TRDdJakI0NHdVUi8vS29Va2ExTXhQTzVQazlzQnV0bys0UjJyTUsrRlN1?= =?utf-8?B?My9ETHBFV2NWQVpSbEltTm5tT0NKbStZbVE5bXg5Njg0MXBqWmtpRWptOHRk?= =?utf-8?B?N0JKQmdyUnBpR2UreFpCdkE4eTV4bDlUY3BsbnozOHVDOHBDdFFLYXdlYXJu?= =?utf-8?B?Zm9jNFR0UkRoNm1aYkNtU2Q4MUhqWkpaTHhYNk9HYzBvR0EydDgyRHlIZFlC?= =?utf-8?B?WXBEVWpqcWVPODdKY3J2K3lrR2ZvajR6b2NtYzV0N2xTVG1wNW56bC82cWFL?= =?utf-8?B?MXZGeThidVZxRmhDZitVMFo1MXFwaGNQVkRGcDgvMi9WaEJncFZFbTlSL1N0?= =?utf-8?B?MENsd0JjbENSNW9TKzg0d1g5L2V6Qm5UbG5pY1puQkpaMUZ2RWNiS3p2a1Rh?= =?utf-8?B?YlZmZGh2YUl2L1FJRld6MzB0c1E1L3p5cmpMT3c3WnNKZ05GL2FzNHhTUzhE?= =?utf-8?B?Ym5Wa0tvWG9tMCtKaTc0cUlUUEJBWEllUUJ3N2dlZ3BBbTJTb0RzVWk0eExz?= =?utf-8?Q?J7NTmf/CRmIpak4d+0=3D?= X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4672ef15-7e92-40ad-4877-08da853294fd X-MS-Exchange-CrossTenant-AuthSource: DB6PR0101MB2214.eurprd01.prod.exchangelabs.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Aug 2022 18:09:12.3801 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBAPR01MB6920 Subject: Re: [FFmpeg-devel] [PATCH] swscale/x86/rgb2_rgb: Empty MMX state in ff_shuffle_bytes_2103_mmxext X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Michael Niedermayer: > On Tue, Aug 23, 2022 at 07:28:19PM +0200, Andreas Rheinhardt wrote: >> Michael Niedermayer: >>> On Mon, Aug 22, 2022 at 11:59:17PM +0200, Andreas Rheinhardt wrote: >>>> Andreas Rheinhardt: >>>>> Fixes FATE-failures with the the filter-2xbr filter-3xbr filter-4xbr >>>>> filter-ep2x filter-ep3x filter-hq2x filter-hq3x filter-hq4x >>>>> filter-paletteuse-bayer filter-paletteuse-bayer0 >>>>> filter-paletteuse-nodither and filter-paletteuse-sierra2_4a tests >>>>> when using 32bit x86 with CPUFLAGS ranging from "mmx+mmxext" to >>>>> "mmx+mmxext+sse+sse2+sse3" (the relevant function is only overwritten >>>>> when using SSSE3). >>>>> >>>>> Signed-off-by: Andreas Rheinhardt >>>>> --- >>>>> libswscale/x86/rgb_2_rgb.asm | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/libswscale/x86/rgb_2_rgb.asm b/libswscale/x86/rgb_2_rgb.asm >>>>> index c695c61d5c..76ca1eec03 100644 >>>>> --- a/libswscale/x86/rgb_2_rgb.asm >>>>> +++ b/libswscale/x86/rgb_2_rgb.asm >>>>> @@ -104,6 +104,7 @@ jge .end >>>>> jl .loop_simd >>>>> >>>>> .end: >>>>> + emms >>>>> RET >>>>> >>>>> ;------------------------------------------------------------------------------ >>>> >>>> I'd really love if someone with x86 assembly skills could look over this >>>> trivial patch and confirm whether it is indeed correct. All I currently >>>> know is that is works for me. >>> >>> emms needs to be called between MMX and float code, as far outside of loops >>> as possible >>> that would suggest outside the for() loops in rgbToRgbWrapper() and any >>> other code using it. >> >> But there is another aspect that the above is missing: Namely that if >> emms_c() is put outside of MMX functions, then it will be called even >> when it is unnecessary. In this case it is unnecessary for all modern >> CPUs, as this function is overridden when SSSE3 is available. > > If you dont like that, > dont call it when its not needed or call it a few hundread times unnecessary > like your patch does. > or write only code that doesnt need emms > maybe there are more options ... > If emms_c() is used as now outside of MMX functions, then a "dont call it when its not needed" would involve a check and would therefore still incur cost for users who don't use this. Also it is unclear how such a check would even look like given that one can use av_force_cpu_flags(). See also 55fc2c5a892c50feb1b9a8f55b74ec6594755ddb. This patch also only calls it a few hundred times unnecessarily if one runs this without SSSE3. CPUs without SSSE3 are ancient today. For the non-ancient CPUs, using emms_c() adds an EMMS. - Andreas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".