From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id C0D3643FB7 for ; Tue, 23 Aug 2022 18:34:50 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1047A68B99B; Tue, 23 Aug 2022 21:34:47 +0300 (EEST) Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05olkn2083.outbound.protection.outlook.com [40.92.89.83]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C3BD868B926 for ; Tue, 23 Aug 2022 21:34:40 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bjpoVvjEcq+kkj1liA9MD11yU2Ys5wXrYiwieb3DMcfLKcs3vFQApfSWRQhBrq0ReoVA7tgjEoOGD3YxXSaUMfWnO7yunwHHua67fnRKm4+frUX5rnbUR+7sFGv17I+4jsCyO3p6o6YuFjYmqT9/h08MdXPiiw0pOmJ0AFZLgvE/hBzK/RTiAFPHlwfnRxpoDPl6eOPpAJpmiwh9ISnkZxJA6hfraYK2Fo00EOgtRwtSxu2xk6AwT2ExsXpIWdgmRjBerS6iYXgjsVzErbXvTDlGsVfiZ2mAIqeDqYBwtbfrc9TMgxVuPQuaSv78iCbQRUFTm/oKmZL3FNiAfNNUiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=K9ownH2jhBuWG5eXjJysZn6C/bXLprdKkqHi1YQHkSM=; b=fCbp+m5nbqIjun8SvOaNubw35p6Oue15XVSftUoAZZNJHhySReo/xzD2GrOEzQGGDoV5mUOsHwYNdbYfdJYsCDpZ3j5tgBYpfT6jtkNNrSWvsoFkNOhTyPZtgPCf4wdd2SGTCfIs9E4b+6BW75V5K31SUImjgDhPaDNnIIBkWR2uu4lbe8Omtb2aThstdtbDzu1qInoQUSxWFhx7PxufSt5JHNsL8e8Wa2rMQwT/RH0grQQKtEl/wBpbTgTVYPzdILXEfWamWNymQYRnCZySxzXOlhBXvoRAp2cQVfFhbGNhLYPAOqKkwgqGbNPnsYH1DHkhSDP12z76Fsc8yhOjdw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=K9ownH2jhBuWG5eXjJysZn6C/bXLprdKkqHi1YQHkSM=; b=W9kC1JcuKh5OGcLaBF5Psgwrw/0/+oRPZFXOb4ukVADeVHPiH5cy8OO1AmNi1wA0lSy5Eql40UjNWC7Q7H4TLbRez6Rp5MDGjrjmh161xfMlOmSmOnV+PAMEnM8zHgUTrT+tKAcnAprag21bz0r13EYLVyGkgiLVvWU9X6hHIzaVE2KiaLAdg7MTpB8b3EgajouS93omyXQ2UTYYVq9Km9TaRz5319QI/xrMZgTjZrAxJZSMNZBw4pe25QIjQPP93sI0wTOjTip5S71GMizj4s6AWtJmJnTLjbQQUTFf5COp4YP9gGHZoWFHaSbvSdl/HZ37wNUgyW0kvkKU3dQuFw== Received: from DB6PR0101MB2214.eurprd01.prod.exchangelabs.com (2603:10a6:4:42::27) by DB8PR01MB6042.eurprd01.prod.exchangelabs.com (2603:10a6:10:e1::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5546.19; Tue, 23 Aug 2022 18:34:38 +0000 Received: from DB6PR0101MB2214.eurprd01.prod.exchangelabs.com ([fe80::210e:b627:bcc9:8c46]) by DB6PR0101MB2214.eurprd01.prod.exchangelabs.com ([fe80::210e:b627:bcc9:8c46%11]) with mapi id 15.20.5546.022; Tue, 23 Aug 2022 18:34:37 +0000 Message-ID: Date: Tue, 23 Aug 2022 20:34:35 +0200 Content-Language: en-US To: ffmpeg-devel@ffmpeg.org References: <20220823154215.GJ2088045@pb2> <20220823175112.GK2088045@pb2> <20220823182217.GQ2088045@pb2> From: Andreas Rheinhardt In-Reply-To: <20220823182217.GQ2088045@pb2> X-TMN: [5K1kAXnqV6use4FuVQAAPMOM46QLUkyL] X-ClientProxiedBy: ZR0P278CA0069.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:21::20) To DB6PR0101MB2214.eurprd01.prod.exchangelabs.com (2603:10a6:4:42::27) X-Microsoft-Original-Message-ID: <53fd69ab-421a-d6be-dbb3-ff9c916f43a6@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 348c6ba8-b0a2-415c-572c-08da85362234 X-MS-TrafficTypeDiagnostic: DB8PR01MB6042:EE_ X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: sahRKLD0Ncw4lH+WvMD6Mz+N77tF8yzRX0eS+2OMOFL8sgdFA+olOU13R5Fw+6lxWJp/6MtJKcGu1VDWWACYCtZImO+lFJKOwOV1qvSVSkTIbsJhFse3sqyqJM77DH5gYR+fNpHpqC3DH+jIIUWVIaNKFMgl0rGlnsdfroJ47LwQA9Mf8xF8VSYsybBuUFVJjedexH84j4klj+EzEqAqh+cjmaAfeTXwaLlwCd4QmrVlUsIy9IeGc0xzLrhf176nPajllyQJtb85jvy/5NURtCIQTeKlzbdOm2o4y+nYuPj7hHmtxqfMSdavtJV+mii9lWW5ddKVYqbCrovBm+yyq4KoI4PaKkI47YSw60fwpc40LXf4uYu4+3+FKr1Npro89OpXv9omxqlA/ToZlVc4wVikB5AqdZMdBLRwd6t1w7a7c5DiC/EndP/WZtbS07g4/95mU78CF6IDqcE7eFZlry9ho7TXvU5PEmEVZh9iLDc04MOlRRw5/PBduJ74XVto8NjCiwlxgoW4b6OorapNVuYIhfrBMy6W2qSLy22aTz5WViiPV4rOs1uVqFgs1aAo4f/9eJKN0asIkXJkAW01h1Lx3EFPS0D/XYJe4psiAa7mNufhLFQytmbOgKb2eRJi/y84MWVriPfgTb6WynTVKVhfhBXd9rgujVn8vy7Rvz3KWtPS/GiRwzJB40EnMjab X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?T3JUdW8rUCsySS9IVlZrdXlKQkNmVTdlMUNudXJkMVoxK3hDQk1US2JHMzhQ?= =?utf-8?B?YnhBR21KUVZCUkZ1bW1sZHc4R2RWRVdtRVYxR2ZNcVkxYTZnZ1hLU2tRSFZp?= =?utf-8?B?cm5EcGJ0eUR1cUpYZkd3Y0NmTG5QZ0MxZDdyYXo2bTdlRDg0VE9INUdRcFY1?= =?utf-8?B?UDljZlFPN2lyV29lRG9nWkpDbFNXcFFXODQ5OXpGc0EyUXBCaWlGWGFMR1pu?= =?utf-8?B?SUl2dmlMdUphTXo1S25iVkp0VGZ4NE4zUGJzdFVVblJxd01DL2l3NURCWFNH?= =?utf-8?B?UXdCM01SQzdjbldtTE01NmZiU1hpc0RnanM4UW95V092aVcrQnhBeGVnMFY0?= =?utf-8?B?azVHMyt1Wk14TG1sMWFROXNEWVZaR2Nkdk1DV1J2NmRNcnVoQkJSeURXUDFN?= =?utf-8?B?R0hZeXIvZm40VU85S0srUXA0eWdqVU1sbTNuRnd5NjB3TWJWY3AvVnhBT1Vy?= =?utf-8?B?RjFtT3Y3VTJ5ajRTa0poTXdzYjhuUC9BS3BVZ1A0enBnQ0RXQU01aklmajN1?= =?utf-8?B?MzRWMWUvZjZuU1NxWWZ0eFlyR0c2Wjg3bXd6L0NSNWZ4SVVheEpLMXBUbWdD?= =?utf-8?B?ZjIzVUlyZUlkVTJyYzUyc1F5a1BNZkMxU3F3bmdEd0Y0MGsyNHhaNGlLdU1y?= =?utf-8?B?UGlHbWlRV2gySXdCaFg1cHhxREN4Y0trWFFoOFAxQmxuYUlDdnd2cnMvU0Q1?= =?utf-8?B?ODdkQWYzTWs4K1dxQnNxU25PdVZxRkRXRS9XVkxWZDRKOFJoOG1MZ2RzWngx?= =?utf-8?B?dm9yQWhvSWQ2dmVFZFZ0bmFEby9QNUp2TS84TkZVc2N1bmJXUnRoRTJLSTdq?= =?utf-8?B?L1JJUDROUElHM2NGWjNISjBKWjEzVFErUWpnbmZhRnJoRWRNZWFicllBc29S?= =?utf-8?B?clhLSU9VdytERnFoQ01Zb3huMHNET1lzM1ZRVzVwblcvZ1ovYXVqN0NCUU9F?= =?utf-8?B?UGpMcjc1SFBOTVNXZjhmdktsWGJHMHFWYWJxclhwR1NseDZPNmhHa09JVlVy?= =?utf-8?B?dGFTcDJhdmVLU0M1TXF0M2lCS0FGNDc5TGNIKzZjblMrUFJMU0d5N1dJVDRQ?= =?utf-8?B?QmdwQUtJQ20yTlNxK1hlNUlBU0l5ZHNpTi93RDBRT0ZnNHBoMkJiaUdCWlVo?= =?utf-8?B?NEI1ZVZkR1Q1UndYVFZHS25HSkpuWGNKTnRwMmhod3lBbjhaTVJxQ04rYlN3?= =?utf-8?B?UU5EUWlNTG91TTNCMGdUTlpwSzc0aDRJZW56Q3o2S2xDLy9nVkVROE96cE9B?= =?utf-8?B?UmhqYlRiNi9XSE9UcTU1aDlkOE1RZHJ6Q3lHeVFSK2s2cVNEcys2Z1JyUEhs?= =?utf-8?B?MUx6WnZ4bm4zc3JLOWlmNU4yZE1hWlU1LzlybERXdTVQQ2J6UkJybmJSdjRw?= =?utf-8?B?ME9udmo5YmY3RVE4eXAzUmRvaE5QN3JvQWdtQkpVV3FxSWFORzU2eG03Tlh5?= =?utf-8?B?YU42OWpGTUtOOGlxUlJKa2JReUNEQXlLMXd0UVVwckIvVHBIMWx6OXQ5VmdR?= =?utf-8?B?QzkvL1VBaHc3djYxQnh0aDRvSExCU1k1Ty90VjkyY0FTSWh4YkhXc2FUeFdT?= =?utf-8?B?MHRwNDl1eWtVa3dPTkNrME1OSmR0bjlrekI1QzlRZWpOcUZLblZYR20vWXd4?= =?utf-8?B?U0tnS0tDSUhEeHhZN0wvZ2xOUFUwdUdzcUhud1pLVG9uVVZ3UlpPYnlYRFB3?= =?utf-8?B?Rkk0LzljY0UybTBwMDBYR2ZmVlo3Rm5sZWw1TmwzMlZ6NnRhOEwraFpzaUIr?= =?utf-8?Q?gkOlyr6h2+GdVKFAK0=3D?= X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 348c6ba8-b0a2-415c-572c-08da85362234 X-MS-Exchange-CrossTenant-AuthSource: DB6PR0101MB2214.eurprd01.prod.exchangelabs.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Aug 2022 18:34:37.7898 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR01MB6042 Subject: Re: [FFmpeg-devel] [PATCH] swscale/x86/rgb2_rgb: Empty MMX state in ff_shuffle_bytes_2103_mmxext X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Michael Niedermayer: > On Tue, Aug 23, 2022 at 08:09:09PM +0200, Andreas Rheinhardt wrote: >> Michael Niedermayer: >>> On Tue, Aug 23, 2022 at 07:28:19PM +0200, Andreas Rheinhardt wrote: >>>> Michael Niedermayer: >>>>> On Mon, Aug 22, 2022 at 11:59:17PM +0200, Andreas Rheinhardt wrote: >>>>>> Andreas Rheinhardt: >>>>>>> Fixes FATE-failures with the the filter-2xbr filter-3xbr filter-4xbr >>>>>>> filter-ep2x filter-ep3x filter-hq2x filter-hq3x filter-hq4x >>>>>>> filter-paletteuse-bayer filter-paletteuse-bayer0 >>>>>>> filter-paletteuse-nodither and filter-paletteuse-sierra2_4a tests >>>>>>> when using 32bit x86 with CPUFLAGS ranging from "mmx+mmxext" to >>>>>>> "mmx+mmxext+sse+sse2+sse3" (the relevant function is only overwritten >>>>>>> when using SSSE3). >>>>>>> >>>>>>> Signed-off-by: Andreas Rheinhardt >>>>>>> --- >>>>>>> libswscale/x86/rgb_2_rgb.asm | 1 + >>>>>>> 1 file changed, 1 insertion(+) >>>>>>> >>>>>>> diff --git a/libswscale/x86/rgb_2_rgb.asm b/libswscale/x86/rgb_2_rgb.asm >>>>>>> index c695c61d5c..76ca1eec03 100644 >>>>>>> --- a/libswscale/x86/rgb_2_rgb.asm >>>>>>> +++ b/libswscale/x86/rgb_2_rgb.asm >>>>>>> @@ -104,6 +104,7 @@ jge .end >>>>>>> jl .loop_simd >>>>>>> >>>>>>> .end: >>>>>>> + emms >>>>>>> RET >>>>>>> >>>>>>> ;------------------------------------------------------------------------------ >>>>>> >>>>>> I'd really love if someone with x86 assembly skills could look over this >>>>>> trivial patch and confirm whether it is indeed correct. All I currently >>>>>> know is that is works for me. >>>>> >>>>> emms needs to be called between MMX and float code, as far outside of loops >>>>> as possible >>>>> that would suggest outside the for() loops in rgbToRgbWrapper() and any >>>>> other code using it. >>>> >>>> But there is another aspect that the above is missing: Namely that if >>>> emms_c() is put outside of MMX functions, then it will be called even >>>> when it is unnecessary. In this case it is unnecessary for all modern >>>> CPUs, as this function is overridden when SSSE3 is available. >>> >>> If you dont like that, >>> dont call it when its not needed or call it a few hundread times unnecessary >>> like your patch does. >>> or write only code that doesnt need emms >>> maybe there are more options ... >>> >> >> If emms_c() is used as now outside of MMX functions, then a "dont call >> it when its not needed" would involve a check and would therefore still >> incur cost for users who don't use this. Also it is unclear how such a >> check would even look like given that one can use av_force_cpu_flags(). >> See also 55fc2c5a892c50feb1b9a8f55b74ec6594755ddb. >> This patch also only calls it a few hundred times unnecessarily if one >> runs this without SSSE3. CPUs without SSSE3 are ancient today. For the >> non-ancient CPUs, using emms_c() adds an EMMS. > > do whatever you prefer. > The best solution depends on assumptions. > The impact is biggest on old CPUs where EMMS is also a slow instruction > But as you say these are ancient today. very small impact on many vs > small to moderate impact on a today rare setup > the worst is if the bug is left open and time is wasted on bikesheding > Given that Lynne already approved this on IRC, I have already applied it as de33506e4b3e3362095aab167ad8bb87c1bd9488. Several of your FATE-boxes are now green: E.g. https://fate.ffmpeg.org/history.cgi?slot=x86_32-debian-kfreebsd-gcc-4.4-cpuflags-sse Rejoice! - Andreas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".