From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 44A884341F for ; Fri, 10 Jun 2022 00:01:25 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 094D568B922; Fri, 10 Jun 2022 02:57:22 +0300 (EEST) Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-oln040092073083.outbound.protection.outlook.com [40.92.73.83]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DA99968B918 for ; Fri, 10 Jun 2022 02:57:20 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=X+N3wU0ayNdWjiDtvw8AEMkxdgrPbhynLLPSmIR1IDGR18IfmA4cBI5JbfXKyMafm82G0ThZSSNKsjtdVC4FEa6u702ib9FctOeXPVDLPaK3KzEo7V2fGBt2Q7C4pGuUjm848Eo4jho+NMTy/97AAKfQopni0f6GvLXcYmWJEYi8N+GAKOTM2TcHngshgws4ZBXb5pi7ovapk6zl0uq7DRWf+MfHecdPnB5zmx7y+3n+QAxyXY6MRHj51eD4oA4P82XODsSpyzq2LAQfe1DxPOSVrYfwKGtjkvedt73khxvMpFNyjUJ/qzv2Wz3riIUA+FVNM7KQ36nwdonqgnzPKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=va4Nk39vhctx231QYDHz2inDcteFwM+2+mfcx+GiTpk=; b=ByHWvF00RGYHiD6MIzVSasXOrhJksFCwdyQpr7w8LrqvugWUor+bSrGQZ8tdqH3RHp3gXBdcRoO9YewQnfNl8gFGe6dnzBonVCVImj1g0l7yuW0sWnbg9oMVAE3Hxprj4CzMpvD3TeZI48zDIbkgL8wAXKl+tTZOyzGRu/6mYU+C7wtY9eIBNoiC0ojWkzMkqVItqQ2BGGYb71b6+fb6u2qdpFXnKgKMk7Ow/V7EARSq7hs1MSfhrfaIiI8YYEyO/lInR2QOt5vSPrSNSbV9lDIDaT60vl01nM29nTydcz2Fe2HtOP8XUb2kMIopGWGCpYczP3LPbgHfFxYbQrpOvg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=va4Nk39vhctx231QYDHz2inDcteFwM+2+mfcx+GiTpk=; b=QzM0XiJp97Q4AjkQR/7LAHwETjwE2QKCSFsRMkppMwScJBUG7Y7yrUdmH4LWQVCxZa0xmb92TSIXrPViKUz5TVtPl7tSr1MHnVq2anyfG3oC7ViPFjs7XzS02HtuZz7MntY0i0GwIkxunmEvzzsbqh9ktSshiKGg4l7FiJNJyl89UawFHaWCwWAclnCQ4e2vqqEOjmaopor1QZE1GMXQDQ6xXkAHZf6jF4zLUQqZ2cDFOYHUBc0noB7IRkHyI9/SqO54di+YtNisrnuXdR37N3WGncsmItTbYNVZO/yxNO8xmguvUvPdH7AxgUkEIFmVIqzDf6HkTDvf9HoHK+SBOQ== Received: from DB6PR0101MB2214.eurprd01.prod.exchangelabs.com (2603:10a6:4:42::27) by DB6PR01MB3862.eurprd01.prod.exchangelabs.com (2603:10a6:6:48::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5314.15; Thu, 9 Jun 2022 23:57:19 +0000 Received: from DB6PR0101MB2214.eurprd01.prod.exchangelabs.com ([fe80::60b9:9f29:40cc:f01c]) by DB6PR0101MB2214.eurprd01.prod.exchangelabs.com ([fe80::60b9:9f29:40cc:f01c%10]) with mapi id 15.20.5332.013; Thu, 9 Jun 2022 23:57:19 +0000 From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Fri, 10 Jun 2022 01:55:15 +0200 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: X-TMN: [kWR1sGgpDfHoDZHESZJhKrOcSFLtLZV+] X-ClientProxiedBy: AM5PR04CA0005.eurprd04.prod.outlook.com (2603:10a6:206:1::18) To DB6PR0101MB2214.eurprd01.prod.exchangelabs.com (2603:10a6:4:42::27) X-Microsoft-Original-Message-ID: <20220609235523.458689-33-andreas.rheinhardt@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 08cbb10e-d1cc-48d9-25df-08da4a73c985 X-MS-Exchange-SLBlob-MailProps: S/btQ8cKWiTijo6adWu98SHbtEySq1lKMY/u9/dCHeuGrjcp4sPm3/48ClZ85alLW4s34KqJoBpECgFDZ90KI0/EN2oz+PktlHal8/oFiE/P3+KzlS1O3z8fzYHeSiRZc7krNsJ7O+l3wQSOfBaWg/JrwAaGMJIN9UDF3EHC6K3vWpClXI+grN3JJ1acxa2dy0mReN86P2XXQ6p3vSc9LyvGOfcqfNzFz5A623bQhK1h7AzJM8R55EQLDbr1e4MuQ90ElJ5T0fPv0864arARjAIngw0ZXA+62jwGQbN3fVsXoWCxC6Z7xUcGZ3VWSoTkxCOlhN6/EN6yQ3lpVtrcUgVDqMSjpCy3gm4IqyACV9TZAfoQOySB3GTffYXAI+CMNYqX3e6nNIzx+rMQ61jILZJM+psRf3cb8ARZStAZnFWe6zTGPbHrdDnJwryiqm5g1VG100PA4a1I+67vR8kHyFTVeMEVPEdPoipnD5d+5HTslfli1E1zAVQbkde4Ujvlc/zANQVH9Sbph8N48SGIoUxE2AwRpNGLCZExxSI6VKIyXnKyidV2HCwyufQIJbHheL3qauqkI5X3+Ebl3Qqu+YjeQ0+dkUrioB085eBH75czrDinl9EAdOJiPx0Nd4esnqL3fDj1cgjvy/jBvpazzEwbBb9kY0ExnXO/WdYzeyoSw7v5MHUeWiM6epq3lAUWVdfmOYeYwXw19wcKXsLumqUQ6PUmyeMchDByFsMKcWN3Ff5PIRTKgJcGNcXijxYdz5vjtYQt2nA= X-MS-TrafficTypeDiagnostic: DB6PR01MB3862:EE_ X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: vippA7G3rbayZa5+BVve4vZ9+C1V/PMK+ehT3VAWpwl2mgu2q8vM59528IZCmHXeijD7bNVTVr8IFiGEiYjT64u+qr91aKjKod8iwIA2wjQF5rVnVayrApp1rwlmQopknOzfo77hR9FAzhp6IJbEywmrKNjy3EhUgrFEr9abgkcvc1ie+unXah7m+FY9EmBqAz0xDxRw1EAHjxZGlLtLEwh2xuhBywZvL/1Fe6+t0Z7nTBmvm+zz9sefiHgYy2mR5MFn+exHkkiWhQv7AvJjJGGsRnhFcYk5q3bmbZs5hgBsXkaLiHoB8aBjyGphFD1kxMDf/27PWBsm0+PdovJ8MYWWWGKGJjZrt52NyjnTRZnDaldyc3BMvQytFb3SR1mM/HrlLvucB9DYvzFbj8shPPWVCKBAs6Zmbd7av1gKq7Hx0K8gdijvHnQERIZwlVa6blKFe3GWV4PuajIkrV2qdrR2J53I9ljbGJxOaZfUqr9zd7P1P/MjIzeASilPL/dobTl/u9VfEyRnyPks4Ui33YkTMv2jMcW/Kl0w9wrT/l3+CyXCHx8uYVyzQHgSMdRb3/sIDipL5fL3kJZnAxvNkg== X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?Evu5BfSjrhJj7oYwsn7OVmN9uMRjfQ69U+TVggM0dTnwpKiTRk9I5uUJAFZs?= =?us-ascii?Q?BQO0cAPEFltoTLwvSsdeiybpgsCpbAW0QUEjwgJ2vkgUAZuq9zjNfcnp8Sf8?= =?us-ascii?Q?r+0NWIM4RP1zi0W8AQsAkQIOWME5jDvYChsCnxMVusDL5gCd6bcJCWUjLgzX?= =?us-ascii?Q?ye8blcZrEblfdH+MSkAs7wN3aD6BJmANERkbV/JwZB45sBf6S+5U3AFFZAtk?= =?us-ascii?Q?Ea5708l+PDYKxd97Vd2p5g1nkzci3ZvmaoQDUetVzrb5EizayOESUsNtW+dx?= =?us-ascii?Q?+kzMyc+mP24+jbV9GpUPrV0MadGw9hjbyUFdp6pfowzw6HH4Ix4xo2t0jHXs?= =?us-ascii?Q?4gw7G1JD6Z6DRwZc/7mSpTVj3wZEUi15r9CfY2fdXg9Tespwxhl0ORfm6gCT?= =?us-ascii?Q?yfb5YGdaaSRhMUUhucLx1DNEKxdLr/b9iFwYdh5L4Fiq4mvdSDM4O+GqaP0N?= =?us-ascii?Q?ZhE15alVHNr9DiTcJjmjSeurmenSA4KXuL0FoayO8LIJkfweWGEff65RtAsp?= =?us-ascii?Q?GKnFVgaChVaUjAo0mpfAq2/ou8xwDQkvEWonreZBYStxcR9xwECfrioVtOxJ?= =?us-ascii?Q?MQ+cW3JFZ19Md6plRDkUawL6kDTX+R/4gALvjcXc1eacHK+VBepCz3q5ACdS?= =?us-ascii?Q?MpLeKdsK57RlIHTxnncdqPUN4OrKuKl/hi3pPrBkghkrvO/8lzmRP8hChSe7?= =?us-ascii?Q?tbEZjneVEZxgRttA1zWWdhokRMNXnAXM1dGPhAtpucHHqiscLG9kgXVe0VTO?= =?us-ascii?Q?PGI4+X0jWdwCMF2MrQsG+ITKBcqG9rDqvk39eDy42GZyk55vZeUh4MM2KaOi?= =?us-ascii?Q?WhgUlSJ52tDnGUOBw8x1F4Ldy6n3Ra18EN1AQc3t/1PRgLdhaP0IJJdxJda2?= =?us-ascii?Q?2zFz2DlMEbijQvL3T5GlB5Ah7HfH9ShFD0inzI5QJS8tfsudW88SD4uo57nY?= =?us-ascii?Q?YomEK2YLUpOEyc96nzuTlUUTMY8jbBa+sCkxvKRO5/P50xYBmt1ukJUQM8fh?= =?us-ascii?Q?Ql2bp1UJm796+JuFF+f6Z5h8tUzbIVNYMRZRWAM4p46q9uAQz9GG9LoKsY0l?= =?us-ascii?Q?KzaPCURpFs7NlI1xc4p2lw0SN57Rq/m2YpJr/TK/1Yh2xy3SW++Rs4APRjPr?= =?us-ascii?Q?+9PIWaID3V6cejz5XtDNEBjHjE4s4GxZF5fql4siKkKNOY3pJmsQ0VG7tVyi?= =?us-ascii?Q?2YkXN2uwuq9B6gyX5Kk4HTODB77gA2CWxLowS7sHZJLn7vvpftIizuHFmo0t?= =?us-ascii?Q?RkWM02j6v78buvZ1emWmaGbT4bC8rckenM1rnjPVGd4No9rl2srF6VGEZv8C?= =?us-ascii?Q?XQs2OqK0yHZSiiCPYJ5Opd85SnIyKx5wAFlH4cZ6ZwbvE9L7zFK7wvCoHVKA?= =?us-ascii?Q?Ymkb27ZKm5k8YHEKkGS7qEigPAYg0DyCp5A3ZFWsTFyeokm3tK61ZL+vRxzE?= =?us-ascii?Q?BPI0TMRwwop8z38Uc/GdLXltD8c+T5MIr+IQ3kCsrbQRrkhSh1kdGA=3D=3D?= X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 08cbb10e-d1cc-48d9-25df-08da4a73c985 X-MS-Exchange-CrossTenant-AuthSource: DB6PR0101MB2214.eurprd01.prod.exchangelabs.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Jun 2022 23:57:19.1487 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR01MB3862 Subject: [FFmpeg-devel] [PATCH 33/41] avcodec/x86/h264_qpel: Disable overridden functions on x64 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT, SSE and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2). This commit therefore disables several MMXEXT functions (that are overridden by SSE2 functions) at compile-time for x64. Notice that some 10-bit SSE2 functions are overridden by sse2_cache64 functions in the same code block. This is suboptimal and the functions that are overridden should either be removed or the sse2_cache64 functions be put behind suitable checks. This commit does neither. Signed-off-by: Andreas Rheinhardt --- I would love to get input on what to do with these sse2_cache64 functions. If no one says anything, I will send a patch that retains the current behaviour and removes the functions overridden by the sse2_cache64 functions. libavcodec/x86/h264_qpel.c | 44 +++++++++++++++++++++---------- libavcodec/x86/h264_qpel_8bit.asm | 4 +++ 2 files changed, 34 insertions(+), 14 deletions(-) diff --git a/libavcodec/x86/h264_qpel.c b/libavcodec/x86/h264_qpel.c index fd1070247b..cb5f8a126c 100644 --- a/libavcodec/x86/h264_qpel.c +++ b/libavcodec/x86/h264_qpel.c @@ -236,7 +236,11 @@ static av_always_inline void ff_ ## OPNAME ## h264_qpel16_hv_lowpass_ ## MMX(uin #define ff_put_h264_qpel8or16_hv2_lowpass_sse2 ff_put_h264_qpel8or16_hv2_lowpass_mmxext #define ff_avg_h264_qpel8or16_hv2_lowpass_sse2 ff_avg_h264_qpel8or16_hv2_lowpass_mmxext -#define H264_MC(OPNAME, SIZE, MMX, ALIGN) \ +#define H264_MC_C_H(OPNAME, SIZE, MMX, ALIGN) \ +H264_MC_C(OPNAME, SIZE, MMX, ALIGN)\ +H264_MC_H(OPNAME, SIZE, MMX, ALIGN)\ + +#define H264_MC_C_V_H_HV(OPNAME, SIZE, MMX, ALIGN) \ H264_MC_C(OPNAME, SIZE, MMX, ALIGN)\ H264_MC_V(OPNAME, SIZE, MMX, ALIGN)\ H264_MC_H(OPNAME, SIZE, MMX, ALIGN)\ @@ -372,13 +376,9 @@ static void OPNAME ## h264_qpel ## SIZE ## _mc32_ ## MMX(uint8_t *dst, const uin ff_ ## OPNAME ## pixels ## SIZE ## _l2_shift5_mmxext(dst, halfV+3, halfHV, stride, SIZE, SIZE);\ }\ -#define H264_MC_4816(MMX)\ -H264_MC(put_, 4, MMX, 8)\ -H264_MC(put_, 8, MMX, 8)\ -H264_MC(put_, 16,MMX, 8)\ -H264_MC(avg_, 4, MMX, 8)\ -H264_MC(avg_, 8, MMX, 8)\ -H264_MC(avg_, 16,MMX, 8)\ +#define H264_MC(QPEL, SIZE, MMX, ALIGN)\ +QPEL(put_, SIZE, MMX, ALIGN) \ +QPEL(avg_, SIZE, MMX, ALIGN) \ #define H264_MC_816(QPEL, XMM)\ QPEL(put_, 8, XMM, 16)\ @@ -397,7 +397,14 @@ QPEL_H264_H_XMM(avg_,AVG_MMXEXT_OP, ssse3) QPEL_H264_HV_XMM(put_, PUT_OP, ssse3) QPEL_H264_HV_XMM(avg_,AVG_MMXEXT_OP, ssse3) -H264_MC_4816(mmxext) +H264_MC(H264_MC_C_V_H_HV, 4, mmxext, 8) +#if ARCH_X86_32 +H264_MC(H264_MC_C_V_H_HV, 8, mmxext, 8) +H264_MC(H264_MC_C_V_H_HV, 16, mmxext, 8) +#else +H264_MC(H264_MC_C_H, 8, mmxext, 8) +H264_MC(H264_MC_C_H, 16, mmxext, 8) +#endif H264_MC_816(H264_MC_V, sse2) H264_MC_816(H264_MC_HV, sse2) H264_MC_816(H264_MC_H, ssse3) @@ -499,12 +506,16 @@ QPEL16(mmxext) #endif /* HAVE_X86ASM */ -#define SET_QPEL_FUNCS(PFX, IDX, SIZE, CPU, PREFIX) \ +#define SET_QPEL_FUNCS0123(PFX, IDX, SIZE, CPU, PREFIX) \ do { \ c->PFX ## _pixels_tab[IDX][ 0] = PREFIX ## PFX ## SIZE ## _mc00_ ## CPU; \ c->PFX ## _pixels_tab[IDX][ 1] = PREFIX ## PFX ## SIZE ## _mc10_ ## CPU; \ c->PFX ## _pixels_tab[IDX][ 2] = PREFIX ## PFX ## SIZE ## _mc20_ ## CPU; \ c->PFX ## _pixels_tab[IDX][ 3] = PREFIX ## PFX ## SIZE ## _mc30_ ## CPU; \ + } while (0) +#define SET_QPEL_FUNCS(PFX, IDX, SIZE, CPU, PREFIX) \ + do { \ + SET_QPEL_FUNCS0123(PFX, IDX, SIZE, CPU, PREFIX); \ c->PFX ## _pixels_tab[IDX][ 4] = PREFIX ## PFX ## SIZE ## _mc01_ ## CPU; \ c->PFX ## _pixels_tab[IDX][ 5] = PREFIX ## PFX ## SIZE ## _mc11_ ## CPU; \ c->PFX ## _pixels_tab[IDX][ 6] = PREFIX ## PFX ## SIZE ## _mc21_ ## CPU; \ @@ -543,11 +554,16 @@ av_cold void ff_h264qpel_init_x86(H264QpelContext *c, int bit_depth) if (EXTERNAL_MMXEXT(cpu_flags)) { if (!high_bit_depth) { - SET_QPEL_FUNCS(put_h264_qpel, 0, 16, mmxext, ); - SET_QPEL_FUNCS(put_h264_qpel, 1, 8, mmxext, ); +#if ARCH_X86_32 +#define SET_MMXEXT_QPEL_FUNCS(PFX, IDX, SIZE, CPU, PREFIX) SET_QPEL_FUNCS(PFX, IDX, SIZE, CPU, PREFIX) +#else +#define SET_MMXEXT_QPEL_FUNCS(PFX, IDX, SIZE, CPU, PREFIX) SET_QPEL_FUNCS0123(PFX, IDX, SIZE, CPU, PREFIX) +#endif + SET_MMXEXT_QPEL_FUNCS(put_h264_qpel, 0, 16, mmxext, ); + SET_MMXEXT_QPEL_FUNCS(put_h264_qpel, 1, 8, mmxext, ); SET_QPEL_FUNCS(put_h264_qpel, 2, 4, mmxext, ); - SET_QPEL_FUNCS(avg_h264_qpel, 0, 16, mmxext, ); - SET_QPEL_FUNCS(avg_h264_qpel, 1, 8, mmxext, ); + SET_MMXEXT_QPEL_FUNCS(avg_h264_qpel, 0, 16, mmxext, ); + SET_MMXEXT_QPEL_FUNCS(avg_h264_qpel, 1, 8, mmxext, ); SET_QPEL_FUNCS(avg_h264_qpel, 2, 4, mmxext, ); } else if (bit_depth == 10) { #if ARCH_X86_32 diff --git a/libavcodec/x86/h264_qpel_8bit.asm b/libavcodec/x86/h264_qpel_8bit.asm index 03c7d88f8c..72e98248d8 100644 --- a/libavcodec/x86/h264_qpel_8bit.asm +++ b/libavcodec/x86/h264_qpel_8bit.asm @@ -461,9 +461,11 @@ cglobal %1_h264_qpel8or16_v_lowpass_op, 5,5,8 ; dst, src, dstStride, srcStride, REP_RET %endmacro +%if ARCH_X86_32 INIT_MMX mmxext QPEL8OR16_V_LOWPASS_OP put QPEL8OR16_V_LOWPASS_OP avg +%endif INIT_XMM sse2 QPEL8OR16_V_LOWPASS_OP put @@ -581,8 +583,10 @@ cglobal %1_h264_qpel8or16_hv1_lowpass_op, 4,4,8 ; src, tmp, srcStride, size REP_RET %endmacro +%if ARCH_X86_32 INIT_MMX mmxext QPEL8OR16_HV1_LOWPASS_OP put +%endif INIT_XMM sse2 QPEL8OR16_HV1_LOWPASS_OP put -- 2.34.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".