From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 61F7F4328E for ; Thu, 28 Jul 2022 01:02:35 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D958D68B900; Thu, 28 Jul 2022 04:02:32 +0300 (EEST) Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-sn1anam02olkn2094.outbound.protection.outlook.com [40.92.44.94]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 978BC68B8CF for ; Thu, 28 Jul 2022 04:02:25 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=V/iYxgIccIWyrvtQBQgtD15CxofZiqox93eRKtx7+ye9Gln0QUtc0AwxaZApcNRtvsP9MlOVWShPk3dFIoMjRJvjpcB53xQsseL1Ptpp9E5D1sR9NynfQLRhdSqeSvUKp3k96jxCXhzMvrv96KRW5qiKikKazYT4JTRuPRI48P+AW7O4mUEYj39Wo8nH659evZDHkSX0MbY08jCsZmbSfoES8itCPpC1qNQnlAVPpev4SIlSj/GMX5fi8FIa3koMmVCahZ4ctTVYH6vNvgplyBhSJgnUfWBq9EXhrmIKz+AGOgaNY7TccdeIFAU0fWWi+sKmhE8v6KZhcOlb4tKokw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=F6+nwhiDnrKk8DqPCLG8qrhW7lCrQv8PWbBsJUemV7c=; b=MuFIqyoOK1t4WBI+Z0NIo3CkgphZq5y3QHeGZpaJtLHL/r/jw5hBehcFc+Kct8MhVLAY0YSmHu6CDQ/cPJPR9nuO35u7FG59W1PYvt4B9vQBwnb1KmXsbcSKbwl8658AWXYiFG4KBSK/tFWsUd3+kQoq2FyxK/3cX26osUIUFOYVgUTWFqiQTkbX6i+qF4azBPamaL/Wd/WV4cYu+oJ3X8VzF3MysS2IUq//qF6oH8zl6IQmUgag5O6TIDuRcouB2brE+SBm0vP2G3huLMzP4xWMyIHEw2Gs4eTDN7KXvG2mNN7eqfNuDhxDet0nzUXkRvV2FmEjzq3xrqoPc9VxBQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=F6+nwhiDnrKk8DqPCLG8qrhW7lCrQv8PWbBsJUemV7c=; b=VJ8tUpe1HNWqE7RPgxAmYoROgPdRxGbc/z2YCxj4AfWIKHs5nYbBRxE22tx09ZOfrJio1fhxVn+p5RsqwOsmcqbHgaw87MNhTAVN9EN3mMPvV0+H9i6kFfg5fcIYUdhCRALJWMaxCO+g3GCGIpt4UlVxKPgdhF68mgjvoai+4KKTnGnN5q3gBsiMZN8YyU5ervnI7xFzgQ2+mrUWEN64nU+nSkk9/hB4FqHGjXkALSKUzdUqt9jkZevXOWe1XTAbCxbJE+tsWclXfoMSlSM+IhGGNM2MBhee37QSVo3Pyp6Vw0q5L3rIIK8hQoO380pJQ+WXTH2UeCx9HPoCWLhfGg== Received: from DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM (2603:10b6:8:b::20) by MN0P223MB0509.NAMP223.PROD.OUTLOOK.COM (2603:10b6:208:3c7::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5458.25; Thu, 28 Jul 2022 01:02:21 +0000 Received: from DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM ([fe80::d9e4:ced6:ab31:c231]) by DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM ([fe80::d9e4:ced6:ab31:c231%2]) with mapi id 15.20.5482.006; Thu, 28 Jul 2022 01:02:21 +0000 From: Soft Works To: FFmpeg development discussions and patches Thread-Topic: [FFmpeg-devel] [PATCH] enable auto vectorization for gcc 7 and higher Thread-Index: Adih3iCGbsai8yFERGe52y5L0WuErAAAa3gAAAZfh4AABnHWAA== Date: Thu, 28 Jul 2022 01:02:21 +0000 Message-ID: References: <05a46152f1b2458ea326edd9cfb6d817@amazon.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-tmn: [5EzRkiUGD5BMdqGnfSR6e5CsshVAJ7Xh] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 25f8137a-6956-41c1-8cd2-08da7034d381 x-ms-traffictypediagnostic: MN0P223MB0509:EE_ x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: UjMGVdgzxf1FI6Dw7Y+d9LorTLI1E1y1cAiPWlVdUfjrj14S+KX8N/38hJByLyIaC4bYUbECZtzsdm8rZYyzLxEMkO0lkRtFjXuFLVUkMNwaiNh9sFsr97tlcnHJ43DaGRKOyjiU/Gblx1psGF9BLGoxOJgTaQ7XYdXai9L0NRXcsbPZl2MerVg1klkeaQn/dR6Z/MKMKgh3BDaE98q8BrO4Wqllarh7FCHpZ5Drs2+LJGGG6eDjtRodExdTqDOOxNWM7ejcF7ZIzNsBwFjLZcroKZZYs3NZQae5uA1baqnsvwyff9s6UgYu7PMs8SYjqFjIdSSOxQeITBiG3gACfqqhiph0LXwT+VG3p5QMDisDSCU/HTIxFK79Idou9L+ym9cJPFmdzu10bs9jDZb7+mf5zfmVJOFM2JFKFxwIvIPib75/Ji66916cD3UR1mU22Tf3rrSmWYW3krk5RzIosYkIsDht6IghdrNpSxRGmtVvs7RRzTqd3p8TkxBXfbmAJcYlJBvSGTia5okdWPkfnQIzWIFWLSwA3xpwzzyylLMLgD+grDbWDQEnwiA6o/KHO/UYtM/h6EEybm0h3GPL1duuMyFw3OnJhmfpWsmp+THuMCNRsaCeyjkqxrhla5b26E+uEzGdaOiXqicdrlOJsgZOqKX1+ad9hCmBvXuJr5zl7xmlo0FwPPpOSC6Fc1eK x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?utf-8?B?OGw5NStyRFFkdzMwNG9lNFdUSG5RcVRUaTcySkYwYzhkOVRLeWE5WGZuRXJ1?= =?utf-8?B?ejk0V1gvQVB3MmQ1OUVhUkVWSG1qQ2RCempCUXlRUzZLU0FKekNiZHF4VmNO?= =?utf-8?B?OU9UMXM4VG5FQVAvc3VYR3g2dFZsK3BrMW81N00rOENpOTZrNjlsZndJendL?= =?utf-8?B?WmwvRkNuYlZzZjFLMmVVaTRrQldObVRCVm40Z2JWYzM3N0JiSks0NkFza0VK?= =?utf-8?B?ZGxiRmxrSU8xN3VKL2ozUUVrQTdyVGpLY2kzdGNHNURQQ2ZvTStna1RvVUhi?= =?utf-8?B?VXpkbVQwb3Fpb3BmWmNEazhNVlRiQmtHNFM2WkNYUlB5aWlieHVoNUlLanR6?= =?utf-8?B?RGFHaXdiQ2xQclJIODBaVG1rYUZoVGpVTFc4TVFPaldPSkRSUHd2eXduMUxI?= =?utf-8?B?djl6UUZYdE9NZjJBVTFUNUllWDZZbTZjRk0ycUFPaXlKRUhVSEhNYmJBZm1q?= =?utf-8?B?Vis5c3NwYUpOZFJST3RBWlQxOHkrbG92eXVlK3dPdVFzVkFuV1Bybkl5RzdD?= =?utf-8?B?QVJFc1RnYnRJeGJEVGp4c0tDQnhlbU5WcUR3dUtVRlhPbzM4dWJwSGE3ZlFH?= =?utf-8?B?WEtoYy9mOGlLR3V3ZmpISEhicnE4cGc0OHhucXdjQnp3Z0k2b2x3bVdsK2hL?= =?utf-8?B?aHhGSDNaVkVUVFU1dDFhRCt0eG8rQTF6MUdIVDlQMVNEMjR3K3VhVTNPUnQw?= =?utf-8?B?SktMeXJ3SCs0SkNHR0h2Y3BpVmZVWU0rTUZjU3lCaFc3cmM5L3BBS0RYR0Nr?= =?utf-8?B?VGRYdTZKME0vc1A1RDBRalpyYVBrTmFWWnBlRWFNSWV0YS9qcVkwWjk5NXpY?= =?utf-8?B?blJZMmd5QWo2a1RxNmU1VzZuaHEzS0ZWcUg3dVNmWXFyUm5kZWZvS2RJZXA2?= =?utf-8?B?ZkVWcGw1T2EyQnczbmh2aEV5Y3NkanJoZ3FNK1h2REpTajhDN2YxL2Z6Ly81?= =?utf-8?B?VVBZays1VFNYejU3SzdnS1c0SzN2aWppNkVKUTdMdFZ6YmNUV0psQUhwQlM0?= =?utf-8?B?U0VwWFQweHRraUs2Q2Ntc01uZ3Q3cElsRGZyMGgzNHpyNVhQU2MvWWxYMnVG?= =?utf-8?B?ZlNkZy95NHpCSkswZGJqZlpaRTZjMHRnaHdZUUMrdjF1YUZ0YXM5VDNZV2Ux?= =?utf-8?B?KzhVOFVPRWVmWG5kbmtPck5Td2lsdmpDT2V6a3EvK2IyTnJMaDhHTGtDeWVN?= =?utf-8?B?YTJSQ3U2eTFuTmJ0Nzd1Ujl1c0RvWEtYSmFqNUoxdmF6ZWZxdTZFVVNwRG5P?= =?utf-8?B?VEYrekJkN2tQU0VUaEZ0dWc4NWF0eG96cnNBR3MvQ1V3TU1Pd0dTYUVsTVRy?= =?utf-8?B?UDFJMU9tRTQ0WklKZGZydnovN0wvc3VjakxkajJGeE9iNzNvaHRZMVZDeWZM?= =?utf-8?B?cW9vbDBmSFFYdDgxUGs4VDcvbGJMOWhJQS8vU3ZFZVREK01TQTcxajBacHM4?= =?utf-8?B?R3NrYlU3L3FyRVZ5cEdWYUo4NnFzU3F2NnNuNGNJWTRQQ0pLcDE0UFBVM0xo?= =?utf-8?B?SXorTnlkSlZQekZvUUgzWityb0c5akdlcytyWC9kOE1qS0FOTkdycHBUNjFQ?= =?utf-8?B?ekcwd3BVZVBsVU0rTmRFWDdrd2J2K2dsTlRmd2dYQ3ExTFRuV1RvYXZESWZ2?= =?utf-8?B?R1JIeUZDWHV2NC81U0hYNTRTWk5pVGg3aGFDREN0V3VNeU9TUHRXWnpGV0NZ?= =?utf-8?B?Q2svK2grWWVHTk0wWkNWdTlCRUlVaFY3KzB2SlQ1L25EVUVOWi9nMENnPT0=?= MIME-Version: 1.0 X-OriginatorOrg: sct-15-20-4755-11-msonline-outlook-1ff67.templateTenant X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-CrossTenant-Network-Message-Id: 25f8137a-6956-41c1-8cd2-08da7034d381 X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Jul 2022 01:02:21.4815 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-rms-persistedconsumerorg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN0P223MB0509 Subject: Re: [FFmpeg-devel] [PATCH] enable auto vectorization for gcc 7 and higher X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: > -----Original Message----- > From: ffmpeg-devel On Behalf Of > Hendrik Leppkes > Sent: Wednesday, July 27, 2022 10:42 PM > To: FFmpeg development discussions and patches devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH] enable auto vectorization for gcc > 7 and higher > > On Wed, Jul 27, 2022 at 7:39 PM James Almer > wrote: > > > > On 7/27/2022 2:34 PM, Swinney, Jonathan wrote: > > > I recognize that this patch is going to be somewhat > controversial. I'm submitting it mostly to see what the opinions are > and evaluate options. I am working on improving performance for > aarch64. On that architecture, there are fewer hand written assembly > implementations of hot functions than there are for x86_64 and > allowing gcc to auto-vectorize yields noticeable improvements. > > > > > > Gcc vectorization has improved recently and it hasn't been > evaluated on the mailing list for a few years. This is the latest > discussion I found in my searches: > http://ffmpeg.org/pipermail/ffmpeg-devel/2016-May/193977.html > > > > Every time this was done, it was inevitably reverted after > complains and > > crash reports started piling up because gcc can't really handle all > the > > inline code our codebase has, among other things. > > > > No need to wait for issues, I just tested, and the same issues still > persist that have existed for years with GCC now. They don't seem to > care to make it compatible with inline asm, which might be fair > enough, but it means it just can't work here. > > In file included from libavcodec/cabac_functions.h:49, > from libavcodec/h264_cabac.c:36: > libavcodec/h264_cabac.c: In function 'ff_h264_decode_mb_cabac': > libavcodec/x86/cabac.h:199:5: error: 'asm' operand has impossible > constraints I wonder why it doesn't fail when I try the same on MINGW32: gcc -I. -Isrc/ -D_FORTIFY_SOURCE=0 -D__USE_MINGW_ANSI_STDIO=1 -D_ISOC99_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -U__STRICT_ANSI__ -D__USE_MINGW_ANSI_STDIO=1 -D__printf__=__gnu_printf__ -D_POSIX_C_SOURCE=200112 -D_XOPEN_SOURCE=600 -DOPJ_STATIC -DZLIB_CONST -DHAVE_AV_CONFIG_H -DBUILDING_avcodec -mthreads -DLIBTWOLAME_STATIC -std=c11 -IV:/ffbuild/mas/local32/include -IV:/ffbuild/mas/msys64/mingw32/include -I/mingw32/include -IF:/ffbuild/mas/local32/include -DLIBARCHIVE_STATIC -Wdeclaration-after-statement -Wall -Wdisabled-optimization -Wpointer-arith -Wredundant-decls -Wwrite-strings -Wtype-limits -Wundef -Wmissing-prototypes -Wstrict-prototypes -Wempty-body -Wno-parentheses -Wno-switch -Wno-format-zero-length -Wno-pointer-sign -Wno-unused-const-variable -Wno-bool-operation -Wno-char-subscripts -O3 -Werror=format-security -Werror=implicit-function-declaration -Werror=missing-prototypes -Werror=return-type -Werror=vla -Wformat -fdiagnostics-color=auto -Wno-maybe-uninitialized - ftree-vectorize -MMD -MF libavcodec/h264_cabac.d -MT libavcodec/h264_cabac.o -c -o libavcodec/h264_cabac.o src/libavcodec/h264_cabac.c When I add garbage to line 199 in cabac.h, it errors, so I'm sure it gets compiled. Same for the av_always_inline line above. gcc version is 10.3.0. I wonder whether it's about some of the compiler flags that it doesn't error here, but I couldn't reproduce with various combinations. Maybe you can spot a difference? >From my experience, tree-vectorize can provide quite some improvements in certain cases, but I often had to rewrite the loops (primarily simplifying) until these got actually vectorized in the way I wanted. Another conclusion from that work is that there's hardly any benefit in using tree-vectorize in combination with O3. When O3 is specified, gcc preferres loop-unrolling over vectorization in the vast majority of cases (often slower). Even worse is that loop-unrolling cannot be disabled individually (neither globally with -O3 -fno-unroll-loops - nor locally with function __attribute__ or pragma gcc optimize) I had done a (small) number of tests doing typical stuff to compare O2 and O3 and I couldn't notice any relevant advantages of O3. It wasn't exhaustive and very likely one can find cases where O3 performs better, but the vectorization advantages on the other side were actually relevant, so I had chosen to change all our builds to O2. Looking at my notes I remember that I had tried a number of things to control gcc optimizations at the function level. It didn't work for me to activate vectorization optimizations (which are globally disabled), but maybe it works the other way round. What you could try is either: #pragma GCC optimize("no-tree-vectorize") or #ifdef __GNUC__ __attribute__((optimize("-fno-tree-vectorize"))) #endif to decorate the function which errors at your side (or maybe even at the upstream caller). Maybe this allows to disable vectorization locally for the erroring case. Best regards, softworkz PS: The observations I made were for x86_64 code in the context of ffmpeg compiled with gcc 10 (maybe 9) and analyzed with Intel tools. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".