From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id D8A974A993 for ; Thu, 29 May 2025 19:07:02 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 4C52168DC71; Thu, 29 May 2025 22:06:59 +0300 (EEST) Received: from relay7-d.mail.gandi.net (relay7-d.mail.gandi.net [217.70.183.200]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 668C068DC26 for ; Thu, 29 May 2025 22:06:52 +0300 (EEST) Received: by mail.gandi.net (Postfix) with ESMTPSA id 9EFC9441AE for ; Thu, 29 May 2025 19:06:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=niedermayer.cc; s=gm1; t=1748545611; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=nR1+GnMRkDc5UL73ZcRnFwQAcgwr8s13dOmDrhkVZ8U=; b=FljrSycNpGR/3jkFijCuiORaz0MFYY8V85r3YF7ZlfPDQlh/0kj8F7RBfhTcvuiDgeovRZ Md0+Wy9EiPOMZ3KKYFeLOpNo+DaeInKWyavJan5XS0TJQaKURif0vmvmYA9u8Z+Ec/5Zm9 poeo6WJCwYU7BBFEaDkMUDcn8eZSbn3EgOp2DMq4jpVWP8Xm+Jr2bQyq1+P8hbRmTFyHaI arF+YGfoLXkXt6keSyzVr0C/IxWtDIVCFvS1xPJjinHjiDiIsxKB1BklwFVHkIj92bFNIm 6CekZzB5QSaXJkQGqAeWKNRzjxdxXPe4MCK7LUblguSBeu8l+1j0v5u97zGUJA== Date: Thu, 29 May 2025 21:06:50 +0200 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20250529190650.GC29660@pb2> References: <20250529070312.698302-1-jiawei@iscas.ac.cn> <20250529160224.GB29660@pb2> MIME-Version: 1.0 In-Reply-To: X-GND-State: clean X-GND-Score: -90 X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtddtgddvieelfeculddtuddrgeefvddrtddtmdcutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfitefpfffkpdcuggftfghnshhusghstghrihgsvgenuceurghilhhouhhtmecufedtudenucesvcftvggtihhpihgvnhhtshculddquddttddmnegfrhhlucfvnfffucdluddtmdenucfjughrpeffhffvuffkfhggtggujgesghdtreertddtjeenucfhrhhomhepofhitghhrggvlhcupfhivgguvghrmhgrhigvrhcuoehmihgthhgrvghlsehnihgvuggvrhhmrgihvghrrdgttgeqnecuggftrfgrthhtvghrnhepffehvefhvddvhfdtgfethfdtueelfedtveekffeljeethfegtdfhfefggfeufedtnecuffhomhgrihhnpehffhhmphgvghdrohhrghenucfkphepgedurdeiiedrieejrdduudefnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehinhgvthepgedurdeiiedrieejrdduudefpdhhvghloheplhhotggrlhhhohhsthdpmhgrihhlfhhrohhmpehmihgthhgrvghlsehnihgvuggvrhhmrgihvghrrdgttgdpnhgspghrtghpthhtohepuddprhgtphhtthhopehffhhmphgvghdquggvvhgvlhesfhhfmhhpvghgrdhorhhg X-GND-Sasl: michael@niedermayer.cc Subject: Re: [FFmpeg-devel] [FFmpeg-devel, v2] gcc: Relaxing auto-vectorization limitation. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: multipart/mixed; boundary="===============8620644996273162761==" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --===============8620644996273162761== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="pUn9KHdhfAzggWSF" Content-Disposition: inline --pUn9KHdhfAzggWSF Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, May 29, 2025 at 07:26:09PM +0200, Andreas Rheinhardt wrote: > Michael Niedermayer: > > Hi > >=20 > > On Thu, May 29, 2025 at 04:37:16PM +0800, Zhao Zhili wrote: > >> > >> > >>> On May 29, 2025, at 15:03, Jiawei wrote: > >>> > >>> This patch modifies the FFmpeg build system to remove the explicit di= sabling > >>> of GCC's auto-vectorization feature. > >>> > >>> Modern GCC versions have demonstrated stable auto-vectorization capab= ilities > >>> through extensive optimizations in loop analysis and SIMD code genera= tion. > >>> The explicit -fno-tree-vectorize flag originally added in commit 9738= 59f > >>> (2009) to workaround early GCC vectorization instability is no longer > >>> necessary for recent gcc versions. > >>> > >>> Key improvements justifying this change: > >>> 1. Enhanced heuristics for loop vectorization cost models > >>> 2. Mature handling of alignment and memory access patterns > >>> 3. Robust fallback mechanisms for unsupported architectures > >>> > >>> This change allows FFmpeg to benefit from automated SIMD optimizations > >>> when built with -O3 optimization level, particularly improving > >>> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architecture= s. > >>> > >>> [1] https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/973859f5230e77bee= a7bb59dc081870689d6d191 > >>> > >>> Version log: > >>> Only allow GCC versions >=3D 13 to use auto-vectorization. > >>> Disscussion see: > >>> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20250521061750.5488= 2-1-jiawei@iscas.ac.cn/ > >>> > >>> --- > >>> configure | 1 - > >>> 1 file changed, 1 deletion(-) > >>> > >>> Signed-off-by: Jiawei > >>> --- > >>> configure | 6 +++++- > >>> 1 file changed, 5 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/configure b/configure > >>> index 3730b0524c..91e3e107c2 100755 > >>> --- a/configure > >>> +++ b/configure > >>> @@ -7656,7 +7656,11 @@ if enabled icc; then > >>> disable aligned_stack > >>> fi > >>> elif enabled gcc; then > >>> - check_optflags -fno-tree-vectorize > >>> + gcc_version=3D$($cc -dumpversion) > >>> + major_version=3D${gcc_version%%.*} > >>> + if [ $major_version -lt 13 ]; then > >>> + check_optflags -fno-tree-vectorize > >>> + fi > >>> check_cflags -Werror=3Dformat-security > >>> check_cflags -Werror=3Dimplicit-function-declaration > >>> check_cflags -Werror=3Dmissing-prototypes > >>> --=20 > >>> 2.43.0 > >>> > >>> This patch modifies the FFmpeg build system to remove the explicit di= sabling > >>> of GCC's auto-vectorization feature. > >>> > >>> Modern GCC versions have demonstrated stable auto-vectorization capab= ilities > >>> through extensive optimizations in loop analysis and SIMD code genera= tion. > >>> The explicit -fno-tree-vectorize flag originally added in commit 9738= 59f > >>> (2009) to workaround early GCC vectorization instability is no longer > >>> necessary for recent gcc versions. > >>> > >>> Key improvements justifying this change: > >>> 1. Enhanced heuristics for loop vectorization cost models > >>> 2. Mature handling of alignment and memory access patterns > >>> 3. Robust fallback mechanisms for unsupported architectures > >>> > >>> This change allows FFmpeg to benefit from automated SIMD optimizations > >>> when built with -O3 optimization level, particularly improving > >>> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architecture= s. > >>> > >>> [1] https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/973859f5230e77bee= a7bb59dc081870689d6d191 > >>> > >>> Version log: > >>> Only allow GCC versions >=3D 13 to use auto-vectorization. > >>> Disscussion see: > >>> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20250521061750.5488= 2-1-jiawei@iscas.ac.cn/ > >>> > >>> --- > >>> configure | 1 - > >>> 1 file changed, 1 deletion(-) > >>> > >>> Signed-off-by: Jiawei > >>> --- > >>> configure | 6 +++++- > >>> 1 file changed, 5 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/configure b/configure > >>> index 3730b0524c..91e3e107c2 100755 > >>> --- a/configure > >>> +++ b/configure > >>> @@ -7656,7 +7656,11 @@ if enabled icc; then > >>> disable aligned_stack > >>> fi > >>> elif enabled gcc; then > >>> - check_optflags -fno-tree-vectorize > >>> + gcc_version=3D$($cc -dumpversion) > >>> + major_version=3D${gcc_version%%.*} > >>> + if [ $major_version -lt 13 ]; then > >>> + check_optflags -fno-tree-vectorize > >>> + fi > >>> check_cflags -Werror=3Dformat-security > >>> check_cflags -Werror=3Dimplicit-function-declaration > >>> check_cflags -Werror=3Dmissing-prototypes > >>> --=20 > >>> 2.43.0 > >>> > >>> This patch modifies the FFmpeg build system to remove the explicit di= sabling > >>> of GCC's auto-vectorization feature. > >>> > >>> Modern GCC versions have demonstrated stable auto-vectorization capab= ilities > >>> through extensive optimizations in loop analysis and SIMD code genera= tion. > >>> The explicit -fno-tree-vectorize flag originally added in commit 9738= 59f > >>> (2009) to workaround early GCC vectorization instability is no longer > >>> necessary for recent gcc versions. > >>> > >>> Key improvements justifying this change: > >>> 1. Enhanced heuristics for loop vectorization cost models > >>> 2. Mature handling of alignment and memory access patterns > >>> 3. Robust fallback mechanisms for unsupported architectures > >>> > >>> This change allows FFmpeg to benefit from automated SIMD optimizations > >>> when built with -O3 optimization level, particularly improving > >>> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architecture= s. > >>> > >>> [1] https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/973859f5230e77bee= a7bb59dc081870689d6d191 > >>> > >>> Version log: > >>> Only allow GCC versions >=3D 13 to use auto-vectorization. > >>> Disscussion see: > >>> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20250521061750.5488= 2-1-jiawei@iscas.ac.cn/ > >>> > >>> --- > >>> configure | 1 - > >>> 1 file changed, 1 deletion(-) > >>> > >>> Signed-off-by: Jiawei > >>> --- > >>> configure | 6 +++++- > >>> 1 file changed, 5 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/configure b/configure > >>> index 3730b0524c..91e3e107c2 100755 > >>> --- a/configure > >>> +++ b/configure > >>> @@ -7656,7 +7656,11 @@ if enabled icc; then > >>> disable aligned_stack > >>> fi > >>> elif enabled gcc; then > >>> - check_optflags -fno-tree-vectorize > >>> + gcc_version=3D$($cc -dumpversion) > >>> + major_version=3D${gcc_version%%.*} > >>> + if [ $major_version -lt 13 ]; then > >>> + check_optflags -fno-tree-vectorize > >>> + fi > >>> check_cflags -Werror=3Dformat-security > >>> check_cflags -Werror=3Dimplicit-function-declaration > >>> check_cflags -Werror=3Dmissing-prototypes > >>> --=20 > >>> 2.43.0 > >>> > >> > >> It looks like the patch format is corrupted. > >> > >> I=E2=80=99m OK with the code change. However, the commit message is mi= sleading. As already pointed out > >> by multiple developers, this option doesn=E2=80=99t help with AVX, SVE= and RVV because we can=E2=80=99t assume > >> they are available at runtime, unless build and run on a particular ha= rdware. > >=20 > > can gcc or clang not build code like our runtime cpudetect ? > > i mean build functions for each major type and detect cpu once > > and switch accordingly ? >=20 > How would this "once" work in practice? If the cpu is supposed to be > detected only once, the result needs to be stored somewhere (in static > storage). Even if this is initialized in the libraries .init function > (so that we can be sure that it is initialized when the actual code is > run), this would still need a check every time one of these code > snippets is run. Iam not a compiler developer, nor an expert in ELF but the cpu detect could edit the PLT or use an approuch similar to how our code works, that is function pointers thx [...] --=20 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB When you are offended at any man's fault, turn to yourself and study your own failings. Then you will forget your anger. -- Epictetus --pUn9KHdhfAzggWSF Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EABEKAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCaDiwQwAKCRBhHseHBAsP q9RcAJ9qwzpi8XFcCi8IIT072DNrtvLClwCfdu9DivGUIDCmdlfxORV781qCCL8= =Pihj -----END PGP SIGNATURE----- --pUn9KHdhfAzggWSF-- --===============8620644996273162761== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --===============8620644996273162761==--