From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id B99094CEB8 for ; Thu, 29 May 2025 16:02:38 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 2DAA868DAC9; Thu, 29 May 2025 19:02:34 +0300 (EEST) Received: from relay9-d.mail.gandi.net (relay9-d.mail.gandi.net [217.70.183.199]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 1A4BF68D9A9 for ; Thu, 29 May 2025 19:02:27 +0300 (EEST) Received: by mail.gandi.net (Postfix) with ESMTPSA id 5B36A439FC for ; Thu, 29 May 2025 16:02:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=niedermayer.cc; s=gm1; t=1748534546; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=SQXUvhLedmxDA9nu+9Hfiwtd+m44ADdQSr4Vs8R7ml8=; b=hWvKT5NNyrFwOdc19asfnmtqIbYNxO+tAoE8+RMRd46AiQ5yVugOWlrmBfQq4MTFmc6njz R6RznlYHprwncbGNjiR2fgxSd77Ke6+LKsald+KjFOFigyzN2ULh55sOPmz6ATQIx++6qp RUOjTzO+6acYoWLpidos7LII49when6qJgkKIqUJmJM2oKb5XH5DHCtAo3EUz93arun+kI JYraXhkZP/db3SCu2tc+xgoHZBL04EPWydbKF2M5hIy4ZUiLO1UoHGSmoCCridK/Y9EgKU eegGmZL5eb0gtIGWeTK5ssXVp0FsbPFzdXlbObJfE1SgI1L3E8z8aYdQYg+kkA== Date: Thu, 29 May 2025 18:02:24 +0200 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20250529160224.GB29660@pb2> References: <20250529070312.698302-1-jiawei@iscas.ac.cn> MIME-Version: 1.0 In-Reply-To: X-GND-State: clean X-GND-Score: -90 X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtddtgddvieehieculddtuddrgeefvddrtddtmdcutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfitefpfffkpdcuggftfghnshhusghstghrihgsvgenuceurghilhhouhhtmecufedtudenucesvcftvggtihhpihgvnhhtshculddquddttddmnegfrhhlucfvnfffucdluddtmdenucfjughrpeffhffvuffkfhggtggujgesghdtreertddtjeenucfhrhhomhepofhitghhrggvlhcupfhivgguvghrmhgrhigvrhcuoehmihgthhgrvghlsehnihgvuggvrhhmrgihvghrrdgttgeqnecuggftrfgrthhtvghrnhepffehvefhvddvhfdtgfethfdtueelfedtveekffeljeethfegtdfhfefggfeufedtnecuffhomhgrihhnpehffhhmphgvghdrohhrghenucfkphepgedurdeiiedrieejrdduudefnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehinhgvthepgedurdeiiedrieejrdduudefpdhhvghloheplhhotggrlhhhohhsthdpmhgrihhlfhhrohhmpehmihgthhgrvghlsehnihgvuggvrhhmrgihvghrrdgttgdpnhgspghrtghpthhtohepuddprhgtphhtthhopehffhhmphgvghdquggvvhgvlhesfhhfmhhpvghgrdhorhhg X-GND-Sasl: michael@niedermayer.cc Subject: Re: [FFmpeg-devel] [FFmpeg-devel, v2] gcc: Relaxing auto-vectorization limitation. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: multipart/mixed; boundary="===============8016389178363272599==" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --===============8016389178363272599== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="pK4eg2kzVWbx3/cL" Content-Disposition: inline --pK4eg2kzVWbx3/cL Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi On Thu, May 29, 2025 at 04:37:16PM +0800, Zhao Zhili wrote: >=20 >=20 > > On May 29, 2025, at 15:03, Jiawei wrote: > >=20 > > This patch modifies the FFmpeg build system to remove the explicit disa= bling > > of GCC's auto-vectorization feature. > >=20 > > Modern GCC versions have demonstrated stable auto-vectorization capabil= ities > > through extensive optimizations in loop analysis and SIMD code generati= on. > > The explicit -fno-tree-vectorize flag originally added in commit 973859f > > (2009) to workaround early GCC vectorization instability is no longer > > necessary for recent gcc versions. > >=20 > > Key improvements justifying this change: > > 1. Enhanced heuristics for loop vectorization cost models > > 2. Mature handling of alignment and memory access patterns > > 3. Robust fallback mechanisms for unsupported architectures > >=20 > > This change allows FFmpeg to benefit from automated SIMD optimizations > > when built with -O3 optimization level, particularly improving > > performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures. > >=20 > > [1] https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/973859f5230e77beea7= bb59dc081870689d6d191 > >=20 > > Version log: > > Only allow GCC versions >=3D 13 to use auto-vectorization. > > Disscussion see: > > https://patchwork.ffmpeg.org/project/ffmpeg/patch/20250521061750.54882-= 1-jiawei@iscas.ac.cn/ > >=20 > > --- > > configure | 1 - > > 1 file changed, 1 deletion(-) > >=20 > > Signed-off-by: Jiawei > > --- > > configure | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > >=20 > > diff --git a/configure b/configure > > index 3730b0524c..91e3e107c2 100755 > > --- a/configure > > +++ b/configure > > @@ -7656,7 +7656,11 @@ if enabled icc; then > > disable aligned_stack > > fi > > elif enabled gcc; then > > - check_optflags -fno-tree-vectorize > > + gcc_version=3D$($cc -dumpversion) > > + major_version=3D${gcc_version%%.*} > > + if [ $major_version -lt 13 ]; then > > + check_optflags -fno-tree-vectorize > > + fi > > check_cflags -Werror=3Dformat-security > > check_cflags -Werror=3Dimplicit-function-declaration > > check_cflags -Werror=3Dmissing-prototypes > > --=20 > > 2.43.0 > >=20 > > This patch modifies the FFmpeg build system to remove the explicit disa= bling > > of GCC's auto-vectorization feature. > >=20 > > Modern GCC versions have demonstrated stable auto-vectorization capabil= ities > > through extensive optimizations in loop analysis and SIMD code generati= on. > > The explicit -fno-tree-vectorize flag originally added in commit 973859f > > (2009) to workaround early GCC vectorization instability is no longer > > necessary for recent gcc versions. > >=20 > > Key improvements justifying this change: > > 1. Enhanced heuristics for loop vectorization cost models > > 2. Mature handling of alignment and memory access patterns > > 3. Robust fallback mechanisms for unsupported architectures > >=20 > > This change allows FFmpeg to benefit from automated SIMD optimizations > > when built with -O3 optimization level, particularly improving > > performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures. > >=20 > > [1] https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/973859f5230e77beea7= bb59dc081870689d6d191 > >=20 > > Version log: > > Only allow GCC versions >=3D 13 to use auto-vectorization. > > Disscussion see: > > https://patchwork.ffmpeg.org/project/ffmpeg/patch/20250521061750.54882-= 1-jiawei@iscas.ac.cn/ > >=20 > > --- > > configure | 1 - > > 1 file changed, 1 deletion(-) > >=20 > > Signed-off-by: Jiawei > > --- > > configure | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > >=20 > > diff --git a/configure b/configure > > index 3730b0524c..91e3e107c2 100755 > > --- a/configure > > +++ b/configure > > @@ -7656,7 +7656,11 @@ if enabled icc; then > > disable aligned_stack > > fi > > elif enabled gcc; then > > - check_optflags -fno-tree-vectorize > > + gcc_version=3D$($cc -dumpversion) > > + major_version=3D${gcc_version%%.*} > > + if [ $major_version -lt 13 ]; then > > + check_optflags -fno-tree-vectorize > > + fi > > check_cflags -Werror=3Dformat-security > > check_cflags -Werror=3Dimplicit-function-declaration > > check_cflags -Werror=3Dmissing-prototypes > > --=20 > > 2.43.0 > >=20 > > This patch modifies the FFmpeg build system to remove the explicit disa= bling > > of GCC's auto-vectorization feature. > >=20 > > Modern GCC versions have demonstrated stable auto-vectorization capabil= ities > > through extensive optimizations in loop analysis and SIMD code generati= on. > > The explicit -fno-tree-vectorize flag originally added in commit 973859f > > (2009) to workaround early GCC vectorization instability is no longer > > necessary for recent gcc versions. > >=20 > > Key improvements justifying this change: > > 1. Enhanced heuristics for loop vectorization cost models > > 2. Mature handling of alignment and memory access patterns > > 3. Robust fallback mechanisms for unsupported architectures > >=20 > > This change allows FFmpeg to benefit from automated SIMD optimizations > > when built with -O3 optimization level, particularly improving > > performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures. > >=20 > > [1] https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/973859f5230e77beea7= bb59dc081870689d6d191 > >=20 > > Version log: > > Only allow GCC versions >=3D 13 to use auto-vectorization. > > Disscussion see: > > https://patchwork.ffmpeg.org/project/ffmpeg/patch/20250521061750.54882-= 1-jiawei@iscas.ac.cn/ > >=20 > > --- > > configure | 1 - > > 1 file changed, 1 deletion(-) > >=20 > > Signed-off-by: Jiawei > > --- > > configure | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > >=20 > > diff --git a/configure b/configure > > index 3730b0524c..91e3e107c2 100755 > > --- a/configure > > +++ b/configure > > @@ -7656,7 +7656,11 @@ if enabled icc; then > > disable aligned_stack > > fi > > elif enabled gcc; then > > - check_optflags -fno-tree-vectorize > > + gcc_version=3D$($cc -dumpversion) > > + major_version=3D${gcc_version%%.*} > > + if [ $major_version -lt 13 ]; then > > + check_optflags -fno-tree-vectorize > > + fi > > check_cflags -Werror=3Dformat-security > > check_cflags -Werror=3Dimplicit-function-declaration > > check_cflags -Werror=3Dmissing-prototypes > > --=20 > > 2.43.0 > >=20 >=20 > It looks like the patch format is corrupted. >=20 > I=E2=80=99m OK with the code change. However, the commit message is misle= ading. As already pointed out > by multiple developers, this option doesn=E2=80=99t help with AVX, SVE an= d RVV because we can=E2=80=99t assume > they are available at runtime, unless build and run on a particular hardw= are. can gcc or clang not build code like our runtime cpudetect ? i mean build functions for each major type and detect cpu once and switch accordingly ? I cannot be the first person thinking of that thx [...] --=20 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Awnsering whenever a program halts or runs forever is On a turing machine, in general impossible (turings halting problem). On any real computer, always possible as a real computer has a finite number of states N, and will either halt in less than N cycles or never halt. --pK4eg2kzVWbx3/cL Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EABEKAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCaDiFDAAKCRBhHseHBAsP qz0fAJ0SJEAh1pCOD3dJIIfNpmEC5j/y5QCfcE52Abuf2sXYi95KHrCqDkSpuhw= =Orfb -----END PGP SIGNATURE----- --pK4eg2kzVWbx3/cL-- --===============8016389178363272599== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --===============8016389178363272599==--