* [FFmpeg-devel] [PATCH 1/2] aarch64: Simplify the linux runtime cpu detection code @ 2023-10-24 12:22 Martin Storsjö 2023-10-24 12:22 ` [FFmpeg-devel] [PATCH 2/2] configure: Improve aarch64 feature detection on older, broken Clang versions Martin Storsjö 2023-10-24 12:40 ` [FFmpeg-devel] [PATCH 1/2] aarch64: Simplify the linux runtime cpu detection code Sean McGovern 0 siblings, 2 replies; 7+ messages in thread From: Martin Storsjö @ 2023-10-24 12:22 UTC (permalink / raw) To: ffmpeg-devel Skip doing the whole getauxval(AT_HWCAP) if HWCAP_CPUID isn't defined. --- libavutil/aarch64/cpu.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c index bd780e8591..2b50c426bc 100644 --- a/libavutil/aarch64/cpu.c +++ b/libavutil/aarch64/cpu.c @@ -30,11 +30,9 @@ static int detect_flags(void) { int flags = 0; - unsigned long hwcap; - - hwcap = getauxval(AT_HWCAP); #if defined(HWCAP_CPUID) + unsigned long hwcap = getauxval(AT_HWCAP); // We can check for DOTPROD and I8MM using HWCAP_ASIMDDP and // HWCAP2_I8MM too, avoiding to read the CPUID registers (which triggers // a trap, handled by the kernel). However the HWCAP_* defines for these @@ -53,8 +51,6 @@ static int detect_flags(void) if (((tmp >> 52) & 0xf) == 0x1) flags |= AV_CPU_FLAG_I8MM; } -#else - (void)hwcap; #endif return flags; -- 2.34.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 7+ messages in thread
* [FFmpeg-devel] [PATCH 2/2] configure: Improve aarch64 feature detection on older, broken Clang versions 2023-10-24 12:22 [FFmpeg-devel] [PATCH 1/2] aarch64: Simplify the linux runtime cpu detection code Martin Storsjö @ 2023-10-24 12:22 ` Martin Storsjö 2023-10-31 10:23 ` Martin Storsjö 2023-10-24 12:40 ` [FFmpeg-devel] [PATCH 1/2] aarch64: Simplify the linux runtime cpu detection code Sean McGovern 1 sibling, 1 reply; 7+ messages in thread From: Martin Storsjö @ 2023-10-24 12:22 UTC (permalink / raw) To: ffmpeg-devel Clang versions before 17 (Xcode versions up to and including 15.0) had a very annoying bug in its behaviour of the ".arch" directive in assembly. If the directive only contained a level, such as ".arch armv8.2-a", it did validate the name of the level, but it didn't apply the level to what instructions are allowed. The level was applied if the directive contained an extra feature enabled, such as ".arch armv8.2-a+crc" though. It was also applied on the next ".arch_extension" directive. This bug, combined with the fact that the same versions of Clang didn't support the dotprod/i8mm extension names in either ".arch <level>+<feature>" or in ".arch_extension", could lead to unexepcted build failures. As the dotprod/i8mm extensions couldn't be enabled dynamically via the ".arch_extension" directive, someone building ffmpeg could try to enable them by configuring their build with --extra-cflags="-march=armv8.6-a". During configure, we test for support for the i8mm instructions like this: # Built with -march=armv8.6-a .arch armv8.2-a # Has no visible effect here #.arch_extension i8mm # Omitted as the extension name isn't known usdot v0.4s, v0.16b, v0.16b # Successfully assembled as armv8.6-a is the effective level, # and i8mm is enabled implicitly in armv8.6-a. Thus, we would enable assembling those instructions. However if we later check for another extension, such as sve (which those versions of Clang actually do support), we can later run into the following situation when building actual code: # Built with -march=armv8.6-a .arch armv8.2-a # Has no visible effect here #.arch_extension i8mm # Omitted as the extension name isn't known .arch_extension sve # Included as "sve" is as supported extension name # .arch_extension effectively activates the previous .arch directive, # so the effective level is armv8.2-a+sve now. usdot v0.4s, v0.16b, v0.16b # Fails to build the instructions that require i8mm. Despite the # configure check, the unrelated ".arch_extension sve" directive # breaks the functionality of the i8mm feature. This patch avoids this situation: - By adding a dummy feature such as "+crc" on the .arch directive (if supported), we make sure that it does get applied immediately, avoiding it taking effect spuriously at a later unrelated ".arch_extension" directive. - By checking for higher arch levels such as armv8.4-a and armv8.6-a, we can assemble the dotprod and i8mm extensions without the user needing to pass -march=armv8.6-a. This allows using the dotprod/i8mm codepaths via runtime detection while keeping the binary runnable on older versions. I.e. this enables the i8mm codepaths on Apple M2 machines while built with Xcode's Clang. TL;DR: Enable the I8MM extensions for Apple M2 without the user needing to do a custom configuration; avoid potential build breakage if a user does such a custom configuration. Once Xcode versions that have these issues fixed are prevalent, we can consider reverting this change. --- configure | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/configure b/configure index f494da204c..e00fb6b719 100755 --- a/configure +++ b/configure @@ -6045,7 +6045,26 @@ check_inline_asm inline_asm_nonlocal_labels '"Label:\n"' if enabled aarch64; then as_arch_level="armv8-a" check_as as_arch_directive ".arch $as_arch_level" - enabled as_arch_directive && check_arch_level armv8.2-a + if enabled as_arch_directive; then + # Check for higher .arch levels. We only need armv8.2-a in order to + # enable the extensions we want below - we primarily want to control + # them via .arch_extension. However: + # + # Clang before version 17 (Xcode versions up to and including 15.0) + # didn't support controlling the dotprod/i8mm extensions via + # .arch_extension; thus try to enable them via the .arch level as well. + for level in armv8.2-a armv8.4-a armv8.6-a; do + check_arch_level $level + done + # Clang before version 17 (Xcode versions up to and including 15.0) + # also had a bug (https://github.com/llvm/llvm-project/issues/32220) + # causing a plain ".arch <level>" to not have any effect unless it + # had an extra "+<feature>" included - but it was activate on the next + # ".arch_extension" directive. Check if we can include "+crc" as dummy + # feature to make the .arch directive behave as expected and take + # effect right away. + check_arch_level "${as_arch_level}+crc" + fi enabled armv8 && check_insn armv8 'prfm pldl1strm, [x0]' # internal assembler in clang 3.3 does not support this instruction -- 2.34.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/2] configure: Improve aarch64 feature detection on older, broken Clang versions 2023-10-24 12:22 ` [FFmpeg-devel] [PATCH 2/2] configure: Improve aarch64 feature detection on older, broken Clang versions Martin Storsjö @ 2023-10-31 10:23 ` Martin Storsjö 0 siblings, 0 replies; 7+ messages in thread From: Martin Storsjö @ 2023-10-31 10:23 UTC (permalink / raw) To: ffmpeg-devel On Tue, 24 Oct 2023, Martin Storsjö wrote: > Clang versions before 17 (Xcode versions up to and including 15.0) > had a very annoying bug in its behaviour of the ".arch" directive > in assembly. If the directive only contained a level, such as > ".arch armv8.2-a", it did validate the name of the level, but it > didn't apply the level to what instructions are allowed. The level > was applied if the directive contained an extra feature enabled, > such as ".arch armv8.2-a+crc" though. It was also applied on the > next ".arch_extension" directive. > > This bug, combined with the fact that the same versions of Clang > didn't support the dotprod/i8mm extension names in either > ".arch <level>+<feature>" or in ".arch_extension", could lead to > unexepcted build failures. > > As the dotprod/i8mm extensions couldn't be enabled dynamically > via the ".arch_extension" directive, someone building ffmpeg could > try to enable them by configuring their build with > --extra-cflags="-march=armv8.6-a". > > During configure, we test for support for the i8mm instructions > like this: > > # Built with -march=armv8.6-a > .arch armv8.2-a # Has no visible effect here > #.arch_extension i8mm # Omitted as the extension name isn't known > usdot v0.4s, v0.16b, v0.16b > # Successfully assembled as armv8.6-a is the effective level, > # and i8mm is enabled implicitly in armv8.6-a. > > Thus, we would enable assembling those instructions. However if > we later check for another extension, such as sve (which those > versions of Clang actually do support), we can later run into the > following situation when building actual code: > > # Built with -march=armv8.6-a > .arch armv8.2-a # Has no visible effect here > #.arch_extension i8mm # Omitted as the extension name isn't known > .arch_extension sve # Included as "sve" is as supported extension name > # .arch_extension effectively activates the previous .arch directive, > # so the effective level is armv8.2-a+sve now. > usdot v0.4s, v0.16b, v0.16b > # Fails to build the instructions that require i8mm. Despite the > # configure check, the unrelated ".arch_extension sve" directive > # breaks the functionality of the i8mm feature. > > This patch avoids this situation: > - By adding a dummy feature such as "+crc" on the .arch directive > (if supported), we make sure that it does get applied immediately, > avoiding it taking effect spuriously at a later unrelated > ".arch_extension" directive. > - By checking for higher arch levels such as armv8.4-a and armv8.6-a, > we can assemble the dotprod and i8mm extensions without the user > needing to pass -march=armv8.6-a. This allows using the dotprod/i8mm > codepaths via runtime detection while keeping the binary runnable > on older versions. I.e. this enables the i8mm codepaths on Apple M2 > machines while built with Xcode's Clang. > > TL;DR: Enable the I8MM extensions for Apple M2 without the user needing > to do a custom configuration; avoid potential build breakage if a user > does such a custom configuration. > > Once Xcode versions that have these issues fixed are prevalent, we > can consider reverting this change. > --- > configure | 21 ++++++++++++++++++++- > 1 file changed, 20 insertions(+), 1 deletion(-) Will push now. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] aarch64: Simplify the linux runtime cpu detection code 2023-10-24 12:22 [FFmpeg-devel] [PATCH 1/2] aarch64: Simplify the linux runtime cpu detection code Martin Storsjö 2023-10-24 12:22 ` [FFmpeg-devel] [PATCH 2/2] configure: Improve aarch64 feature detection on older, broken Clang versions Martin Storsjö @ 2023-10-24 12:40 ` Sean McGovern 2023-10-26 21:40 ` Sean McGovern 2023-10-31 10:22 ` Martin Storsjö 1 sibling, 2 replies; 7+ messages in thread From: Sean McGovern @ 2023-10-24 12:40 UTC (permalink / raw) To: FFmpeg development discussions and patches On Tue, Oct 24, 2023, 08:23 Martin Storsjö <martin@martin.st> wrote: > Skip doing the whole getauxval(AT_HWCAP) if HWCAP_CPUID isn't > defined. > --- > libavutil/aarch64/cpu.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c > index bd780e8591..2b50c426bc 100644 > --- a/libavutil/aarch64/cpu.c > +++ b/libavutil/aarch64/cpu.c > @@ -30,11 +30,9 @@ > static int detect_flags(void) > { > int flags = 0; > - unsigned long hwcap; > - > - hwcap = getauxval(AT_HWCAP); > > #if defined(HWCAP_CPUID) > + unsigned long hwcap = getauxval(AT_HWCAP); > // We can check for DOTPROD and I8MM using HWCAP_ASIMDDP and > // HWCAP2_I8MM too, avoiding to read the CPUID registers (which > triggers > // a trap, handled by the kernel). However the HWCAP_* defines for > these > @@ -53,8 +51,6 @@ static int detect_flags(void) > if (((tmp >> 52) & 0xf) == 0x1) > flags |= AV_CPU_FLAG_I8MM; > } > -#else > - (void)hwcap; > #endif > > return flags; > -- > 2.34.1 > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > Good call, and LGTM. -- Sean McGovern > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] aarch64: Simplify the linux runtime cpu detection code 2023-10-24 12:40 ` [FFmpeg-devel] [PATCH 1/2] aarch64: Simplify the linux runtime cpu detection code Sean McGovern @ 2023-10-26 21:40 ` Sean McGovern 2023-10-26 21:42 ` Sean McGovern 2023-10-31 10:22 ` Martin Storsjö 1 sibling, 1 reply; 7+ messages in thread From: Sean McGovern @ 2023-10-26 21:40 UTC (permalink / raw) To: FFmpeg development discussions and patches On Tue, Oct 24, 2023 at 8:40 AM Sean McGovern <gseanmcg@gmail.com> wrote: > > > > On Tue, Oct 24, 2023, 08:23 Martin Storsjö <martin@martin.st> wrote: >> >> Skip doing the whole getauxval(AT_HWCAP) if HWCAP_CPUID isn't >> defined. >> --- >> libavutil/aarch64/cpu.c | 6 +----- >> 1 file changed, 1 insertion(+), 5 deletions(-) >> >> diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c >> index bd780e8591..2b50c426bc 100644 >> --- a/libavutil/aarch64/cpu.c >> +++ b/libavutil/aarch64/cpu.c >> @@ -30,11 +30,9 @@ >> static int detect_flags(void) >> { >> int flags = 0; >> - unsigned long hwcap; >> - >> - hwcap = getauxval(AT_HWCAP); >> >> #if defined(HWCAP_CPUID) >> + unsigned long hwcap = getauxval(AT_HWCAP); >> // We can check for DOTPROD and I8MM using HWCAP_ASIMDDP and >> // HWCAP2_I8MM too, avoiding to read the CPUID registers (which triggers >> // a trap, handled by the kernel). However the HWCAP_* defines for these >> @@ -53,8 +51,6 @@ static int detect_flags(void) >> if (((tmp >> 52) & 0xf) == 0x1) >> flags |= AV_CPU_FLAG_I8MM; >> } >> -#else >> - (void)hwcap; >> #endif >> >> return flags; >> -- >> 2.34.1 >> >> _______________________________________________ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel >> >> To unsubscribe, visit link above, or email >> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > > > Good call, and LGTM. > > -- Sean McGovern Ooops! I don't have c= so maybe someone who does could push this -- and the 2nd patch in this set needs a pair of eyes more familiar with compiler bugs. -- Sean McG _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] aarch64: Simplify the linux runtime cpu detection code 2023-10-26 21:40 ` Sean McGovern @ 2023-10-26 21:42 ` Sean McGovern 0 siblings, 0 replies; 7+ messages in thread From: Sean McGovern @ 2023-10-26 21:42 UTC (permalink / raw) To: FFmpeg development discussions and patches On Thu, Oct 26, 2023 at 5:40 PM Sean McGovern <gseanmcg@gmail.com> wrote: > > On Tue, Oct 24, 2023 at 8:40 AM Sean McGovern <gseanmcg@gmail.com> wrote: > > > > > > > > On Tue, Oct 24, 2023, 08:23 Martin Storsjö <martin@martin.st> wrote: > >> > >> Skip doing the whole getauxval(AT_HWCAP) if HWCAP_CPUID isn't > >> defined. > >> --- > >> libavutil/aarch64/cpu.c | 6 +----- > >> 1 file changed, 1 insertion(+), 5 deletions(-) > >> > >> diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c > >> index bd780e8591..2b50c426bc 100644 > >> --- a/libavutil/aarch64/cpu.c > >> +++ b/libavutil/aarch64/cpu.c > >> @@ -30,11 +30,9 @@ > >> static int detect_flags(void) > >> { > >> int flags = 0; > >> - unsigned long hwcap; > >> - > >> - hwcap = getauxval(AT_HWCAP); > >> > >> #if defined(HWCAP_CPUID) > >> + unsigned long hwcap = getauxval(AT_HWCAP); > >> // We can check for DOTPROD and I8MM using HWCAP_ASIMDDP and > >> // HWCAP2_I8MM too, avoiding to read the CPUID registers (which triggers > >> // a trap, handled by the kernel). However the HWCAP_* defines for these > >> @@ -53,8 +51,6 @@ static int detect_flags(void) > >> if (((tmp >> 52) & 0xf) == 0x1) > >> flags |= AV_CPU_FLAG_I8MM; > >> } > >> -#else > >> - (void)hwcap; > >> #endif > >> > >> return flags; > >> -- > >> 2.34.1 > >> > >> _______________________________________________ > >> ffmpeg-devel mailing list > >> ffmpeg-devel@ffmpeg.org > >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > >> > >> To unsubscribe, visit link above, or email > >> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > > > > > > Good call, and LGTM. > > > > -- Sean McGovern > > Ooops! I don't have c= so maybe someone who does could push this -- > and the 2nd patch in this set needs a pair of eyes more familiar with > compiler bugs. > > -- Sean McG Looks like I missed the push notification on patch #2. -- Sean McG. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] aarch64: Simplify the linux runtime cpu detection code 2023-10-24 12:40 ` [FFmpeg-devel] [PATCH 1/2] aarch64: Simplify the linux runtime cpu detection code Sean McGovern 2023-10-26 21:40 ` Sean McGovern @ 2023-10-31 10:22 ` Martin Storsjö 1 sibling, 0 replies; 7+ messages in thread From: Martin Storsjö @ 2023-10-31 10:22 UTC (permalink / raw) To: FFmpeg development discussions and patches On Tue, 24 Oct 2023, Sean McGovern wrote: > On Tue, Oct 24, 2023, 08:23 Martin Storsjö <martin@martin.st> wrote: > >> Skip doing the whole getauxval(AT_HWCAP) if HWCAP_CPUID isn't >> defined. >> --- >> libavutil/aarch64/cpu.c | 6 +----- >> 1 file changed, 1 insertion(+), 5 deletions(-) >> >> diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c >> index bd780e8591..2b50c426bc 100644 >> --- a/libavutil/aarch64/cpu.c >> +++ b/libavutil/aarch64/cpu.c >> @@ -30,11 +30,9 @@ >> static int detect_flags(void) >> { >> int flags = 0; >> - unsigned long hwcap; >> - >> - hwcap = getauxval(AT_HWCAP); >> >> #if defined(HWCAP_CPUID) >> + unsigned long hwcap = getauxval(AT_HWCAP); >> // We can check for DOTPROD and I8MM using HWCAP_ASIMDDP and >> // HWCAP2_I8MM too, avoiding to read the CPUID registers (which >> triggers >> // a trap, handled by the kernel). However the HWCAP_* defines for >> these >> @@ -53,8 +51,6 @@ static int detect_flags(void) >> if (((tmp >> 52) & 0xf) == 0x1) >> flags |= AV_CPU_FLAG_I8MM; >> } >> -#else >> - (void)hwcap; >> #endif >> >> return flags; >> -- >> 2.34.1 >> >> _______________________________________________ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel >> >> To unsubscribe, visit link above, or email >> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". >> > > Good call, and LGTM. Thanks, will push now. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-10-31 10:23 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-10-24 12:22 [FFmpeg-devel] [PATCH 1/2] aarch64: Simplify the linux runtime cpu detection code Martin Storsjö 2023-10-24 12:22 ` [FFmpeg-devel] [PATCH 2/2] configure: Improve aarch64 feature detection on older, broken Clang versions Martin Storsjö 2023-10-31 10:23 ` Martin Storsjö 2023-10-24 12:40 ` [FFmpeg-devel] [PATCH 1/2] aarch64: Simplify the linux runtime cpu detection code Sean McGovern 2023-10-26 21:40 ` Sean McGovern 2023-10-26 21:42 ` Sean McGovern 2023-10-31 10:22 ` Martin Storsjö
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git