* [FFmpeg-devel] [PATCH v2 1/5] configure: aarch64: Support assembling the dotprod and i8mm arch extensions @ 2023-05-30 12:30 Martin Storsjö 2023-05-30 12:30 ` [FFmpeg-devel] [PATCH v2 2/5] aarch64: Add cpu flags for the dotprod and i8mm extensions Martin Storsjö ` (4 more replies) 0 siblings, 5 replies; 11+ messages in thread From: Martin Storsjö @ 2023-05-30 12:30 UTC (permalink / raw) To: ffmpeg-devel These are available since ARMv8.4-a and ARMv8.6-a respectively, but can also be available optionally since ARMv8.2-a. Check if ".arch armv8.2-a" and ".arch_extension {dotprod,i8mm}" are supported, and check if the instructions can be assembled. Current clang versions fail to support the dotprod and i8mm features in the .arch_extension directive, but do support them if enabled with -march=armv8.4-a on the command line. (Curiously, lowering the arch level with ".arch armv8.2-a" doesn't make the extensions unavailable if they were enabled with -march; if that changes, Clang should also learn to support these extensions via .arch_extension for them to remain usable here.) --- Simplified the detection logic somewhat; check if ".arch armv8.2-a" and ".arch_extension {dotprod,i8mm}" are available, then check if the instruction can be assembled. This way, we check exactly the same thing as we are going to assemble in the end, so there shouldn't be any risk of build breakage due to testing and building subtly different things. --- configure | 81 ++++++++++++++++++++++++++++++++++++++++- libavutil/aarch64/asm.S | 11 ++++++ 2 files changed, 91 insertions(+), 1 deletion(-) diff --git a/configure b/configure index 495493aa0e..50eb27ba0e 100755 --- a/configure +++ b/configure @@ -454,6 +454,8 @@ Optimization options (experts only): --disable-armv6t2 disable armv6t2 optimizations --disable-vfp disable VFP optimizations --disable-neon disable NEON optimizations + --disable-dotprod disable DOTPROD optimizations + --disable-i8mm disable I8MM optimizations --disable-inline-asm disable use of inline assembly --disable-x86asm disable use of standalone x86 assembly --disable-mipsdsp disable MIPS DSP ASE R1 optimizations @@ -1154,6 +1156,43 @@ check_insn(){ check_as ${1}_external "$2" } +check_arch_level(){ + log check_arch_level "$@" + level="$1" + check_as tested_arch_level ".arch $level" + enabled tested_arch_level && as_arch_level="$level" +} + +check_archext_insn(){ + log check_archext_insn "$@" + feature="$1" + instr="$2" + # Check if the assembly is accepted in inline assembly. + check_inline_asm ${feature}_inline "\"$instr\"" + # We don't check if the instruction is supported out of the box by the + # external assembler (we don't try to set ${feature}_external) as we don't + # need to use these instructions in non-runtime detected codepaths. + + disable $feature + + enabled as_arch_directive && arch_directive=".arch $as_arch_level" || arch_directive="" + + # Test if the assembler supports the .arch_extension $feature directive. + arch_extension_directive=".arch_extension $feature" + test_as <<EOF && enable as_archext_${feature}_directive || arch_extension_directive="" +$arch_directive +$arch_extension_directive +EOF + + # Test if we can assemble the instruction after potential .arch and + # .arch_extension directives. + test_as <<EOF && enable ${feature} +$arch_directive +$arch_extension_directive +$instr +EOF +} + check_x86asm(){ log check_x86asm "$@" name=$1 @@ -2059,6 +2098,8 @@ ARCH_EXT_LIST_ARM=" armv6 armv6t2 armv8 + dotprod + i8mm neon vfp vfpv3 @@ -2322,6 +2363,8 @@ SYSTEM_LIBRARIES=" TOOLCHAIN_FEATURES=" as_arch_directive + as_archext_dotprod_directive + as_archext_i8mm_directive as_dn_directive as_fpu_directive as_func @@ -2622,6 +2665,8 @@ intrinsics_neon_deps="neon" vfp_deps_any="aarch64 arm" vfpv3_deps="vfp" setend_deps="arm" +dotprod_deps="aarch64 neon" +i8mm_deps="aarch64 neon" map 'eval ${v}_inline_deps=inline_asm' $ARCH_EXT_LIST_ARM @@ -5988,12 +6033,27 @@ check_inline_asm inline_asm_labels '"1:\n"' check_inline_asm inline_asm_nonlocal_labels '"Label:\n"' if enabled aarch64; then + as_arch_level="armv8-a" + check_as as_arch_directive ".arch $as_arch_level" + enabled as_arch_directive && check_arch_level armv8.2-a + enabled armv8 && check_insn armv8 'prfm pldl1strm, [x0]' # internal assembler in clang 3.3 does not support this instruction enabled neon && check_insn neon 'ext v0.8B, v0.8B, v1.8B, #1' enabled vfp && check_insn vfp 'fmadd d0, d0, d1, d2' - map 'enabled_any ${v}_external ${v}_inline || disable $v' $ARCH_EXT_LIST_ARM + archext_list="dotprod i8mm" + enabled dotprod && check_archext_insn dotprod 'udot v0.4s, v0.16b, v0.16b' + enabled i8mm && check_archext_insn i8mm 'usdot v0.4s, v0.16b, v0.16b' + + # Disable the main feature (e.g. HAVE_NEON) if neither inline nor external + # assembly support the feature out of the box. Skip this for the features + # checked with check_archext_insn above, as that function takes care of + # updating all the variables as necessary. + for v in $ARCH_EXT_LIST_ARM; do + is_in $v $archext_list && continue + enabled_any ${v}_external ${v}_inline || disable $v + done elif enabled alpha; then @@ -6022,6 +6082,12 @@ EOF warn "Compiler does not indicate floating-point ABI, guessing $fpabi." fi + # Test for various instruction sets, testing support both in inline and + # external assembly. This sets the ${v}_inline or ${v}_external flags + # if the instruction can be used unconditionally in either inline or + # external assembly. This means that if the ${v}_external feature is set, + # that feature can be used unconditionally in various support macros + # anywhere in external assembly, in any function. enabled armv5te && check_insn armv5te 'qadd r0, r0, r0' enabled armv6 && check_insn armv6 'sadd16 r0, r0, r0' enabled armv6t2 && check_insn armv6t2 'movt r0, #0' @@ -6030,6 +6096,14 @@ EOF enabled vfpv3 && check_insn vfpv3 'vmov.f32 s0, #1.0' enabled setend && check_insn setend 'setend be' + # If neither inline nor external assembly can use the feature by default, + # disable the main unsuffixed feature (e.g. HAVE_NEON). + # + # For targets that support runtime CPU feature detection, don't disable + # the main feature flag - there we assume that all supported toolchains + # can assemble code for all instruction set features (e.g. NEON) with + # suitable assembly flags (such as ".fpu neon"); we don't check + # specifically that they really do. [ $target_os = linux ] || [ $target_os = android ] || map 'enabled_any ${v}_external ${v}_inline || disable $v' \ $ARCH_EXT_LIST_ARM @@ -7610,6 +7684,8 @@ fi if enabled aarch64; then echo "NEON enabled ${neon-no}" echo "VFP enabled ${vfp-no}" + echo "DOTPROD enabled ${dotprod-no}" + echo "I8MM enabled ${i8mm-no}" fi if enabled arm; then echo "ARMv5TE enabled ${armv5te-no}" @@ -7900,6 +7976,9 @@ test -n "$assert_level" && test -n "$malloc_prefix" && echo "#define MALLOC_PREFIX $malloc_prefix" >>$TMPH +enabled aarch64 && + echo "#define AS_ARCH_LEVEL $as_arch_level" >>$TMPH + if enabled x86asm; then append config_files $TMPASM cat > $TMPASM <<EOF diff --git a/libavutil/aarch64/asm.S b/libavutil/aarch64/asm.S index a7782415d7..8589cf74fc 100644 --- a/libavutil/aarch64/asm.S +++ b/libavutil/aarch64/asm.S @@ -36,6 +36,17 @@ # define __has_feature(x) 0 #endif +#if HAVE_AS_ARCH_DIRECTIVE + .arch AS_ARCH_LEVEL +#endif + +#if HAVE_AS_ARCHEXT_DOTPROD_DIRECTIVE + .arch_extension dotprod +#endif +#if HAVE_AS_ARCHEXT_I8MM_DIRECTIVE + .arch_extension i8mm +#endif + /* Support macros for * - Armv8.3-A Pointer Authentication and -- 2.37.1 (Apple Git-137.1) _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* [FFmpeg-devel] [PATCH v2 2/5] aarch64: Add cpu flags for the dotprod and i8mm extensions 2023-05-30 12:30 [FFmpeg-devel] [PATCH v2 1/5] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Martin Storsjö @ 2023-05-30 12:30 ` Martin Storsjö 2023-05-30 12:30 ` [FFmpeg-devel] [PATCH v2 3/5] aarch64: Add Linux runtime cpu feature detection using getauxval(AT_HWCAP) Martin Storsjö ` (3 subsequent siblings) 4 siblings, 0 replies; 11+ messages in thread From: Martin Storsjö @ 2023-05-30 12:30 UTC (permalink / raw) To: ffmpeg-devel Set these available if they are available unconditionally for the compiler. --- Fixed the name of the __ARM_FEATURE define used for detecting i8mm. --- libavutil/aarch64/cpu.c | 15 ++++++++++++--- libavutil/aarch64/cpu.h | 2 ++ libavutil/cpu.c | 2 ++ libavutil/cpu.h | 2 ++ libavutil/tests/cpu.c | 2 ++ tests/checkasm/checkasm.c | 2 ++ 6 files changed, 22 insertions(+), 3 deletions(-) diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c index cc641da576..0c76f5ad15 100644 --- a/libavutil/aarch64/cpu.c +++ b/libavutil/aarch64/cpu.c @@ -22,9 +22,18 @@ int ff_get_cpu_flags_aarch64(void) { - return AV_CPU_FLAG_ARMV8 * HAVE_ARMV8 | - AV_CPU_FLAG_NEON * HAVE_NEON | - AV_CPU_FLAG_VFP * HAVE_VFP; + int flags = AV_CPU_FLAG_ARMV8 * HAVE_ARMV8 | + AV_CPU_FLAG_NEON * HAVE_NEON | + AV_CPU_FLAG_VFP * HAVE_VFP; + +#ifdef __ARM_FEATURE_DOTPROD + flags |= AV_CPU_FLAG_DOTPROD; +#endif +#ifdef __ARM_FEATURE_MATMUL_INT8 + flags |= AV_CPU_FLAG_I8MM; +#endif + + return flags; } size_t ff_get_cpu_max_align_aarch64(void) diff --git a/libavutil/aarch64/cpu.h b/libavutil/aarch64/cpu.h index 2ee3f9323a..64d703be37 100644 --- a/libavutil/aarch64/cpu.h +++ b/libavutil/aarch64/cpu.h @@ -25,5 +25,7 @@ #define have_armv8(flags) CPUEXT(flags, ARMV8) #define have_neon(flags) CPUEXT(flags, NEON) #define have_vfp(flags) CPUEXT(flags, VFP) +#define have_dotprod(flags) CPUEXT(flags, DOTPROD) +#define have_i8mm(flags) CPUEXT(flags, I8MM) #endif /* AVUTIL_AARCH64_CPU_H */ diff --git a/libavutil/cpu.c b/libavutil/cpu.c index 2c5f7f4958..2ffc3986aa 100644 --- a/libavutil/cpu.c +++ b/libavutil/cpu.c @@ -174,6 +174,8 @@ int av_parse_cpu_caps(unsigned *flags, const char *s) { "armv8", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_ARMV8 }, .unit = "flags" }, { "neon", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_NEON }, .unit = "flags" }, { "vfp", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_VFP }, .unit = "flags" }, + { "dotprod", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_DOTPROD }, .unit = "flags" }, + { "i8mm", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_I8MM }, .unit = "flags" }, #elif ARCH_MIPS { "mmi", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_MMI }, .unit = "flags" }, { "msa", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_MSA }, .unit = "flags" }, diff --git a/libavutil/cpu.h b/libavutil/cpu.h index 8fa5ea9199..da486f9c7a 100644 --- a/libavutil/cpu.h +++ b/libavutil/cpu.h @@ -69,6 +69,8 @@ #define AV_CPU_FLAG_NEON (1 << 5) #define AV_CPU_FLAG_ARMV8 (1 << 6) #define AV_CPU_FLAG_VFP_VM (1 << 7) ///< VFPv2 vector mode, deprecated in ARMv7-A and unavailable in various CPUs implementations +#define AV_CPU_FLAG_DOTPROD (1 << 8) +#define AV_CPU_FLAG_I8MM (1 << 9) #define AV_CPU_FLAG_SETEND (1 <<16) #define AV_CPU_FLAG_MMI (1 << 0) diff --git a/libavutil/tests/cpu.c b/libavutil/tests/cpu.c index dadadb31dc..a52637339d 100644 --- a/libavutil/tests/cpu.c +++ b/libavutil/tests/cpu.c @@ -38,6 +38,8 @@ static const struct { { AV_CPU_FLAG_ARMV8, "armv8" }, { AV_CPU_FLAG_NEON, "neon" }, { AV_CPU_FLAG_VFP, "vfp" }, + { AV_CPU_FLAG_DOTPROD, "dotprod" }, + { AV_CPU_FLAG_I8MM, "i8mm" }, #elif ARCH_ARM { AV_CPU_FLAG_ARMV5TE, "armv5te" }, { AV_CPU_FLAG_ARMV6, "armv6" }, diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 7389ebaee9..4311a8ffcb 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -230,6 +230,8 @@ static const struct { #if ARCH_AARCH64 { "ARMV8", "armv8", AV_CPU_FLAG_ARMV8 }, { "NEON", "neon", AV_CPU_FLAG_NEON }, + { "DOTPROD", "dotprod", AV_CPU_FLAG_DOTPROD }, + { "I8MM", "i8mm", AV_CPU_FLAG_I8MM }, #elif ARCH_ARM { "ARMV5TE", "armv5te", AV_CPU_FLAG_ARMV5TE }, { "ARMV6", "armv6", AV_CPU_FLAG_ARMV6 }, -- 2.37.1 (Apple Git-137.1) _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* [FFmpeg-devel] [PATCH v2 3/5] aarch64: Add Linux runtime cpu feature detection using getauxval(AT_HWCAP) 2023-05-30 12:30 [FFmpeg-devel] [PATCH v2 1/5] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Martin Storsjö 2023-05-30 12:30 ` [FFmpeg-devel] [PATCH v2 2/5] aarch64: Add cpu flags for the dotprod and i8mm extensions Martin Storsjö @ 2023-05-30 12:30 ` Martin Storsjö 2023-05-31 16:54 ` Rémi Denis-Courmont 2023-05-30 12:30 ` [FFmpeg-devel] [PATCH v2 4/5] aarch64: Add Apple runtime detection of dotprod and i8mm using sysctl Martin Storsjö ` (2 subsequent siblings) 4 siblings, 1 reply; 11+ messages in thread From: Martin Storsjö @ 2023-05-30 12:30 UTC (permalink / raw) To: ffmpeg-devel Based partially on code by Janne Grunau. --- Updated to use both the direct HWCAP* macros and HWCAP_CPUID. A not unreasonably old distribution like Ubuntu 20.04 does have HWCAP_CPUID but not HWCAP2_I8MM in the distribution provided headers. Alternatively I guess we could carry our own fallback hardcoded values for the HWCAP* values we use and skip HWCAP_CPUID. --- configure | 2 ++ libavutil/aarch64/cpu.c | 63 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) diff --git a/configure b/configure index 50eb27ba0e..b39de74de5 100755 --- a/configure +++ b/configure @@ -2209,6 +2209,7 @@ HAVE_LIST_PUB=" HEADERS_LIST=" arpa_inet_h + asm_hwcap_h asm_types_h cdio_paranoia_h cdio_paranoia_paranoia_h @@ -6432,6 +6433,7 @@ check_headers io.h enabled libdrm && check_headers linux/dma-buf.h +check_headers asm/hwcap.h check_headers linux/perf_event.h check_headers libcrystalhd/libcrystalhd_if.h check_headers malloc.h diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c index 0c76f5ad15..4563959ffd 100644 --- a/libavutil/aarch64/cpu.c +++ b/libavutil/aarch64/cpu.c @@ -20,6 +20,67 @@ #include "libavutil/cpu_internal.h" #include "config.h" +#if (defined(__linux__) || defined(__ANDROID__)) && HAVE_GETAUXVAL && HAVE_ASM_HWCAP_H +#include <stdint.h> +#include <asm/hwcap.h> +#include <sys/auxv.h> + +#define get_cpu_feature_reg(reg, val) \ + __asm__("mrs %0, " #reg : "=r" (val)) + +static int detect_flags(void) +{ + int flags = 0; + unsigned long hwcap, hwcap2; + + // Check for support using direct individual HWCAPs + hwcap = getauxval(AT_HWCAP); +#ifdef HWCAP_ASIMDDP + if (hwcap & HWCAP_ASIMDDP) + flags |= AV_CPU_FLAG_DOTPROD; +#endif + +#ifdef AT_HWCAP2 + hwcap2 = getauxval(AT_HWCAP2); +#ifdef HWCAP2_I8MM + if (hwcap2 & HWCAP2_I8MM) + flags |= AV_CPU_FLAG_I8MM; +#endif +#endif + + // Silence warnings if none of the hwcaps to check are known. + (void)hwcap; + (void)hwcap2; + +#if defined(HWCAP_CPUID) + // The HWCAP_* defines for individual extensions may become available late, as + // they require updates to userland headers. As a fallback, see if we can access + // the CPUID registers (trapped via the kernel). + // See https://www.kernel.org/doc/html/latest/arm64/cpu-feature-registers.html + if (hwcap & HWCAP_CPUID) { + uint64_t tmp; + + get_cpu_feature_reg(ID_AA64ISAR0_EL1, tmp); + if (((tmp >> 44) & 0xf) == 0x1) + flags |= AV_CPU_FLAG_DOTPROD; + get_cpu_feature_reg(ID_AA64ISAR1_EL1, tmp); + if (((tmp >> 52) & 0xf) == 0x1) + flags |= AV_CPU_FLAG_I8MM; + } +#endif + + return flags; +} + +#else + +static int detect_flags(void) +{ + return 0; +} + +#endif + int ff_get_cpu_flags_aarch64(void) { int flags = AV_CPU_FLAG_ARMV8 * HAVE_ARMV8 | @@ -33,6 +94,8 @@ int ff_get_cpu_flags_aarch64(void) flags |= AV_CPU_FLAG_I8MM; #endif + flags |= detect_flags(); + return flags; } -- 2.37.1 (Apple Git-137.1) _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH v2 3/5] aarch64: Add Linux runtime cpu feature detection using getauxval(AT_HWCAP) 2023-05-30 12:30 ` [FFmpeg-devel] [PATCH v2 3/5] aarch64: Add Linux runtime cpu feature detection using getauxval(AT_HWCAP) Martin Storsjö @ 2023-05-31 16:54 ` Rémi Denis-Courmont 2023-05-31 19:37 ` Martin Storsjö 0 siblings, 1 reply; 11+ messages in thread From: Rémi Denis-Courmont @ 2023-05-31 16:54 UTC (permalink / raw) To: ffmpeg-devel Le tiistaina 30. toukokuuta 2023, 15.30.41 EEST Martin Storsjö a écrit : > Based partially on code by Janne Grunau. > > --- > Updated to use both the direct HWCAP* macros and HWCAP_CPUID. A > not unreasonably old distribution like Ubuntu 20.04 does have > HWCAP_CPUID but not HWCAP2_I8MM in the distribution provided headers. > > Alternatively I guess we could carry our own fallback hardcoded values > for the HWCAP* values we use and skip HWCAP_CPUID. > --- > configure | 2 ++ > libavutil/aarch64/cpu.c | 63 +++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 65 insertions(+) > > diff --git a/configure b/configure > index 50eb27ba0e..b39de74de5 100755 > --- a/configure > +++ b/configure > @@ -2209,6 +2209,7 @@ HAVE_LIST_PUB=" > > HEADERS_LIST=" > arpa_inet_h > + asm_hwcap_h > asm_types_h > cdio_paranoia_h > cdio_paranoia_paranoia_h > @@ -6432,6 +6433,7 @@ check_headers io.h > enabled libdrm && > check_headers linux/dma-buf.h > > +check_headers asm/hwcap.h > check_headers linux/perf_event.h > check_headers libcrystalhd/libcrystalhd_if.h > check_headers malloc.h > diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c > index 0c76f5ad15..4563959ffd 100644 > --- a/libavutil/aarch64/cpu.c > +++ b/libavutil/aarch64/cpu.c > @@ -20,6 +20,67 @@ > #include "libavutil/cpu_internal.h" > #include "config.h" > > +#if (defined(__linux__) || defined(__ANDROID__)) && HAVE_GETAUXVAL && > HAVE_ASM_HWCAP_H +#include <stdint.h> > +#include <asm/hwcap.h> > +#include <sys/auxv.h> > + > +#define get_cpu_feature_reg(reg, val) \ > + __asm__("mrs %0, " #reg : "=r" (val)) > + > +static int detect_flags(void) > +{ > + int flags = 0; > + unsigned long hwcap, hwcap2; > + > + // Check for support using direct individual HWCAPs > + hwcap = getauxval(AT_HWCAP); > +#ifdef HWCAP_ASIMDDP > + if (hwcap & HWCAP_ASIMDDP) > + flags |= AV_CPU_FLAG_DOTPROD; > +#endif > + > +#ifdef AT_HWCAP2 > + hwcap2 = getauxval(AT_HWCAP2); > +#ifdef HWCAP2_I8MM > + if (hwcap2 & HWCAP2_I8MM) > + flags |= AV_CPU_FLAG_I8MM; > +#endif > +#endif > + > + // Silence warnings if none of the hwcaps to check are known. > + (void)hwcap; > + (void)hwcap2; > + > +#if defined(HWCAP_CPUID) > + // The HWCAP_* defines for individual extensions may become available > late, as > + // they require updates to userland headers. As a fallback, see if we can access > + // the CPUID registers (trapped via the kernel). > + // See https://www.kernel.org/doc/html/latest/arm64/cpu-feature-registers.html I don't actually care which method is used and whether to hard-code the missing constants or not. But doing both methods is weird. If you are going to trigger the TID3 traps anyway, there is no point checking the auxillary vectors before, AFAICT. You *could* check the auxillary vectors as a run-time fallback if HWCAP_CPUID is *not* set, but that only really makes for HWCAP_FP and HWCAP_ASIMD, not for HWCAP_ASIMDDP (Linux 4.15) and HWCAP2_I8MM (Linux 5.6) which are more recent than HWCAP_CPUID (Linux 4.11). And then, that would be only in the corner case that FP and/or AdvSIMD were explicitly disabled since they are on by default for all AArch64 targets. -- Реми Дёни-Курмон http://www.remlab.net/ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH v2 3/5] aarch64: Add Linux runtime cpu feature detection using getauxval(AT_HWCAP) 2023-05-31 16:54 ` Rémi Denis-Courmont @ 2023-05-31 19:37 ` Martin Storsjö 0 siblings, 0 replies; 11+ messages in thread From: Martin Storsjö @ 2023-05-31 19:37 UTC (permalink / raw) To: FFmpeg development discussions and patches On Wed, 31 May 2023, Rémi Denis-Courmont wrote: > Le tiistaina 30. toukokuuta 2023, 15.30.41 EEST Martin Storsjö a écrit : >> Based partially on code by Janne Grunau. >> >> --- >> Updated to use both the direct HWCAP* macros and HWCAP_CPUID. A >> not unreasonably old distribution like Ubuntu 20.04 does have >> HWCAP_CPUID but not HWCAP2_I8MM in the distribution provided headers. >> >> Alternatively I guess we could carry our own fallback hardcoded values >> for the HWCAP* values we use and skip HWCAP_CPUID. >> --- >> configure | 2 ++ >> libavutil/aarch64/cpu.c | 63 +++++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 65 insertions(+) >> >> diff --git a/configure b/configure >> index 50eb27ba0e..b39de74de5 100755 >> --- a/configure >> +++ b/configure >> @@ -2209,6 +2209,7 @@ HAVE_LIST_PUB=" >> >> HEADERS_LIST=" >> arpa_inet_h >> + asm_hwcap_h >> asm_types_h >> cdio_paranoia_h >> cdio_paranoia_paranoia_h >> @@ -6432,6 +6433,7 @@ check_headers io.h >> enabled libdrm && >> check_headers linux/dma-buf.h >> >> +check_headers asm/hwcap.h >> check_headers linux/perf_event.h >> check_headers libcrystalhd/libcrystalhd_if.h >> check_headers malloc.h >> diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c >> index 0c76f5ad15..4563959ffd 100644 >> --- a/libavutil/aarch64/cpu.c >> +++ b/libavutil/aarch64/cpu.c >> @@ -20,6 +20,67 @@ >> #include "libavutil/cpu_internal.h" >> #include "config.h" >> >> +#if (defined(__linux__) || defined(__ANDROID__)) && HAVE_GETAUXVAL && >> HAVE_ASM_HWCAP_H +#include <stdint.h> >> +#include <asm/hwcap.h> >> +#include <sys/auxv.h> >> + >> +#define get_cpu_feature_reg(reg, val) \ >> + __asm__("mrs %0, " #reg : "=r" (val)) >> + >> +static int detect_flags(void) >> +{ >> + int flags = 0; >> + unsigned long hwcap, hwcap2; >> + >> + // Check for support using direct individual HWCAPs >> + hwcap = getauxval(AT_HWCAP); >> +#ifdef HWCAP_ASIMDDP >> + if (hwcap & HWCAP_ASIMDDP) >> + flags |= AV_CPU_FLAG_DOTPROD; >> +#endif >> + >> +#ifdef AT_HWCAP2 >> + hwcap2 = getauxval(AT_HWCAP2); >> +#ifdef HWCAP2_I8MM >> + if (hwcap2 & HWCAP2_I8MM) >> + flags |= AV_CPU_FLAG_I8MM; >> +#endif >> +#endif >> + >> + // Silence warnings if none of the hwcaps to check are known. >> + (void)hwcap; >> + (void)hwcap2; >> + >> +#if defined(HWCAP_CPUID) >> + // The HWCAP_* defines for individual extensions may become available >> late, as >> + // they require updates to userland headers. As a fallback, see if we > can access >> + // the CPUID registers (trapped via the kernel). >> + // See https://www.kernel.org/doc/html/latest/arm64/cpu-feature-registers.html > > I don't actually care which method is used and whether to hard-code the > missing constants or not. But doing both methods is weird. If you are going to > trigger the TID3 traps anyway, there is no point checking the auxillary > vectors before, AFAICT. Yeah, that's true. > You *could* check the auxillary vectors as a run-time fallback if HWCAP_CPUID > is *not* set, but that only really makes for HWCAP_FP and HWCAP_ASIMD, not for > HWCAP_ASIMDDP (Linux 4.15) and HWCAP2_I8MM (Linux 5.6) which are more recent > than HWCAP_CPUID (Linux 4.11). And then, that would be only in the corner case > that FP and/or AdvSIMD were explicitly disabled since they are on by default > for all AArch64 targets. Yeah - I guess there's no potential configuration where a kernel does know about HWCAP_CPUID and newer HWCAPs but has decided to set HWCAP_CPUID to 0 and not handle the trapping? I considered falling back on the trapping CPUID codepath only if the individual HWCAPs weren't detected/supported, but that soon becomes quite a mess if we're adding more than a couple extensions. So I guess after all that it's simplest to just go with CPUID, possibly with a code comment that we could go with individual HWCAPs at some point in the future if we want to simplify things and don't care about older systems/toolchains. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* [FFmpeg-devel] [PATCH v2 4/5] aarch64: Add Apple runtime detection of dotprod and i8mm using sysctl 2023-05-30 12:30 [FFmpeg-devel] [PATCH v2 1/5] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Martin Storsjö 2023-05-30 12:30 ` [FFmpeg-devel] [PATCH v2 2/5] aarch64: Add cpu flags for the dotprod and i8mm extensions Martin Storsjö 2023-05-30 12:30 ` [FFmpeg-devel] [PATCH v2 3/5] aarch64: Add Linux runtime cpu feature detection using getauxval(AT_HWCAP) Martin Storsjö @ 2023-05-30 12:30 ` Martin Storsjö 2023-05-30 12:30 ` [FFmpeg-devel] [PATCH v2 5/5] aarch64: Add Windows runtime detection of the dotprod instructions Martin Storsjö 2023-06-06 10:25 ` [FFmpeg-devel] [PATCH v2 1/5] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Martin Storsjö 4 siblings, 0 replies; 11+ messages in thread From: Martin Storsjö @ 2023-05-30 12:30 UTC (permalink / raw) To: ffmpeg-devel For now, there's not much value in this since Clang don't support enabling the dotprod or i8mm features with either .arch_extension or .arch (it has to be enabled by the base arch flags passed to the compiler). But it may be supported in the future. --- configure | 2 ++ libavutil/aarch64/cpu.c | 22 ++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/configure b/configure index b39de74de5..001287c169 100755 --- a/configure +++ b/configure @@ -2348,6 +2348,7 @@ SYSTEM_FUNCS=" strerror_r sysconf sysctl + sysctlbyname usleep UTGetOSTypeFromString VirtualAlloc @@ -6394,6 +6395,7 @@ check_func_headers mach/mach_time.h mach_absolute_time check_func_headers stdlib.h getenv check_func_headers sys/stat.h lstat check_func_headers sys/auxv.h getauxval +check_func_headers sys/sysctl.h sysctlbyname check_func_headers windows.h GetModuleHandle check_func_headers windows.h GetProcessAffinityMask diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c index 4563959ffd..ffb00f6dd2 100644 --- a/libavutil/aarch64/cpu.c +++ b/libavutil/aarch64/cpu.c @@ -72,6 +72,28 @@ static int detect_flags(void) return flags; } +#elif defined(__APPLE__) && HAVE_SYSCTLBYNAME +#include <sys/sysctl.h> + +static int detect_flags(void) +{ + uint32_t value = 0; + size_t size; + int flags = 0; + + size = sizeof(value); + if (!sysctlbyname("hw.optional.arm.FEAT_DotProd", &value, &size, NULL, 0)) { + if (value) + flags |= AV_CPU_FLAG_DOTPROD; + } + size = sizeof(value); + if (!sysctlbyname("hw.optional.arm.FEAT_I8MM", &value, &size, NULL, 0)) { + if (value) + flags |= AV_CPU_FLAG_I8MM; + } + return flags; +} + #else static int detect_flags(void) -- 2.37.1 (Apple Git-137.1) _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* [FFmpeg-devel] [PATCH v2 5/5] aarch64: Add Windows runtime detection of the dotprod instructions 2023-05-30 12:30 [FFmpeg-devel] [PATCH v2 1/5] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Martin Storsjö ` (2 preceding siblings ...) 2023-05-30 12:30 ` [FFmpeg-devel] [PATCH v2 4/5] aarch64: Add Apple runtime detection of dotprod and i8mm using sysctl Martin Storsjö @ 2023-05-30 12:30 ` Martin Storsjö 2023-06-03 20:51 ` Martin Storsjö 2023-06-05 17:36 ` James Zern 2023-06-06 10:25 ` [FFmpeg-devel] [PATCH v2 1/5] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Martin Storsjö 4 siblings, 2 replies; 11+ messages in thread From: Martin Storsjö @ 2023-05-30 12:30 UTC (permalink / raw) To: ffmpeg-devel For Windows, there's no publicly defined constant for checking for the i8mm extension yet. --- libavutil/aarch64/cpu.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c index ffb00f6dd2..4b97530240 100644 --- a/libavutil/aarch64/cpu.c +++ b/libavutil/aarch64/cpu.c @@ -94,6 +94,16 @@ static int detect_flags(void) return flags; } +#elif defined(_WIN32) +#include <windows.h> + +static int detect_flags(void) +{ + int flags = 0; + if (IsProcessorFeaturePresent(PF_ARM_V82_DP_INSTRUCTIONS_AVAILABLE)) + flags |= AV_CPU_FLAG_DOTPROD; + return flags; +} #else static int detect_flags(void) -- 2.37.1 (Apple Git-137.1) _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH v2 5/5] aarch64: Add Windows runtime detection of the dotprod instructions 2023-05-30 12:30 ` [FFmpeg-devel] [PATCH v2 5/5] aarch64: Add Windows runtime detection of the dotprod instructions Martin Storsjö @ 2023-06-03 20:51 ` Martin Storsjö 2023-06-05 17:36 ` James Zern 1 sibling, 0 replies; 11+ messages in thread From: Martin Storsjö @ 2023-06-03 20:51 UTC (permalink / raw) To: ffmpeg-devel On Tue, 30 May 2023, Martin Storsjö wrote: > For Windows, there's no publicly defined constant for checking for > the i8mm extension yet. > --- > libavutil/aarch64/cpu.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) If there's no objections or further comments on this patchset, I'll push it early next week (with the unnecessary checks in patch 3/5 removed and a comment amended). // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH v2 5/5] aarch64: Add Windows runtime detection of the dotprod instructions 2023-05-30 12:30 ` [FFmpeg-devel] [PATCH v2 5/5] aarch64: Add Windows runtime detection of the dotprod instructions Martin Storsjö 2023-06-03 20:51 ` Martin Storsjö @ 2023-06-05 17:36 ` James Zern 2023-06-06 9:32 ` Martin Storsjö 1 sibling, 1 reply; 11+ messages in thread From: James Zern @ 2023-06-05 17:36 UTC (permalink / raw) To: FFmpeg development discussions and patches On Tue, May 30, 2023 at 5:31 AM Martin Storsjö <martin@martin.st> wrote: > > For Windows, there's no publicly defined constant for checking for > the i8mm extension yet. > --- > libavutil/aarch64/cpu.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c > index ffb00f6dd2..4b97530240 100644 > --- a/libavutil/aarch64/cpu.c > +++ b/libavutil/aarch64/cpu.c > @@ -94,6 +94,16 @@ static int detect_flags(void) > return flags; > } > > +#elif defined(_WIN32) > +#include <windows.h> > + > +static int detect_flags(void) > +{ > + int flags = 0; > + if (IsProcessorFeaturePresent(PF_ARM_V82_DP_INSTRUCTIONS_AVAILABLE)) I think this requires a recent Windows SDK, is compatibility a concern? lgtm otherwise. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH v2 5/5] aarch64: Add Windows runtime detection of the dotprod instructions 2023-06-05 17:36 ` James Zern @ 2023-06-06 9:32 ` Martin Storsjö 0 siblings, 0 replies; 11+ messages in thread From: Martin Storsjö @ 2023-06-06 9:32 UTC (permalink / raw) To: FFmpeg development discussions and patches On Mon, 5 Jun 2023, James Zern wrote: > On Tue, May 30, 2023 at 5:31 AM Martin Storsjö <martin@martin.st> wrote: >> >> For Windows, there's no publicly defined constant for checking for >> the i8mm extension yet. >> --- >> libavutil/aarch64/cpu.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c >> index ffb00f6dd2..4b97530240 100644 >> --- a/libavutil/aarch64/cpu.c >> +++ b/libavutil/aarch64/cpu.c >> @@ -94,6 +94,16 @@ static int detect_flags(void) >> return flags; >> } >> >> +#elif defined(_WIN32) >> +#include <windows.h> >> + >> +static int detect_flags(void) >> +{ >> + int flags = 0; >> + if (IsProcessorFeaturePresent(PF_ARM_V82_DP_INSTRUCTIONS_AVAILABLE)) > > I think this requires a recent Windows SDK, is compatibility a concern? Good catch, thanks! Yes, this requires a fairly recent SDK; it's been added to mingw-w64 in 2021, and in the MS WinSDK since 10.0.22000.0. So this would indeed have failed in my own fate config with an older MSVC... It's easy to wrap this up in an #ifdef though, since the identifier is a define (both in mingw-w64 and WinSDK). Other than that, this code is only used for aarch64, so the target audience is much smaller than generic windows code, but nevertheless, let's not break older toolchains if we don't need to. I'll amend the patch locally with an ifdef before pushing. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH v2 1/5] configure: aarch64: Support assembling the dotprod and i8mm arch extensions 2023-05-30 12:30 [FFmpeg-devel] [PATCH v2 1/5] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Martin Storsjö ` (3 preceding siblings ...) 2023-05-30 12:30 ` [FFmpeg-devel] [PATCH v2 5/5] aarch64: Add Windows runtime detection of the dotprod instructions Martin Storsjö @ 2023-06-06 10:25 ` Martin Storsjö 4 siblings, 0 replies; 11+ messages in thread From: Martin Storsjö @ 2023-06-06 10:25 UTC (permalink / raw) To: ffmpeg-devel On Tue, 30 May 2023, Martin Storsjö wrote: > Current clang versions fail to support the dotprod and i8mm > features in the .arch_extension directive, but do support them > if enabled with -march=armv8.4-a on the command line. (Curiously, > lowering the arch level with ".arch armv8.2-a" doesn't make the > extensions unavailable if they were enabled with -march; if that > changes, Clang should also learn to support these extensions via > .arch_extension for them to remain usable here.) FWIW, since today, Clang does support enabling these extensions with both these extensions, see https://github.com/llvm/llvm-project/commit/4b8d9abca7d0280878fb12de331e688ee85d7cd8 and https://github.com/llvm/llvm-project/commit/4b8d9abca7d0280878fb12de331e688ee85d7cd8. It turns out that it is possible to enable these extensions with older Clang via assembly too, but due to a bug, it would require using e.g. ".arch armv8.6-a+crc" (it requires using a "+<ext>" for any random unrelated extension). I won't try to support using that in our assembly, as the proper mechanism should be supported going forward. As there was no further opposition, I'll push this patchset now with the last modifications that were suggested. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-06-06 10:25 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-05-30 12:30 [FFmpeg-devel] [PATCH v2 1/5] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Martin Storsjö 2023-05-30 12:30 ` [FFmpeg-devel] [PATCH v2 2/5] aarch64: Add cpu flags for the dotprod and i8mm extensions Martin Storsjö 2023-05-30 12:30 ` [FFmpeg-devel] [PATCH v2 3/5] aarch64: Add Linux runtime cpu feature detection using getauxval(AT_HWCAP) Martin Storsjö 2023-05-31 16:54 ` Rémi Denis-Courmont 2023-05-31 19:37 ` Martin Storsjö 2023-05-30 12:30 ` [FFmpeg-devel] [PATCH v2 4/5] aarch64: Add Apple runtime detection of dotprod and i8mm using sysctl Martin Storsjö 2023-05-30 12:30 ` [FFmpeg-devel] [PATCH v2 5/5] aarch64: Add Windows runtime detection of the dotprod instructions Martin Storsjö 2023-06-03 20:51 ` Martin Storsjö 2023-06-05 17:36 ` James Zern 2023-06-06 9:32 ` Martin Storsjö 2023-06-06 10:25 ` [FFmpeg-devel] [PATCH v2 1/5] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Martin Storsjö
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git