Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
* [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions
@ 2023-05-26  8:03 Martin Storsjö
  2023-05-26  8:03 ` [FFmpeg-devel] [PATCH 2/4] aarch64: Add cpu flags for the dotprod and i8mm extensions Martin Storsjö
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Martin Storsjö @ 2023-05-26  8:03 UTC (permalink / raw)
  To: ffmpeg-devel

These are available since ARMv8.4-a and ARMv8.6-a respectively,
but can also be available optionally since ARMv8.2-a.

Check if these are available for use unconditionally (e.g. if compiling
with -march=armv8.6-a), or if they can be enabled with specific
assembler directives.

Use ".arch_extension <ext>" for enabling a specific extension in
assembly; the same can also be achieved with ".arch armv8.2-a+<ext>",
but with .arch_extension is easier to combine multiple separate
features.

Enabling these extensions requires setting a base architecture level
of armv8.2-a with .arch. Don't add ".arch armv8.2-a" unless necessary;
if the base level is high enough (which might unlock other extensions
without .arch_extension), we don't want to lower it.

Only add .arch/.arch_extension if needed, e.g. current clang fails
to recognize the dotprod and i8mm features in .arch_extension, but
can successfully assemble these instructions if part of the baseline
set with -march.
---
 configure               | 77 ++++++++++++++++++++++++++++++++++++++++-
 libavutil/aarch64/asm.S | 13 +++++++
 2 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 87f7afc2e1..3c7473efb2 100755
--- a/configure
+++ b/configure
@@ -454,6 +454,8 @@ Optimization options (experts only):
   --disable-armv6t2        disable armv6t2 optimizations
   --disable-vfp            disable VFP optimizations
   --disable-neon           disable NEON optimizations
+  --disable-dotprod        disable DOTPROD optimizations
+  --disable-i8mm           disable I8MM optimizations
   --disable-inline-asm     disable use of inline assembly
   --disable-x86asm         disable use of standalone x86 assembly
   --disable-mipsdsp        disable MIPS DSP ASE R1 optimizations
@@ -1154,6 +1156,41 @@ check_insn(){
     check_as ${1}_external "$2"
 }
 
+check_archext_insn(){
+    log check_archext_insn "$@"
+    feature="$1"
+    base_arch="$2"
+    archext="$3"
+    instr="$4"
+    # Check if the assembly is accepted unconditionally in either inline or
+    # external assembly.
+    check_inline_asm ${feature}_inline "\"$instr\""
+    check_as ${feature}_external "$instr"
+
+    enabled_any ${feature}_inline ${feature}_external || disable ${feature}
+
+    if disabled ${feature}_external; then
+        # If not accepted unconditionally, check if we can assemble it
+        # with a suitable .arch_extension directive.
+        test_as <<EOF && enable ${feature} as_archext_${archext}_directive
+.arch_extension $archext
+$instr
+EOF
+        if disabled ${feature}; then
+            # If the base arch level is too low, .arch_extension can require setting
+            # a higher arch level with .arch too. Only do this if strictly needed;
+            # if the base level is e.g. arvm8.4-a and some features are available
+            # without any .arch_extension, we don't want to set ".arch armv8.2-a"
+            # for some other .arch_extension.
+            test_as <<EOF && enable ${feature} as_archext_${archext}_directive as_archext_${archext}_needs_arch
+.arch $base_arch
+.arch_extension $archext
+$instr
+EOF
+        fi
+    fi
+}
+
 check_x86asm(){
     log check_x86asm "$@"
     name=$1
@@ -2059,6 +2096,8 @@ ARCH_EXT_LIST_ARM="
     armv6
     armv6t2
     armv8
+    dotprod
+    i8mm
     neon
     vfp
     vfpv3
@@ -2322,6 +2361,10 @@ SYSTEM_LIBRARIES="
 
 TOOLCHAIN_FEATURES="
     as_arch_directive
+    as_archext_dotprod_directive
+    as_archext_dotprod_needs_arch
+    as_archext_i8mm_directive
+    as_archext_i8mm_needs_arch
     as_dn_directive
     as_fpu_directive
     as_func
@@ -2622,6 +2665,8 @@ intrinsics_neon_deps="neon"
 vfp_deps_any="aarch64 arm"
 vfpv3_deps="vfp"
 setend_deps="arm"
+dotprod_deps="aarch64 neon"
+i8mm_deps="aarch64 neon"
 
 map 'eval ${v}_inline_deps=inline_asm' $ARCH_EXT_LIST_ARM
 
@@ -5979,12 +6024,26 @@ check_inline_asm inline_asm_labels '"1:\n"'
 check_inline_asm inline_asm_nonlocal_labels '"Label:\n"'
 
 if enabled aarch64; then
+    check_as as_arch_directive ".arch armv8.2-a"
+
     enabled armv8 && check_insn armv8 'prfm   pldl1strm, [x0]'
     # internal assembler in clang 3.3 does not support this instruction
     enabled neon && check_insn neon 'ext   v0.8B, v0.8B, v1.8B, #1'
     enabled vfp  && check_insn vfp  'fmadd d0,    d0,    d1,    d2'
 
-    map 'enabled_any ${v}_external ${v}_inline || disable $v' $ARCH_EXT_LIST_ARM
+    archext_list="dotprod i8mm"
+    enabled dotprod && check_archext_insn dotprod armv8.2-a dotprod 'udot v0.4s, v0.16b, v0.16b'
+    enabled i8mm    && check_archext_insn i8mm    armv8.2-a i8mm    'usdot v0.4s, v0.16b, v0.16b'
+
+    # Disable the main feature (e.g. HAVE_NEON) if neither inline nor external
+    # assembly support the feature out of the box. Skip this for the features
+    # checked with check_archext_insn above; they are checked separately whether
+    # they can be built out of the box or enabled with an .arch_extension
+    # flag.
+    for v in $ARCH_EXT_LIST_ARM; do
+        is_in $v $archext_list && continue
+        enabled_any ${v}_external ${v}_inline || disable $v
+    done
 
 elif enabled alpha; then
 
@@ -6013,6 +6072,12 @@ EOF
         warn "Compiler does not indicate floating-point ABI, guessing $fpabi."
     fi
 
+    # Test for various instruction sets, testing support both in inline and
+    # external assembly. This sets the ${v}_inline or ${v}_external flags
+    # if the instruction can be used unconditionally in either inline or
+    # external assembly. This means that if the ${v}_external feature is set,
+    # that feature can be used unconditionally in various support macros
+    # anywhere in external assembly, in any function.
     enabled armv5te && check_insn armv5te 'qadd r0, r0, r0'
     enabled armv6   && check_insn armv6   'sadd16 r0, r0, r0'
     enabled armv6t2 && check_insn armv6t2 'movt r0, #0'
@@ -6021,6 +6086,14 @@ EOF
     enabled vfpv3   && check_insn vfpv3   'vmov.f32 s0, #1.0'
     enabled setend  && check_insn setend  'setend be'
 
+    # If neither inline nor external assembly can use the feature by default,
+    # disable the main unsuffixed feature (e.g. HAVE_NEON).
+    #
+    # For targets that support runtime CPU feature detection, don't disable
+    # the main feature flag - there we assume that all supported toolchains
+    # can assemble code for all instruction set features (e.g. NEON) with
+    # suitable assembly flags (such as ".fpu neon"); we don't check
+    # specifically that they really do.
     [ $target_os = linux ] || [ $target_os = android ] ||
         map 'enabled_any ${v}_external ${v}_inline || disable $v' \
             $ARCH_EXT_LIST_ARM
@@ -7601,6 +7674,8 @@ fi
 if enabled aarch64; then
     echo "NEON enabled              ${neon-no}"
     echo "VFP enabled               ${vfp-no}"
+    echo "DOTPROD enabled           ${dotprod-no}"
+    echo "I8MM enabled              ${i8mm-no}"
 fi
 if enabled arm; then
     echo "ARMv5TE enabled           ${armv5te-no}"
diff --git a/libavutil/aarch64/asm.S b/libavutil/aarch64/asm.S
index a7782415d7..7cf907f93c 100644
--- a/libavutil/aarch64/asm.S
+++ b/libavutil/aarch64/asm.S
@@ -36,6 +36,19 @@
 #   define __has_feature(x) 0
 #endif
 
+#if HAVE_AS_ARCH_DIRECTIVE
+#if HAVE_AS_ARCHEXT_DOTPROD_NEEDS_ARCH || HAVE_AS_ARCHEXT_I8MM_NEEDS_ARCH
+        .arch           armv8.2-a
+#endif
+#endif
+
+#if HAVE_AS_ARCHEXT_DOTPROD_DIRECTIVE
+        .arch_extension dotprod
+#endif
+#if HAVE_AS_ARCHEXT_I8MM_DIRECTIVE
+        .arch_extension i8mm
+#endif
+
 
 /* Support macros for
  *   - Armv8.3-A Pointer Authentication and
-- 
2.37.1 (Apple Git-137.1)

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [FFmpeg-devel] [PATCH 2/4] aarch64: Add cpu flags for the dotprod and i8mm extensions
  2023-05-26  8:03 [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Martin Storsjö
@ 2023-05-26  8:03 ` Martin Storsjö
  2023-05-26  8:03 ` [FFmpeg-devel] [PATCH 3/4] aarch64: Add linux runtime cpu feature detection using getauxval(AT_HWCAP) Martin Storsjö
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Martin Storsjö @ 2023-05-26  8:03 UTC (permalink / raw)
  To: ffmpeg-devel

Set these available if they are available unconditionally for
the compiler.
---
 libavutil/aarch64/cpu.c   | 15 ++++++++++++---
 libavutil/aarch64/cpu.h   |  2 ++
 libavutil/cpu.c           |  2 ++
 libavutil/cpu.h           |  2 ++
 libavutil/tests/cpu.c     |  2 ++
 tests/checkasm/checkasm.c |  2 ++
 6 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c
index cc641da576..42b33e4a2d 100644
--- a/libavutil/aarch64/cpu.c
+++ b/libavutil/aarch64/cpu.c
@@ -22,9 +22,18 @@
 
 int ff_get_cpu_flags_aarch64(void)
 {
-    return AV_CPU_FLAG_ARMV8 * HAVE_ARMV8 |
-           AV_CPU_FLAG_NEON  * HAVE_NEON  |
-           AV_CPU_FLAG_VFP   * HAVE_VFP;
+    int flags = AV_CPU_FLAG_ARMV8 * HAVE_ARMV8 |
+                AV_CPU_FLAG_NEON  * HAVE_NEON  |
+                AV_CPU_FLAG_VFP   * HAVE_VFP;
+
+#ifdef __ARM_FEATURE_DOTPROD
+    flags |= AV_CPU_FLAG_DOTPROD;
+#endif
+#ifdef __ARM_FEATURE_I8MM
+    flags |= AV_CPU_FLAG_I8MM;
+#endif
+
+    return flags;
 }
 
 size_t ff_get_cpu_max_align_aarch64(void)
diff --git a/libavutil/aarch64/cpu.h b/libavutil/aarch64/cpu.h
index 2ee3f9323a..64d703be37 100644
--- a/libavutil/aarch64/cpu.h
+++ b/libavutil/aarch64/cpu.h
@@ -25,5 +25,7 @@
 #define have_armv8(flags) CPUEXT(flags, ARMV8)
 #define have_neon(flags) CPUEXT(flags, NEON)
 #define have_vfp(flags)  CPUEXT(flags, VFP)
+#define have_dotprod(flags) CPUEXT(flags, DOTPROD)
+#define have_i8mm(flags)    CPUEXT(flags, I8MM)
 
 #endif /* AVUTIL_AARCH64_CPU_H */
diff --git a/libavutil/cpu.c b/libavutil/cpu.c
index 2c5f7f4958..2ffc3986aa 100644
--- a/libavutil/cpu.c
+++ b/libavutil/cpu.c
@@ -174,6 +174,8 @@ int av_parse_cpu_caps(unsigned *flags, const char *s)
         { "armv8",    NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_ARMV8    },    .unit = "flags" },
         { "neon",     NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_NEON     },    .unit = "flags" },
         { "vfp",      NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_VFP      },    .unit = "flags" },
+        { "dotprod",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_DOTPROD  },    .unit = "flags" },
+        { "i8mm",     NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_I8MM     },    .unit = "flags" },
 #elif ARCH_MIPS
         { "mmi",      NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_MMI      },    .unit = "flags" },
         { "msa",      NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_MSA      },    .unit = "flags" },
diff --git a/libavutil/cpu.h b/libavutil/cpu.h
index 8fa5ea9199..da486f9c7a 100644
--- a/libavutil/cpu.h
+++ b/libavutil/cpu.h
@@ -69,6 +69,8 @@
 #define AV_CPU_FLAG_NEON         (1 << 5)
 #define AV_CPU_FLAG_ARMV8        (1 << 6)
 #define AV_CPU_FLAG_VFP_VM       (1 << 7) ///< VFPv2 vector mode, deprecated in ARMv7-A and unavailable in various CPUs implementations
+#define AV_CPU_FLAG_DOTPROD      (1 << 8)
+#define AV_CPU_FLAG_I8MM         (1 << 9)
 #define AV_CPU_FLAG_SETEND       (1 <<16)
 
 #define AV_CPU_FLAG_MMI          (1 << 0)
diff --git a/libavutil/tests/cpu.c b/libavutil/tests/cpu.c
index dadadb31dc..a52637339d 100644
--- a/libavutil/tests/cpu.c
+++ b/libavutil/tests/cpu.c
@@ -38,6 +38,8 @@ static const struct {
     { AV_CPU_FLAG_ARMV8,     "armv8"      },
     { AV_CPU_FLAG_NEON,      "neon"       },
     { AV_CPU_FLAG_VFP,       "vfp"        },
+    { AV_CPU_FLAG_DOTPROD,   "dotprod"    },
+    { AV_CPU_FLAG_I8MM,      "i8mm"       },
 #elif ARCH_ARM
     { AV_CPU_FLAG_ARMV5TE,   "armv5te"    },
     { AV_CPU_FLAG_ARMV6,     "armv6"      },
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 7389ebaee9..4311a8ffcb 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -230,6 +230,8 @@ static const struct {
 #if   ARCH_AARCH64
     { "ARMV8",    "armv8",    AV_CPU_FLAG_ARMV8 },
     { "NEON",     "neon",     AV_CPU_FLAG_NEON },
+    { "DOTPROD",  "dotprod",  AV_CPU_FLAG_DOTPROD },
+    { "I8MM",     "i8mm",     AV_CPU_FLAG_I8MM },
 #elif ARCH_ARM
     { "ARMV5TE",  "armv5te",  AV_CPU_FLAG_ARMV5TE },
     { "ARMV6",    "armv6",    AV_CPU_FLAG_ARMV6 },
-- 
2.37.1 (Apple Git-137.1)

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [FFmpeg-devel] [PATCH 3/4] aarch64: Add linux runtime cpu feature detection using getauxval(AT_HWCAP)
  2023-05-26  8:03 [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Martin Storsjö
  2023-05-26  8:03 ` [FFmpeg-devel] [PATCH 2/4] aarch64: Add cpu flags for the dotprod and i8mm extensions Martin Storsjö
@ 2023-05-26  8:03 ` Martin Storsjö
  2023-05-27  9:04   ` Rémi Denis-Courmont
  2023-05-26  8:03 ` [FFmpeg-devel] [PATCH 4/4] aarch64: Add Apple runtime detection of dotprod and i8mm using sysctl Martin Storsjö
  2023-05-27  8:44 ` [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Rémi Denis-Courmont
  3 siblings, 1 reply; 13+ messages in thread
From: Martin Storsjö @ 2023-05-26  8:03 UTC (permalink / raw)
  To: ffmpeg-devel

Based on code by Janne Grunau.

Using HWCAP_CPUID for user space access to the CPU feature registers. See
https://www.kernel.org/doc/html/latest/arm64/cpu-feature-registers.html.
---
 configure               |  2 ++
 libavutil/aarch64/cpu.c | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/configure b/configure
index 3c7473efb2..b5357b8d27 100755
--- a/configure
+++ b/configure
@@ -2207,6 +2207,7 @@ HAVE_LIST_PUB="
 
 HEADERS_LIST="
     arpa_inet_h
+    asm_hwcap_h
     asm_types_h
     cdio_paranoia_h
     cdio_paranoia_paranoia_h
@@ -6422,6 +6423,7 @@ check_headers io.h
 enabled libdrm &&
     check_headers linux/dma-buf.h
 
+check_headers asm/hwcap.h
 check_headers linux/perf_event.h
 check_headers libcrystalhd/libcrystalhd_if.h
 check_headers malloc.h
diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c
index 42b33e4a2d..34c838c2f5 100644
--- a/libavutil/aarch64/cpu.c
+++ b/libavutil/aarch64/cpu.c
@@ -20,6 +20,42 @@
 #include "libavutil/cpu_internal.h"
 #include "config.h"
 
+#if (defined(__linux__) || defined(__ANDROID__)) && HAVE_GETAUXVAL && HAVE_ASM_HWCAP_H
+#include <stdint.h>
+#include <asm/hwcap.h>
+#include <sys/auxv.h>
+
+#define get_cpu_feature_reg(reg, val) \
+        __asm__("mrs %0, " #reg : "=r" (val))
+
+static int detect_flags(void)
+{
+    unsigned long ret = getauxval(AT_HWCAP);
+    int flags = 0;
+#if defined(HWCAP_CPUID)
+    uint64_t tmp;
+    if (!(ret & HWCAP_CPUID))
+        return flags;
+    get_cpu_feature_reg(ID_AA64ISAR0_EL1, tmp);
+    if (((tmp >> 44) & 0xf) == 0x1)
+        flags |= AV_CPU_FLAG_DOTPROD;
+    get_cpu_feature_reg(ID_AA64ISAR1_EL1, tmp);
+    if (((tmp >> 52) & 0xf) == 0x1)
+        flags |= AV_CPU_FLAG_I8MM;
+#endif
+
+    return flags;
+}
+
+#else
+
+static int detect_flags(void)
+{
+    return 0;
+}
+
+#endif
+
 int ff_get_cpu_flags_aarch64(void)
 {
     int flags = AV_CPU_FLAG_ARMV8 * HAVE_ARMV8 |
@@ -33,6 +69,8 @@ int ff_get_cpu_flags_aarch64(void)
     flags |= AV_CPU_FLAG_I8MM;
 #endif
 
+    flags |= detect_flags();
+
     return flags;
 }
 
-- 
2.37.1 (Apple Git-137.1)

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [FFmpeg-devel] [PATCH 4/4] aarch64: Add Apple runtime detection of dotprod and i8mm using sysctl
  2023-05-26  8:03 [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Martin Storsjö
  2023-05-26  8:03 ` [FFmpeg-devel] [PATCH 2/4] aarch64: Add cpu flags for the dotprod and i8mm extensions Martin Storsjö
  2023-05-26  8:03 ` [FFmpeg-devel] [PATCH 3/4] aarch64: Add linux runtime cpu feature detection using getauxval(AT_HWCAP) Martin Storsjö
@ 2023-05-26  8:03 ` Martin Storsjö
  2023-05-27  8:44 ` [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Rémi Denis-Courmont
  3 siblings, 0 replies; 13+ messages in thread
From: Martin Storsjö @ 2023-05-26  8:03 UTC (permalink / raw)
  To: ffmpeg-devel

---
 configure               |  2 ++
 libavutil/aarch64/cpu.c | 22 ++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/configure b/configure
index b5357b8d27..45bdc16c7d 100755
--- a/configure
+++ b/configure
@@ -2346,6 +2346,7 @@ SYSTEM_FUNCS="
     strerror_r
     sysconf
     sysctl
+    sysctlbyname
     usleep
     UTGetOSTypeFromString
     VirtualAlloc
@@ -6384,6 +6385,7 @@ check_func_headers mach/mach_time.h mach_absolute_time
 check_func_headers stdlib.h getenv
 check_func_headers sys/stat.h lstat
 check_func_headers sys/auxv.h getauxval
+check_func_headers sys/sysctl.h sysctlbyname
 
 check_func_headers windows.h GetModuleHandle
 check_func_headers windows.h GetProcessAffinityMask
diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c
index 34c838c2f5..f35e4356df 100644
--- a/libavutil/aarch64/cpu.c
+++ b/libavutil/aarch64/cpu.c
@@ -47,6 +47,28 @@ static int detect_flags(void)
     return flags;
 }
 
+#elif defined(__APPLE__) && HAVE_SYSCTLBYNAME
+#include <sys/sysctl.h>
+
+static int detect_flags(void)
+{
+    uint32_t value = 0;
+    size_t size;
+    int flags = 0;
+
+    size = sizeof(value);
+    if (!sysctlbyname("hw.optional.arm.FEAT_DotProd", &value, &size, NULL, 0)) {
+        if (value)
+            flags |= AV_CPU_FLAG_DOTPROD;
+    }
+    size = sizeof(value);
+    if (!sysctlbyname("hw.optional.arm.FEAT_I8MM", &value, &size, NULL, 0)) {
+        if (value)
+            flags |= AV_CPU_FLAG_I8MM;
+    }
+    return flags;
+}
+
 #else
 
 static int detect_flags(void)
-- 
2.37.1 (Apple Git-137.1)

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions
  2023-05-26  8:03 [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Martin Storsjö
                   ` (2 preceding siblings ...)
  2023-05-26  8:03 ` [FFmpeg-devel] [PATCH 4/4] aarch64: Add Apple runtime detection of dotprod and i8mm using sysctl Martin Storsjö
@ 2023-05-27  8:44 ` Rémi Denis-Courmont
  2023-05-27 21:34   ` Martin Storsjö
  3 siblings, 1 reply; 13+ messages in thread
From: Rémi Denis-Courmont @ 2023-05-27  8:44 UTC (permalink / raw)
  To: ffmpeg-devel

Le perjantaina 26. toukokuuta 2023, 11.03.12 EEST Martin Storsjö a écrit :
> These are available since ARMv8.4-a and ARMv8.6-a respectively,
> but can also be available optionally since ARMv8.2-a.
> 
> Check if these are available for use unconditionally (e.g. if compiling
> with -march=armv8.6-a), or if they can be enabled with specific
> assembler directives.
> 
> Use ".arch_extension <ext>" for enabling a specific extension in
> assembly; the same can also be achieved with ".arch armv8.2-a+<ext>",
> but with .arch_extension is easier to combine multiple separate
> features.
> 
> Enabling these extensions requires setting a base architecture level
> of armv8.2-a with .arch. Don't add ".arch armv8.2-a" unless necessary;
> if the base level is high enough (which might unlock other extensions
> without .arch_extension), we don't want to lower it.

I don't follow how that would actually happen, TBH. Even if the default target 
version is, say, 8.5, the assembler won't magically start emitting 8.5 
instructions.

Someone would have to write assembler code that would fail to build under a 
toolchain with a lower target version. That sounds like a bug that should be 
spotted and fixed, rather than papered over. Conversely the logic here seems 
unnecessarily, if not counter-productively, intricate.

> Only add .arch/.arch_extension if needed, e.g. current clang fails
> to recognize the dotprod and i8mm features in .arch_extension, but
> can successfully assemble these instructions if part of the baseline
> set with -march.

IME, Clang is utterly useless for assembling. That has become one of my pet 
peeves with Rust inline assembler, which is other much nicer than C inline 
assembler, whence you can't just work around it with `-no-integrated-as`. But 
I digress.

If the problem is to avoid `.arch_extension`, then I don't really see why you 
can't just use `.arch` with plus, and simplify a lot.


-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [FFmpeg-devel] [PATCH 3/4] aarch64: Add linux runtime cpu feature detection using getauxval(AT_HWCAP)
  2023-05-26  8:03 ` [FFmpeg-devel] [PATCH 3/4] aarch64: Add linux runtime cpu feature detection using getauxval(AT_HWCAP) Martin Storsjö
@ 2023-05-27  9:04   ` Rémi Denis-Courmont
  2023-05-27 20:35     ` Martin Storsjö
  0 siblings, 1 reply; 13+ messages in thread
From: Rémi Denis-Courmont @ 2023-05-27  9:04 UTC (permalink / raw)
  To: ffmpeg-devel

Le perjantaina 26. toukokuuta 2023, 11.03.14 EEST Martin Storsjö a écrit :
> Based on code by Janne Grunau.
> 
> Using HWCAP_CPUID for user space access to the CPU feature registers. See
> https://www.kernel.org/doc/html/latest/arm64/cpu-feature-registers.html.
> ---
>  configure               |  2 ++
>  libavutil/aarch64/cpu.c | 38 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/configure b/configure
> index 3c7473efb2..b5357b8d27 100755
> --- a/configure
> +++ b/configure
> @@ -2207,6 +2207,7 @@ HAVE_LIST_PUB="
> 
>  HEADERS_LIST="
>      arpa_inet_h
> +    asm_hwcap_h
>      asm_types_h
>      cdio_paranoia_h
>      cdio_paranoia_paranoia_h
> @@ -6422,6 +6423,7 @@ check_headers io.h
>  enabled libdrm &&
>      check_headers linux/dma-buf.h
> 
> +check_headers asm/hwcap.h
>  check_headers linux/perf_event.h
>  check_headers libcrystalhd/libcrystalhd_if.h
>  check_headers malloc.h
> diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c
> index 42b33e4a2d..34c838c2f5 100644
> --- a/libavutil/aarch64/cpu.c
> +++ b/libavutil/aarch64/cpu.c
> @@ -20,6 +20,42 @@
>  #include "libavutil/cpu_internal.h"
>  #include "config.h"
> 
> +#if (defined(__linux__) || defined(__ANDROID__)) && HAVE_GETAUXVAL &&
> HAVE_ASM_HWCAP_H +#include <stdint.h>
> +#include <asm/hwcap.h>
> +#include <sys/auxv.h>
> +
> +#define get_cpu_feature_reg(reg, val) \
> +        __asm__("mrs %0, " #reg : "=r" (val))

Strictly speaking, this can read any system register. One way to prevent that 
would be to include the ID_ prefix and _EL1 suffix in the macro. I would have 
used a pure static inline instead, but that's just a matter of taste.

> +
> +static int detect_flags(void)
> +{
> +    unsigned long ret = getauxval(AT_HWCAP);
> +    int flags = 0;
> +#if defined(HWCAP_CPUID)
> +    uint64_t tmp;
> +    if (!(ret & HWCAP_CPUID))
> +        return flags;
> +    get_cpu_feature_reg(ID_AA64ISAR0_EL1, tmp);
> +    if (((tmp >> 44) & 0xf) == 0x1)
> +        flags |= AV_CPU_FLAG_DOTPROD;
> +    get_cpu_feature_reg(ID_AA64ISAR1_EL1, tmp);
> +    if (((tmp >> 52) & 0xf) == 0x1)
> +        flags |= AV_CPU_FLAG_I8MM;
> +#endif

NEON detection could be added here, though I've yet to see an Armv8 
implementation without AdvSIMD.

FWIW, DotProd is exposed as HWCAP_ASIMDDP and I8MM is exposed via HWCAP2_I8MM, 
using trapped ID registers is not (yet) necessary.

> +
> +    return flags;
> +}
> +
> +#else
> +
> +static int detect_flags(void)
> +{
> +    return 0;
> +}
> +
> +#endif
> +
>  int ff_get_cpu_flags_aarch64(void)
>  {
>      int flags = AV_CPU_FLAG_ARMV8 * HAVE_ARMV8 |
> @@ -33,6 +69,8 @@ int ff_get_cpu_flags_aarch64(void)
>      flags |= AV_CPU_FLAG_I8MM;
>  #endif
> 
> +    flags |= detect_flags();
> +
>      return flags;
>  }


-- 
レミ・デニ-クールモン
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [FFmpeg-devel] [PATCH 3/4] aarch64: Add linux runtime cpu feature detection using getauxval(AT_HWCAP)
  2023-05-27  9:04   ` Rémi Denis-Courmont
@ 2023-05-27 20:35     ` Martin Storsjö
  2023-05-28  5:58       ` Rémi Denis-Courmont
  0 siblings, 1 reply; 13+ messages in thread
From: Martin Storsjö @ 2023-05-27 20:35 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

On Sat, 27 May 2023, Rémi Denis-Courmont wrote:

> Le perjantaina 26. toukokuuta 2023, 11.03.14 EEST Martin Storsjö a écrit :
>> Based on code by Janne Grunau.
>> 
>> Using HWCAP_CPUID for user space access to the CPU feature registers. See
>> https://www.kernel.org/doc/html/latest/arm64/cpu-feature-registers.html.
>> ---
>>  configure               |  2 ++
>>  libavutil/aarch64/cpu.c | 38 ++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 40 insertions(+)
>> 
>> diff --git a/configure b/configure
>> index 3c7473efb2..b5357b8d27 100755
>> --- a/configure
>> +++ b/configure
>> @@ -2207,6 +2207,7 @@ HAVE_LIST_PUB="
>>
>>  HEADERS_LIST="
>>      arpa_inet_h
>> +    asm_hwcap_h
>>      asm_types_h
>>      cdio_paranoia_h
>>      cdio_paranoia_paranoia_h
>> @@ -6422,6 +6423,7 @@ check_headers io.h
>>  enabled libdrm &&
>>      check_headers linux/dma-buf.h
>> 
>> +check_headers asm/hwcap.h
>>  check_headers linux/perf_event.h
>>  check_headers libcrystalhd/libcrystalhd_if.h
>>  check_headers malloc.h
>> diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c
>> index 42b33e4a2d..34c838c2f5 100644
>> --- a/libavutil/aarch64/cpu.c
>> +++ b/libavutil/aarch64/cpu.c
>> @@ -20,6 +20,42 @@
>>  #include "libavutil/cpu_internal.h"
>>  #include "config.h"
>> 
>> +#if (defined(__linux__) || defined(__ANDROID__)) && HAVE_GETAUXVAL &&
>> HAVE_ASM_HWCAP_H +#include <stdint.h>
>> +#include <asm/hwcap.h>
>> +#include <sys/auxv.h>
>> +
>> +#define get_cpu_feature_reg(reg, val) \
>> +        __asm__("mrs %0, " #reg : "=r" (val))
>
> Strictly speaking, this can read any system register. One way to prevent that 
> would be to include the ID_ prefix and _EL1 suffix in the macro. I would have 
> used a pure static inline instead, but that's just a matter of taste.
>
>> +
>> +static int detect_flags(void)
>> +{
>> +    unsigned long ret = getauxval(AT_HWCAP);
>> +    int flags = 0;
>> +#if defined(HWCAP_CPUID)
>> +    uint64_t tmp;
>> +    if (!(ret & HWCAP_CPUID))
>> +        return flags;
>> +    get_cpu_feature_reg(ID_AA64ISAR0_EL1, tmp);
>> +    if (((tmp >> 44) & 0xf) == 0x1)
>> +        flags |= AV_CPU_FLAG_DOTPROD;
>> +    get_cpu_feature_reg(ID_AA64ISAR1_EL1, tmp);
>> +    if (((tmp >> 52) & 0xf) == 0x1)
>> +        flags |= AV_CPU_FLAG_I8MM;
>> +#endif
>
> NEON detection could be added here, though I've yet to see an Armv8 
> implementation without AdvSIMD.

I guess we could, but as it's part of the require baseline for armv8-a I 
don't think there's much need for it? If configured with --disable-neon we 
don't return that cpuflag though.

> FWIW, DotProd is exposed as HWCAP_ASIMDDP and I8MM is exposed via 
> HWCAP2_I8MM, using trapped ID registers is not (yet) necessary.

Ah, I see. I guess using those would be more straightforward.

OTOH, HWCAP_CPUID is available much earlier than HWCAP_ASIMDDP or 
HWCAP2_I8MM (I do some amount of cross building with a fairly old 
sysroot). I'll think about it, whether it's worth complicating things to 
try both approaches, or if we should just go with the plain HWCAPs here.

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions
  2023-05-27  8:44 ` [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Rémi Denis-Courmont
@ 2023-05-27 21:34   ` Martin Storsjö
  2023-05-28  5:50     ` Rémi Denis-Courmont
  2023-06-01 11:04     ` Martin Storsjö
  0 siblings, 2 replies; 13+ messages in thread
From: Martin Storsjö @ 2023-05-27 21:34 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

On Sat, 27 May 2023, Rémi Denis-Courmont wrote:

> Le perjantaina 26. toukokuuta 2023, 11.03.12 EEST Martin Storsjö a écrit :
>> These are available since ARMv8.4-a and ARMv8.6-a respectively,
>> but can also be available optionally since ARMv8.2-a.
>> 
>> Check if these are available for use unconditionally (e.g. if compiling
>> with -march=armv8.6-a), or if they can be enabled with specific
>> assembler directives.
>> 
>> Use ".arch_extension <ext>" for enabling a specific extension in
>> assembly; the same can also be achieved with ".arch armv8.2-a+<ext>",
>> but with .arch_extension is easier to combine multiple separate
>> features.
>> 
>> Enabling these extensions requires setting a base architecture level
>> of armv8.2-a with .arch. Don't add ".arch armv8.2-a" unless necessary;
>> if the base level is high enough (which might unlock other extensions
>> without .arch_extension), we don't want to lower it.
>
> I don't follow how that would actually happen, TBH. Even if the default target 
> version is, say, 8.5, the assembler won't magically start emitting 8.5 
> instructions.
>
> Someone would have to write assembler code that would fail to build under a 
> toolchain with a lower target version. That sounds like a bug that should be 
> spotted and fixed, rather than papered over.

I don't see how anything here suggests papering over such an issue?

I'm not sure exactly which parts of the message you refer to here, but 
I'll elaborate on the point about why we only should set .arch if we 
really need to.


Consider a build configuration with -march=armv8.4-a. We test that the 
dotprod extension is available and usable without adding any directives - 
so we won't add any directives for that. We also test that the assembler 
does support i8mm, with ".arch armv8.2-a" plus ".arch_extension i8mm".

But if we do add ".arch armv8.2-a" and ".arch_extension i8mm", then we 
break the dotprod extension. If we only add ".arch_extension i8mm" without 
the .arch directive, we get what we want to though.

> If the problem is to avoid `.arch_extension`, then I don't really see 
> why you can't just use `.arch` with plus, and simplify a lot.

Well Clang doesn't quite support that currently either. For 
".arch_extension dotprod" it errors out since it doesn't recognize the 
dotprod feature in that directive. It does accept ".arch 
armv8.2-a+dotprod" but it doesn't actually unlock using the dotprod 
extension in the assembly despite that. (I'll look into fixing this in 
upstream LLVM afterwards.)

As Clang/LLVM has these limitations/issues currently, one main design 
criterion here is that we shouldn't add any extra .arch/.arch_extension 
directives unless we need and can (and gain some instruction support from 
it).


Taking it back to the drawing board: So for enabling e.g. i8mm, we could 
either do
     .arch armv8.6-a
or
     .arch armv8.2-a+dotprod
or
     .arch armv8.2
     .arch_extension dotprod


Out of these, I initially preferred doing the third approach.

There's no functional difference between the second and third one, except 
the single-line form is more messy to handle, as we can have various 
combinations of what actually is supported. And with the single-line .arch 
form, we can't just add e.g. i8mm on top of a -march= setting that already 
supports dotprod, without respecifying what the toolchain itself defaults 
to.


The documentation for .arch_extension hints at it being possible to 
disable support for extensions with it too, but that doesn't seem to be 
the case in practice. If it was, we could add macros to only enable 
specifically the extensions we want around those functions that should use 
them and nothing more. But I guess if that's not actually supported we 
can't do that.


I guess the alternative would be to just try to set .arch 
<highest-supported-that-we-care-about>. I was worried that support for 
e.g. armv8.6-a appeared later in toolchains than support for the 
individual extension i8mm, but at least from a quick browse in binutils 
history, they seem to have been added at the same time, so there's 
probably no such drawback.

Or what's the situation with e.g. SVE2 - was ".arch_extension sve2" 
supported significantly earlier than ".arch armv9-a"? It looks like 
binutils learnt about sve2 in 2019, but about armv9-a in 2021? OTOH that's 
probably not too much of a real issue either.

If we'd do that, it does simplify the configure logic a fair bit and 
reduces the number of configure variables we need by a lot. It does enable 
a few more instruction set extensions than what we need though, but that's 
probably not a real issue.

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions
  2023-05-27 21:34   ` Martin Storsjö
@ 2023-05-28  5:50     ` Rémi Denis-Courmont
  2023-05-30 12:25       ` Martin Storsjö
  2023-06-01 11:04     ` Martin Storsjö
  1 sibling, 1 reply; 13+ messages in thread
From: Rémi Denis-Courmont @ 2023-05-28  5:50 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

Le sunnuntaina 28. toukokuuta 2023, 0.34.15 EEST Martin Storsjö a écrit :
> > Someone would have to write assembler code that would fail to build under
> > a toolchain with a lower target version. That sounds like a bug that
> > should be spotted and fixed, rather than papered over.
> 
> I don't see how anything here suggests papering over such an issue?

For instance, say the toolchain, or the FFmpeg build flags, sets ARMv8.6 as 
target architecture. It makes perfect sense for the C compiler to emit non-
hint ARMv8.3 PAuth instructions, which would crash on an ARMv8.0-8.2 
processor. But it makes zero sense for the assembler to change its behaviour. 
So all existing ASIMD assembler files written for ARMv8.0-A, should still 
generate ARMv8.0-A code.

But now somebody can accidentally write an ARMv8.1-8.6 instruction into their 
ARMv8.0 assembler code, and it will not trigger a build error (to them). There 
are no clear reasons to trying to avoid *lowering* the target version for 
*assembler* (as opposed to C).

If a file needs ARMv8.2, it should just ARMv8.2 as its target, whether the 
overall build targets ARMv8.0, ARMv8.2 or ARMv8.6. The C code around it should 
ensure that the requirements are met. There are no benefits to preserving a 
higher version; it just makes the configure checks needlessly intricate.

> if we do add ".arch armv8.2-a" and ".arch_extension i8mm", then we
> break the dotprod extension. If we only add ".arch_extension i8mm" without
> the .arch directive, we get what we want to though.

Yes, but you can also just set `.arch armv8.4-a`, which is ostensibly The Way 
to enable DotProd on a craptastic assembler. If even that doesn't work, the 
assembler presumably doesn't support DotProd at all (or it's some obscure 
unstable development version that was snapshot between adding DotProd and 
adding complete ARMv8.4, which we really should not care about).

Likewise, for I8MM, you can just do `.arch armv8.6-a`.

So what if the target version was 8.7 or 9.2? Well, in an assembler file, we 
don't really care, as noted above.

> > If the problem is to avoid `.arch_extension`, then I don't really see
> > why you can't just use `.arch` with plus, and simplify a lot.
> 
> Well Clang doesn't quite support that currently either. For
> ".arch_extension dotprod" it errors out since it doesn't recognize the
> dotprod feature in that directive. It does accept ".arch
> armv8.2-a+dotprod" but it doesn't actually unlock using the dotprod
> extension in the assembly despite that. (I'll look into fixing this in
> upstream LLVM afterwards.)

I don't know if `.arch_extension` is specified by Arm or just a GCCism. But if 
you accept DotProd in `.arch` in arch and then don't enable it, then that's 
clearly a bug. But then again, that's moot if you can just do `.arch armv8.4-
a` instead.

> Taking it back to the drawing board: So for enabling e.g. i8mm, we could
> either do
>      .arch armv8.6-a
> or
>      .arch armv8.2-a+dotprod
> or
>      .arch armv8.2
>      .arch_extension dotprod
> 
> 
> Out of these, I initially preferred doing the third approach.

I agree that that's the cleanest option. But that's not the point here. The 
point is that what this patch is a hell of a lot more involved than doing just 
that, and it seems unnecessarily intricate for the purpose of enabling 
DotProd.

> There's no functional difference between the second and third one, except
> the single-line form is more messy to handle, as we can have various
> combinations of what actually is supported.

Yes but a pile of #if's is even more messy to handle.

> And with the single-line .arch
> form, we can't just add e.g. i8mm on top of a -march= setting that already
> supports dotprod, without respecifying what the toolchain itself defaults
> to.
> 
> 
> The documentation for .arch_extension hints at it being possible to
> disable support for extensions with it too, but that doesn't seem to be
> the case in practice. If it was, we could add macros to only enable
> specifically the extensions we want around those functions that should use
> them and nothing more. But I guess if that's not actually supported we
> can't do that.

GCC supports it with RV64 (`.option arch` plus/minus). I haven't tried it on 
AArch64. Of course, LLVM does not support it, even though the patch has been 
out there for a long time and even though it is part of the ABI spec rather 
than made up by GNU/binutils. Thanks to that, RVV is unusable on LLVM (Linux 
kernel specifically requires GNU/as due to that).

> I guess the alternative would be to just try to set .arch
> <highest-supported-that-we-care-about>. I was worried that support for
> e.g. armv8.6-a appeared later in toolchains than support for the
> individual extension i8mm, but at least from a quick browse in binutils
> history, they seem to have been added at the same time, so there's
> probably no such drawback.
> 
> Or what's the situation with e.g. SVE2 - was ".arch_extension sve2"
> supported significantly earlier than ".arch armv9-a"?

I have not tested SVE on LLVM. AFAIK, SVE and SVE2 are optional from 8.2 and 
9.0 onward respectively, and not mandatory in any version, so if your 
toolchain supports neither .arch with plus sign, nor .arch_extension, it is 
game over.

> If we'd do that, it does simplify the configure logic a fair bit and
> reduces the number of configure variables we need by a lot. It does enable
> a few more instruction set extensions than what we need though, but that's
> probably not a real issue.

Yes.

-- 
Реми Дёни-Курмон
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [FFmpeg-devel] [PATCH 3/4] aarch64: Add linux runtime cpu feature detection using getauxval(AT_HWCAP)
  2023-05-27 20:35     ` Martin Storsjö
@ 2023-05-28  5:58       ` Rémi Denis-Courmont
  0 siblings, 0 replies; 13+ messages in thread
From: Rémi Denis-Courmont @ 2023-05-28  5:58 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

Le lauantaina 27. toukokuuta 2023, 23.35.14 EEST Martin Storsjö a écrit :
> > NEON detection could be added here, though I've yet to see an Armv8
> > implementation without AdvSIMD.
> 
> I guess we could, but as it's part of the require baseline for armv8-a I
> don't think there's much need for it?

I mean that if you have 95% of the code needed, you might as well add the last 
5%. I don't think it's used by much anything.

> If configured with --disable-neon we
> don't return that cpuflag though.
> 
> > FWIW, DotProd is exposed as HWCAP_ASIMDDP and I8MM is exposed via
> > HWCAP2_I8MM, using trapped ID registers is not (yet) necessary.
> 
> Ah, I see. I guess using those would be more straightforward.
> 
> OTOH, HWCAP_CPUID is available much earlier than HWCAP_ASIMDDP or
> HWCAP2_I8MM (I do some amount of cross building with a fairly old
> sysroot).

If the sysroot shares the not-too-old kernel of the host, then ID registers 
will work anyway. The age of the userspace toolchain affects availability of 
HWCAP constant definitions, not that of ID registers, which is a purely in-
kernel feature.

> I'll think about it, whether it's worth complicating things to
> try both approaches, or if we should just go with the plain HWCAPs here.

I don't really mind either way.

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions
  2023-05-28  5:50     ` Rémi Denis-Courmont
@ 2023-05-30 12:25       ` Martin Storsjö
  2023-05-31 16:29         ` Rémi Denis-Courmont
  0 siblings, 1 reply; 13+ messages in thread
From: Martin Storsjö @ 2023-05-30 12:25 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

On Sun, 28 May 2023, Rémi Denis-Courmont wrote:

> Le sunnuntaina 28. toukokuuta 2023, 0.34.15 EEST Martin Storsjö a écrit :
>
>> I guess the alternative would be to just try to set .arch
>> <highest-supported-that-we-care-about>. I was worried that support for
>> e.g. armv8.6-a appeared later in toolchains than support for the
>> individual extension i8mm, but at least from a quick browse in binutils
>> history, they seem to have been added at the same time, so there's
>> probably no such drawback.
>> 
>> Or what's the situation with e.g. SVE2 - was ".arch_extension sve2"
>> supported significantly earlier than ".arch armv9-a"?
>
> I have not tested SVE on LLVM. AFAIK, SVE and SVE2 are optional from 8.2 and 
> 9.0 onward respectively, and not mandatory in any version, so if your 
> toolchain supports neither .arch with plus sign, nor .arch_extension, it is 
> game over.

I didn't meant specifically whether LLVM supports it here, just in general 
wrt binutils and how to enable the feature.

FWIW it seems like SVE2 is a mandatory part of 9.0 - assembling SVE2 
instructions can be done with ".arch armv9-a". But there are about 2 years 
worth of deployed binutils based toolchains that do recognize ".arch 
armv8.2-a; .arch_extension sve2" but don't recognize ".arch armv9-a".

So for the generic mechanism for enabling cpu features, I'd prefer to keep 
the mechanism using primarily .arch_extension (with .arch set as high as 
necessary) rather than relying solely on .arch <version> without any extra 
+<feature>.

>> If we'd do that, it does simplify the configure logic a fair bit and
>> reduces the number of configure variables we need by a lot. It does enable
>> a few more instruction set extensions than what we need though, but that's
>> probably not a real issue.
>
> Yes.

I made an attempt at simplifying the logic in configure and asm.S 
somewhat, while still primarily using .arch_extension, and while making 
sure we still can get the features assembled with current Clang with a 
high enough -march= setting. (Runtime enabled features are out of scope 
for Clang for now as we don't want to try to pass individual higher 
-march= options to the individual assembly files.)

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions
  2023-05-30 12:25       ` Martin Storsjö
@ 2023-05-31 16:29         ` Rémi Denis-Courmont
  0 siblings, 0 replies; 13+ messages in thread
From: Rémi Denis-Courmont @ 2023-05-31 16:29 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

Le tiistaina 30. toukokuuta 2023, 15.25.25 EEST Martin Storsjö a écrit :
> On Sun, 28 May 2023, Rémi Denis-Courmont wrote:
> > Le sunnuntaina 28. toukokuuta 2023, 0.34.15 EEST Martin Storsjö a écrit :
> >> I guess the alternative would be to just try to set .arch
> >> <highest-supported-that-we-care-about>. I was worried that support for
> >> e.g. armv8.6-a appeared later in toolchains than support for the
> >> individual extension i8mm, but at least from a quick browse in binutils
> >> history, they seem to have been added at the same time, so there's
> >> probably no such drawback.
> >> 
> >> Or what's the situation with e.g. SVE2 - was ".arch_extension sve2"
> >> supported significantly earlier than ".arch armv9-a"?
> > 
> > I have not tested SVE on LLVM. AFAIK, SVE and SVE2 are optional from 8.2
> > and 9.0 onward respectively, and not mandatory in any version, so if your
> > toolchain supports neither .arch with plus sign, nor .arch_extension, it
> > is game over.
> 
> I didn't meant specifically whether LLVM supports it here, just in general
> wrt binutils and how to enable the feature.
> 
> FWIW it seems like SVE2 is a mandatory part of 9.0

Yes and no. SVE requires ARMv8.2, and SVE2 requires ARMv9.0, but neither are 
ever required by any existing ARM version. DDI0487 is abundantly clear that 
SVE2 is OPTIONAL: "[The Scalable Vector Extension version 2 (SVE2)] feature is 
supported in AArch64 state only. This feature is OPTIONAL in an Armv9.0 
implementation,"

However it adds that "standard Armv9-A software platforms support FEAT_SVE2."

> - assembling SVE2 instructions can be done with ".arch armv9-a".

Presumably binutils targets "standard Armv9-A software platforms" by default, 
as opposed to just any "Armv9.0 implementation". I guess that means that any 
Cortex, Neoverse or other proper ARM design will include SVE2, but big-ass 
architecture licensees such as APPL and NVDA are not required to include SVE2 
in their own designs.

> But there are about 2 years
> worth of deployed binutils based toolchains that do recognize ".arch
> armv8.2-a; .arch_extension sve2" but don't recognize ".arch armv9-a".

This is not entirely surprising, since SVE is much older than ARMv9, even if 
SVE2 requires ARMv9.

-- 
レミ・デニ-クールモン
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions
  2023-05-27 21:34   ` Martin Storsjö
  2023-05-28  5:50     ` Rémi Denis-Courmont
@ 2023-06-01 11:04     ` Martin Storsjö
  1 sibling, 0 replies; 13+ messages in thread
From: Martin Storsjö @ 2023-06-01 11:04 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

On Sun, 28 May 2023, Martin Storsjö wrote:

> The documentation for .arch_extension hints at it being possible to disable 
> support for extensions with it too, but that doesn't seem to be the case in 
> practice. If it was, we could add macros to only enable specifically the 
> extensions we want around those functions that should use them and nothing 
> more. But I guess if that's not actually supported we can't do that.

Actually, yes, this does work, it just uses a different syntax than I 
expected. To disable the extension <ext>, one can write the directive 
".arch_extension no<ext>".

Is the updated, less complex version of the configure patch more 
tolerable?

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-06-01 11:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-26  8:03 [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Martin Storsjö
2023-05-26  8:03 ` [FFmpeg-devel] [PATCH 2/4] aarch64: Add cpu flags for the dotprod and i8mm extensions Martin Storsjö
2023-05-26  8:03 ` [FFmpeg-devel] [PATCH 3/4] aarch64: Add linux runtime cpu feature detection using getauxval(AT_HWCAP) Martin Storsjö
2023-05-27  9:04   ` Rémi Denis-Courmont
2023-05-27 20:35     ` Martin Storsjö
2023-05-28  5:58       ` Rémi Denis-Courmont
2023-05-26  8:03 ` [FFmpeg-devel] [PATCH 4/4] aarch64: Add Apple runtime detection of dotprod and i8mm using sysctl Martin Storsjö
2023-05-27  8:44 ` [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions Rémi Denis-Courmont
2023-05-27 21:34   ` Martin Storsjö
2023-05-28  5:50     ` Rémi Denis-Courmont
2023-05-30 12:25       ` Martin Storsjö
2023-05-31 16:29         ` Rémi Denis-Courmont
2023-06-01 11:04     ` Martin Storsjö

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git