Re: [FFmpeg-devel] [FFmpeg-cvslog] x86/intreadwrite: use intrinsics instead of inline asm for AV

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed

* Re: [FFmpeg-devel] [FFmpeg-cvslog] x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128
       [not found] <20240710162641.7B606412682@natalya.videolan.org>
@ 2024-07-11 13:08 ` Martin Storsjö
  2024-07-11 13:13   ` James Almer
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Storsjö @ 2024-07-11 13:08 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: ffmpeg-cvslog

On Wed, 10 Jul 2024, James Almer wrote:

> ffmpeg | branch: master | James Almer <jamrial@gmail.com> | Wed Jul 10 13:00:20 2024 -0300| [bd1bcb07e0f29c135103a402d71b343a09ad1690] | committer: James Almer
>
> x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128
>
> This has the benefit of removing any SSE -> AVX penalty that may happen when
> the compiler emits VEX encoded instructions.
>
> Signed-off-by: James Almer <jamrial@gmail.com>
>
>> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=bd1bcb07e0f29c135103a402d71b343a09ad1690
> ---
>
> configure                    |  5 ++++-
> libavutil/x86/intreadwrite.h | 20 +++++++-------------
> 2 files changed, 11 insertions(+), 14 deletions(-)
>
> diff --git a/configure b/configure
> index f84fefeaab..7151ed1de3 100755
> --- a/configure
> +++ b/configure
> @@ -2314,6 +2314,7 @@ HEADERS_LIST="
>
> INTRINSICS_LIST="
>     intrinsics_neon
> +    intrinsics_sse
>     intrinsics_sse2
> "
>
> @@ -2744,7 +2745,8 @@ armv6t2_deps="arm"
> armv8_deps="aarch64"
> neon_deps_any="aarch64 arm"
> intrinsics_neon_deps="neon"
> -intrinsics_sse2_deps="sse2"
> +intrinsics_sse_deps="sse"
> +intrinsics_sse2_deps="sse2 intrinsics_sse"
> vfp_deps="arm"
> vfpv3_deps="vfp"
> setend_deps="arm"
> @@ -6446,6 +6448,7 @@ elif enabled loongarch; then
> fi
>
> check_cc intrinsics_neon arm_neon.h "int16x8_t test = vdupq_n_s16(0)"
> +check_cc intrinsics_sse immintrin.h "__m128 test = _mm_setzero_ps()"
> check_cc intrinsics_sse2 emmintrin.h "__m128i test = _mm_setzero_si128()"
>
> check_ldflags -Wl,--as-needed
> diff --git a/libavutil/x86/intreadwrite.h b/libavutil/x86/intreadwrite.h
> index 9bbef00dba..6546eb016c 100644
> --- a/libavutil/x86/intreadwrite.h
> +++ b/libavutil/x86/intreadwrite.h
> @@ -22,29 +22,25 @@
> #define AVUTIL_X86_INTREADWRITE_H
>
> #include <stdint.h>
> +#if HAVE_INTRINSICS_SSE
> +#include <immintrin.h>
> +#endif

This change seems to have broken builds for x86 with Clang 16 or newer. 
(Clang 15 and lower seems to be fine.)

See e.g. 
http://fate.ffmpeg.org/log.cgi?slot=i686-mingw32-clang-trunk&time=20240711035948&log=compile 
for an example of the error. The issue is that a clang internal intrinsics 
header contains "_mm_comige_sh(__m128h A,", i.e. a parameter with the name 
"A" (which toolchain provided headers shouldn't use). This clashes with 
libavcodec/huffuyv.h, which has a "#define A 3".

This is obviously a Clang intrinsics header bug, but we can't fix the 
existing Clang 16-18 releases that are out there, so I guess what we can 
do is change our "define A" to something more elaborate. (IIRC there are 
some similar issues with names with ncurses and/or android headers too.)

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [FFmpeg-devel] [FFmpeg-cvslog] x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128
  2024-07-11 13:08 ` [FFmpeg-devel] [FFmpeg-cvslog] x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128 Martin Storsjö
@ 2024-07-11 13:13   ` James Almer
  2024-07-11 13:42     ` Martin Storsjö
  0 siblings, 1 reply; 7+ messages in thread
From: James Almer @ 2024-07-11 13:13 UTC (permalink / raw)
  To: ffmpeg-devel

On 7/11/2024 10:08 AM, Martin Storsjö wrote:
> On Wed, 10 Jul 2024, James Almer wrote:
> 
>> ffmpeg | branch: master | James Almer <jamrial@gmail.com> | Wed Jul 10 
>> 13:00:20 2024 -0300| [bd1bcb07e0f29c135103a402d71b343a09ad1690] | 
>> committer: James Almer
>>
>> x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128
>>
>> This has the benefit of removing any SSE -> AVX penalty that may 
>> happen when
>> the compiler emits VEX encoded instructions.
>>
>> Signed-off-by: James Almer <jamrial@gmail.com>
>>
>>> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=bd1bcb07e0f29c135103a402d71b343a09ad1690
>> ---
>>
>> configure                    |  5 ++++-
>> libavutil/x86/intreadwrite.h | 20 +++++++-------------
>> 2 files changed, 11 insertions(+), 14 deletions(-)
>>
>> diff --git a/configure b/configure
>> index f84fefeaab..7151ed1de3 100755
>> --- a/configure
>> +++ b/configure
>> @@ -2314,6 +2314,7 @@ HEADERS_LIST="
>>
>> INTRINSICS_LIST="
>>     intrinsics_neon
>> +    intrinsics_sse
>>     intrinsics_sse2
>> "
>>
>> @@ -2744,7 +2745,8 @@ armv6t2_deps="arm"
>> armv8_deps="aarch64"
>> neon_deps_any="aarch64 arm"
>> intrinsics_neon_deps="neon"
>> -intrinsics_sse2_deps="sse2"
>> +intrinsics_sse_deps="sse"
>> +intrinsics_sse2_deps="sse2 intrinsics_sse"
>> vfp_deps="arm"
>> vfpv3_deps="vfp"
>> setend_deps="arm"
>> @@ -6446,6 +6448,7 @@ elif enabled loongarch; then
>> fi
>>
>> check_cc intrinsics_neon arm_neon.h "int16x8_t test = vdupq_n_s16(0)"
>> +check_cc intrinsics_sse immintrin.h "__m128 test = _mm_setzero_ps()"
>> check_cc intrinsics_sse2 emmintrin.h "__m128i test = _mm_setzero_si128()"
>>
>> check_ldflags -Wl,--as-needed
>> diff --git a/libavutil/x86/intreadwrite.h b/libavutil/x86/intreadwrite.h
>> index 9bbef00dba..6546eb016c 100644
>> --- a/libavutil/x86/intreadwrite.h
>> +++ b/libavutil/x86/intreadwrite.h
>> @@ -22,29 +22,25 @@
>> #define AVUTIL_X86_INTREADWRITE_H
>>
>> #include <stdint.h>
>> +#if HAVE_INTRINSICS_SSE
>> +#include <immintrin.h>
>> +#endif
> 
> This change seems to have broken builds for x86 with Clang 16 or newer. 
> (Clang 15 and lower seems to be fine.)
> 
> See e.g. 
> http://fate.ffmpeg.org/log.cgi?slot=i686-mingw32-clang-trunk&time=20240711035948&log=compile for an example of the error. The issue is that a clang internal intrinsics header contains "_mm_comige_sh(__m128h A,", i.e. a parameter with the name "A" (which toolchain provided headers shouldn't use). This clashes with libavcodec/huffuyv.h, which has a "#define A 3".
> 
> This is obviously a Clang intrinsics header bug, but we can't fix the 
> existing Clang 16-18 releases that are out there, so I guess what we can 
> do is change our "define A" to something more elaborate. (IIRC there are 
> some similar issues with names with ncurses and/or android headers too.)
> 
> // Martin

We also do "#define A AV_OPT_FLAG_AUDIO_PARAM" in options_table.h and 
probably other places, so changing huffyuv.h may not be enough.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [FFmpeg-devel] [FFmpeg-cvslog] x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128
  2024-07-11 13:13   ` James Almer
@ 2024-07-11 13:42     ` Martin Storsjö
  2024-07-11 13:54       ` Martin Storsjö
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Storsjö @ 2024-07-11 13:42 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

On Thu, 11 Jul 2024, James Almer wrote:

> On 7/11/2024 10:08 AM, Martin Storsjö wrote:
>> On Wed, 10 Jul 2024, James Almer wrote:
>> 
>>> ffmpeg | branch: master | James Almer <jamrial@gmail.com> | Wed Jul 10 
>>> 13:00:20 2024 -0300| [bd1bcb07e0f29c135103a402d71b343a09ad1690] | 
>>> committer: James Almer
>>>
>>> x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128
>>>
>>> This has the benefit of removing any SSE -> AVX penalty that may 
>>> happen when
>>> the compiler emits VEX encoded instructions.
>>>
>>> Signed-off-by: James Almer <jamrial@gmail.com>
>>>
>>>> 
> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=bd1bcb07e0f29c135103a402d71b343a09ad1690
>>> ---
>>>
>>> configure                    |  5 ++++-
>>> libavutil/x86/intreadwrite.h | 20 +++++++-------------
>>> 2 files changed, 11 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/configure b/configure
>>> index f84fefeaab..7151ed1de3 100755
>>> --- a/configure
>>> +++ b/configure
>>> @@ -2314,6 +2314,7 @@ HEADERS_LIST="
>>>
>>> INTRINSICS_LIST="
>>>     intrinsics_neon
>>> +    intrinsics_sse
>>>     intrinsics_sse2
>>> "
>>>
>>> @@ -2744,7 +2745,8 @@ armv6t2_deps="arm"
>>> armv8_deps="aarch64"
>>> neon_deps_any="aarch64 arm"
>>> intrinsics_neon_deps="neon"
>>> -intrinsics_sse2_deps="sse2"
>>> +intrinsics_sse_deps="sse"
>>> +intrinsics_sse2_deps="sse2 intrinsics_sse"
>>> vfp_deps="arm"
>>> vfpv3_deps="vfp"
>>> setend_deps="arm"
>>> @@ -6446,6 +6448,7 @@ elif enabled loongarch; then
>>> fi
>>>
>>> check_cc intrinsics_neon arm_neon.h "int16x8_t test = vdupq_n_s16(0)"
>>> +check_cc intrinsics_sse immintrin.h "__m128 test = _mm_setzero_ps()"
>>> check_cc intrinsics_sse2 emmintrin.h "__m128i test = 
> _mm_setzero_si128()"
>>>
>>> check_ldflags -Wl,--as-needed
>>> diff --git a/libavutil/x86/intreadwrite.h 
> b/libavutil/x86/intreadwrite.h
>>> index 9bbef00dba..6546eb016c 100644
>>> --- a/libavutil/x86/intreadwrite.h
>>> +++ b/libavutil/x86/intreadwrite.h
>>> @@ -22,29 +22,25 @@
>>> #define AVUTIL_X86_INTREADWRITE_H
>>>
>>> #include <stdint.h>
>>> +#if HAVE_INTRINSICS_SSE
>>> +#include <immintrin.h>
>>> +#endif
>> 
>> This change seems to have broken builds for x86 with Clang 16 or newer. 
>> (Clang 15 and lower seems to be fine.)
>> 
>> See e.g. 
>> 
> http://fate.ffmpeg.org/log.cgi?slot=i686-mingw32-clang-trunk&time=20240711035948&log=compile 
> for an example of the error. The issue is that a clang internal 
> intrinsics header contains "_mm_comige_sh(__m128h A,", i.e. a parameter 
> with the name "A" (which toolchain provided headers shouldn't use). This 
> clashes with libavcodec/huffuyv.h, which has a "#define A 3".
>> 
>> This is obviously a Clang intrinsics header bug, but we can't fix the 
>> existing Clang 16-18 releases that are out there, so I guess what we 
> can 
>> do is change our "define A" to something more elaborate. (IIRC there 
> are 
>> some similar issues with names with ncurses and/or android headers 
> too.)
>> 
>> // Martin
>
> We also do "#define A AV_OPT_FLAG_AUDIO_PARAM" in options_table.h and 
> probably other places, so changing huffyuv.h may not be enough.

That's quite possible, but those cases may be including intreadwrite.h 
before that, do it's possible it might not trigger there.

I'll see how many places need to be changed here.

I sent a fix to Clang in https://github.com/llvm/llvm-project/pull/98478.

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [FFmpeg-devel] [FFmpeg-cvslog] x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128
  2024-07-11 13:42     ` Martin Storsjö
@ 2024-07-11 13:54       ` Martin Storsjö
  2024-07-11 17:53         ` James Almer
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Storsjö @ 2024-07-11 13:54 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

On Thu, 11 Jul 2024, Martin Storsjö wrote:

> On Thu, 11 Jul 2024, James Almer wrote:
>
>> On 7/11/2024 10:08 AM, Martin Storsjö wrote:
>>> On Wed, 10 Jul 2024, James Almer wrote:
>>> 
>>>> ffmpeg | branch: master | James Almer <jamrial@gmail.com> | Wed Jul 
>>>> 10 13:00:20 2024 -0300| [bd1bcb07e0f29c135103a402d71b343a09ad1690] | 
>>>> committer: James Almer
>>>> 
>>>> x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128
>>>> 
>>>> This has the benefit of removing any SSE -> AVX penalty that may 
>>>> happen when
>>>> the compiler emits VEX encoded instructions.
>>>> 
>>>> Signed-off-by: James Almer <jamrial@gmail.com>
>>>> 
>>>>> 
>> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=bd1bcb07e0f29c135103a402d71b343a09ad1690
>>>> ---
>>>> 
>>>> configure                    |  5 ++++-
>>>> libavutil/x86/intreadwrite.h | 20 +++++++-------------
>>>> 2 files changed, 11 insertions(+), 14 deletions(-)
>>>> 
>>>> diff --git a/configure b/configure
>>>> index f84fefeaab..7151ed1de3 100755
>>>> --- a/configure
>>>> +++ b/configure
>>>> @@ -2314,6 +2314,7 @@ HEADERS_LIST="
>>>> 
>>>> INTRINSICS_LIST="
>>>>     intrinsics_neon
>>>> +    intrinsics_sse
>>>>     intrinsics_sse2
>>>> "
>>>> 
>>>> @@ -2744,7 +2745,8 @@ armv6t2_deps="arm"
>>>> armv8_deps="aarch64"
>>>> neon_deps_any="aarch64 arm"
>>>> intrinsics_neon_deps="neon"
>>>> -intrinsics_sse2_deps="sse2"
>>>> +intrinsics_sse_deps="sse"
>>>> +intrinsics_sse2_deps="sse2 intrinsics_sse"
>>>> vfp_deps="arm"
>>>> vfpv3_deps="vfp"
>>>> setend_deps="arm"
>>>> @@ -6446,6 +6448,7 @@ elif enabled loongarch; then
>>>> fi
>>>> 
>>>> check_cc intrinsics_neon arm_neon.h "int16x8_t test = vdupq_n_s16(0)"
>>>> +check_cc intrinsics_sse immintrin.h "__m128 test = _mm_setzero_ps()"
>>>> check_cc intrinsics_sse2 emmintrin.h "__m128i test = 
>> _mm_setzero_si128()"
>>>> 
>>>> check_ldflags -Wl,--as-needed
>>>> diff --git a/libavutil/x86/intreadwrite.h 
>> b/libavutil/x86/intreadwrite.h
>>>> index 9bbef00dba..6546eb016c 100644
>>>> --- a/libavutil/x86/intreadwrite.h
>>>> +++ b/libavutil/x86/intreadwrite.h
>>>> @@ -22,29 +22,25 @@
>>>> #define AVUTIL_X86_INTREADWRITE_H
>>>> 
>>>> #include <stdint.h>
>>>> +#if HAVE_INTRINSICS_SSE
>>>> +#include <immintrin.h>
>>>> +#endif
>>> 
>>> This change seems to have broken builds for x86 with Clang 16 or 
>>> newer. (Clang 15 and lower seems to be fine.)
>>> 
>>> See e.g. 
>> http://fate.ffmpeg.org/log.cgi?slot=i686-mingw32-clang-trunk&time=20240711035948&log=compile 
>> for an example of the error. The issue is that a clang internal 
>> intrinsics header contains "_mm_comige_sh(__m128h A,", i.e. a parameter 
>> with the name "A" (which toolchain provided headers shouldn't use). 
>> This clashes with libavcodec/huffuyv.h, which has a "#define A 3".
>>> 
>>> This is obviously a Clang intrinsics header bug, but we can't fix the 
>>> existing Clang 16-18 releases that are out there, so I guess what we 
>> can 
>>> do is change our "define A" to something more elaborate. (IIRC there 
>> are 
>>> some similar issues with names with ncurses and/or android headers 
>> too.)
>>> 
>>> // Martin
>> 
>> We also do "#define A AV_OPT_FLAG_AUDIO_PARAM" in options_table.h and 
>> probably other places, so changing huffyuv.h may not be enough.
>
> That's quite possible, but those cases may be including intreadwrite.h 
> before that, do it's possible it might not trigger there.
>
> I'll see how many places need to be changed here.

huffyuvenc.c seems to be the only file that runs into the issue; it 
includes put_bits.h (which brings in libavutil/intreadwrite.h) after 
huffyuv.h.

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [FFmpeg-devel] [FFmpeg-cvslog] x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128
  2024-07-11 13:54       ` Martin Storsjö
@ 2024-07-11 17:53         ` James Almer
  2024-07-11 21:00           ` Martin Storsjö
  0 siblings, 1 reply; 7+ messages in thread
From: James Almer @ 2024-07-11 17:53 UTC (permalink / raw)
  To: ffmpeg-devel

On 7/11/2024 10:54 AM, Martin Storsjö wrote:
> On Thu, 11 Jul 2024, Martin Storsjö wrote:
> 
>> On Thu, 11 Jul 2024, James Almer wrote:
>>
>>> On 7/11/2024 10:08 AM, Martin Storsjö wrote:
>>>> On Wed, 10 Jul 2024, James Almer wrote:
>>>>
>>>>> ffmpeg | branch: master | James Almer <jamrial@gmail.com> | Wed Jul 
>>>>> 10 13:00:20 2024 -0300| [bd1bcb07e0f29c135103a402d71b343a09ad1690] 
>>>>> | committer: James Almer
>>>>>
>>>>> x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128
>>>>>
>>>>> This has the benefit of removing any SSE -> AVX penalty that may 
>>>>> happen when
>>>>> the compiler emits VEX encoded instructions.
>>>>>
>>>>> Signed-off-by: James Almer <jamrial@gmail.com>
>>>>>
>>>>>>
>>> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=bd1bcb07e0f29c135103a402d71b343a09ad1690
>>>>> ---
>>>>>
>>>>> configure                    |  5 ++++-
>>>>> libavutil/x86/intreadwrite.h | 20 +++++++-------------
>>>>> 2 files changed, 11 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/configure b/configure
>>>>> index f84fefeaab..7151ed1de3 100755
>>>>> --- a/configure
>>>>> +++ b/configure
>>>>> @@ -2314,6 +2314,7 @@ HEADERS_LIST="
>>>>>
>>>>> INTRINSICS_LIST="
>>>>>     intrinsics_neon
>>>>> +    intrinsics_sse
>>>>>     intrinsics_sse2
>>>>> "
>>>>>
>>>>> @@ -2744,7 +2745,8 @@ armv6t2_deps="arm"
>>>>> armv8_deps="aarch64"
>>>>> neon_deps_any="aarch64 arm"
>>>>> intrinsics_neon_deps="neon"
>>>>> -intrinsics_sse2_deps="sse2"
>>>>> +intrinsics_sse_deps="sse"
>>>>> +intrinsics_sse2_deps="sse2 intrinsics_sse"
>>>>> vfp_deps="arm"
>>>>> vfpv3_deps="vfp"
>>>>> setend_deps="arm"
>>>>> @@ -6446,6 +6448,7 @@ elif enabled loongarch; then
>>>>> fi
>>>>>
>>>>> check_cc intrinsics_neon arm_neon.h "int16x8_t test = vdupq_n_s16(0)"
>>>>> +check_cc intrinsics_sse immintrin.h "__m128 test = _mm_setzero_ps()"
>>>>> check_cc intrinsics_sse2 emmintrin.h "__m128i test = 
>>> _mm_setzero_si128()"
>>>>>
>>>>> check_ldflags -Wl,--as-needed
>>>>> diff --git a/libavutil/x86/intreadwrite.h 
>>> b/libavutil/x86/intreadwrite.h
>>>>> index 9bbef00dba..6546eb016c 100644
>>>>> --- a/libavutil/x86/intreadwrite.h
>>>>> +++ b/libavutil/x86/intreadwrite.h
>>>>> @@ -22,29 +22,25 @@
>>>>> #define AVUTIL_X86_INTREADWRITE_H
>>>>>
>>>>> #include <stdint.h>
>>>>> +#if HAVE_INTRINSICS_SSE
>>>>> +#include <immintrin.h>
>>>>> +#endif
>>>>
>>>> This change seems to have broken builds for x86 with Clang 16 or 
>>>> newer. (Clang 15 and lower seems to be fine.)
>>>>
>>>> See e.g. 
>>> http://fate.ffmpeg.org/log.cgi?slot=i686-mingw32-clang-trunk&time=20240711035948&log=compile for an example of the error. The issue is that a clang internal intrinsics header contains "_mm_comige_sh(__m128h A,", i.e. a parameter with the name "A" (which toolchain provided headers shouldn't use). This clashes with libavcodec/huffuyv.h, which has a "#define A 3".
>>>>
>>>> This is obviously a Clang intrinsics header bug, but we can't fix 
>>>> the existing Clang 16-18 releases that are out there, so I guess 
>>>> what we 
>>> can
>>>> do is change our "define A" to something more elaborate. (IIRC there 
>>> are
>>>> some similar issues with names with ncurses and/or android headers 
>>> too.)
>>>>
>>>> // Martin
>>>
>>> We also do "#define A AV_OPT_FLAG_AUDIO_PARAM" in options_table.h and 
>>> probably other places, so changing huffyuv.h may not be enough.
>>
>> That's quite possible, but those cases may be including intreadwrite.h 
>> before that, do it's possible it might not trigger there.
>>
>> I'll see how many places need to be changed here.
> 
> huffyuvenc.c seems to be the only file that runs into the issue; it 
> includes put_bits.h (which brings in libavutil/intreadwrite.h) after 
> huffyuv.h.

Can we move the huffyuv.h include right after put_bits.h then? I 
personally prefer that over changing the RGBA defines to workaround a 
compiler bug.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [FFmpeg-devel] [FFmpeg-cvslog] x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128
  2024-07-11 17:53         ` James Almer
@ 2024-07-11 21:00           ` Martin Storsjö
  2024-07-11 21:11             ` James Almer
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Storsjö @ 2024-07-11 21:00 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

On Thu, 11 Jul 2024, James Almer wrote:

> On 7/11/2024 10:54 AM, Martin Storsjö wrote:
>> On Thu, 11 Jul 2024, Martin Storsjö wrote:
>> 
>>> On Thu, 11 Jul 2024, James Almer wrote:
>>>
>>>> On 7/11/2024 10:08 AM, Martin Storsjö wrote:
>>>>> On Wed, 10 Jul 2024, James Almer wrote:
>>>>>
>>>>>> ffmpeg | branch: master | James Almer <jamrial@gmail.com> | Wed Jul 
>>>>>> 10 13:00:20 2024 -0300| [bd1bcb07e0f29c135103a402d71b343a09ad1690] 
>>>>>> | committer: James Almer
>>>>>>
>>>>>> x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128
>>>>>>
>>>>>> This has the benefit of removing any SSE -> AVX penalty that may 
>>>>>> happen when
>>>>>> the compiler emits VEX encoded instructions.
>>>>>>
>>>>>> Signed-off-by: James Almer <jamrial@gmail.com>
>>>>>>
>>>>>>>
>>>> 
> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=bd1bcb07e0f29c135103a402d71b343a09ad1690
>>>>>> ---
>>>>>>
>>>>>> configure                    |  5 ++++-
>>>>>> libavutil/x86/intreadwrite.h | 20 +++++++-------------
>>>>>> 2 files changed, 11 insertions(+), 14 deletions(-)
>>>>>>
>>>>>> diff --git a/configure b/configure
>>>>>> index f84fefeaab..7151ed1de3 100755
>>>>>> --- a/configure
>>>>>> +++ b/configure
>>>>>> @@ -2314,6 +2314,7 @@ HEADERS_LIST="
>>>>>>
>>>>>> INTRINSICS_LIST="
>>>>>>     intrinsics_neon
>>>>>> +    intrinsics_sse
>>>>>>     intrinsics_sse2
>>>>>> "
>>>>>>
>>>>>> @@ -2744,7 +2745,8 @@ armv6t2_deps="arm"
>>>>>> armv8_deps="aarch64"
>>>>>> neon_deps_any="aarch64 arm"
>>>>>> intrinsics_neon_deps="neon"
>>>>>> -intrinsics_sse2_deps="sse2"
>>>>>> +intrinsics_sse_deps="sse"
>>>>>> +intrinsics_sse2_deps="sse2 intrinsics_sse"
>>>>>> vfp_deps="arm"
>>>>>> vfpv3_deps="vfp"
>>>>>> setend_deps="arm"
>>>>>> @@ -6446,6 +6448,7 @@ elif enabled loongarch; then
>>>>>> fi
>>>>>>
>>>>>> check_cc intrinsics_neon arm_neon.h "int16x8_t test = vdupq_n_s16(0)"
>>>>>> +check_cc intrinsics_sse immintrin.h "__m128 test = _mm_setzero_ps()"
>>>>>> check_cc intrinsics_sse2 emmintrin.h "__m128i test = 
>>>> _mm_setzero_si128()"
>>>>>>
>>>>>> check_ldflags -Wl,--as-needed
>>>>>> diff --git a/libavutil/x86/intreadwrite.h 
>>>> b/libavutil/x86/intreadwrite.h
>>>>>> index 9bbef00dba..6546eb016c 100644
>>>>>> --- a/libavutil/x86/intreadwrite.h
>>>>>> +++ b/libavutil/x86/intreadwrite.h
>>>>>> @@ -22,29 +22,25 @@
>>>>>> #define AVUTIL_X86_INTREADWRITE_H
>>>>>>
>>>>>> #include <stdint.h>
>>>>>> +#if HAVE_INTRINSICS_SSE
>>>>>> +#include <immintrin.h>
>>>>>> +#endif
>>>>>
>>>>> This change seems to have broken builds for x86 with Clang 16 or 
>>>>> newer. (Clang 15 and lower seems to be fine.)
>>>>>
>>>>> See e.g. 
>>>> 
> http://fate.ffmpeg.org/log.cgi?slot=i686-mingw32-clang-trunk&time=20240711035948&log=compile 
> for an example of the error. The issue is that a clang internal intrinsics 
> header contains "_mm_comige_sh(__m128h A,", i.e. a parameter with the name 
> "A" (which toolchain provided headers shouldn't use). This clashes with 
> libavcodec/huffuyv.h, which has a "#define A 3".
>>>>>
>>>>> This is obviously a Clang intrinsics header bug, but we can't fix 
>>>>> the existing Clang 16-18 releases that are out there, so I guess 
>>>>> what we 
>>>> can
>>>>> do is change our "define A" to something more elaborate. (IIRC there 
>>>> are
>>>>> some similar issues with names with ncurses and/or android headers 
>>>> too.)
>>>>>
>>>>> // Martin
>>>>
>>>> We also do "#define A AV_OPT_FLAG_AUDIO_PARAM" in options_table.h and 
>>>> probably other places, so changing huffyuv.h may not be enough.
>>>
>>> That's quite possible, but those cases may be including intreadwrite.h 
>>> before that, do it's possible it might not trigger there.
>>>
>>> I'll see how many places need to be changed here.
>> 
>> huffyuvenc.c seems to be the only file that runs into the issue; it 
>> includes put_bits.h (which brings in libavutil/intreadwrite.h) after 
>> huffyuv.h.
>
> Can we move the huffyuv.h include right after put_bits.h then? I 
> personally prefer that over changing the RGBA defines to workaround a 
> compiler bug.

That would probably work too.

However your other recent patch, which removes the include of immintrin.h, 
also avoids the issue altogether.

(FWIW, the upstream bug is fixed now in 
https://github.com/llvm/llvm-project/commit/6f04f46927cf54d19cc2a1470f47d5db4b3b96bb.)

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [FFmpeg-devel] [FFmpeg-cvslog] x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128
  2024-07-11 21:00           ` Martin Storsjö
@ 2024-07-11 21:11             ` James Almer
  0 siblings, 0 replies; 7+ messages in thread
From: James Almer @ 2024-07-11 21:11 UTC (permalink / raw)
  To: ffmpeg-devel

On 7/11/2024 6:00 PM, Martin Storsjö wrote:
> On Thu, 11 Jul 2024, James Almer wrote:
> 
>> On 7/11/2024 10:54 AM, Martin Storsjö wrote:
>>> On Thu, 11 Jul 2024, Martin Storsjö wrote:
>>>
>>>> On Thu, 11 Jul 2024, James Almer wrote:
>>>>
>>>>> On 7/11/2024 10:08 AM, Martin Storsjö wrote:
>>>>>> On Wed, 10 Jul 2024, James Almer wrote:
>>>>>>
>>>>>>> ffmpeg | branch: master | James Almer <jamrial@gmail.com> | Wed 
>>>>>>> Jul 10 13:00:20 2024 -0300| 
>>>>>>> [bd1bcb07e0f29c135103a402d71b343a09ad1690] | committer: James Almer
>>>>>>>
>>>>>>> x86/intreadwrite: use intrinsics instead of inline asm for 
>>>>>>> AV_COPY128
>>>>>>>
>>>>>>> This has the benefit of removing any SSE -> AVX penalty that may 
>>>>>>> happen when
>>>>>>> the compiler emits VEX encoded instructions.
>>>>>>>
>>>>>>> Signed-off-by: James Almer <jamrial@gmail.com>
>>>>>>>
>>>>>>>>
>>>>>
>> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=bd1bcb07e0f29c135103a402d71b343a09ad1690
>>>>>>> ---
>>>>>>>
>>>>>>> configure                    |  5 ++++-
>>>>>>> libavutil/x86/intreadwrite.h | 20 +++++++-------------
>>>>>>> 2 files changed, 11 insertions(+), 14 deletions(-)
>>>>>>>
>>>>>>> diff --git a/configure b/configure
>>>>>>> index f84fefeaab..7151ed1de3 100755
>>>>>>> --- a/configure
>>>>>>> +++ b/configure
>>>>>>> @@ -2314,6 +2314,7 @@ HEADERS_LIST="
>>>>>>>
>>>>>>> INTRINSICS_LIST="
>>>>>>>     intrinsics_neon
>>>>>>> +    intrinsics_sse
>>>>>>>     intrinsics_sse2
>>>>>>> "
>>>>>>>
>>>>>>> @@ -2744,7 +2745,8 @@ armv6t2_deps="arm"
>>>>>>> armv8_deps="aarch64"
>>>>>>> neon_deps_any="aarch64 arm"
>>>>>>> intrinsics_neon_deps="neon"
>>>>>>> -intrinsics_sse2_deps="sse2"
>>>>>>> +intrinsics_sse_deps="sse"
>>>>>>> +intrinsics_sse2_deps="sse2 intrinsics_sse"
>>>>>>> vfp_deps="arm"
>>>>>>> vfpv3_deps="vfp"
>>>>>>> setend_deps="arm"
>>>>>>> @@ -6446,6 +6448,7 @@ elif enabled loongarch; then
>>>>>>> fi
>>>>>>>
>>>>>>> check_cc intrinsics_neon arm_neon.h "int16x8_t test = 
>>>>>>> vdupq_n_s16(0)"
>>>>>>> +check_cc intrinsics_sse immintrin.h "__m128 test = 
>>>>>>> _mm_setzero_ps()"
>>>>>>> check_cc intrinsics_sse2 emmintrin.h "__m128i test = 
>>>>> _mm_setzero_si128()"
>>>>>>>
>>>>>>> check_ldflags -Wl,--as-needed
>>>>>>> diff --git a/libavutil/x86/intreadwrite.h 
>>>>> b/libavutil/x86/intreadwrite.h
>>>>>>> index 9bbef00dba..6546eb016c 100644
>>>>>>> --- a/libavutil/x86/intreadwrite.h
>>>>>>> +++ b/libavutil/x86/intreadwrite.h
>>>>>>> @@ -22,29 +22,25 @@
>>>>>>> #define AVUTIL_X86_INTREADWRITE_H
>>>>>>>
>>>>>>> #include <stdint.h>
>>>>>>> +#if HAVE_INTRINSICS_SSE
>>>>>>> +#include <immintrin.h>
>>>>>>> +#endif
>>>>>>
>>>>>> This change seems to have broken builds for x86 with Clang 16 or 
>>>>>> newer. (Clang 15 and lower seems to be fine.)
>>>>>>
>>>>>> See e.g. 
>>>>>
>> http://fate.ffmpeg.org/log.cgi?slot=i686-mingw32-clang-trunk&time=20240711035948&log=compile for an example of the error. The issue is that a clang internal intrinsics header contains "_mm_comige_sh(__m128h A,", i.e. a parameter with the name "A" (which toolchain provided headers shouldn't use). This clashes with libavcodec/huffuyv.h, which has a "#define A 3".
>>>>>>
>>>>>> This is obviously a Clang intrinsics header bug, but we can't fix 
>>>>>> the existing Clang 16-18 releases that are out there, so I guess 
>>>>>> what we 
>>>>> can
>>>>>> do is change our "define A" to something more elaborate. (IIRC there 
>>>>> are
>>>>>> some similar issues with names with ncurses and/or android headers 
>>>>> too.)
>>>>>>
>>>>>> // Martin
>>>>>
>>>>> We also do "#define A AV_OPT_FLAG_AUDIO_PARAM" in options_table.h 
>>>>> and probably other places, so changing huffyuv.h may not be enough.
>>>>
>>>> That's quite possible, but those cases may be including 
>>>> intreadwrite.h before that, do it's possible it might not trigger 
>>>> there.
>>>>
>>>> I'll see how many places need to be changed here.
>>>
>>> huffyuvenc.c seems to be the only file that runs into the issue; it 
>>> includes put_bits.h (which brings in libavutil/intreadwrite.h) after 
>>> huffyuv.h.
>>
>> Can we move the huffyuv.h include right after put_bits.h then? I 
>> personally prefer that over changing the RGBA defines to workaround a 
>> compiler bug.
> 
> That would probably work too.
> 
> However your other recent patch, which removes the include of 
> immintrin.h, also avoids the issue altogether.

So immintrin.h includes all the avx512 headers, but emmintrin.h doesn't? 
Two birds, one stone then :p
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-07-11 21:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20240710162641.7B606412682@natalya.videolan.org>
2024-07-11 13:08 ` [FFmpeg-devel] [FFmpeg-cvslog] x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128 Martin Storsjö
2024-07-11 13:13   ` James Almer
2024-07-11 13:42     ` Martin Storsjö
2024-07-11 13:54       ` Martin Storsjö
2024-07-11 17:53         ` James Almer
2024-07-11 21:00           ` Martin Storsjö
2024-07-11 21:11             ` James Almer

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git