Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
* [FFmpeg-devel] [PATCH] avutil/x86/intmath: remove inline asm implementations for clip functions
@ 2025-06-02 18:41 James Almer
  2025-06-03 16:15 ` Niklas Haas
  0 siblings, 1 reply; 6+ messages in thread
From: James Almer @ 2025-06-02 18:41 UTC (permalink / raw)
  To: ffmpeg-devel

GCC/Clang is smart enough to emit minss/maxss the same way as these functions.
The only theoretical benefit was in x86_32, where x87 floats are used, but the
penalty of making the clipping opaque to the compiler's scheduler plus moving
values from mmx regs to xmm and back will offset any potential speedup.
x86_32 builds targetting anything made in the last two decades and a half
should use -msse -mfp=sse anyway.

Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavutil/x86/intmath.h | 62 -----------------------------------------
 1 file changed, 62 deletions(-)

diff --git a/libavutil/x86/intmath.h b/libavutil/x86/intmath.h
index 4893a1f1b4..735945ca95 100644
--- a/libavutil/x86/intmath.h
+++ b/libavutil/x86/intmath.h
@@ -114,68 +114,6 @@ static av_always_inline av_const unsigned av_zero_extend_bmi2(unsigned a, unsign
 
 #endif /* __BMI2__ */
 
-#if defined(__SSE2__) && !defined(__INTEL_COMPILER)
-
-#define av_clipd av_clipd_sse2
-static av_always_inline av_const double av_clipd_sse2(double a, double amin, double amax)
-{
-#if defined(ASSERT_LEVEL) && ASSERT_LEVEL >= 2
-    if (amin > amax) abort();
-#endif
-    __asm__ ("maxsd %1, %0 \n\t"
-             "minsd %2, %0 \n\t"
-             : "+&x"(a) : "xm"(amin), "xm"(amax));
-    return a;
-}
-
-#endif /* __SSE2__ */
-
-#if defined(__SSE__) && !defined(__INTEL_COMPILER)
-
-#define av_clipf av_clipf_sse
-static av_always_inline av_const float av_clipf_sse(float a, float amin, float amax)
-{
-#if defined(ASSERT_LEVEL) && ASSERT_LEVEL >= 2
-    if (amin > amax) abort();
-#endif
-    __asm__ ("maxss %1, %0 \n\t"
-             "minss %2, %0 \n\t"
-             : "+&x"(a) : "xm"(amin), "xm"(amax));
-    return a;
-}
-
-#endif /* __SSE__ */
-
-#if defined(__AVX__) && !defined(__INTEL_COMPILER)
-
-#undef av_clipd
-#define av_clipd av_clipd_avx
-static av_always_inline av_const double av_clipd_avx(double a, double amin, double amax)
-{
-#if defined(ASSERT_LEVEL) && ASSERT_LEVEL >= 2
-    if (amin > amax) abort();
-#endif
-    __asm__ ("vmaxsd %1, %0, %0 \n\t"
-             "vminsd %2, %0, %0 \n\t"
-             : "+&x"(a) : "xm"(amin), "xm"(amax));
-    return a;
-}
-
-#undef av_clipf
-#define av_clipf av_clipf_avx
-static av_always_inline av_const float av_clipf_avx(float a, float amin, float amax)
-{
-#if defined(ASSERT_LEVEL) && ASSERT_LEVEL >= 2
-    if (amin > amax) abort();
-#endif
-    __asm__ ("vmaxss %1, %0, %0 \n\t"
-             "vminss %2, %0, %0 \n\t"
-             : "+&x"(a) : "xm"(amin), "xm"(amax));
-    return a;
-}
-
-#endif /* __AVX__ */
-
 #endif /* __GNUC__ */
 
 #endif /* AVUTIL_X86_INTMATH_H */
-- 
2.49.0

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FFmpeg-devel] [PATCH] avutil/x86/intmath: remove inline asm implementations for clip functions
  2025-06-02 18:41 [FFmpeg-devel] [PATCH] avutil/x86/intmath: remove inline asm implementations for clip functions James Almer
@ 2025-06-03 16:15 ` Niklas Haas
  2025-06-03 16:22   ` Andreas Rheinhardt
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Niklas Haas @ 2025-06-03 16:15 UTC (permalink / raw)
  To: ffmpeg-devel

On Mon, 02 Jun 2025 15:41:33 -0300 James Almer <jamrial@gmail.com> wrote:
> GCC/Clang is smart enough to emit minss/maxss the same way as these functions.
> The only theoretical benefit was in x86_32, where x87 floats are used, but the
> penalty of making the clipping opaque to the compiler's scheduler plus moving
> values from mmx regs to xmm and back will offset any potential speedup.
> x86_32 builds targetting anything made in the last two decades and a half
> should use -msse -mfp=sse anyway.

As mention in the another thread, x87 FPU usage causes non-bitexact results in
swscale. Should we at this point consider setting -mfpu=sse by default for
x86_32 builds?

>
> Signed-off-by: James Almer <jamrial@gmail.com>
> ---
>  libavutil/x86/intmath.h | 62 -----------------------------------------
>  1 file changed, 62 deletions(-)
>
> diff --git a/libavutil/x86/intmath.h b/libavutil/x86/intmath.h
> index 4893a1f1b4..735945ca95 100644
> --- a/libavutil/x86/intmath.h
> +++ b/libavutil/x86/intmath.h
> @@ -114,68 +114,6 @@ static av_always_inline av_const unsigned av_zero_extend_bmi2(unsigned a, unsign
>
>  #endif /* __BMI2__ */
>
> -#if defined(__SSE2__) && !defined(__INTEL_COMPILER)
> -
> -#define av_clipd av_clipd_sse2
> -static av_always_inline av_const double av_clipd_sse2(double a, double amin, double amax)
> -{
> -#if defined(ASSERT_LEVEL) && ASSERT_LEVEL >= 2
> -    if (amin > amax) abort();
> -#endif
> -    __asm__ ("maxsd %1, %0 \n\t"
> -             "minsd %2, %0 \n\t"
> -             : "+&x"(a) : "xm"(amin), "xm"(amax));
> -    return a;
> -}
> -
> -#endif /* __SSE2__ */
> -
> -#if defined(__SSE__) && !defined(__INTEL_COMPILER)
> -
> -#define av_clipf av_clipf_sse
> -static av_always_inline av_const float av_clipf_sse(float a, float amin, float amax)
> -{
> -#if defined(ASSERT_LEVEL) && ASSERT_LEVEL >= 2
> -    if (amin > amax) abort();
> -#endif
> -    __asm__ ("maxss %1, %0 \n\t"
> -             "minss %2, %0 \n\t"
> -             : "+&x"(a) : "xm"(amin), "xm"(amax));
> -    return a;
> -}
> -
> -#endif /* __SSE__ */
> -
> -#if defined(__AVX__) && !defined(__INTEL_COMPILER)
> -
> -#undef av_clipd
> -#define av_clipd av_clipd_avx
> -static av_always_inline av_const double av_clipd_avx(double a, double amin, double amax)
> -{
> -#if defined(ASSERT_LEVEL) && ASSERT_LEVEL >= 2
> -    if (amin > amax) abort();
> -#endif
> -    __asm__ ("vmaxsd %1, %0, %0 \n\t"
> -             "vminsd %2, %0, %0 \n\t"
> -             : "+&x"(a) : "xm"(amin), "xm"(amax));
> -    return a;
> -}
> -
> -#undef av_clipf
> -#define av_clipf av_clipf_avx
> -static av_always_inline av_const float av_clipf_avx(float a, float amin, float amax)
> -{
> -#if defined(ASSERT_LEVEL) && ASSERT_LEVEL >= 2
> -    if (amin > amax) abort();
> -#endif
> -    __asm__ ("vmaxss %1, %0, %0 \n\t"
> -             "vminss %2, %0, %0 \n\t"
> -             : "+&x"(a) : "xm"(amin), "xm"(amax));
> -    return a;
> -}
> -
> -#endif /* __AVX__ */
> -
>  #endif /* __GNUC__ */
>
>  #endif /* AVUTIL_X86_INTMATH_H */
> --
> 2.49.0
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FFmpeg-devel] [PATCH] avutil/x86/intmath: remove inline asm implementations for clip functions
  2025-06-03 16:15 ` Niklas Haas
@ 2025-06-03 16:22   ` Andreas Rheinhardt
  2025-06-03 16:32     ` Niklas Haas
  2025-06-03 16:40   ` Martin Storsjö
  2025-06-04 11:33   ` Rémi Denis-Courmont
  2 siblings, 1 reply; 6+ messages in thread
From: Andreas Rheinhardt @ 2025-06-03 16:22 UTC (permalink / raw)
  To: ffmpeg-devel

Niklas Haas:
> On Mon, 02 Jun 2025 15:41:33 -0300 James Almer <jamrial@gmail.com> wrote:
>> GCC/Clang is smart enough to emit minss/maxss the same way as these functions.
>> The only theoretical benefit was in x86_32, where x87 floats are used, but the
>> penalty of making the clipping opaque to the compiler's scheduler plus moving
>> values from mmx regs to xmm and back will offset any potential speedup.
>> x86_32 builds targetting anything made in the last two decades and a half
>> should use -msse -mfp=sse anyway.
> 
> As mention in the another thread, x87 FPU usage causes non-bitexact results in
> swscale. Should we at this point consider setting -mfpu=sse by default for
> x86_32 builds?
> 
What about CPUs without SSE?

- Andreas

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FFmpeg-devel] [PATCH] avutil/x86/intmath: remove inline asm implementations for clip functions
  2025-06-03 16:22   ` Andreas Rheinhardt
@ 2025-06-03 16:32     ` Niklas Haas
  0 siblings, 0 replies; 6+ messages in thread
From: Niklas Haas @ 2025-06-03 16:32 UTC (permalink / raw)
  To: ffmpeg-devel

On Tue, 03 Jun 2025 18:22:30 +0200 Andreas Rheinhardt <andreas.rheinhardt@outlook.com> wrote:
> Niklas Haas:
> > On Mon, 02 Jun 2025 15:41:33 -0300 James Almer <jamrial@gmail.com> wrote:
> >> GCC/Clang is smart enough to emit minss/maxss the same way as these functions.
> >> The only theoretical benefit was in x86_32, where x87 floats are used, but the
> >> penalty of making the clipping opaque to the compiler's scheduler plus moving
> >> values from mmx regs to xmm and back will offset any potential speedup.
> >> x86_32 builds targetting anything made in the last two decades and a half
> >> should use -msse -mfp=sse anyway.
> >
> > As mention in the another thread, x87 FPU usage causes non-bitexact results in
> > swscale. Should we at this point consider setting -mfpu=sse by default for
> > x86_32 builds?
> >
> What about CPUs without SSE?

Have been dropped from all major distros, as far as I can tell, so I'm not sure
what userbase we would be alienating by not supporting them out of the box.

If you have a valid use case for combining FFmpeg git master builds from 2025
with obsolete hardware 1999, you can just disable the flag and live with the
fact that your output will not be bitexact.

>
> - Andreas
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FFmpeg-devel] [PATCH] avutil/x86/intmath: remove inline asm implementations for clip functions
  2025-06-03 16:15 ` Niklas Haas
  2025-06-03 16:22   ` Andreas Rheinhardt
@ 2025-06-03 16:40   ` Martin Storsjö
  2025-06-04 11:33   ` Rémi Denis-Courmont
  2 siblings, 0 replies; 6+ messages in thread
From: Martin Storsjö @ 2025-06-03 16:40 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

On Tue, 3 Jun 2025, Niklas Haas wrote:

> On Mon, 02 Jun 2025 15:41:33 -0300 James Almer <jamrial@gmail.com> wrote:
>> GCC/Clang is smart enough to emit minss/maxss the same way as these functions.
>> The only theoretical benefit was in x86_32, where x87 floats are used, but the
>> penalty of making the clipping opaque to the compiler's scheduler plus moving
>> values from mmx regs to xmm and back will offset any potential speedup.
>> x86_32 builds targetting anything made in the last two decades and a half
>> should use -msse -mfp=sse anyway.
>
> As mention in the another thread, x87 FPU usage causes non-bitexact results in
> swscale. Should we at this point consider setting -mfpu=sse by default for
> x86_32 builds?

I don't object to doing that - however, if we have float code, that is 
expected to be bitexact, the root issue still remains even if we force SSE 
fpmath on x86. The same issue could crop up on any 
architecture/OS/compiler combo, it's just that x87 math shows the issues 
much easier. (Not making a judgement whether that's an issue we want to 
deal with, or whether we're ok with having bitexact code that relies on 
float math that behaves close enough to some specific reference.)

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FFmpeg-devel] [PATCH] avutil/x86/intmath: remove inline asm implementations for clip functions
  2025-06-03 16:15 ` Niklas Haas
  2025-06-03 16:22   ` Andreas Rheinhardt
  2025-06-03 16:40   ` Martin Storsjö
@ 2025-06-04 11:33   ` Rémi Denis-Courmont
  2 siblings, 0 replies; 6+ messages in thread
From: Rémi Denis-Courmont @ 2025-06-04 11:33 UTC (permalink / raw)
  To: FFmpeg development discussions and patches



Le 3 juin 2025 19:15:57 GMT+03:00, Niklas Haas <ffmpeg@haasn.xyz> a écrit :
>On Mon, 02 Jun 2025 15:41:33 -0300 James Almer <jamrial@gmail.com> wrote:
>> GCC/Clang is smart enough to emit minss/maxss the same way as these functions.
>> The only theoretical benefit was in x86_32, where x87 floats are used, but the
>> penalty of making the clipping opaque to the compiler's scheduler plus moving
>> values from mmx regs to xmm and back will offset any potential speedup.
>> x86_32 builds targetting anything made in the last two decades and a half
>> should use -msse -mfp=sse anyway.
>
>As mention in the another thread, x87 FPU usage causes non-bitexact results in
>swscale. Should we at this point consider setting -mfpu=sse by default for
>x86_32 builds?

As a general rule, I prefer to leave the choice of compiler flags to the distributor than to hard-code them in project specific build scripts.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-06-04 11:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-06-02 18:41 [FFmpeg-devel] [PATCH] avutil/x86/intmath: remove inline asm implementations for clip functions James Almer
2025-06-03 16:15 ` Niklas Haas
2025-06-03 16:22   ` Andreas Rheinhardt
2025-06-03 16:32     ` Niklas Haas
2025-06-03 16:40   ` Martin Storsjö
2025-06-04 11:33   ` Rémi Denis-Courmont

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git