On Tue, Aug 23, 2022 at 08:09:09PM +0200, Andreas Rheinhardt wrote: > Michael Niedermayer: > > On Tue, Aug 23, 2022 at 07:28:19PM +0200, Andreas Rheinhardt wrote: > >> Michael Niedermayer: > >>> On Mon, Aug 22, 2022 at 11:59:17PM +0200, Andreas Rheinhardt wrote: > >>>> Andreas Rheinhardt: > >>>>> Fixes FATE-failures with the the filter-2xbr filter-3xbr filter-4xbr > >>>>> filter-ep2x filter-ep3x filter-hq2x filter-hq3x filter-hq4x > >>>>> filter-paletteuse-bayer filter-paletteuse-bayer0 > >>>>> filter-paletteuse-nodither and filter-paletteuse-sierra2_4a tests > >>>>> when using 32bit x86 with CPUFLAGS ranging from "mmx+mmxext" to > >>>>> "mmx+mmxext+sse+sse2+sse3" (the relevant function is only overwritten > >>>>> when using SSSE3). > >>>>> > >>>>> Signed-off-by: Andreas Rheinhardt > >>>>> --- > >>>>> libswscale/x86/rgb_2_rgb.asm | 1 + > >>>>> 1 file changed, 1 insertion(+) > >>>>> > >>>>> diff --git a/libswscale/x86/rgb_2_rgb.asm b/libswscale/x86/rgb_2_rgb.asm > >>>>> index c695c61d5c..76ca1eec03 100644 > >>>>> --- a/libswscale/x86/rgb_2_rgb.asm > >>>>> +++ b/libswscale/x86/rgb_2_rgb.asm > >>>>> @@ -104,6 +104,7 @@ jge .end > >>>>> jl .loop_simd > >>>>> > >>>>> .end: > >>>>> + emms > >>>>> RET > >>>>> > >>>>> ;------------------------------------------------------------------------------ > >>>> > >>>> I'd really love if someone with x86 assembly skills could look over this > >>>> trivial patch and confirm whether it is indeed correct. All I currently > >>>> know is that is works for me. > >>> > >>> emms needs to be called between MMX and float code, as far outside of loops > >>> as possible > >>> that would suggest outside the for() loops in rgbToRgbWrapper() and any > >>> other code using it. > >> > >> But there is another aspect that the above is missing: Namely that if > >> emms_c() is put outside of MMX functions, then it will be called even > >> when it is unnecessary. In this case it is unnecessary for all modern > >> CPUs, as this function is overridden when SSSE3 is available. > > > > If you dont like that, > > dont call it when its not needed or call it a few hundread times unnecessary > > like your patch does. > > or write only code that doesnt need emms > > maybe there are more options ... > > > > If emms_c() is used as now outside of MMX functions, then a "dont call > it when its not needed" would involve a check and would therefore still > incur cost for users who don't use this. Also it is unclear how such a > check would even look like given that one can use av_force_cpu_flags(). > See also 55fc2c5a892c50feb1b9a8f55b74ec6594755ddb. > This patch also only calls it a few hundred times unnecessarily if one > runs this without SSSE3. CPUs without SSSE3 are ancient today. For the > non-ancient CPUs, using emms_c() adds an EMMS. do whatever you prefer. The best solution depends on assumptions. The impact is biggest on old CPUs where EMMS is also a slow instruction But as you say these are ancient today. very small impact on many vs small to moderate impact on a today rare setup the worst is if the bug is left open and time is wasted on bikesheding thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Avoid a single point of failure, be that a person or equipment.