From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id F2CF943FB4 for ; Tue, 23 Aug 2022 18:22:27 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1E33568B995; Tue, 23 Aug 2022 21:22:25 +0300 (EEST) Received: from relay7-d.mail.gandi.net (relay7-d.mail.gandi.net [217.70.183.200]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A067068B934 for ; Tue, 23 Aug 2022 21:22:18 +0300 (EEST) Received: (Authenticated sender: michael@niedermayer.cc) by mail.gandi.net (Postfix) with ESMTPSA id 02FD620003 for ; Tue, 23 Aug 2022 18:22:17 +0000 (UTC) Date: Tue, 23 Aug 2022 20:22:17 +0200 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20220823182217.GQ2088045@pb2> References: <20220823154215.GJ2088045@pb2> <20220823175112.GK2088045@pb2> MIME-Version: 1.0 In-Reply-To: Subject: Re: [FFmpeg-devel] [PATCH] swscale/x86/rgb2_rgb: Empty MMX state in ff_shuffle_bytes_2103_mmxext X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: multipart/mixed; boundary="===============8086760039600192027==" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --===============8086760039600192027== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="VypWdzyUxItf8lKe" Content-Disposition: inline --VypWdzyUxItf8lKe Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Aug 23, 2022 at 08:09:09PM +0200, Andreas Rheinhardt wrote: > Michael Niedermayer: > > On Tue, Aug 23, 2022 at 07:28:19PM +0200, Andreas Rheinhardt wrote: > >> Michael Niedermayer: > >>> On Mon, Aug 22, 2022 at 11:59:17PM +0200, Andreas Rheinhardt wrote: > >>>> Andreas Rheinhardt: > >>>>> Fixes FATE-failures with the the filter-2xbr filter-3xbr filter-4xbr > >>>>> filter-ep2x filter-ep3x filter-hq2x filter-hq3x filter-hq4x > >>>>> filter-paletteuse-bayer filter-paletteuse-bayer0 > >>>>> filter-paletteuse-nodither and filter-paletteuse-sierra2_4a tests > >>>>> when using 32bit x86 with CPUFLAGS ranging from "mmx+mmxext" to > >>>>> "mmx+mmxext+sse+sse2+sse3" (the relevant function is only overwritt= en > >>>>> when using SSSE3). > >>>>> > >>>>> Signed-off-by: Andreas Rheinhardt > >>>>> --- > >>>>> libswscale/x86/rgb_2_rgb.asm | 1 + > >>>>> 1 file changed, 1 insertion(+) > >>>>> > >>>>> diff --git a/libswscale/x86/rgb_2_rgb.asm b/libswscale/x86/rgb_2_rg= b.asm > >>>>> index c695c61d5c..76ca1eec03 100644 > >>>>> --- a/libswscale/x86/rgb_2_rgb.asm > >>>>> +++ b/libswscale/x86/rgb_2_rgb.asm > >>>>> @@ -104,6 +104,7 @@ jge .end > >>>>> jl .loop_simd > >>>>> =20 > >>>>> .end: > >>>>> + emms > >>>>> RET > >>>>> =20 > >>>>> ;-----------------------------------------------------------------= ------------- > >>>> > >>>> I'd really love if someone with x86 assembly skills could look over = this > >>>> trivial patch and confirm whether it is indeed correct. All I curren= tly > >>>> know is that is works for me. > >>> > >>> emms needs to be called between MMX and float code, as far outside of= loops > >>> as possible > >>> that would suggest outside the for() loops in rgbToRgbWrapper() and a= ny > >>> other code using it. > >> > >> But there is another aspect that the above is missing: Namely that if > >> emms_c() is put outside of MMX functions, then it will be called even > >> when it is unnecessary. In this case it is unnecessary for all modern > >> CPUs, as this function is overridden when SSSE3 is available. > >=20 > > If you dont like that, > > dont call it when its not needed or call it a few hundread times unnece= ssary > > like your patch does. > > or write only code that doesnt need emms=20 > > maybe there are more options ... > >=20 >=20 > If emms_c() is used as now outside of MMX functions, then a "dont call > it when its not needed" would involve a check and would therefore still > incur cost for users who don't use this. Also it is unclear how such a > check would even look like given that one can use av_force_cpu_flags(). > See also 55fc2c5a892c50feb1b9a8f55b74ec6594755ddb. > This patch also only calls it a few hundred times unnecessarily if one > runs this without SSSE3. CPUs without SSSE3 are ancient today. For the > non-ancient CPUs, using emms_c() adds an EMMS. do whatever you prefer. The best solution depends on assumptions. The impact is biggest on old CPUs where EMMS is also a slow instruction But as you say these are ancient today. very small impact on many vs small to moderate impact on a today rare setup the worst is if the bug is left open and time is wasted on bikesheding thx [...] --=20 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Avoid a single point of failure, be that a person or equipment. --VypWdzyUxItf8lKe Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EABEIAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCYwUa1QAKCRBhHseHBAsP q5ZhAJ9FKtHE9stS4JL4TIGTvKJkAj+8KACdFEZZAnFId3s/76/KQLAdK6wBvt0= =QoGV -----END PGP SIGNATURE----- --VypWdzyUxItf8lKe-- --===============8086760039600192027== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --===============8086760039600192027==--