From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 721AB49B78 for ; Sat, 4 May 2024 00:21:24 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C513F68D6FA; Sat, 4 May 2024 03:21:21 +0300 (EEST) Received: from relay7-d.mail.gandi.net (relay7-d.mail.gandi.net [217.70.183.200]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A231C68D637 for ; Sat, 4 May 2024 03:21:14 +0300 (EEST) Received: by mail.gandi.net (Postfix) with ESMTPSA id CF11B20003 for ; Sat, 4 May 2024 00:21:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=niedermayer.cc; s=gm1; t=1714782074; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=nqTSO0eMY3HWsxzCyFA42kg+atc2+zhEksdIePZTqSE=; b=PNymBsFDgARd9orItai9Ee2tz321jDl33LRKOS0zvrhs/90zKmGKuZEmA/hzj3JmsLyKBc Q2clv90HEZzq9e1Iix2qPkqX8tFkbRp6M6DdPcBfJj/wB4FQxvdqM2eYUfz3FpJkQgnbDo zz91wVnP1Q7+CFe7YO/jSK7LtPhGFvwiYXPTB3MzIudiZHW3Nn50X43umUGuDAgl0+Uj58 QlSkZHajUTqKL533Yi+TvxE21GNU1sRXS+Z3M6ncUDVjU6U4mVtWlliSPZJX7du+UnIjX8 w+yBwVBUSKhwwiraJ4bTsJdsZ2dbFW3R5hils4Jaf11qvSU0LOTnmId8AJBCVg== Date: Sat, 4 May 2024 02:21:13 +0200 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20240504002113.GP6420@pb2> References: <20240426122142.60282-1-heng.hu.1989@gmail.com> MIME-Version: 1.0 In-Reply-To: X-GND-Sasl: michael@niedermayer.cc Subject: Re: [FFmpeg-devel] [PATCH v1] scale: Bring back the old yuv2yuvX, use it when disable-x86asm. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: multipart/mixed; boundary="===============4745632554062147579==" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --===============4745632554062147579== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="qIZl56LoxzOo3jo2" Content-Disposition: inline --qIZl56LoxzOo3jo2 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, May 03, 2024 at 04:07:36AM +0800, hu heng wrote: > =E4=BA=8E2024=E5=B9=B44=E6=9C=8826=E6=97=A5=E5= =91=A8=E4=BA=94 20:21=E5=86=99=E9=81=93=EF=BC=9A > > > > From: huheng > > > > rename old inline yuv2yuvX to yuv2yuv_X, to avoid conflicts with > > the names of standalone asm functions. When ffmpeg is compiled with > > --disable-x86asm, using the scale function will cause the video to > > be blurred. The reason is that when disable-x86asm, INLINE_MMXEXT > > is 1 and use_mmx_vfilter is 1, but c->yuv2planeX uses the c language > > version, which causes a problem of mismatch with the vfilter. This > > problem has persisted from version 4.4 to the present. Fix it by using > > inline yuv2yuv_X_mmxext, that can maintain the consistency of > > use_mmx_vfilter. > > > > reproduce the issue: > > 1. ./configure --disable-x86asm --enable-gpl --enable-libx264 > > 2. ./ffmpeg -i input.mp4 -vf "scale=3D1280x720" -c:v libx264 output.mp4 > > the output.mp4 is abnormal > > > > Signed-off-by: huheng > > --- > > libswscale/x86/swscale.c | 6 +++- > > libswscale/x86/swscale_template.c | 53 +++++++++++++++++++++++++++++++ > > 2 files changed, 58 insertions(+), 1 deletion(-) > > > > diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c > > index ff16398988..1bb9d1d51a 100644 > > --- a/libswscale/x86/swscale.c > > +++ b/libswscale/x86/swscale.c > > @@ -452,8 +452,12 @@ av_cold void ff_sws_init_swscale_x86(SwsContext *c) > > int cpu_flags =3D av_get_cpu_flags(); > > > > #if HAVE_MMXEXT_INLINE > > - if (INLINE_MMXEXT(cpu_flags)) > > + if (INLINE_MMXEXT(cpu_flags)) { > > sws_init_swscale_mmxext(c); > > + if (c->use_mmx_vfilter && !(c->flags & SWS_ACCURATE_RND)) { > > + c->yuv2planeX =3D yuv2yuv_X_mmxext; > > + } > > + } > > #endif > > if(c->use_mmx_vfilter && !(c->flags & SWS_ACCURATE_RND)) { > > #if HAVE_MMXEXT_EXTERNAL > > diff --git a/libswscale/x86/swscale_template.c b/libswscale/x86/swscale= _template.c > > index 6190fcb4fe..1b8794480d 100644 > > --- a/libswscale/x86/swscale_template.c > > +++ b/libswscale/x86/swscale_template.c > > @@ -33,6 +33,59 @@ > > #define MOVNTQ2 "movntq " > > #define MOVNTQ(a,b) REAL_MOVNTQ(a,b) > > > > +static void RENAME(yuv2yuv_X)(const int16_t *filter, int filterSize, > > + const int16_t **src, uint8_t *dest, int dst= W, > > + const uint8_t *dither, int offset) > > +{ > > + filterSize--; > > + __asm__ volatile( > > + "movd %0, %%mm1\n\t" > > + "punpcklwd %%mm1, %%mm1\n\t" > > + "punpckldq %%mm1, %%mm1\n\t" > > + "psllw $3, %%mm1\n\t" > > + "paddw %%mm1, %%mm3\n\t" > > + "paddw %%mm1, %%mm4\n\t" > > + "psraw $4, %%mm3\n\t" > > + "psraw $4, %%mm4\n\t" > > + ::"m"(filterSize) > > + ); > > + > > + __asm__ volatile(\ > > + "movq %%mm3, %%mm6\n\t" > > + "movq %%mm4, %%mm7\n\t" > > + "movl %3, %%ecx\n\t" > > + "mov %0, %%"FF_REG_d" \n= \t"\ > > + "mov (%%"FF_REG_d"), %%"FF_REG_S" \n= \t"\ > > + ".p2align 4 \n= \t" /* FIXME Unroll? */\ > > + "1: \n= \t"\ > > + "movq 8(%%"FF_REG_d"), %%mm0 \n= \t" /* filterCoeff */\ > > + "movq (%%"FF_REG_S", %%"FF_REG_c", 2), %%mm2 \n= \t" /* srcData */\ > > + "movq 8(%%"FF_REG_S", %%"FF_REG_c", 2), %%mm5 \n= \t" /* srcData */\ > > + "add $16, %%"FF_REG_d" \n= \t"\ > > + "mov (%%"FF_REG_d"), %%"FF_REG_S" \n= \t"\ > > + "test %%"FF_REG_S", %%"FF_REG_S" \n= \t"\ > > + "pmulhw %%mm0, %%mm2 \n\t"\ > > + "pmulhw %%mm0, %%mm5 \n\t"\ > > + "paddw %%mm2, %%mm3 \n\t"\ > > + "paddw %%mm5, %%mm4 \n\t"\ > > + " jnz 1b \n\t"\ > > + "psraw $3, %%mm3 \n\t"\ > > + "psraw $3, %%mm4 \n\t"\ > > + "packuswb %%mm4, %%mm3 \n\t" > > + MOVNTQ2 " %%mm3, (%1, %%"FF_REG_c")\n\= t" > > + "add $8, %%"FF_REG_c" \n\t"\ > > + "cmp %2, %%"FF_REG_c" \n\t"\ > > + "movq %%mm6, %%mm3\n\t" > > + "movq %%mm7, %%mm4\n\t" > > + "mov %0, %%"FF_REG_d" \n\t= "\ > > + "mov (%%"FF_REG_d"), %%"FF_REG_S" \n\t= "\ > > + "jb 1b \n\t= "\ > > + :: "g" (filter), > > + "r" (dest-offset), "g" ((x86_reg)(dstW+offset)), "m" (offse= t) > > + : "%"FF_REG_d, "%"FF_REG_S, "%"FF_REG_c > > + ); > > +} > > + > > #define YSCALEYUV2PACKEDX_UV \ > > __asm__ volatile(\ > > "xor %%"FF_REG_a", %%"FF_REG_a" \n\t"\ > > -- > > 2.20.1 > > > Request for review and some additional information: > I found that when compiling ffmpeg with the '-disable-x86asm' option, > there is an issue with the scale function causing video distortion. > This issue is very easily reproducible. I used bisect to identify the > problem and found that it was introduced after commit 554c2bc7086f49. > This commit moved the inline assembly function yuv2yuvX_sse3 to > standalone assembly, which broke the scale function under the > 'disable-x86asm' option. My patch restores the original inline > assembly code under the 'disable-x86asm' option, fixing this issue. Having a inline asm AND a external asm implementation of the same thing is not ideal I think moving all asm to yasm/nasm fits more into FFmpegs direction. that would likely make things consistent and this sort of bug disappear thx [...] --=20 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Any man who breaks a law that conscience tells him is unjust and willingly= =20 accepts the penalty by staying in jail in order to arouse the conscience of= =20 the community on the injustice of the law is at that moment expressing the= =20 very highest respect for law. - Martin Luther King Jr --qIZl56LoxzOo3jo2 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EABEKAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCZjV/bwAKCRBhHseHBAsP q++sAJ9rAwZ8iM/oIgNoY+FKERMasVv4XwCgk3p/cl9w7s8kWQKEP4cbTo1QwNk= =7LV7 -----END PGP SIGNATURE----- --qIZl56LoxzOo3jo2-- --===============4745632554062147579== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --===============4745632554062147579==--