From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 25CA442417 for ; Fri, 15 Apr 2022 16:44:00 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7AD4368AEAF; Fri, 15 Apr 2022 19:43:57 +0300 (EEST) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id ABEE468B49E for ; Fri, 15 Apr 2022 19:43:50 +0300 (EEST) Received: from localhost (213-47-68-29.cable.dynamic.surfer.at [213.47.68.29]) (Authenticated sender: michael@niedermayer.cc) by mail.gandi.net (Postfix) with ESMTPSA id A33EE1C0005 for ; Fri, 15 Apr 2022 16:43:49 +0000 (UTC) Date: Fri, 15 Apr 2022 18:43:48 +0200 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20220415164348.GN2829255@pb2> References: <50530740b25747fbbfd138adabdc4a8f@EX13D07UWB004.ant.amazon.com> MIME-Version: 1.0 In-Reply-To: <50530740b25747fbbfd138adabdc4a8f@EX13D07UWB004.ant.amazon.com> Subject: Re: [FFmpeg-devel] [PATCH v2 1/1] lavc/aarch64: add some neon pix_abs functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: multipart/mixed; boundary="===============3628094788918999109==" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --===============3628094788918999109== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="54bcaj5yU7UbAbBK" Content-Disposition: inline --54bcaj5yU7UbAbBK Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Apr 14, 2022 at 04:22:58PM +0000, Swinney, Jonathan wrote: > - ff_pix_abs16_neon > - ff_pix_abs16_xy2_neon >=20 > In direct micro benchmarks of these ff functions verses their C implement= ations, > these functions performed as follows on AWS Graviton 2: >=20 > ff_pix_abs16_neon: > c: benchmark ran 100000 iterations in 0.955383 seconds > ff: benchmark ran 100000 iterations in 0.097669 seconds >=20 > ff_pix_abs16_xy2_neon: > c: benchmark ran 100000 iterations in 1.916759 seconds > ff: benchmark ran 100000 iterations in 0.370729 seconds >=20 > Signed-off-by: Jonathan Swinney > --- > libavcodec/aarch64/Makefile | 2 + > libavcodec/aarch64/me_cmp_init_aarch64.c | 39 +++++ > libavcodec/aarch64/me_cmp_neon.S | 209 +++++++++++++++++++++++ > libavcodec/me_cmp.c | 2 + > libavcodec/me_cmp.h | 1 + > libavcodec/x86/me_cmp.asm | 7 + > libavcodec/x86/me_cmp_init.c | 3 + > tests/checkasm/Makefile | 2 +- > tests/checkasm/checkasm.c | 1 + > tests/checkasm/checkasm.h | 1 + > tests/checkasm/motion.c | 155 +++++++++++++++++ > 11 files changed, 421 insertions(+), 1 deletion(-) > create mode 100644 libavcodec/aarch64/me_cmp_init_aarch64.c > create mode 100644 libavcodec/aarch64/me_cmp_neon.S > create mode 100644 tests/checkasm/motion.c >=20 [...] > diff --git a/libavcodec/x86/me_cmp.asm b/libavcodec/x86/me_cmp.asm > index ad06d485ab..f73b9f9161 100644 > --- a/libavcodec/x86/me_cmp.asm > +++ b/libavcodec/x86/me_cmp.asm > @@ -255,6 +255,7 @@ hadamard8x8_diff %+ SUFFIX: > =20 > HSUM m0, m1, eax > and rax, 0xFFFF > + emms > ret > =20 > hadamard8_16_wrapper 0, 14 > @@ -345,6 +346,7 @@ cglobal sse%1, 5,5,8, v, pix1, pix2, lsize, h > =20 > HADDD m7, m1 > movd eax, m7 ; return value > + emms > RET > %endmacro on which arm chip did you test this ? [...] > diff --git a/libavcodec/x86/me_cmp_init.c b/libavcodec/x86/me_cmp_init.c > index 9af911bb88..b330868a38 100644 > --- a/libavcodec/x86/me_cmp_init.c > +++ b/libavcodec/x86/me_cmp_init.c > @@ -186,6 +186,8 @@ static int vsad_intra16_mmx(MpegEncContext *v, uint8_= t *pix, uint8_t *dummy, > : "r" (stride), "m" (h) > : "%ecx"); > =20 > + emms_c(); > + > return tmp & 0xFFFF; > } > #undef SUM > @@ -418,6 +420,7 @@ static inline int sum_mmx(void) > "paddw %%mm0, %%mm6 \n\t" > "movd %%mm6, %0 \n\t" > : "=3Dr" (ret)); > + emms_c(); > return ret & 0xFFFF; > } hmmm Also before the patch=20 checkasm: all 6153 tests passed after it checkasm: all 3198 tests passed thats on a x86-64 [...] --=20 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Complexity theory is the science of finding the exact solution to an approximation. Benchmarking OTOH is finding an approximation of the exact --54bcaj5yU7UbAbBK Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EABEIAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCYlmgwQAKCRBhHseHBAsP q87iAJ9qhZxhAVINRXqWZn9BPtx2tU/mKACcDP/hm53JP2AQrmHbqMK/Be5H0Y4= =Sjw6 -----END PGP SIGNATURE----- --54bcaj5yU7UbAbBK-- --===============3628094788918999109== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --===============3628094788918999109==--