From: "Swinney, Jonathan" <jswinney@amazon.com> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Cc: "Michael Niedermayer" <michael@niedermayer.cc>, "Martin Storsjö" <martin@martin.st> Subject: Re: [FFmpeg-devel] [PATCH v2 1/1] lavc/aarch64: add some neon pix_abs functions Date: Mon, 25 Apr 2022 22:43:25 +0000 Message-ID: <F87A6A9F-0E71-4C8A-AA6B-EE2B39EC6275@amazon.com> (raw) In-Reply-To: <20220415164348.GN2829255@pb2> Thanks to Michael and Martin for you reviews on several of my patches. I've made many of the changes you have requested, but I'm not yet ready to resubmit the patches. I'll be out of the office until next week and I will submit updated versions then. Thanks! -- Jonathan Swinney On 4/15/22, 11:45 AM, "ffmpeg-devel on behalf of Michael Niedermayer" <ffmpeg-devel-bounces@ffmpeg.org on behalf of michael@niedermayer.cc> wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On Thu, Apr 14, 2022 at 04:22:58PM +0000, Swinney, Jonathan wrote: > - ff_pix_abs16_neon > - ff_pix_abs16_xy2_neon > > In direct micro benchmarks of these ff functions verses their C implementations, > these functions performed as follows on AWS Graviton 2: > > ff_pix_abs16_neon: > c: benchmark ran 100000 iterations in 0.955383 seconds > ff: benchmark ran 100000 iterations in 0.097669 seconds > > ff_pix_abs16_xy2_neon: > c: benchmark ran 100000 iterations in 1.916759 seconds > ff: benchmark ran 100000 iterations in 0.370729 seconds > > Signed-off-by: Jonathan Swinney <jswinney@amazon.com> > --- > libavcodec/aarch64/Makefile | 2 + > libavcodec/aarch64/me_cmp_init_aarch64.c | 39 +++++ > libavcodec/aarch64/me_cmp_neon.S | 209 +++++++++++++++++++++++ > libavcodec/me_cmp.c | 2 + > libavcodec/me_cmp.h | 1 + > libavcodec/x86/me_cmp.asm | 7 + > libavcodec/x86/me_cmp_init.c | 3 + > tests/checkasm/Makefile | 2 +- > tests/checkasm/checkasm.c | 1 + > tests/checkasm/checkasm.h | 1 + > tests/checkasm/motion.c | 155 +++++++++++++++++ > 11 files changed, 421 insertions(+), 1 deletion(-) > create mode 100644 libavcodec/aarch64/me_cmp_init_aarch64.c > create mode 100644 libavcodec/aarch64/me_cmp_neon.S > create mode 100644 tests/checkasm/motion.c > [...] > diff --git a/libavcodec/x86/me_cmp.asm b/libavcodec/x86/me_cmp.asm > index ad06d485ab..f73b9f9161 100644 > --- a/libavcodec/x86/me_cmp.asm > +++ b/libavcodec/x86/me_cmp.asm > @@ -255,6 +255,7 @@ hadamard8x8_diff %+ SUFFIX: > > HSUM m0, m1, eax > and rax, 0xFFFF > + emms > ret > > hadamard8_16_wrapper 0, 14 > @@ -345,6 +346,7 @@ cglobal sse%1, 5,5,8, v, pix1, pix2, lsize, h > > HADDD m7, m1 > movd eax, m7 ; return value > + emms > RET > %endmacro on which arm chip did you test this ? [...] > diff --git a/libavcodec/x86/me_cmp_init.c b/libavcodec/x86/me_cmp_init.c > index 9af911bb88..b330868a38 100644 > --- a/libavcodec/x86/me_cmp_init.c > +++ b/libavcodec/x86/me_cmp_init.c > @@ -186,6 +186,8 @@ static int vsad_intra16_mmx(MpegEncContext *v, uint8_t *pix, uint8_t *dummy, > : "r" (stride), "m" (h) > : "%ecx"); > > + emms_c(); > + > return tmp & 0xFFFF; > } > #undef SUM > @@ -418,6 +420,7 @@ static inline int sum_mmx(void) > "paddw %%mm0, %%mm6 \n\t" > "movd %%mm6, %0 \n\t" > : "=r" (ret)); > + emms_c(); > return ret & 0xFFFF; > } hmmm Also before the patch checkasm: all 6153 tests passed after it checkasm: all 3198 tests passed thats on a x86-64 [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Complexity theory is the science of finding the exact solution to an approximation. Benchmarking OTOH is finding an approximation of the exact _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2022-04-25 22:43 UTC|newest] Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-04-14 16:22 Swinney, Jonathan 2022-04-15 16:43 ` Michael Niedermayer 2022-04-25 22:43 ` Swinney, Jonathan [this message] 2022-04-15 21:13 ` Martin Storsjö
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=F87A6A9F-0E71-4C8A-AA6B-EE2B39EC6275@amazon.com \ --to=jswinney@amazon.com \ --cc=ffmpeg-devel@ffmpeg.org \ --cc=martin@martin.st \ --cc=michael@niedermayer.cc \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git