Re: [FFmpeg-devel] [PATCH v2 1/1] lavc/aarch64: add some neon pix_abs functions

From: "Swinney, Jonathan" <jswinney@amazon.com>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: "Michael Niedermayer" <michael@niedermayer.cc>,
	"Martin Storsjö" <martin@martin.st>
Subject: Re: [FFmpeg-devel] [PATCH v2 1/1] lavc/aarch64: add some neon pix_abs functions
Date: Mon, 25 Apr 2022 22:43:25 +0000
Message-ID: <F87A6A9F-0E71-4C8A-AA6B-EE2B39EC6275@amazon.com> (raw)
In-Reply-To: <20220415164348.GN2829255@pb2>

Thanks to Michael and Martin for you reviews on several of my patches. I've made many of the changes you have requested, but I'm not yet ready to resubmit the patches. I'll be out of the office until next week and I will submit updated versions then. Thanks!

-- 

Jonathan Swinney

On 4/15/22, 11:45 AM, "ffmpeg-devel on behalf of Michael Niedermayer" <ffmpeg-devel-bounces@ffmpeg.org on behalf of michael@niedermayer.cc> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

    On Thu, Apr 14, 2022 at 04:22:58PM +0000, Swinney, Jonathan wrote:
    >  - ff_pix_abs16_neon
    >  - ff_pix_abs16_xy2_neon
    >
    > In direct micro benchmarks of these ff functions verses their C implementations,
    > these functions performed as follows on AWS Graviton 2:
    >
    > ff_pix_abs16_neon:
    > c:  benchmark ran 100000 iterations in 0.955383 seconds
    > ff: benchmark ran 100000 iterations in 0.097669 seconds
    >
    > ff_pix_abs16_xy2_neon:
    > c:  benchmark ran 100000 iterations in 1.916759 seconds
    > ff: benchmark ran 100000 iterations in 0.370729 seconds
    >
    > Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
    > ---
    >  libavcodec/aarch64/Makefile              |   2 +
    >  libavcodec/aarch64/me_cmp_init_aarch64.c |  39 +++++
    >  libavcodec/aarch64/me_cmp_neon.S         | 209 +++++++++++++++++++++++
    >  libavcodec/me_cmp.c                      |   2 +
    >  libavcodec/me_cmp.h                      |   1 +
    >  libavcodec/x86/me_cmp.asm                |   7 +
    >  libavcodec/x86/me_cmp_init.c             |   3 +
    >  tests/checkasm/Makefile                  |   2 +-
    >  tests/checkasm/checkasm.c                |   1 +
    >  tests/checkasm/checkasm.h                |   1 +
    >  tests/checkasm/motion.c                  | 155 +++++++++++++++++
    >  11 files changed, 421 insertions(+), 1 deletion(-)
    >  create mode 100644 libavcodec/aarch64/me_cmp_init_aarch64.c
    >  create mode 100644 libavcodec/aarch64/me_cmp_neon.S
    >  create mode 100644 tests/checkasm/motion.c
    >
    [...]
    > diff --git a/libavcodec/x86/me_cmp.asm b/libavcodec/x86/me_cmp.asm
    > index ad06d485ab..f73b9f9161 100644
    > --- a/libavcodec/x86/me_cmp.asm
    > +++ b/libavcodec/x86/me_cmp.asm
    > @@ -255,6 +255,7 @@ hadamard8x8_diff %+ SUFFIX:
    >
    >      HSUM                         m0, m1, eax
    >      and                         rax, 0xFFFF
    > +    emms
    >      ret
    >
    >  hadamard8_16_wrapper 0, 14
    > @@ -345,6 +346,7 @@ cglobal sse%1, 5,5,8, v, pix1, pix2, lsize, h
    >
    >      HADDD     m7, m1
    >      movd     eax, m7         ; return value
    > +    emms
    >      RET
    >  %endmacro

    on which arm chip did you test this ?

    [...]
    > diff --git a/libavcodec/x86/me_cmp_init.c b/libavcodec/x86/me_cmp_init.c
    > index 9af911bb88..b330868a38 100644
    > --- a/libavcodec/x86/me_cmp_init.c
    > +++ b/libavcodec/x86/me_cmp_init.c
    > @@ -186,6 +186,8 @@ static int vsad_intra16_mmx(MpegEncContext *v, uint8_t *pix, uint8_t *dummy,
    >          : "r" (stride), "m" (h)
    >          : "%ecx");
    >
    > +    emms_c();
    > +
    >      return tmp & 0xFFFF;
    >  }
    >  #undef SUM
    > @@ -418,6 +420,7 @@ static inline int sum_mmx(void)
    >          "paddw %%mm0, %%mm6             \n\t"
    >          "movd %%mm6, %0                 \n\t"
    >          : "=r" (ret));
    > +    emms_c();
    >      return ret & 0xFFFF;
    >  }

    hmmm

    Also before the patch
    checkasm: all 6153 tests passed
    after it
    checkasm: all 3198 tests passed

    thats on a x86-64

    [...]

    --
    Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

    Complexity theory is the science of finding the exact solution to an
    approximation. Benchmarking OTOH is finding an approximation of the exact

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".