Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
* [FFmpeg-devel] [PATCH 0/2] lavc/aarch64: Provide neon implementations
@ 2022-06-29  8:24 Hubert Mazur
  2022-06-29  8:24 ` [FFmpeg-devel] [PATCH 1/2] lavc/aarch64: Assign callback with function Hubert Mazur
  2022-06-29  8:24 ` [FFmpeg-devel] [PATCH 2/2] lavc/aarch64: Add pix_abs16_x2 neon implementation Hubert Mazur
  0 siblings, 2 replies; 10+ messages in thread
From: Hubert Mazur @ 2022-06-29  8:24 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: gjb, jswinney, Hubert Mazur, martin, mw, spop

Provide neon implementations for motion estimation functions.

Hubert Mazur (2):
  lavc/aarch64: Assign callback with function
  lavc/aarch64: Add pix_abs16_x2 neon implementation

 libavcodec/aarch64/me_cmp_init_aarch64.c |   5 +
 libavcodec/aarch64/me_cmp_neon.S         | 134 +++++++++++++++++++++++
 2 files changed, 139 insertions(+)

-- 
2.34.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [FFmpeg-devel] [PATCH 1/2] lavc/aarch64: Assign callback with function
  2022-06-29  8:24 [FFmpeg-devel] [PATCH 0/2] lavc/aarch64: Provide neon implementations Hubert Mazur
@ 2022-06-29  8:24 ` Hubert Mazur
  2022-07-11 20:58   ` Martin Storsjö
  2022-06-29  8:24 ` [FFmpeg-devel] [PATCH 2/2] lavc/aarch64: Add pix_abs16_x2 neon implementation Hubert Mazur
  1 sibling, 1 reply; 10+ messages in thread
From: Hubert Mazur @ 2022-06-29  8:24 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: gjb, jswinney, Hubert Mazur, martin, mw, spop

Assign c->sad[0] callback with already existing neon implementation
of pix_abs16 function.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
---
 libavcodec/aarch64/me_cmp_init_aarch64.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarch64/me_cmp_init_aarch64.c
index 9fb63e9973..bec9148a1a 100644
--- a/libavcodec/aarch64/me_cmp_init_aarch64.c
+++ b/libavcodec/aarch64/me_cmp_init_aarch64.c
@@ -35,5 +35,7 @@ av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx)
     if (have_neon(cpu_flags)) {
         c->pix_abs[0][0] = ff_pix_abs16_neon;
         c->pix_abs[0][3] = ff_pix_abs16_xy2_neon;
+
+        c->sad[0] = ff_pix_abs16_neon;
     }
 }
-- 
2.34.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [FFmpeg-devel] [PATCH 2/2] lavc/aarch64: Add pix_abs16_x2 neon implementation
  2022-06-29  8:24 [FFmpeg-devel] [PATCH 0/2] lavc/aarch64: Provide neon implementations Hubert Mazur
  2022-06-29  8:24 ` [FFmpeg-devel] [PATCH 1/2] lavc/aarch64: Assign callback with function Hubert Mazur
@ 2022-06-29  8:24 ` Hubert Mazur
  2022-07-11 12:22   ` Hubert Mazur
                     ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Hubert Mazur @ 2022-06-29  8:24 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: gjb, jswinney, Hubert Mazur, martin, mw, spop

Provide neon implementation for pix_abs16_x2 function.

Performance tests of implementation are below.
 - pix_abs_0_1_c: 291.9
 - pix_abs_0_1_neon: 73.7

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
---
 libavcodec/aarch64/me_cmp_init_aarch64.c |   3 +
 libavcodec/aarch64/me_cmp_neon.S         | 134 +++++++++++++++++++++++
 2 files changed, 137 insertions(+)

diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarch64/me_cmp_init_aarch64.c
index bec9148a1a..136b008eb7 100644
--- a/libavcodec/aarch64/me_cmp_init_aarch64.c
+++ b/libavcodec/aarch64/me_cmp_init_aarch64.c
@@ -27,6 +27,8 @@ int ff_pix_abs16_neon(MpegEncContext *s, uint8_t *blk1, uint8_t *blk2,
                       ptrdiff_t stride, int h);
 int ff_pix_abs16_xy2_neon(MpegEncContext *s, uint8_t *blk1, uint8_t *blk2,
                       ptrdiff_t stride, int h);
+int ff_pix_abs16_x2_neon(MpegEncContext *v, uint8_t *pix1, uint8_t *pix2,
+                      ptrdiff_t stride, int h);
 
 av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx)
 {
@@ -34,6 +36,7 @@ av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx)
 
     if (have_neon(cpu_flags)) {
         c->pix_abs[0][0] = ff_pix_abs16_neon;
+        c->pix_abs[0][1] = ff_pix_abs16_x2_neon;
         c->pix_abs[0][3] = ff_pix_abs16_xy2_neon;
 
         c->sad[0] = ff_pix_abs16_neon;
diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S
index a7937bd8be..c2fd94f4b3 100644
--- a/libavcodec/aarch64/me_cmp_neon.S
+++ b/libavcodec/aarch64/me_cmp_neon.S
@@ -203,3 +203,137 @@ function ff_pix_abs16_xy2_neon, export=1
         fmov            w0, s0                      // copy result to general purpose register
         ret
 endfunc
+
+function ff_pix_abs16_x2_neon, export=1
+        // x0           unused
+        // x1           uint8_t *pix1
+        // x2           uint8_t *pix2
+        // x3           ptrdiff_t stride
+        // x4           int h
+
+        // preserve value of v8-v12 registers
+        stp             d10, d11, [sp, #-0x10]!
+        stp             d8, d9, [sp, #-0x10]!
+
+        // initialize buffers
+        movi            d18, #0
+        movi            v20.8h, #1
+        add             x5, x2, #1 // pix2 + 1
+        cmp             w4, #4
+        b.lt            2f
+
+// make 4 iterations at once
+1:
+        // v0 - pix1
+        // v1 - pix2
+        // v2 - pix2 + 1
+        ld1             {v0.16b}, [x1], x3
+        ld1             {v1.16b}, [x2], x3
+        ld1             {v2.16b}, [x5], x3
+
+        ld1             {v3.16b}, [x1], x3
+        ld1             {v4.16b}, [x2], x3
+        ld1             {v5.16b}, [x5], x3
+
+        ld1             {v6.16b}, [x1], x3
+        ld1             {v7.16b}, [x2], x3
+        ld1             {v8.16b}, [x5], x3
+
+        ld1             {v9.16b}, [x1], x3
+        ld1             {v10.16b}, [x2], x3
+        ld1             {v11.16b}, [x5], x3
+
+        // abs(pix1[0] - avg2(pix2[0], pix2[1]))
+        // avg2(a,b) = (((a) + (b) + 1) >> 1)
+        // abs(x) = (x < 0 ? -x : x)
+
+        // pix2[0] + pix2[1]
+        uaddl           v30.8h, v1.8b, v2.8b
+        uaddl2          v29.8h, v1.16b, v2.16b
+        // add one to each element
+        add             v30.8h, v30.8h, v20.8h
+        add             v29.8h, v29.8h, v20.8h
+        // divide by 2, narrow width and store in v30
+        uqshrn          v30.8b, v30.8h, #1
+        uqshrn2         v30.16b, v29.8h, #1
+
+        // abs(pix1[0] - avg2(pix2[0], pix2[1]))
+        uabd            v16.16b, v0.16b, v30.16b
+        uaddlv          h16, v16.16b
+
+        // 2nd iteration
+        uaddl           v28.8h, v4.8b, v5.8b
+        uaddl2          v27.8h, v4.16b, v5.16b
+        add             v28.8h, v28.8h, v20.8h
+        add             v27.8h, v27.8h, v20.8h
+
+        uqshrn          v28.8b, v28.8h, #1
+        uqshrn2         v28.16b, v27.8h, #1
+
+        uabd            v17.16b, v3.16b, v28.16b
+        uaddlv          h17, v17.16b
+
+        // 3rd iteration
+        uaddl           v26.8h, v7.8b, v8.8b
+        uaddl2          v25.8h, v7.16b, v8.16b
+        add             v26.8h, v26.8h, v20.8h
+        add             v25.8h, v25.8h, v20.8h
+
+        uqshrn          v26.8b, v26.8h, #1
+        uqshrn2         v26.16b, v25.8h, #1
+
+        uabd            v19.16b, v6.16b, v26.16b
+        uaddlv          h19, v19.16b
+
+        // 4th iteration
+        uaddl           v24.8h, v10.8b, v11.8b
+        uaddl2          v23.8h, v10.16b, v11.16b
+        add             v24.8h, v24.8h, v20.8h
+        add             v23.8h, v23.8h, v20.8h
+
+        uqshrn          v24.8b, v24.8h, #1
+        uqshrn2         v24.16b, v23.8h, #1
+
+        uabd            v21.16b, v9.16b, v24.16b
+        uaddlv          h21, v21.16b
+
+        sub             w4, w4, #4
+
+        // accumulate the result in d18
+        add             d18, d18, d16
+        add             d18, d18, d17
+        add             d18, d18, d19
+        add             d18, d18, d21
+
+        cmp             w4, #4
+        b.ge            1b
+        cbz             w4, 3f
+
+// iterate by one
+2:
+        ld1             {v0.16b}, [x1], x3
+        ld1             {v1.16b}, [x2], x3
+        ld1             {v2.16b}, [x5], x3
+
+        uaddl           v30.8h, v1.8b, v2.8b
+        uaddl2          v29.8h, v1.16b, v2.16b
+        add             v30.8h, v30.8h, v20.8h
+        add             v29.8h, v29.8h, v20.8h
+
+        uqshrn          v30.8b, v30.8h, #1
+        uqshrn2         v30.16b, v20.8h, #1
+
+        uabd            v28.16b, v0.16b, v30.16b
+        uaddlv          h28, v28.16b
+
+        add             d18, d18, d28
+        subs            w4, w4, #1
+        b.ne            2b
+
+3:
+        fmov            w0, s18
+        ldp             d8, d9, [sp], 0x10
+        ldp             d10, d11, [sp], 0x10
+
+        ret
+endfunc
-- 
2.34.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [FFmpeg-devel] [PATCH 2/2] lavc/aarch64: Add pix_abs16_x2 neon implementation
  2022-06-29  8:24 ` [FFmpeg-devel] [PATCH 2/2] lavc/aarch64: Add pix_abs16_x2 neon implementation Hubert Mazur
@ 2022-07-11 12:22   ` Hubert Mazur
  2022-07-11 19:59     ` Swinney, Jonathan
  2022-07-11 21:21   ` Martin Storsjö
  2022-07-12  9:15   ` [FFmpeg-devel] [PATCH] " Hubert Mazur
  2 siblings, 1 reply; 10+ messages in thread
From: Hubert Mazur @ 2022-07-11 12:22 UTC (permalink / raw)
  To: ffmpeg-devel
  Cc: Marcin Wojtas, Martin Storsjö,
	Swinney, Jonathan, Pop, Sebastian, Grzegorz Bernacki

Hi, do you have any feedback regarding the patch?

Regards,
Hubert

On Wed, Jun 29, 2022 at 10:25 AM Hubert Mazur <hum@semihalf.com> wrote:

> Provide neon implementation for pix_abs16_x2 function.
>
> Performance tests of implementation are below.
>  - pix_abs_0_1_c: 291.9
>  - pix_abs_0_1_neon: 73.7
>
> Benchmarks and tests run with checkasm tool on AWS Graviton 3.
>
> Signed-off-by: Hubert Mazur <hum@semihalf.com>
> ---
>  libavcodec/aarch64/me_cmp_init_aarch64.c |   3 +
>  libavcodec/aarch64/me_cmp_neon.S         | 134 +++++++++++++++++++++++
>  2 files changed, 137 insertions(+)
>
> diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c
> b/libavcodec/aarch64/me_cmp_init_aarch64.c
> index bec9148a1a..136b008eb7 100644
> --- a/libavcodec/aarch64/me_cmp_init_aarch64.c
> +++ b/libavcodec/aarch64/me_cmp_init_aarch64.c
> @@ -27,6 +27,8 @@ int ff_pix_abs16_neon(MpegEncContext *s, uint8_t *blk1,
> uint8_t *blk2,
>                        ptrdiff_t stride, int h);
>  int ff_pix_abs16_xy2_neon(MpegEncContext *s, uint8_t *blk1, uint8_t *blk2,
>                        ptrdiff_t stride, int h);
> +int ff_pix_abs16_x2_neon(MpegEncContext *v, uint8_t *pix1, uint8_t *pix2,
> +                      ptrdiff_t stride, int h);
>
>  av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext
> *avctx)
>  {
> @@ -34,6 +36,7 @@ av_cold void ff_me_cmp_init_aarch64(MECmpContext *c,
> AVCodecContext *avctx)
>
>      if (have_neon(cpu_flags)) {
>          c->pix_abs[0][0] = ff_pix_abs16_neon;
> +        c->pix_abs[0][1] = ff_pix_abs16_x2_neon;
>          c->pix_abs[0][3] = ff_pix_abs16_xy2_neon;
>
>          c->sad[0] = ff_pix_abs16_neon;
> diff --git a/libavcodec/aarch64/me_cmp_neon.S
> b/libavcodec/aarch64/me_cmp_neon.S
> index a7937bd8be..c2fd94f4b3 100644
> --- a/libavcodec/aarch64/me_cmp_neon.S
> +++ b/libavcodec/aarch64/me_cmp_neon.S
> @@ -203,3 +203,137 @@ function ff_pix_abs16_xy2_neon, export=1
>          fmov            w0, s0                      // copy result to
> general purpose register
>          ret
>  endfunc
> +
> +function ff_pix_abs16_x2_neon, export=1
> +        // x0           unused
> +        // x1           uint8_t *pix1
> +        // x2           uint8_t *pix2
> +        // x3           ptrdiff_t stride
> +        // x4           int h
> +
> +        // preserve value of v8-v12 registers
> +        stp             d10, d11, [sp, #-0x10]!
> +        stp             d8, d9, [sp, #-0x10]!
> +
> +        // initialize buffers
> +        movi            d18, #0
> +        movi            v20.8h, #1
> +        add             x5, x2, #1 // pix2 + 1
> +        cmp             w4, #4
> +        b.lt            2f
> +
> +// make 4 iterations at once
> +1:
> +        // v0 - pix1
> +        // v1 - pix2
> +        // v2 - pix2 + 1
> +        ld1             {v0.16b}, [x1], x3
> +        ld1             {v1.16b}, [x2], x3
> +        ld1             {v2.16b}, [x5], x3
> +
> +        ld1             {v3.16b}, [x1], x3
> +        ld1             {v4.16b}, [x2], x3
> +        ld1             {v5.16b}, [x5], x3
> +
> +        ld1             {v6.16b}, [x1], x3
> +        ld1             {v7.16b}, [x2], x3
> +        ld1             {v8.16b}, [x5], x3
> +
> +        ld1             {v9.16b}, [x1], x3
> +        ld1             {v10.16b}, [x2], x3
> +        ld1             {v11.16b}, [x5], x3
> +
> +        // abs(pix1[0] - avg2(pix2[0], pix2[1]))
> +        // avg2(a,b) = (((a) + (b) + 1) >> 1)
> +        // abs(x) = (x < 0 ? -x : x)
> +
> +        // pix2[0] + pix2[1]
> +        uaddl           v30.8h, v1.8b, v2.8b
> +        uaddl2          v29.8h, v1.16b, v2.16b
> +        // add one to each element
> +        add             v30.8h, v30.8h, v20.8h
> +        add             v29.8h, v29.8h, v20.8h
> +        // divide by 2, narrow width and store in v30
> +        uqshrn          v30.8b, v30.8h, #1
> +        uqshrn2         v30.16b, v29.8h, #1
> +
> +        // abs(pix1[0] - avg2(pix2[0], pix2[1]))
> +        uabd            v16.16b, v0.16b, v30.16b
> +        uaddlv          h16, v16.16b
> +
> +        // 2nd iteration
> +        uaddl           v28.8h, v4.8b, v5.8b
> +        uaddl2          v27.8h, v4.16b, v5.16b
> +        add             v28.8h, v28.8h, v20.8h
> +        add             v27.8h, v27.8h, v20.8h
> +
> +        uqshrn          v28.8b, v28.8h, #1
> +        uqshrn2         v28.16b, v27.8h, #1
> +
> +        uabd            v17.16b, v3.16b, v28.16b
> +        uaddlv          h17, v17.16b
> +
> +        // 3rd iteration
> +        uaddl           v26.8h, v7.8b, v8.8b
> +        uaddl2          v25.8h, v7.16b, v8.16b
> +        add             v26.8h, v26.8h, v20.8h
> +        add             v25.8h, v25.8h, v20.8h
> +
> +        uqshrn          v26.8b, v26.8h, #1
> +        uqshrn2         v26.16b, v25.8h, #1
> +
> +        uabd            v19.16b, v6.16b, v26.16b
> +        uaddlv          h19, v19.16b
> +
> +        // 4th iteration
> +        uaddl           v24.8h, v10.8b, v11.8b
> +        uaddl2          v23.8h, v10.16b, v11.16b
> +        add             v24.8h, v24.8h, v20.8h
> +        add             v23.8h, v23.8h, v20.8h
> +
> +        uqshrn          v24.8b, v24.8h, #1
> +        uqshrn2         v24.16b, v23.8h, #1
> +
> +        uabd            v21.16b, v9.16b, v24.16b
> +        uaddlv          h21, v21.16b
> +
> +        sub             w4, w4, #4
> +
> +        // accumulate the result in d18
> +        add             d18, d18, d16
> +        add             d18, d18, d17
> +        add             d18, d18, d19
> +        add             d18, d18, d21
> +
> +        cmp             w4, #4
> +        b.ge            1b
> +        cbz             w4, 3f
> +
> +// iterate by one
> +2:
> +        ld1             {v0.16b}, [x1], x3
> +        ld1             {v1.16b}, [x2], x3
> +        ld1             {v2.16b}, [x5], x3
> +
> +        uaddl           v30.8h, v1.8b, v2.8b
> +        uaddl2          v29.8h, v1.16b, v2.16b
> +        add             v30.8h, v30.8h, v20.8h
> +        add             v29.8h, v29.8h, v20.8h
> +
> +        uqshrn          v30.8b, v30.8h, #1
> +        uqshrn2         v30.16b, v20.8h, #1
> +
> +        uabd            v28.16b, v0.16b, v30.16b
> +        uaddlv          h28, v28.16b
> +
> +        add             d18, d18, d28
> +        subs            w4, w4, #1
> +        b.ne            2b
> +
> +3:
> +        fmov            w0, s18
> +        ldp             d8, d9, [sp], 0x10
> +        ldp             d10, d11, [sp], 0x10
> +
> +        ret
> +endfunc
> --
> 2.34.1
>
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [FFmpeg-devel] [PATCH 2/2] lavc/aarch64: Add pix_abs16_x2 neon implementation
  2022-07-11 12:22   ` Hubert Mazur
@ 2022-07-11 19:59     ` Swinney, Jonathan
  0 siblings, 0 replies; 10+ messages in thread
From: Swinney, Jonathan @ 2022-07-11 19:59 UTC (permalink / raw)
  To: Hubert Mazur, ffmpeg-devel
  Cc: Martin Storsjö, Marcin Wojtas, Pop, Sebastian, Grzegorz Bernacki

> +        // accumulate the result in d18
> +        add             d18, d18, d16
> +        add             d18, d18, d17
> +        add             d18, d18, d19
> +        add             d18, d18, d21

Did you experiment with distributing these instructions to each of the iteration blocks? It might be marginally faster since you could reduce the data dependencies in adjacent instructions.

-- 
Jonathan Swinney

From: Hubert Mazur <hum@semihalf.com>
Date: Monday, July 11, 2022 at 7:23 AM
To: "ffmpeg-devel@ffmpeg.org" <ffmpeg-devel@ffmpeg.org>
Cc: "Pop, Sebastian" <spop@amazon.com>, "Swinney, Jonathan" <jswinney@amazon.com>, Martin Storsjö <martin@martin.st>, Grzegorz Bernacki <gjb@semihalf.com>, Marcin Wojtas <mw@semihalf.com>
Subject: RE: [EXTERNAL][PATCH 2/2] lavc/aarch64: Add pix_abs16_x2 neon implementation

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

Hi, do you have any feedback regarding the patch?
Regards,
Hubert

On Wed, Jun 29, 2022 at 10:25 AM Hubert Mazur <mailto:hum@semihalf.com> wrote:
Provide neon implementation for pix_abs16_x2 function.

Performance tests of implementation are below.
 - pix_abs_0_1_c: 291.9
 - pix_abs_0_1_neon: 73.7

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <mailto:hum@semihalf.com>
---
 libavcodec/aarch64/me_cmp_init_aarch64.c |   3 +
 libavcodec/aarch64/me_cmp_neon.S         | 134 +++++++++++++++++++++++
 2 files changed, 137 insertions(+)

diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarch64/me_cmp_init_aarch64.c
index bec9148a1a..136b008eb7 100644
--- a/libavcodec/aarch64/me_cmp_init_aarch64.c
+++ b/libavcodec/aarch64/me_cmp_init_aarch64.c
@@ -27,6 +27,8 @@ int ff_pix_abs16_neon(MpegEncContext *s, uint8_t *blk1, uint8_t *blk2,
                       ptrdiff_t stride, int h);
 int ff_pix_abs16_xy2_neon(MpegEncContext *s, uint8_t *blk1, uint8_t *blk2,
                       ptrdiff_t stride, int h);
+int ff_pix_abs16_x2_neon(MpegEncContext *v, uint8_t *pix1, uint8_t *pix2,
+                      ptrdiff_t stride, int h);

 av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx)
 {
@@ -34,6 +36,7 @@ av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx)

     if (have_neon(cpu_flags)) {
         c->pix_abs[0][0] = ff_pix_abs16_neon;
+        c->pix_abs[0][1] = ff_pix_abs16_x2_neon;
         c->pix_abs[0][3] = ff_pix_abs16_xy2_neon;

         c->sad[0] = ff_pix_abs16_neon;
diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S
index a7937bd8be..c2fd94f4b3 100644
--- a/libavcodec/aarch64/me_cmp_neon.S
+++ b/libavcodec/aarch64/me_cmp_neon.S
@@ -203,3 +203,137 @@ function ff_pix_abs16_xy2_neon, export=1
         fmov            w0, s0                      // copy result to general purpose register
         ret
 endfunc
+
+function ff_pix_abs16_x2_neon, export=1
+        // x0           unused
+        // x1           uint8_t *pix1
+        // x2           uint8_t *pix2
+        // x3           ptrdiff_t stride
+        // x4           int h
+
+        // preserve value of v8-v12 registers
+        stp             d10, d11, [sp, #-0x10]!
+        stp             d8, d9, [sp, #-0x10]!
+
+        // initialize buffers
+        movi            d18, #0
+        movi            v20.8h, #1
+        add             x5, x2, #1 // pix2 + 1
+        cmp             w4, #4
+        http://b.lt            2f
+
+// make 4 iterations at once
+1:
+        // v0 - pix1
+        // v1 - pix2
+        // v2 - pix2 + 1
+        ld1             {v0.16b}, [x1], x3
+        ld1             {v1.16b}, [x2], x3
+        ld1             {v2.16b}, [x5], x3
+
+        ld1             {v3.16b}, [x1], x3
+        ld1             {v4.16b}, [x2], x3
+        ld1             {v5.16b}, [x5], x3
+
+        ld1             {v6.16b}, [x1], x3
+        ld1             {v7.16b}, [x2], x3
+        ld1             {v8.16b}, [x5], x3
+
+        ld1             {v9.16b}, [x1], x3
+        ld1             {v10.16b}, [x2], x3
+        ld1             {v11.16b}, [x5], x3
+
+        // abs(pix1[0] - avg2(pix2[0], pix2[1]))
+        // avg2(a,b) = (((a) + (b) + 1) >> 1)
+        // abs(x) = (x < 0 ? -x : x)
+
+        // pix2[0] + pix2[1]
+        uaddl           v30.8h, v1.8b, v2.8b
+        uaddl2          v29.8h, v1.16b, v2.16b
+        // add one to each element
+        add             v30.8h, v30.8h, v20.8h
+        add             v29.8h, v29.8h, v20.8h
+        // divide by 2, narrow width and store in v30
+        uqshrn          v30.8b, v30.8h, #1
+        uqshrn2         v30.16b, v29.8h, #1
+
+        // abs(pix1[0] - avg2(pix2[0], pix2[1]))
+        uabd            v16.16b, v0.16b, v30.16b
+        uaddlv          h16, v16.16b
+
+        // 2nd iteration
+        uaddl           v28.8h, v4.8b, v5.8b
+        uaddl2          v27.8h, v4.16b, v5.16b
+        add             v28.8h, v28.8h, v20.8h
+        add             v27.8h, v27.8h, v20.8h
+
+        uqshrn          v28.8b, v28.8h, #1
+        uqshrn2         v28.16b, v27.8h, #1
+
+        uabd            v17.16b, v3.16b, v28.16b
+        uaddlv          h17, v17.16b
+
+        // 3rd iteration
+        uaddl           v26.8h, v7.8b, v8.8b
+        uaddl2          v25.8h, v7.16b, v8.16b
+        add             v26.8h, v26.8h, v20.8h
+        add             v25.8h, v25.8h, v20.8h
+
+        uqshrn          v26.8b, v26.8h, #1
+        uqshrn2         v26.16b, v25.8h, #1
+
+        uabd            v19.16b, v6.16b, v26.16b
+        uaddlv          h19, v19.16b
+
+        // 4th iteration
+        uaddl           v24.8h, v10.8b, v11.8b
+        uaddl2          v23.8h, v10.16b, v11.16b
+        add             v24.8h, v24.8h, v20.8h
+        add             v23.8h, v23.8h, v20.8h
+
+        uqshrn          v24.8b, v24.8h, #1
+        uqshrn2         v24.16b, v23.8h, #1
+
+        uabd            v21.16b, v9.16b, v24.16b
+        uaddlv          h21, v21.16b
+
+        sub             w4, w4, #4
+
+        // accumulate the result in d18
+        add             d18, d18, d16
+        add             d18, d18, d17
+        add             d18, d18, d19
+        add             d18, d18, d21
+
+        cmp             w4, #4
+        http://b.ge            1b
+        cbz             w4, 3f
+
+// iterate by one
+2:
+        ld1             {v0.16b}, [x1], x3
+        ld1             {v1.16b}, [x2], x3
+        ld1             {v2.16b}, [x5], x3
+
+        uaddl           v30.8h, v1.8b, v2.8b
+        uaddl2          v29.8h, v1.16b, v2.16b
+        add             v30.8h, v30.8h, v20.8h
+        add             v29.8h, v29.8h, v20.8h
+
+        uqshrn          v30.8b, v30.8h, #1
+        uqshrn2         v30.16b, v20.8h, #1
+
+        uabd            v28.16b, v0.16b, v30.16b
+        uaddlv          h28, v28.16b
+
+        add             d18, d18, d28
+        subs            w4, w4, #1
+        http://b.ne            2b
+
+3:
+        fmov            w0, s18
+        ldp             d8, d9, [sp], 0x10
+        ldp             d10, d11, [sp], 0x10
+
+        ret
+endfunc
-- 
2.34.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [FFmpeg-devel] [PATCH 1/2] lavc/aarch64: Assign callback with function
  2022-06-29  8:24 ` [FFmpeg-devel] [PATCH 1/2] lavc/aarch64: Assign callback with function Hubert Mazur
@ 2022-07-11 20:58   ` Martin Storsjö
  0 siblings, 0 replies; 10+ messages in thread
From: Martin Storsjö @ 2022-07-11 20:58 UTC (permalink / raw)
  To: Hubert Mazur; +Cc: mw, gjb, jswinney, spop, ffmpeg-devel

On Wed, 29 Jun 2022, Hubert Mazur wrote:

> Assign c->sad[0] callback with already existing neon implementation
> of pix_abs16 function.
>
> Signed-off-by: Hubert Mazur <hum@semihalf.com>
> ---
> libavcodec/aarch64/me_cmp_init_aarch64.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarch64/me_cmp_init_aarch64.c
> index 9fb63e9973..bec9148a1a 100644
> --- a/libavcodec/aarch64/me_cmp_init_aarch64.c
> +++ b/libavcodec/aarch64/me_cmp_init_aarch64.c
> @@ -35,5 +35,7 @@ av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx)
>     if (have_neon(cpu_flags)) {
>         c->pix_abs[0][0] = ff_pix_abs16_neon;
>         c->pix_abs[0][3] = ff_pix_abs16_xy2_neon;
> +
> +        c->sad[0] = ff_pix_abs16_neon;
>     }
> }
> -- 
> 2.34.1

LGTM, although I wouldn't use the word "callback" for these. I'll push 
this with a reworded commit message.

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [FFmpeg-devel] [PATCH 2/2] lavc/aarch64: Add pix_abs16_x2 neon implementation
  2022-06-29  8:24 ` [FFmpeg-devel] [PATCH 2/2] lavc/aarch64: Add pix_abs16_x2 neon implementation Hubert Mazur
  2022-07-11 12:22   ` Hubert Mazur
@ 2022-07-11 21:21   ` Martin Storsjö
  2022-07-12  9:15   ` [FFmpeg-devel] [PATCH] " Hubert Mazur
  2 siblings, 0 replies; 10+ messages in thread
From: Martin Storsjö @ 2022-07-11 21:21 UTC (permalink / raw)
  To: Hubert Mazur; +Cc: mw, gjb, jswinney, spop, ffmpeg-devel

On Wed, 29 Jun 2022, Hubert Mazur wrote:

> Provide neon implementation for pix_abs16_x2 function.
>
> Performance tests of implementation are below.
> - pix_abs_0_1_c: 291.9
> - pix_abs_0_1_neon: 73.7
>
> Benchmarks and tests run with checkasm tool on AWS Graviton 3.
>
> Signed-off-by: Hubert Mazur <hum@semihalf.com>
> ---
> libavcodec/aarch64/me_cmp_init_aarch64.c |   3 +
> libavcodec/aarch64/me_cmp_neon.S         | 134 +++++++++++++++++++++++
> 2 files changed, 137 insertions(+)
>
> diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S
> index a7937bd8be..c2fd94f4b3 100644
> --- a/libavcodec/aarch64/me_cmp_neon.S
> +++ b/libavcodec/aarch64/me_cmp_neon.S
> @@ -203,3 +203,137 @@ function ff_pix_abs16_xy2_neon, export=1
>         fmov            w0, s0                      // copy result to general purpose register
>         ret
> endfunc
> +
> +function ff_pix_abs16_x2_neon, export=1
> +        // x0           unused
> +        // x1           uint8_t *pix1
> +        // x2           uint8_t *pix2
> +        // x3           ptrdiff_t stride
> +        // x4           int h

As this is 'int', it would be w4, not x4

> +
> +        // preserve value of v8-v12 registers
> +        stp             d10, d11, [sp, #-0x10]!
> +        stp             d8, d9, [sp, #-0x10]!
> +

Yes, if possible, avoid using v8-v15. Also if you still do need to back up 
registers, don't update sp in each write; something like this is 
preferred:

     stp d8,  d9,  [sp, #-0x20]!
     stp d10, d11, [sp, #0x10]


> +        // initialize buffers
> +        movi            d18, #0
> +        movi            v20.8h, #1
> +        add             x5, x2, #1 // pix2 + 1
> +        cmp             w4, #4
> +        b.lt            2f

Do the cmp earlier, e.g. before the first movi, to avoid having b.lt 
needing to wait for the result of the cmp.

> +
> +// make 4 iterations at once
> +1:
> +        // v0 - pix1
> +        // v1 - pix2
> +        // v2 - pix2 + 1
> +        ld1             {v0.16b}, [x1], x3
> +        ld1             {v1.16b}, [x2], x3
> +        ld1             {v2.16b}, [x5], x3
> +
> +        ld1             {v3.16b}, [x1], x3
> +        ld1             {v4.16b}, [x2], x3
> +        ld1             {v5.16b}, [x5], x3
> +
> +        ld1             {v6.16b}, [x1], x3
> +        ld1             {v7.16b}, [x2], x3
> +        ld1             {v8.16b}, [x5], x3
> +
> +        ld1             {v9.16b}, [x1], x3
> +        ld1             {v10.16b}, [x2], x3
> +        ld1             {v11.16b}, [x5], x3

I guess this goes for the existing ff_pix_abs16_xy2_neon too, but I think 
it could be more efficient to start doing e.g. the first few steps of the 
first iteration after loading the data for the second iteration.

> +
> +        // abs(pix1[0] - avg2(pix2[0], pix2[1]))
> +        // avg2(a,b) = (((a) + (b) + 1) >> 1)
> +        // abs(x) = (x < 0 ? -x : x)
> +
> +        // pix2[0] + pix2[1]
> +        uaddl           v30.8h, v1.8b, v2.8b
> +        uaddl2          v29.8h, v1.16b, v2.16b
> +        // add one to each element
> +        add             v30.8h, v30.8h, v20.8h
> +        add             v29.8h, v29.8h, v20.8h
> +        // divide by 2, narrow width and store in v30
> +        uqshrn          v30.8b, v30.8h, #1
> +        uqshrn2         v30.16b, v29.8h, #1

Instead of add+uqshrn, you can do uqrshrn, where the 'r' stands for 
rounding, which implicitly adds the 1 before right shifting. But for this 
particular case, there's an even simpler alternative; you can do rhadd, 
which does rounding halving add, which avoids the whole widening/narrowing 
here. Thus these 6 instructions could just be "rhadd v30.16b, v1.16b, 
v2.16b".


> +
> +        // abs(pix1[0] - avg2(pix2[0], pix2[1]))
> +        uabd            v16.16b, v0.16b, v30.16b
> +        uaddlv          h16, v16.16b

In general, avoid doing the horizontal adds (uaddlv here) too early.

Here, I think it would be better to just accumulate things in a regular 
vector (e.g. "uaddw v18.8h, v18.8h, v16.8b", "uaddw2 v19.8h, v19.8h, 
v16.16b"), then finally add v18.8h and v19.8h into each other in the end, 
and just do one single addv.

Also then you can fuse the accumulation into the absolute operation, so 
you should be able to make do with just uabal + uabal2.

The finally when you have the calculation for each iteration simplified as 
suggested, it becomes a tight sequence of 3 instructions where each of 
them relies on the result of the previous one. Then it's better to 
interleave the instructions from the 4 parallel iterations, e.g. like 
this:

     load (1st iteration)
     load (2nd iteration)
     rhadd (1st iteration)
     load (3rd iteration)
     rhadd (2nd iteration)
     uabal (1st iteration)
     uabal2 (1st iteration)
     load (4th iteration)
     rhadd (3rd iteration)
     uabal (2nd iteration)
     uabal2 (2nd iteration)
     rhadd (4th iteration)
     uabal (3rd iteration)
     uabal2 (3rd iteration)
     uabal (4th iteration)
     uabal2 (4th iteration)

That way, you have a quite ideal distance between all instructions and the 
preceding/following instructions that depend on its output.

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [FFmpeg-devel] [PATCH] lavc/aarch64: Add pix_abs16_x2 neon implementation
  2022-06-29  8:24 ` [FFmpeg-devel] [PATCH 2/2] lavc/aarch64: Add pix_abs16_x2 neon implementation Hubert Mazur
  2022-07-11 12:22   ` Hubert Mazur
  2022-07-11 21:21   ` Martin Storsjö
@ 2022-07-12  9:15   ` Hubert Mazur
  2022-07-12  9:15     ` Hubert Mazur
  2 siblings, 1 reply; 10+ messages in thread
From: Hubert Mazur @ 2022-07-12  9:15 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: gjb, upstream, jswinney, martin, mw, spop


Thanks for the feedback. I made changes to the patch.
The performance has increased now to ~7 boost compared
to C implementation.

Changes:
- Do not use v8-v15 registers.
- Use urhadd instruction.
- Reorder the instructions to increase performance.

// Hubert

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [FFmpeg-devel] [PATCH] lavc/aarch64: Add pix_abs16_x2 neon implementation
  2022-07-12  9:15   ` [FFmpeg-devel] [PATCH] " Hubert Mazur
@ 2022-07-12  9:15     ` Hubert Mazur
  2022-07-13 20:29       ` Martin Storsjö
  0 siblings, 1 reply; 10+ messages in thread
From: Hubert Mazur @ 2022-07-12  9:15 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: gjb, upstream, jswinney, Hubert Mazur, martin, mw, spop

Provide neon implementation for pix_abs16_x2 function.

Performance tests of implementation are below.
 - pix_abs_0_1_c: 283.5
 - pix_abs_0_1_neon: 39.0

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
---
 libavcodec/aarch64/me_cmp_init_aarch64.c |  3 +
 libavcodec/aarch64/me_cmp_neon.S         | 75 ++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarch64/me_cmp_init_aarch64.c
index bec9148a1a..136b008eb7 100644
--- a/libavcodec/aarch64/me_cmp_init_aarch64.c
+++ b/libavcodec/aarch64/me_cmp_init_aarch64.c
@@ -27,6 +27,8 @@ int ff_pix_abs16_neon(MpegEncContext *s, uint8_t *blk1, uint8_t *blk2,
                       ptrdiff_t stride, int h);
 int ff_pix_abs16_xy2_neon(MpegEncContext *s, uint8_t *blk1, uint8_t *blk2,
                       ptrdiff_t stride, int h);
+int ff_pix_abs16_x2_neon(MpegEncContext *v, uint8_t *pix1, uint8_t *pix2,
+                      ptrdiff_t stride, int h);
 
 av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx)
 {
@@ -34,6 +36,7 @@ av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx)
 
     if (have_neon(cpu_flags)) {
         c->pix_abs[0][0] = ff_pix_abs16_neon;
+        c->pix_abs[0][1] = ff_pix_abs16_x2_neon;
         c->pix_abs[0][3] = ff_pix_abs16_xy2_neon;
 
         c->sad[0] = ff_pix_abs16_neon;
diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S
index a7937bd8be..e49d049fc2 100644
--- a/libavcodec/aarch64/me_cmp_neon.S
+++ b/libavcodec/aarch64/me_cmp_neon.S
@@ -203,3 +203,78 @@ function ff_pix_abs16_xy2_neon, export=1
         fmov            w0, s0                      // copy result to general purpose register
         ret
 endfunc
+
+function ff_pix_abs16_x2_neon, export=1
+        // x0           unused
+        // x1           uint8_t *pix1
+        // x2           uint8_t *pix2
+        // x3           ptrdiff_t stride
+        // w4           int h
+
+        cmp             w4, #4
+        // initialize buffers
+        movi            d20, #0
+        add             x5, x2, #1 // pix2 + 1
+        b.lt            2f
+
+// make 4 iterations at once
+1:
+
+        // abs(pix1[0] - avg2(pix2[0], pix2[1]))
+        // avg2(a,b) = (((a) + (b) + 1) >> 1)
+        // abs(x) = (x < 0 ? -x : x)
+
+        ld1             {v1.16b}, [x2], x3
+        ld1             {v2.16b}, [x5], x3
+        urhadd          v30.16b, v1.16b, v2.16b
+        ld1             {v0.16b}, [x1], x3
+        uabdl           v16.8h, v0.8b, v30.8b
+        ld1             {v4.16b}, [x2], x3
+        uabdl2          v17.8h, v0.16b, v30.16b
+        ld1             {v5.16b}, [x5], x3
+        urhadd          v29.16b, v4.16b, v5.16b
+        ld1             {v3.16b}, [x1], x3
+        uabal           v16.8h, v3.8b, v29.8b
+        ld1             {v7.16b}, [x2], x3
+        uabal2          v17.8h, v3.16b, v29.16b
+        ld1             {v22.16b}, [x5], x3
+        urhadd          v28.16b, v7.16b, v22.16b
+        ld1             {v6.16b}, [x1], x3
+        uabal           v16.8h, v6.8b, v28.8b
+        ld1             {v24.16b}, [x2], x3
+        uabal2          v17.8h, v6.16b, v28.16b
+        ld1             {v25.16b}, [x5], x3
+        urhadd          v27.16b, v24.16b, v25.16b
+        ld1             {v23.16b}, [x1], x3
+        uabal           v16.8h, v23.8b, v27.8b
+        uabal2          v17.8h, v23.16b, v27.16b
+
+        sub             w4, w4, #4
+
+        add             v16.8h, v16.8h, v17.8h
+        uaddlv          s16, v16.8h
+        cmp             w4, #4
+        add             d20, d20, d16
+
+        b.ge            1b
+        cbz             w4, 3f
+
+// iterate by one
+2:
+        ld1             {v1.16b}, [x2], x3
+        ld1             {v2.16b}, [x5], x3
+        urhadd          v29.16b, v1.16b, v2.16b
+        ld1             {v0.16b}, [x1], x3
+        uabd            v28.16b, v0.16b, v29.16b
+
+        uaddlv          h28, v28.16b
+        subs            w4, w4, #1
+
+        add             d20, d20, d28
+        b.ne            2b
+
+3:
+        fmov            w0, s20
+
+        ret
+endfunc
-- 
2.34.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [FFmpeg-devel] [PATCH] lavc/aarch64: Add pix_abs16_x2 neon implementation
  2022-07-12  9:15     ` Hubert Mazur
@ 2022-07-13 20:29       ` Martin Storsjö
  0 siblings, 0 replies; 10+ messages in thread
From: Martin Storsjö @ 2022-07-13 20:29 UTC (permalink / raw)
  To: Hubert Mazur; +Cc: gjb, upstream, jswinney, ffmpeg-devel, mw, spop

On Tue, 12 Jul 2022, Hubert Mazur wrote:

> Provide neon implementation for pix_abs16_x2 function.
>
> Performance tests of implementation are below.
> - pix_abs_0_1_c: 283.5
> - pix_abs_0_1_neon: 39.0
>
> Benchmarks and tests run with checkasm tool on AWS Graviton 3.
>
> Signed-off-by: Hubert Mazur <hum@semihalf.com>
> ---
> libavcodec/aarch64/me_cmp_init_aarch64.c |  3 +
> libavcodec/aarch64/me_cmp_neon.S         | 75 ++++++++++++++++++++++++
> 2 files changed, 78 insertions(+)

Thanks, I think this looks good enough to me, thus pushed.

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-07-13 20:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-29  8:24 [FFmpeg-devel] [PATCH 0/2] lavc/aarch64: Provide neon implementations Hubert Mazur
2022-06-29  8:24 ` [FFmpeg-devel] [PATCH 1/2] lavc/aarch64: Assign callback with function Hubert Mazur
2022-07-11 20:58   ` Martin Storsjö
2022-06-29  8:24 ` [FFmpeg-devel] [PATCH 2/2] lavc/aarch64: Add pix_abs16_x2 neon implementation Hubert Mazur
2022-07-11 12:22   ` Hubert Mazur
2022-07-11 19:59     ` Swinney, Jonathan
2022-07-11 21:21   ` Martin Storsjö
2022-07-12  9:15   ` [FFmpeg-devel] [PATCH] " Hubert Mazur
2022-07-12  9:15     ` Hubert Mazur
2022-07-13 20:29       ` Martin Storsjö

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git