[FFmpeg-devel] [PATCH 1/3][GSoC 2024] libavcodec/vvc: convert (*sad) to (*sad[6]) to prepare for AVX2 funcs

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed

* [FFmpeg-devel] [PATCH 1/3][GSoC 2024] libavcodec/vvc: convert (*sad) to (*sad[6]) to prepare for AVX2 funcs
@ 2024-05-01 22:39 Stone Chen
  2024-05-01 22:39 ` [FFmpeg-devel] [PATCH 2/3][GSoC 2024] libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC Stone Chen
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Stone Chen @ 2024-05-01 22:39 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Stone Chen

To prepare for adding AVX2 functions for different block widths, change VVCInterDSPContext to contain (*sad[6]) instead of (*sad). This also default initializes the pointer array with the scalar function and the calling sites to jump to the correct function based on block width. There's no change in functionality.
---
 libavcodec/vvc/dsp.h            | 2 +-
 libavcodec/vvc/inter.c          | 4 ++--
 libavcodec/vvc/inter_template.c | 5 ++++-
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/libavcodec/vvc/dsp.h b/libavcodec/vvc/dsp.h
index 9810ac314c..b06a3ef10e 100644
--- a/libavcodec/vvc/dsp.h
+++ b/libavcodec/vvc/dsp.h
@@ -86,7 +86,7 @@ typedef struct VVCInterDSPContext {
 
     void (*apply_bdof)(uint8_t *dst, ptrdiff_t dst_stride, int16_t *src0, int16_t *src1, int block_w, int block_h);
 
-    int (*sad)(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h);
+    int (*sad[6])(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h);
     void (*dmvr[2][2])(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, int height,
         intptr_t mx, intptr_t my, int width);
 } VVCInterDSPContext;
diff --git a/libavcodec/vvc/inter.c b/libavcodec/vvc/inter.c
index 4a8d1d866a..a68f4f9452 100644
--- a/libavcodec/vvc/inter.c
+++ b/libavcodec/vvc/inter.c
@@ -742,7 +742,7 @@ static void dmvr_mv_refine(VVCLocalContext *lc, MvField *mvf, MvField *orig_mv,
         fc->vvcdsp.inter.dmvr[!!my][!!mx](tmp[i], src, src_stride, pred_h, mx, my, pred_w);
     }
 
-    min_sad = fc->vvcdsp.inter.sad(tmp[L0], tmp[L1], dx, dy, block_w, block_h);
+    min_sad = fc->vvcdsp.inter.sad[av_log2(block_w) - 2](tmp[L0], tmp[L1], dx, dy, block_w, block_h);
     min_sad -= min_sad >> 2;
     sad[dy][dx] = min_sad;
 
@@ -752,7 +752,7 @@ static void dmvr_mv_refine(VVCLocalContext *lc, MvField *mvf, MvField *orig_mv,
         for (dy = 0; dy < SAD_ARRAY_SIZE; dy++) {
             for (dx = 0; dx < SAD_ARRAY_SIZE; dx++) {
                 if (dx != sr_range || dy != sr_range) {
-                    sad[dy][dx] = fc->vvcdsp.inter.sad(lc->tmp, lc->tmp1, dx, dy, block_w, block_h);
+                    sad[dy][dx] = fc->vvcdsp.inter.sad[av_log2(block_w) - 2](lc->tmp, lc->tmp1, dx, dy, block_w, block_h);
                     if (sad[dy][dx] < min_sad) {
                         min_sad = sad[dy][dx];
                         min_dx = dx;
diff --git a/libavcodec/vvc/inter_template.c b/libavcodec/vvc/inter_template.c
index e2fbfd4fc0..545e8dd184 100644
--- a/libavcodec/vvc/inter_template.c
+++ b/libavcodec/vvc/inter_template.c
@@ -458,7 +458,10 @@ static void FUNC(ff_vvc_inter_dsp_init)(VVCInterDSPContext *const inter)
     inter->apply_prof_uni_w     = FUNC(apply_prof_uni_w);
     inter->apply_bdof           = FUNC(apply_bdof);
     inter->prof_grad_filter     = FUNC(prof_grad_filter);
-    inter->sad                  = vvc_sad;
+    
+    for (int i = 0; i < FF_ARRAY_ELEMS(inter->sad); i++) {
+        inter->sad[i]           = vvc_sad;
+    }
 }
 
 #undef FUNCS
-- 
2.44.0

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [FFmpeg-devel] [PATCH 2/3][GSoC 2024] libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC
  2024-05-01 22:39 [FFmpeg-devel] [PATCH 1/3][GSoC 2024] libavcodec/vvc: convert (*sad) to (*sad[6]) to prepare for AVX2 funcs Stone Chen
@ 2024-05-01 22:39 ` Stone Chen
  2024-05-01 22:40 ` [FFmpeg-devel] [PATCH 3/3][GSoC 2024] tests/checkasm: Add check_vvc_sad to vvc_mc.c Stone Chen
  2024-05-01 22:59 ` [FFmpeg-devel] [PATCH 1/3][GSoC 2024] libavcodec/vvc: convert (*sad) to (*sad[6]) to prepare for AVX2 funcs Andreas Rheinhardt
  2 siblings, 0 replies; 5+ messages in thread
From: Stone Chen @ 2024-05-01 22:39 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Stone Chen

Implements AVX2 DMVR (decoder-side motion vector refinement) SAD functions. DMVR SAD is only calculated if w >= 8, h >= 8, and w * h > 128. To reduce complexity, SAD is only calculated on even rows. This is calculated for all video bitdepths, but the values passed to the function are always 16bit (even if the original video bitdepth is 8). The AVX2 implementation uses min/max/sub.

Before:
BQTerrace_1920x1080_60_10_420_22_RA.vvc | 80.7 |
Chimera_8bit_1080P_1000_frames.vvc | 158.0 |
NovosobornayaSquare_1920x1080.bin | 159.7 |
RitualDance_1920x1080_60_10_420_37_RA.266 | 146.3 |

After:
BQTerrace_1920x1080_60_10_420_22_RA.vvc | 82.7 |
Chimera_8bit_1080P_1000_frames.vvc | 167.0 |
NovosobornayaSquare_1920x1080.bin | 166.3 |
RitualDance_1920x1080_60_10_420_37_RA.266 | 154.0 |
---
 libavcodec/x86/vvc/Makefile      |   3 +-
 libavcodec/x86/vvc/vvc_sad.asm   | 193 +++++++++++++++++++++++++++++++
 libavcodec/x86/vvc/vvcdsp_init.c |  15 +++
 3 files changed, 210 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/x86/vvc/vvc_sad.asm

diff --git a/libavcodec/x86/vvc/Makefile b/libavcodec/x86/vvc/Makefile
index d1623bd46a..9eb5f65c7c 100644
--- a/libavcodec/x86/vvc/Makefile
+++ b/libavcodec/x86/vvc/Makefile
@@ -4,4 +4,5 @@ clean::
 OBJS-$(CONFIG_VVC_DECODER)             += x86/vvc/vvcdsp_init.o \
                                           x86/h26x/h2656dsp.o
 X86ASM-OBJS-$(CONFIG_VVC_DECODER)      += x86/vvc/vvc_mc.o       \
-                                          x86/h26x/h2656_inter.o
+                                          x86/h26x/h2656_inter.o \
+                                          x86/vvc/vvc_sad.o
diff --git a/libavcodec/x86/vvc/vvc_sad.asm b/libavcodec/x86/vvc/vvc_sad.asm
new file mode 100644
index 0000000000..06be3a936a
--- /dev/null
+++ b/libavcodec/x86/vvc/vvc_sad.asm
@@ -0,0 +1,193 @@
+; /*
+; * Provide SIMD DMVR SAD functions for VVC decoding
+; *
+; * Copyright (c) 2024 Stone Chen
+; *
+; * This file is part of FFmpeg.
+; *
+; * FFmpeg is free software; you can redistribute it and/or
+; * modify it under the terms of the GNU Lesser General Public
+; * License as published by the Free Software Foundation; either
+; * version 2.1 of the License, or (at your option) any later version.
+; *
+; * FFmpeg is distributed in the hope that it will be useful,
+; * but WITHOUT ANY WARRANTY; without even the implied warranty of
+; * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+; * Lesser General Public License for more details.
+; *
+; * You should have received a copy of the GNU Lesser General Public
+; * License along with FFmpeg; if not, write to the Free Software
+; * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+; */
+
+%include "libavutil/x86/x86util.asm"
+
+%define MAX_PB_SIZE 128
+%define ROWS 2          ; DMVR SAD is only calculated on even rows to reduce complexity
+
+SECTION .text
+
+%macro MIN_MAX_SAD 3 ; 
+    vpminuw           %1, %2, %3
+    vpmaxuw           %3, %2, %3
+    vpsubusw          %3, %3, %1
+%endmacro
+
+%macro HORIZ_ADD 3  ; xm0, xm1, m1
+    vextracti128      %1, %3, q0001  ;        3        2      1          0
+    vpaddd            %1, %2         ; xm0 (7 + 3) (6 + 2) (5 + 1)   (4 + 0)
+    vpshufd           %2, %1, q0032  ; xm1    -      -     (7 + 3)   (6 + 2)
+    vpaddd            %1, %1, %2     ; xm0    _      _     (5 1 7 3) (4 0 6 2)
+    vpshufd           %2, %1, q0001  ; xm1    _      _     (5 1 7 3) (5 1 7 3)
+    vpaddd            %1, %1, %2     ;                               (01234567)
+%endmacro
+
+%macro INIT_OFFSET 6 ; src1, src2, dxq, dyq, off1, off2
+    sub             %3, 2
+    sub             %4, 2
+
+    mov             %5, 2
+    mov             %6, 2
+
+    add             %5, %4   
+    sub             %6, %4
+
+    imul            %5, 128
+    imul            %6, 128
+
+    add             %5, 2
+    add             %6, 2
+    
+    add             %5, %3
+    sub             %6, %3
+
+    lea             %1, [%1 + %5 * 2]
+    lea             %2, [%2 + %6 * 2]
+%endmacro
+
+%if ARCH_X86_64
+%if HAVE_AVX2_EXTERNAL
+
+INIT_YMM avx2
+
+cglobal vvc_sad_8, 6, 8, 13, src1, src2, dx, dy, block_w, block_h, off1, off2
+
+    INIT_OFFSET src1q, src2q, dxq, dyq, off1q, off2q
+    pxor               m3, m3
+
+    .loop_height:
+        movu              xm0, [src1q]
+        movu              xm1, [src2q]
+        MIN_MAX_SAD       xm2, xm0, xm1
+        vpmovzxwd          m1, xm1
+        vpaddd             m3, m1
+
+        movu              xm5, [src1q + MAX_PB_SIZE * ROWS * 2]
+        movu              xm6, [src2q + MAX_PB_SIZE * ROWS * 2]
+        MIN_MAX_SAD       xm7, xm5, xm6
+        vpmovzxwd          m6, xm6
+        vpaddd             m3, m6
+
+        movu              xm8, [src1q + MAX_PB_SIZE * 2 * ROWS * 2]
+        movu              xm9, [src2q + MAX_PB_SIZE * 2 * ROWS * 2]
+        MIN_MAX_SAD xm10, xm8, xm9
+        vpmovzxwd          m9, xm9
+        vpaddd             m3, m9
+
+        movu             xm11, [src1q + MAX_PB_SIZE * 3 * ROWS * 2]
+        movu             xm12, [src2q + MAX_PB_SIZE * 3 * ROWS * 2]
+        MIN_MAX_SAD      xm13, xm11, xm12
+        vpmovzxwd         m12, xm12
+
+        vpaddd             m3, m12
+
+        add             src1q, MAX_PB_SIZE * 4 * ROWS * 2 
+        add             src2q, MAX_PB_SIZE * 4 * ROWS * 2
+
+        sub          block_hd, 8
+        jg       .loop_height
+
+        HORIZ_ADD         xm0, xm3, m3
+        movd              eax, xm0
+    RET
+
+cglobal vvc_sad_16, 6, 8, 13, src1, src2, dx, dy, block_w, block_h, off1, off2
+    INIT_OFFSET src1q, src2q, dxq, dyq, off1q, off2q
+    pxor               m8, m8
+.load_pixels:
+        movu              xm0, [src1q]
+        movu              xm1, [src2q]
+        MIN_MAX_SAD       xm2, xm0, xm1
+        vpmovzxwd          m1, xm1
+        vpaddd             m8, m1
+
+        movu              xm5, [src1q + 16]
+        movu              xm6, [src2q + 16]
+        MIN_MAX_SAD       xm7, xm5, xm6
+        vpmovzxwd          m6, xm6
+        vpaddd             m8, m6
+
+        add             src1q, ROWS * MAX_PB_SIZE * 2 
+        add             src2q, ROWS * MAX_PB_SIZE * 2
+
+        sub          block_hd, 2
+        jg       .load_pixels
+
+        HORIZ_ADD          xm0, xm8, m8
+        movd               eax, xm0
+
+    RET
+
+cglobal vvc_sad_32_128, 6, 9, 13, src1, src2, dx, dy, block_w, block_h, off1, off2, row_idx
+    INIT_OFFSET src1q, src2q, dxq, dyq, off1q, off2q
+    pxor                 m3, m3
+
+.loop_height:
+    mov               off1q, src1q
+    mov               off2q, src2q
+    mov            row_idxd, block_wd
+    sar            row_idxd, 5
+
+    .loop_width:
+        movu              xm0, [src1q]
+        movu              xm1, [src2q]
+        MIN_MAX_SAD       xm2, xm0, xm1
+        vpmovzxwd          m1, xm1
+        vpaddd             m3, m1
+
+        movu              xm5, [src1q + 16]
+        movu              xm6, [src2q + 16]
+        MIN_MAX_SAD       xm7, xm5, xm6
+        vpmovzxwd          m6, xm6
+        vpaddd             m3, m6
+
+        movu              xm8, [src1q + 32]
+        movu              xm9, [src2q + 32]
+        MIN_MAX_SAD      xm10, xm8, xm9
+        vpmovzxwd          m9, xm9
+        vpaddd             m3, m9
+
+        movu             xm11, [src1q + 48]
+        movu             xm12, [src2q + 48]
+        MIN_MAX_SAD      xm13, xm11, xm12
+        vpmovzxwd         m12, xm12
+        vpaddd             m3, m12
+
+        add             src1q, 64
+        add             src2q, 64
+        dec          row_idxd
+        jg        .loop_width
+
+    lea             src1q, [off1q + ROWS * MAX_PB_SIZE * 2] 
+    lea             src2q, [off2q + ROWS * MAX_PB_SIZE * 2]
+
+    sub          block_hq, 2
+    jg       .loop_height
+
+    HORIZ_ADD         xm0, xm3, m3
+    movd              eax, xm0
+
+    RET
+
+%endif
+%endif
diff --git a/libavcodec/x86/vvc/vvcdsp_init.c b/libavcodec/x86/vvc/vvcdsp_init.c
index 985d750472..7760372176 100644
--- a/libavcodec/x86/vvc/vvcdsp_init.c
+++ b/libavcodec/x86/vvc/vvcdsp_init.c
@@ -252,6 +252,18 @@ AVG_FUNCS(16, 12, avx2)
     c->inter.avg    = bf(ff_vvc_avg, bd, opt);                       \
     c->inter.w_avg  = bf(ff_vvc_w_avg, bd, opt);                     \
 } while (0)
+
+int ff_vvc_sad_8_avx2(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h);
+int ff_vvc_sad_16_avx2(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h);
+int ff_vvc_sad_32_128_avx2(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h);
+
+#define SAD_INIT() do {                                           \
+    c->inter.sad[1] = ff_vvc_sad_8_avx2;                          \
+    c->inter.sad[2] = ff_vvc_sad_16_avx2;                         \
+    c->inter.sad[3] = ff_vvc_sad_32_128_avx2;                     \
+    c->inter.sad[4] = ff_vvc_sad_32_128_avx2;                     \
+    c->inter.sad[5] = ff_vvc_sad_32_128_avx2;                     \
+} while (0)
 #endif
 
 void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd)
@@ -265,6 +277,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd)
         }
         if (EXTERNAL_AVX2_FAST(cpu_flags)) {
             MC_LINKS_AVX2(8);
+            SAD_INIT();
         }
     } else if (bd == 10) {
         if (EXTERNAL_SSE4(cpu_flags)) {
@@ -273,6 +286,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd)
         if (EXTERNAL_AVX2_FAST(cpu_flags)) {
             MC_LINKS_AVX2(10);
             MC_LINKS_16BPC_AVX2(10);
+            SAD_INIT();
         }
     } else if (bd == 12) {
         if (EXTERNAL_SSE4(cpu_flags)) {
@@ -281,6 +295,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd)
         if (EXTERNAL_AVX2_FAST(cpu_flags)) {
             MC_LINKS_AVX2(12);
             MC_LINKS_16BPC_AVX2(12);
+            SAD_INIT();
         }
     }
 
-- 
2.44.0

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [FFmpeg-devel] [PATCH 3/3][GSoC 2024] tests/checkasm: Add check_vvc_sad to vvc_mc.c
  2024-05-01 22:39 [FFmpeg-devel] [PATCH 1/3][GSoC 2024] libavcodec/vvc: convert (*sad) to (*sad[6]) to prepare for AVX2 funcs Stone Chen
  2024-05-01 22:39 ` [FFmpeg-devel] [PATCH 2/3][GSoC 2024] libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC Stone Chen
@ 2024-05-01 22:40 ` Stone Chen
  2024-05-01 22:59 ` [FFmpeg-devel] [PATCH 1/3][GSoC 2024] libavcodec/vvc: convert (*sad) to (*sad[6]) to prepare for AVX2 funcs Andreas Rheinhardt
  2 siblings, 0 replies; 5+ messages in thread
From: Stone Chen @ 2024-05-01 22:40 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Stone Chen

Adds checkasm for DMVR SAD AVX2 implementation.

Benchmarks ( AMD 7940HS )
vvc_sad_8_16bpc_c: 112.5
vvc_sad_8_16bpc_avx2: 2.5
vvc_sad_16_16bpc_c: 232.5
vvc_sad_16_16bpc_avx2: 22.5
vvc_sad_32_16bpc_c: 912.5
vvc_sad_32_16bpc_avx2: 82.5
vvc_sad_64_16bpc_c: 3582.5
vvc_sad_64_16bpc_avx2: 392.5
vvc_sad_128_16bpc_c: 16702.5
vvc_sad_128_16bpc_avx2: 1912.5

Before:
BQTerrace_1920x1080_60_10_420_22_RA.vvc | 80.7 |
Chimera_8bit_1080P_1000_frames.vvc | 158.0 |
NovosobornayaSquare_1920x1080.bin | 159.7 |
RitualDance_1920x1080_60_10_420_37_RA.266 | 146.3 |

After:
BQTerrace_1920x1080_60_10_420_22_RA.vvc | 82.7 |
Chimera_8bit_1080P_1000_frames.vvc | 167.0 |
NovosobornayaSquare_1920x1080.bin | 166.3 |
RitualDance_1920x1080_60_10_420_37_RA.266 | 154.0 |
---
 tests/checkasm/vvc_mc.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/tests/checkasm/vvc_mc.c b/tests/checkasm/vvc_mc.c
index 97f57cb401..77dd32fbbb 100644
--- a/tests/checkasm/vvc_mc.c
+++ b/tests/checkasm/vvc_mc.c
@@ -322,8 +322,46 @@ static void check_avg(void)
     report("avg");
 }
 
+static void check_vvc_sad(void)
+{
+    const int bit_depth = 10;
+    VVCDSPContext c;
+    LOCAL_ALIGNED_32(uint16_t, src0, [MAX_CTU_SIZE * MAX_CTU_SIZE * 4]);
+    LOCAL_ALIGNED_32(uint16_t, src1, [MAX_CTU_SIZE * MAX_CTU_SIZE * 4]);
+    declare_func(int, const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h);
+
+    ff_vvc_dsp_init(&c, bit_depth);
+    memset(src0, 0, MAX_CTU_SIZE * MAX_CTU_SIZE * 2);
+    memset(src1, 0, MAX_CTU_SIZE * MAX_CTU_SIZE * 2);
+
+    randomize_pixels(src0, src1, MAX_CTU_SIZE * MAX_CTU_SIZE * 2);
+     for (int h = 8; h <= MAX_CTU_SIZE; h *= 2) {
+        for (int w = 8; w <= MAX_CTU_SIZE; w *= 2) {
+            for(int offy = 0; offy <= 4; offy++) {
+                for(int offx = 0; offx <= 4; offx++) {
+                    if(check_func(c.inter.sad[av_log2(w)-2], "vvc_sad_%dx%d", w, h)) {
+                        int result0;
+                        int result1;
+
+                        result0 =  call_ref(src0 + PIXEL_STRIDE * 2 + 2, src1 + PIXEL_STRIDE * 2 + 2, offx, offy, w, h);
+                        result1 =  call_new(src0 + PIXEL_STRIDE * 2 + 2, src1 + PIXEL_STRIDE * 2 + 2, offx, offy, w, h);
+
+                        if (result1 != result0)
+                            fail();
+                        if(w == h && offx == 0 && offy == 0)
+                            bench_new(src0 + PIXEL_STRIDE * 2 + 2, src1 + PIXEL_STRIDE * 2 + 2, offx, offy, w, h);
+                    }
+                }
+            }
+        }
+     }
+
+    report("check_vvc_sad");
+}
+
 void checkasm_check_vvc_mc(void)
 {
+    check_vvc_sad();
     check_put_vvc_luma();
     check_put_vvc_luma_uni();
     check_put_vvc_chroma();
-- 
2.44.0

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [FFmpeg-devel] [PATCH 1/3][GSoC 2024] libavcodec/vvc: convert (*sad) to (*sad[6]) to prepare for AVX2 funcs
  2024-05-01 22:39 [FFmpeg-devel] [PATCH 1/3][GSoC 2024] libavcodec/vvc: convert (*sad) to (*sad[6]) to prepare for AVX2 funcs Stone Chen
  2024-05-01 22:39 ` [FFmpeg-devel] [PATCH 2/3][GSoC 2024] libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC Stone Chen
  2024-05-01 22:40 ` [FFmpeg-devel] [PATCH 3/3][GSoC 2024] tests/checkasm: Add check_vvc_sad to vvc_mc.c Stone Chen
@ 2024-05-01 22:59 ` Andreas Rheinhardt
  2024-05-06 17:02   ` Stone Chen
  2 siblings, 1 reply; 5+ messages in thread
From: Andreas Rheinhardt @ 2024-05-01 22:59 UTC (permalink / raw)
  To: ffmpeg-devel

Stone Chen:
> To prepare for adding AVX2 functions for different block widths, change VVCInterDSPContext to contain (*sad[6]) instead of (*sad). This also default initializes the pointer array with the scalar function and the calling sites to jump to the correct function based on block width. There's no change in functionality.
> ---
>  libavcodec/vvc/dsp.h            | 2 +-
>  libavcodec/vvc/inter.c          | 4 ++--
>  libavcodec/vvc/inter_template.c | 5 ++++-
>  3 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/libavcodec/vvc/dsp.h b/libavcodec/vvc/dsp.h
> index 9810ac314c..b06a3ef10e 100644
> --- a/libavcodec/vvc/dsp.h
> +++ b/libavcodec/vvc/dsp.h
> @@ -86,7 +86,7 @@ typedef struct VVCInterDSPContext {
>  
>      void (*apply_bdof)(uint8_t *dst, ptrdiff_t dst_stride, int16_t *src0, int16_t *src1, int block_w, int block_h);
>  
> -    int (*sad)(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h);
> +    int (*sad[6])(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h);
>      void (*dmvr[2][2])(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, int height,
>          intptr_t mx, intptr_t my, int width);
>  } VVCInterDSPContext;
> diff --git a/libavcodec/vvc/inter.c b/libavcodec/vvc/inter.c
> index 4a8d1d866a..a68f4f9452 100644
> --- a/libavcodec/vvc/inter.c
> +++ b/libavcodec/vvc/inter.c
> @@ -742,7 +742,7 @@ static void dmvr_mv_refine(VVCLocalContext *lc, MvField *mvf, MvField *orig_mv,
>          fc->vvcdsp.inter.dmvr[!!my][!!mx](tmp[i], src, src_stride, pred_h, mx, my, pred_w);
>      }
>  
> -    min_sad = fc->vvcdsp.inter.sad(tmp[L0], tmp[L1], dx, dy, block_w, block_h);
> +    min_sad = fc->vvcdsp.inter.sad[av_log2(block_w) - 2](tmp[L0], tmp[L1], dx, dy, block_w, block_h);
>      min_sad -= min_sad >> 2;
>      sad[dy][dx] = min_sad;
>  
> @@ -752,7 +752,7 @@ static void dmvr_mv_refine(VVCLocalContext *lc, MvField *mvf, MvField *orig_mv,
>          for (dy = 0; dy < SAD_ARRAY_SIZE; dy++) {
>              for (dx = 0; dx < SAD_ARRAY_SIZE; dx++) {
>                  if (dx != sr_range || dy != sr_range) {
> -                    sad[dy][dx] = fc->vvcdsp.inter.sad(lc->tmp, lc->tmp1, dx, dy, block_w, block_h);
> +                    sad[dy][dx] = fc->vvcdsp.inter.sad[av_log2(block_w) - 2](lc->tmp, lc->tmp1, dx, dy, block_w, block_h);
>                      if (sad[dy][dx] < min_sad) {
>                          min_sad = sad[dy][dx];
>                          min_dx = dx;
> diff --git a/libavcodec/vvc/inter_template.c b/libavcodec/vvc/inter_template.c
> index e2fbfd4fc0..545e8dd184 100644
> --- a/libavcodec/vvc/inter_template.c
> +++ b/libavcodec/vvc/inter_template.c
> @@ -458,7 +458,10 @@ static void FUNC(ff_vvc_inter_dsp_init)(VVCInterDSPContext *const inter)
>      inter->apply_prof_uni_w     = FUNC(apply_prof_uni_w);
>      inter->apply_bdof           = FUNC(apply_bdof);
>      inter->prof_grad_filter     = FUNC(prof_grad_filter);
> -    inter->sad                  = vvc_sad;
> +    
> +    for (int i = 0; i < FF_ARRAY_ELEMS(inter->sad); i++) {
> +        inter->sad[i]           = vvc_sad;
> +    }
>  }
>  
>  #undef FUNCS

Why is the jump depending upon block width not performed inside your
avx2 implementation?

- Andreas

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [FFmpeg-devel] [PATCH 1/3][GSoC 2024] libavcodec/vvc: convert (*sad) to (*sad[6]) to prepare for AVX2 funcs
  2024-05-01 22:59 ` [FFmpeg-devel] [PATCH 1/3][GSoC 2024] libavcodec/vvc: convert (*sad) to (*sad[6]) to prepare for AVX2 funcs Andreas Rheinhardt
@ 2024-05-06 17:02   ` Stone Chen
  0 siblings, 0 replies; 5+ messages in thread
From: Stone Chen @ 2024-05-06 17:02 UTC (permalink / raw)
  To: FFmpeg development discussions and patches; +Cc: Nuo Mi

On Wed, May 1, 2024 at 6:59 PM Andreas Rheinhardt <
andreas.rheinhardt@outlook.com> wrote:

> Stone Chen:
> > To prepare for adding AVX2 functions for different block widths, change
> VVCInterDSPContext to contain (*sad[6]) instead of (*sad). This also
> default initializes the pointer array with the scalar function and the
> calling sites to jump to the correct function based on block width. There's
> no change in functionality.
> > ---
> >  libavcodec/vvc/dsp.h            | 2 +-
> >  libavcodec/vvc/inter.c          | 4 ++--
> >  libavcodec/vvc/inter_template.c | 5 ++++-
> >  3 files changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/libavcodec/vvc/dsp.h b/libavcodec/vvc/dsp.h
> > index 9810ac314c..b06a3ef10e 100644
> > --- a/libavcodec/vvc/dsp.h
> > +++ b/libavcodec/vvc/dsp.h
> > @@ -86,7 +86,7 @@ typedef struct VVCInterDSPContext {
> >
> >      void (*apply_bdof)(uint8_t *dst, ptrdiff_t dst_stride, int16_t
> *src0, int16_t *src1, int block_w, int block_h);
> >
> > -    int (*sad)(const int16_t *src0, const int16_t *src1, int dx, int
> dy, int block_w, int block_h);
> > +    int (*sad[6])(const int16_t *src0, const int16_t *src1, int dx, int
> dy, int block_w, int block_h);
> >      void (*dmvr[2][2])(int16_t *dst, const uint8_t *src, ptrdiff_t
> src_stride, int height,
> >          intptr_t mx, intptr_t my, int width);
> >  } VVCInterDSPContext;
> > diff --git a/libavcodec/vvc/inter.c b/libavcodec/vvc/inter.c
> > index 4a8d1d866a..a68f4f9452 100644
> > --- a/libavcodec/vvc/inter.c
> > +++ b/libavcodec/vvc/inter.c
> > @@ -742,7 +742,7 @@ static void dmvr_mv_refine(VVCLocalContext *lc,
> MvField *mvf, MvField *orig_mv,
> >          fc->vvcdsp.inter.dmvr[!!my][!!mx](tmp[i], src, src_stride,
> pred_h, mx, my, pred_w);
> >      }
> >
> > -    min_sad = fc->vvcdsp.inter.sad(tmp[L0], tmp[L1], dx, dy, block_w,
> block_h);
> > +    min_sad = fc->vvcdsp.inter.sad[av_log2(block_w) - 2](tmp[L0],
> tmp[L1], dx, dy, block_w, block_h);
> >      min_sad -= min_sad >> 2;
> >      sad[dy][dx] = min_sad;
> >
> > @@ -752,7 +752,7 @@ static void dmvr_mv_refine(VVCLocalContext *lc,
> MvField *mvf, MvField *orig_mv,
> >          for (dy = 0; dy < SAD_ARRAY_SIZE; dy++) {
> >              for (dx = 0; dx < SAD_ARRAY_SIZE; dx++) {
> >                  if (dx != sr_range || dy != sr_range) {
> > -                    sad[dy][dx] = fc->vvcdsp.inter.sad(lc->tmp,
> lc->tmp1, dx, dy, block_w, block_h);
> > +                    sad[dy][dx] = fc->vvcdsp.inter.sad[av_log2(block_w)
> - 2](lc->tmp, lc->tmp1, dx, dy, block_w, block_h);
> >                      if (sad[dy][dx] < min_sad) {
> >                          min_sad = sad[dy][dx];
> >                          min_dx = dx;
> > diff --git a/libavcodec/vvc/inter_template.c
> b/libavcodec/vvc/inter_template.c
> > index e2fbfd4fc0..545e8dd184 100644
> > --- a/libavcodec/vvc/inter_template.c
> > +++ b/libavcodec/vvc/inter_template.c
> > @@ -458,7 +458,10 @@ static void
> FUNC(ff_vvc_inter_dsp_init)(VVCInterDSPContext *const inter)
> >      inter->apply_prof_uni_w     = FUNC(apply_prof_uni_w);
> >      inter->apply_bdof           = FUNC(apply_bdof);
> >      inter->prof_grad_filter     = FUNC(prof_grad_filter);
> > -    inter->sad                  = vvc_sad;
> > +
> > +    for (int i = 0; i < FF_ARRAY_ELEMS(inter->sad); i++) {
> > +        inter->sad[i]           = vvc_sad;
> > +    }
> >  }
> >
> >  #undef FUNCS
>
> Why is the jump depending upon block width not performed inside your
> avx2 implementation?
>
> - Andreas
>

Hi Andreas,

Sorry missed your email,

In hindsight, there's no particular reason, besides that it was the easiest
way (for me) to get jumps to different functions.
I guess I could just use compares to block width and jumps? Or
alternatively figure out how to write a jump table in asm.

Would those methods be better or did you have something different in mind?

Thanks for the feedback!
Stone


>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-05-06 17:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-01 22:39 [FFmpeg-devel] [PATCH 1/3][GSoC 2024] libavcodec/vvc: convert (*sad) to (*sad[6]) to prepare for AVX2 funcs Stone Chen
2024-05-01 22:39 ` [FFmpeg-devel] [PATCH 2/3][GSoC 2024] libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC Stone Chen
2024-05-01 22:40 ` [FFmpeg-devel] [PATCH 3/3][GSoC 2024] tests/checkasm: Add check_vvc_sad to vvc_mc.c Stone Chen
2024-05-01 22:59 ` [FFmpeg-devel] [PATCH 1/3][GSoC 2024] libavcodec/vvc: convert (*sad) to (*sad[6]) to prepare for AVX2 funcs Andreas Rheinhardt
2024-05-06 17:02   ` Stone Chen

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git