From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTP id 3A2624944F
	for <ffmpegdev@gitmailbox.com>; Sat, 11 May 2024 14:35:40 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8444068D477;
	Sat, 11 May 2024 17:35:38 +0300 (EEST)
Received: from mail-qk1-f178.google.com (mail-qk1-f178.google.com
 [209.85.222.178])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5679C68D3DD
 for <ffmpeg-devel@ffmpeg.org>; Sat, 11 May 2024 17:35:32 +0300 (EEST)
Received: by mail-qk1-f178.google.com with SMTP id
 af79cd13be357-792bdf626beso261133285a.1
 for <ffmpeg-devel@ffmpeg.org>; Sat, 11 May 2024 07:35:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1715438130; x=1716042930; darn=ffmpeg.org;
 h=content-transfer-encoding:mime-version:message-id:date:subject:cc
 :to:from:from:to:cc:subject:date:message-id:reply-to;
 bh=32JHtdcqXpuBtiVqycAC/3hlurTtBXBhVKmsA9D/uXc=;
 b=KTIsczCymcRhjKPntHwj7lFFE/PnaMmH201f5RwyW66RsRrZof0r7WjW3A92ZyKJF1
 islMWUz70sqZiERieVP7wFsDZIRh6gn3bcgk2LmrIe1OHUI6cn2QNtzI3I1JRpEVTknl
 aYECCqtjRe7yCZJe0cl52HZ7XPAqa13U1XqASzMeltJrvquLiUFZZszG9m+NatboqZkP
 yZ2hYlze9jrc5JASA31ry+6FosyCpWkTaZ3tEcJN9Eo2S1xOD0gJs87Xwbb1QVgaxz5l
 QOdh11Ci2ToqcG0tx33/uJCQUEI+mmz+x+CAU8Jf6w32krjrWT1zX5mpMmNUThtFmkBL
 2O7g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1715438130; x=1716042930;
 h=content-transfer-encoding:mime-version:message-id:date:subject:cc
 :to:from:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=32JHtdcqXpuBtiVqycAC/3hlurTtBXBhVKmsA9D/uXc=;
 b=VwGvJ5EyvK0Tt938mgYksFSWkCvJpwSL6oiHo9RGViZeuMtkZ0EdClsztWqU95EQbZ
 aC025gZFriFYOJA7o9wtfrc1+hOpMvgIS8+AFxkRCdowcgJKE6k2PitGOZWdbyKAUN2n
 bVeEw/wmKrGTo6mu3O2R36hJhVq0nnKIci3rCCS4Oee7PB/aQ043M95JIV39LoVSjVcO
 UJCXnGs3KXEOR4wyLjyKt56If2NxryIvD91TUMozoA06FwCSeCFACfTfqZ0rhDuFu4gZ
 zK5OxoBqZTM2DZHqeo8iVuasQ8h42DF+/DVMqIBZc+S9OHlEqkZpNRmZy/X41gfsXWmP
 FnIw==
X-Gm-Message-State: AOJu0YxTIHDZJ8IZC5XTS3yCZqfMatw2DJkUv25HmyCI8CJwqxzJJKbj
 iWPc9THDYa0XrnfhqynEEOmriDjSBZ+E1lFRhLw8bw6KjcXVE7/Nv6RsfZD5
X-Google-Smtp-Source: AGHT+IEVqtDmBuzQWZdOJaj1l/PbOkAZ1PZ1W97aRNeZR+av3xACd96xqEm5WEW++KW2A6KDWw3I/w==
X-Received: by 2002:a05:620a:12c8:b0:792:9d41:3636 with SMTP id
 af79cd13be357-792c6ccb14emr898050785a.26.1715438129806; 
 Sat, 11 May 2024 07:35:29 -0700 (PDT)
Received: from fedora.tailc94c2.ts.net ([96.67.5.125])
 by smtp.gmail.com with ESMTPSA id
 af79cd13be357-792bf33b16dsm280394285a.127.2024.05.11.07.35.29
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Sat, 11 May 2024 07:35:29 -0700 (PDT)
From: Stone Chen <chen.stonechen@gmail.com>
To: ffmpeg-devel@ffmpeg.org
Date: Sat, 11 May 2024 10:34:19 -0400
Message-ID: <20240511143445.18940-2-chen.stonechen@gmail.com>
X-Mailer: git-send-email 2.44.0
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH v2 1/2][GSoC] libavcodec/x86/vvc: Add AVX2
 DMVR SAD functions for VVC
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Stone Chen <chen.stonechen@gmail.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/20240511143445.18940-2-chen.stonechen@gmail.com/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

Implements AVX2 DMVR (decoder-side motion vector refinement) SAD functions. DMVR SAD is only calculated if w >= 8, h >= 8, and w * h > 128. To reduce complexity, SAD is only calculated on even rows. This is calculated for all video bitdepths, but the values passed to the function are always 16bit (even if the original video bitdepth is 8). The AVX2 implementation uses min/max/sub.

Benchmarks ( AMD 7940HS )
Before:
BQTerrace_1920x1080_60_10_420_22_RA.vvc | 80.7 |
Chimera_8bit_1080P_1000_frames.vvc | 158.0 |
NovosobornayaSquare_1920x1080.bin | 159.7 |
RitualDance_1920x1080_60_10_420_37_RA.266 | 146.3 |

After:
BQTerrace_1920x1080_60_10_420_22_RA.vvc | 82.7 |
Chimera_8bit_1080P_1000_frames.vvc | 167.0 |
NovosobornayaSquare_1920x1080.bin | 166.3 |
RitualDance_1920x1080_60_10_420_37_RA.266 | 154.0 |
---
 libavcodec/x86/vvc/Makefile      |   3 +-
 libavcodec/x86/vvc/vvc_sad.asm   | 155 +++++++++++++++++++++++++++++++
 libavcodec/x86/vvc/vvcdsp_init.c |   6 ++
 3 files changed, 163 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/x86/vvc/vvc_sad.asm

diff --git a/libavcodec/x86/vvc/Makefile b/libavcodec/x86/vvc/Makefile
index d1623bd46a..9eb5f65c7c 100644
--- a/libavcodec/x86/vvc/Makefile
+++ b/libavcodec/x86/vvc/Makefile
@@ -4,4 +4,5 @@ clean::
 OBJS-$(CONFIG_VVC_DECODER)             += x86/vvc/vvcdsp_init.o \
                                           x86/h26x/h2656dsp.o
 X86ASM-OBJS-$(CONFIG_VVC_DECODER)      += x86/vvc/vvc_mc.o       \
-                                          x86/h26x/h2656_inter.o
+                                          x86/h26x/h2656_inter.o \
+                                          x86/vvc/vvc_sad.o
diff --git a/libavcodec/x86/vvc/vvc_sad.asm b/libavcodec/x86/vvc/vvc_sad.asm
new file mode 100644
index 0000000000..f1184c731c
--- /dev/null
+++ b/libavcodec/x86/vvc/vvc_sad.asm
@@ -0,0 +1,155 @@
+; /*
+; * Provide SIMD DMVR SAD functions for VVC decoding
+; *
+; * Copyright (c) 2024 Stone Chen
+; *
+; * This file is part of FFmpeg.
+; *
+; * FFmpeg is free software; you can redistribute it and/or
+; * modify it under the terms of the GNU Lesser General Public
+; * License as published by the Free Software Foundation; either
+; * version 2.1 of the License, or (at your option) any later version.
+; *
+; * FFmpeg is distributed in the hope that it will be useful,
+; * but WITHOUT ANY WARRANTY; without even the implied warranty of
+; * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+; * Lesser General Public License for more details.
+; *
+; * You should have received a copy of the GNU Lesser General Public
+; * License along with FFmpeg; if not, write to the Free Software
+; * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+; */
+
+%include "libavutil/x86/x86util.asm"
+
+%define MAX_PB_SIZE 128
+%define ROWS 2          ; DMVR SAD is only calculated on even rows to reduce complexity
+
+SECTION .text
+
+%macro MIN_MAX_SAD 3 ; 
+    vpminuw           %1, %2, %3
+    vpmaxuw           %3, %2, %3
+    vpsubusw          %3, %3, %1
+%endmacro
+
+%macro HORIZ_ADD 3  ; xm0, xm1, m1
+    vextracti128      %1, %3, q0001  ;        3        2      1          0
+    vpaddd            %1, %2         ; xm0 (7 + 3) (6 + 2) (5 + 1)   (4 + 0)
+    vpshufd           %2, %1, q0032  ; xm1    -      -     (7 + 3)   (6 + 2)
+    vpaddd            %1, %1, %2     ; xm0    _      _     (5 1 7 3) (4 0 6 2)
+    vpshufd           %2, %1, q0001  ; xm1    _      _     (5 1 7 3) (5 1 7 3)
+    vpaddd            %1, %1, %2     ;                               (01234567)
+%endmacro
+
+%macro INIT_OFFSET 6 ; src1, src2, dxq, dyq, off1, off2
+    sub             %3, 2
+    sub             %4, 2
+
+    mov             %5, 2
+    mov             %6, 2
+
+    add             %5, %4   
+    sub             %6, %4
+
+    imul            %5, 128
+    imul            %6, 128
+
+    add             %5, 2
+    add             %6, 2
+    
+    add             %5, %3
+    sub             %6, %3
+
+    lea             %1, [%1 + %5 * 2]
+    lea             %2, [%2 + %6 * 2]
+%endmacro
+
+%if ARCH_X86_64
+%if HAVE_AVX2_EXTERNAL
+
+INIT_YMM avx2
+
+cglobal vvc_sad, 6, 9, 13, src1, src2, dx, dy, block_w, block_h, off1, off2, row_idx
+    INIT_OFFSET src1q, src2q, dxq, dyq, off1q, off2q
+    pxor               m3, m3
+    pxor               m8, m8
+
+    cmp          block_wd, 16
+    jge    vvc_sad_16_128
+
+    vvc_sad_8:
+        .loop_height:
+        movu              xm0, [src1q]
+        movu              xm1, [src2q]
+        MIN_MAX_SAD       xm2, xm0, xm1
+        vpmovzxwd          m1, xm1
+        vpaddd             m3, m1
+
+        movu              xm5, [src1q + MAX_PB_SIZE * ROWS * 2]
+        movu              xm6, [src2q + MAX_PB_SIZE * ROWS * 2]
+        MIN_MAX_SAD       xm7, xm5, xm6
+        vpmovzxwd          m6, xm6
+        vpaddd             m3, m6
+
+        movu              xm8, [src1q + MAX_PB_SIZE * 2 * ROWS * 2]
+        movu              xm9, [src2q + MAX_PB_SIZE * 2 * ROWS * 2]
+        MIN_MAX_SAD xm10, xm8, xm9
+        vpmovzxwd          m9, xm9
+        vpaddd             m3, m9
+
+        movu             xm11, [src1q + MAX_PB_SIZE * 3 * ROWS * 2]
+        movu             xm12, [src2q + MAX_PB_SIZE * 3 * ROWS * 2]
+        MIN_MAX_SAD      xm13, xm11, xm12
+        vpmovzxwd         m12, xm12
+
+        vpaddd             m3, m12
+
+        add             src1q, MAX_PB_SIZE * 4 * ROWS * 2 
+        add             src2q, MAX_PB_SIZE * 4 * ROWS * 2
+
+        sub          block_hd, 8
+        jg       .loop_height
+
+        HORIZ_ADD         xm0, xm3, m3
+        movd              eax, xm0
+    RET
+
+    vvc_sad_16_128:
+        .loop_height:
+        mov               off1q, src1q
+        mov               off2q, src2q
+        mov            row_idxd, block_wd
+        sar            row_idxd, 4
+
+        .loop_width:
+            movu              xm0, [src1q]
+            movu              xm1, [src2q]
+            MIN_MAX_SAD       xm2, xm0, xm1
+            vpmovzxwd          m1, xm1
+            vpaddd             m3, m1
+
+            movu              xm5, [src1q + 16]
+            movu              xm6, [src2q + 16]
+            MIN_MAX_SAD       xm7, xm5, xm6
+            vpmovzxwd          m6, xm6
+            vpaddd             m3, m6
+
+            add             src1q, 32
+            add             src2q, 32
+            dec          row_idxd
+            jg        .loop_width
+
+        lea             src1q, [off1q + ROWS * MAX_PB_SIZE * 2] 
+        lea             src2q, [off2q + ROWS * MAX_PB_SIZE * 2]
+
+        sub          block_hq, 2
+        jg       .loop_height
+
+        HORIZ_ADD         xm0, xm3, m3
+        movd              eax, xm0
+
+    RET
+
+%endif
+%endif
diff --git a/libavcodec/x86/vvc/vvcdsp_init.c b/libavcodec/x86/vvc/vvcdsp_init.c
index 985d750472..5d2cfe419f 100644
--- a/libavcodec/x86/vvc/vvcdsp_init.c
+++ b/libavcodec/x86/vvc/vvcdsp_init.c
@@ -252,6 +252,9 @@ AVG_FUNCS(16, 12, avx2)
     c->inter.avg    = bf(ff_vvc_avg, bd, opt);                       \
     c->inter.w_avg  = bf(ff_vvc_w_avg, bd, opt);                     \
 } while (0)
+
+int ff_vvc_sad_avx2(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h);
+#define SAD_INIT() c->inter.sad = ff_vvc_sad_avx2
 #endif
 
 void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd)
@@ -265,6 +268,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd)
         }
         if (EXTERNAL_AVX2_FAST(cpu_flags)) {
             MC_LINKS_AVX2(8);
+            SAD_INIT();
         }
     } else if (bd == 10) {
         if (EXTERNAL_SSE4(cpu_flags)) {
@@ -273,6 +277,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd)
         if (EXTERNAL_AVX2_FAST(cpu_flags)) {
             MC_LINKS_AVX2(10);
             MC_LINKS_16BPC_AVX2(10);
+            SAD_INIT();
         }
     } else if (bd == 12) {
         if (EXTERNAL_SSE4(cpu_flags)) {
@@ -281,6 +286,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd)
         if (EXTERNAL_AVX2_FAST(cpu_flags)) {
             MC_LINKS_AVX2(12);
             MC_LINKS_16BPC_AVX2(12);
+            SAD_INIT();
         }
     }
 
-- 
2.44.0

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".