From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 29F79498F2 for ; Wed, 22 May 2024 22:54:28 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 383B468D3C0; Thu, 23 May 2024 01:54:25 +0300 (EEST) Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3AB2C68D2B7 for ; Thu, 23 May 2024 01:54:19 +0300 (EEST) Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1f335e8d493so2179815ad.1 for ; Wed, 22 May 2024 15:54:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716418456; x=1717023256; darn=ffmpeg.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id:from :to:cc:subject:date:message-id:reply-to; bh=qmlEomy9jnar3Lx8Lz4EEOZOfSCrdrvUMIWlb/9nJy8=; b=VjiNOlf25ly2lSad4cIIrOAPvQYeN4dzldgK85okFug8Mbv5moeEV/DzAI08w01WcO C/nbgP5KNNpD+8cjGugzJAGRR8ccJdbulXT9A/NwhU2ZFmT40p5jRLdxLucDm+8a4hRQ jMIllBwXejdWu62TYeDKaa1nR4KDPRQxjQJxNo1b0woKrtVHMfSAskk+DCe0XD6OIY83 YzpV/2ZW3prhyQ/nk+5PLFoFc1IqTELURYr9HDqUO/RVtPtmjaAG/0+Sd/YyHe9DT69g fsF5D78v4JdEvy80186/B5qFM/PSW5zSBldGKf0fpKoyhkjTUCG6aRzV0uH2FbOjgR+4 EUeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716418456; x=1717023256; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qmlEomy9jnar3Lx8Lz4EEOZOfSCrdrvUMIWlb/9nJy8=; b=rTJEv3zGN8JRrEA3M0s8ZMOY4OIwryzdy3czCOkUbYODMyBgzVdYYMYa+rXbynmHIf utmpEwf7H2hsSz2BmJlABsET143Vo1MkhKJfgQvbMu0Rz11+tY866pfVyINAMpBoJeg2 5R2oFlEfrVfi7jHkfbgaPYtU/MFDzzTbRMtpHAVPMlfDxgD3zcMaA6jFSg72Oj5G0BQS ZSRi55qmIg/giPTLvSpaP1FT3tE5x9NusEXQPTUt9gZ/rigTsI8UycbLtmjAqyTkCyh7 scpErlG65CCJE74r6qGs9c/dsApQkRlBYM95hPF6P7jbbzb6j89/7qAb11wGwtFePS6f 4rFQ== X-Gm-Message-State: AOJu0YypevWic7DrL9kEwuKsyR44z1Cjktx+JjDxnBgUdTdVZqNhjmii /K++zhHwbwMIslr2F8b7p+YWziPANZDFJzvCT13PujG6UMz2Cbc9loClhg== X-Google-Smtp-Source: AGHT+IELGqKQ6g1on+STZo9Vm8/wjILqOb92zavMKleMvtTckDKCAAeSYJs2C/oMXmWPZCd27pB6ow== X-Received: by 2002:a17:902:7b86:b0:1f3:135c:f71e with SMTP id d9443c01a7336-1f31c97bf8fmr29516535ad.20.1716418455988; Wed, 22 May 2024 15:54:15 -0700 (PDT) Received: from [192.168.0.10] ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f3378230casm3272395ad.276.2024.05.22.15.54.14 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 22 May 2024 15:54:15 -0700 (PDT) Message-ID: <3548b9af-e48f-4bff-972b-3c309f6fdd12@gmail.com> Date: Wed, 22 May 2024 19:54:16 -0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: ffmpeg-devel@ffmpeg.org References: <20240522000039.34913-2-chen.stonechen@gmail.com> Content-Language: en-US From: James Almer In-Reply-To: Subject: Re: [FFmpeg-devel] [PATCH v5 1/2][GSoC 2024] libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On 5/22/2024 2:02 AM, Andreas Rheinhardt wrote: > Stone Chen: >> Implements AVX2 DMVR (decoder-side motion vector refinement) SAD functions. DMVR SAD is only calculated if w >= 8, h >= 8, and w * h > 128. To reduce complexity, SAD is only calculated on even rows. This is calculated for all video bitdepths, but the values passed to the function are always 16bit (even if the original video bitdepth is 8). The AVX2 implementation uses min/max/sub. >> >> Additionally this changes parameters dx and dy from int to intptr_t. This allows dx & dy to be used as pointer offsets without needing to use movsxd. >> >> Benchmarks ( AMD 7940HS ) >> Before: >> BQTerrace_1920x1080_60_10_420_22_RA.vvc | 106.0 | >> Chimera_8bit_1080P_1000_frames.vvc | 204.3 | >> NovosobornayaSquare_1920x1080.bin | 197.3 | >> RitualDance_1920x1080_60_10_420_37_RA.266 | 174.0 | >> >> After: >> BQTerrace_1920x1080_60_10_420_22_RA.vvc | 109.3 | >> Chimera_8bit_1080P_1000_frames.vvc | 216.0 | >> NovosobornayaSquare_1920x1080.bin | 204.0| >> RitualDance_1920x1080_60_10_420_37_RA.266 | 181.7 | >> --- >> libavcodec/vvc/dsp.c | 2 +- >> libavcodec/vvc/dsp.h | 2 +- >> libavcodec/x86/vvc/Makefile | 3 +- >> libavcodec/x86/vvc/vvc_sad.asm | 130 +++++++++++++++++++++++++++++++ >> libavcodec/x86/vvc/vvcdsp_init.c | 6 ++ >> 5 files changed, 140 insertions(+), 3 deletions(-) >> create mode 100644 libavcodec/x86/vvc/vvc_sad.asm >> >> diff --git a/libavcodec/x86/vvc/vvcdsp_init.c b/libavcodec/x86/vvc/vvcdsp_init.c >> index 0e68971b2c..aa6c916760 100644 >> --- a/libavcodec/x86/vvc/vvcdsp_init.c >> +++ b/libavcodec/x86/vvc/vvcdsp_init.c >> @@ -311,6 +311,9 @@ ALF_FUNCS(16, 12, avx2) >> c->alf.filter[CHROMA] = ff_vvc_alf_filter_chroma_##bd##_avx2; \ >> c->alf.classify = ff_vvc_alf_classify_##bd##_avx2; \ >> } while (0) >> + >> +int ff_vvc_sad_avx2(const int16_t *src0, const int16_t *src1, intptr_t dx, intptr_t dy, int block_w, int block_h); >> +#define SAD_INIT() c->inter.sad = ff_vvc_sad_avx2 > > You are adding an AVX2 function to an ARCH_X86_64 #if block. I expect > this to lead to linking failures if AVX2 is disabled. It's a prototype, so no linking failures. And SAD_INIT() is called on a block that both needs ARCH_X86_64 and EXTERNAL_AVX2_FAST to be true. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".