From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id DF7FE489CE for ; Fri, 22 Dec 2023 12:12:54 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4A71268D2EA; Fri, 22 Dec 2023 14:12:47 +0200 (EET) Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8408C68D2D6 for ; Fri, 22 Dec 2023 14:12:39 +0200 (EET) Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-1d4006b2566so12434055ad.0 for ; Fri, 22 Dec 2023 04:12:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703247157; x=1703851957; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=GNLcPkPGhil+MhhuZf4A5SiH8KrL98eN9i1Q4LdgusU=; b=O16jSFVajMLP6z2dCTSXy9eOtyzPHHySz5aCJKLgAVuAIsODRWDvFdXo9I49CcfzhK yDlBFg/x3LZteJXKGkOUuY+zaVDUvGsLSOa9SCUAISxp0WFvfQ4RYXyxkKDXsOVELyqU 46OpXK/WlMnXEGT6RfruL+pFpLQAH+7V9+dxohwobuaBkRufyLW3thcXQIL0cnUrXj5B HATQ6m/FQpTftYyGlbHPUrDpo9ekPanmgeLP0h36dYKgwmbTxX7Gr5I6vyJcEpWQ9dsX +naba6eXxlhncahK1Y5BJV3BQI3C87SgtZK2EC8BIYmfV8wLVGGRwe4CXBBzyPKjJWaX qr6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703247157; x=1703851957; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GNLcPkPGhil+MhhuZf4A5SiH8KrL98eN9i1Q4LdgusU=; b=EqoSpGasRyejOUtf6vaX4gkYurYw2HvB7vJJsXu2GM/hmfEZvmvtRBOa/XSu384NUJ FMZHzJrTh83w19hTlxhkcCrWmrDpYZRX584tmHv6VrCZaHwT2tXjSE2A8LyugrfUQz+x Tw8F3HVCd0UMMPk4CPbIv9LoOkKNocwys9p7neDUrBmBZXKBGrhYa+nOmwD4+7PRm52K PEVHPOuYnTMXqfbFVyME3WsNuS3/dTHdpE8rlYCnA6xW72pi57xG5qd24s2evllC9ZVZ yTp7GkUKla12ozixmx3IN09u74fQqbRZ8hvZkPG/UkSTy4WI7FsTv6vzougrr9hIdQvq HwSQ== X-Gm-Message-State: AOJu0YxFWzMd8PUHZU4aRnf3ULvMLCrJwBW+xw9cmIAZuiOsiBRlMdI+ HxBBnuHQ0MMdYyw3xJw/DZFhmznGujY= X-Google-Smtp-Source: AGHT+IECIMntxjPAwQeoZ/AwnClDgn5RZ+dR9ZwZ3nnzV6PdYrDaw93FylAAN0bqUxwLQ211+bMk5A== X-Received: by 2002:a17:902:ef87:b0:1d3:4783:cfc with SMTP id iz7-20020a170902ef8700b001d347830cfcmr710727plb.93.1703247157303; Fri, 22 Dec 2023 04:12:37 -0800 (PST) Received: from localhost.localdomain (host197.190-225-105.telecom.net.ar. [190.225.105.197]) by smtp.gmail.com with ESMTPSA id j14-20020a170902da8e00b001d09c5424d4sm3280608plx.297.2023.12.22.04.12.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Dec 2023 04:12:36 -0800 (PST) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Fri, 22 Dec 2023 09:12:31 -0300 Message-ID: <20231222121232.324-2-jamrial@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20231222121232.324-1-jamrial@gmail.com> References: <20231222121232.324-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/3 v2] x86/takdsp: add avx2 versions of all functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On an Intel Core i7 12700k: decorrelate_ls_c: 814.3 decorrelate_ls_sse2: 165.8 decorrelate_ls_avx2: 101.3 decorrelate_sf_c: 1602.6 decorrelate_sf_sse4: 640.1 decorrelate_sf_avx2: 324.6 decorrelate_sm_c: 1564.8 decorrelate_sm_sse2: 379.3 decorrelate_sm_avx2: 203.3 decorrelate_sr_c: 785.3 decorrelate_sr_sse2: 176.3 decorrelate_sr_avx2: 99.8 Signed-off-by: James Almer --- No changes since last version libavcodec/x86/takdsp.asm | 36 ++++++++++++++++++++++-------------- libavcodec/x86/takdsp_init.c | 11 +++++++++++ 2 files changed, 33 insertions(+), 14 deletions(-) diff --git a/libavcodec/x86/takdsp.asm b/libavcodec/x86/takdsp.asm index be8e1ab553..a5501cc285 100644 --- a/libavcodec/x86/takdsp.asm +++ b/libavcodec/x86/takdsp.asm @@ -28,7 +28,7 @@ pd_128: times 4 dd 128 SECTION .text -INIT_XMM sse2 +%macro TAK_DECORRELATE 0 cglobal tak_decorrelate_ls, 3, 3, 2, p1, p2, length shl lengthd, 2 add p1q, lengthq @@ -73,10 +73,8 @@ cglobal tak_decorrelate_sm, 3, 3, 6, p1, p2, length mova m1, [p2q+lengthq] mova m3, [p1q+lengthq+mmsize] mova m4, [p2q+lengthq+mmsize] - mova m2, m1 - mova m5, m4 - psrad m2, 1 - psrad m5, 1 + psrad m2, m1, 1 + psrad m5, m4, 1 psubd m0, m2 psubd m3, m5 paddd m1, m0 @@ -88,29 +86,39 @@ cglobal tak_decorrelate_sm, 3, 3, 6, p1, p2, length add lengthq, mmsize*2 jl .loop RET +%endmacro -INIT_XMM sse4 +INIT_XMM sse2 +TAK_DECORRELATE +INIT_YMM avx2 +TAK_DECORRELATE + +%macro TAK_DECORRELATE_SF 0 cglobal tak_decorrelate_sf, 3, 3, 5, p1, p2, length, dshift, dfactor shl lengthd, 2 add p1q, lengthq add p2q, lengthq neg lengthq - movd m2, dshiftm - movd m3, dfactorm - pshufd m3, m3, 0 - mova m4, [pd_128] + movd xm2, dshiftm + VPBROADCASTD m3, dfactorm + VBROADCASTI128 m4, [pd_128] .loop: - mova m0, [p1q+lengthq] mova m1, [p2q+lengthq] - psrad m1, m2 + psrad m1, xm2 pmulld m1, m3 paddd m1, m4 psrad m1, 8 - pslld m1, m2 - psubd m1, m0 + pslld m1, xm2 + psubd m1, [p1q+lengthq] mova [p1q+lengthq], m1 add lengthq, mmsize jl .loop RET +%endmacro + +INIT_XMM sse4 +TAK_DECORRELATE_SF +INIT_YMM avx2 +TAK_DECORRELATE_SF diff --git a/libavcodec/x86/takdsp_init.c b/libavcodec/x86/takdsp_init.c index b2e6e639ee..c99a057b24 100644 --- a/libavcodec/x86/takdsp_init.c +++ b/libavcodec/x86/takdsp_init.c @@ -24,9 +24,13 @@ #include "config.h" void ff_tak_decorrelate_ls_sse2(int32_t *p1, int32_t *p2, int length); +void ff_tak_decorrelate_ls_avx2(int32_t *p1, int32_t *p2, int length); void ff_tak_decorrelate_sr_sse2(int32_t *p1, int32_t *p2, int length); +void ff_tak_decorrelate_sr_avx2(int32_t *p1, int32_t *p2, int length); void ff_tak_decorrelate_sm_sse2(int32_t *p1, int32_t *p2, int length); +void ff_tak_decorrelate_sm_avx2(int32_t *p1, int32_t *p2, int length); void ff_tak_decorrelate_sf_sse4(int32_t *p1, int32_t *p2, int length, int dshift, int dfactor); +void ff_tak_decorrelate_sf_avx2(int32_t *p1, int32_t *p2, int length, int dshift, int dfactor); av_cold void ff_takdsp_init_x86(TAKDSPContext *c) { @@ -42,5 +46,12 @@ av_cold void ff_takdsp_init_x86(TAKDSPContext *c) if (EXTERNAL_SSE4(cpu_flags)) { c->decorrelate_sf = ff_tak_decorrelate_sf_sse4; } + + if (EXTERNAL_AVX2_FAST(cpu_flags)) { + c->decorrelate_ls = ff_tak_decorrelate_ls_avx2; + c->decorrelate_sr = ff_tak_decorrelate_sr_avx2; + c->decorrelate_sm = ff_tak_decorrelate_sm_avx2; + c->decorrelate_sf = ff_tak_decorrelate_sf_avx2; + } #endif } -- 2.43.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".