From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id A1D264AACB for ; Sun, 12 May 2024 18:53:57 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CFEFF68D677; Sun, 12 May 2024 21:53:55 +0300 (EEST) Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D095D68D3BD for ; Sun, 12 May 2024 21:53:49 +0300 (EEST) Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-1eb24e3a2d9so32852845ad.1 for ; Sun, 12 May 2024 11:53:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715540027; x=1716144827; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=NW0xmkYyyCKT072/qPul0xnDLWoVuNEKdOfSQ0OoIWU=; b=JwvdpM4XKNopsstN+IrmLPqSWxxw3f7YCI8AIH+7BKESLXA1jxEI+DL0Tq7nsPWg2C b0oIGAxfzK1bdNbX278PXlAbUrCg9p6dzxFKCKWXHX0CQp9reNye5AAsV8RURaR+DJvx 0XKnB/beksFl1WvL8Xe/NfEVzvfXVT5fJyDvReLMF8t5iFm6C4KWZMaWPzurpFunqts4 ytMyWyNNex8GyG/QP956Su+PaHMzd/urrl7m1oitjx2YvEcTKvy6rK5Zo1jJqN/7QmNO J+Kql0AXfApaYhrZO0IX0AD55QYyEPHI0XBOpZ+wZ64vzxBOC5Z9vkxQKHiNkWKbeYNn c6ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715540027; x=1716144827; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NW0xmkYyyCKT072/qPul0xnDLWoVuNEKdOfSQ0OoIWU=; b=T+wdnfNTV90ZO/WUHvSnLHJr0rMn6HAioFhSR0fIivFJhyS1nLRRc+nCSGDMbh6k09 wsoHtBu1w8T2nEK2wHeCO31BHOBUoT06+0jWPwb8DAkOzUL8k46gtQr20C1ZaysOUUlK 2Voqpm1PyuLR5ZsusgPMtjk+HMamKPX/sDKYKubOojE41zkpWa/hyoDd46fRP/ASIPb+ uRtL6j5EXMQZalWIVaucETPl/iukLDoRqWA8csxIIuCwnS51OdR0LkZ3fNpu6dZ5jQ+E QKOCdCnPAeNzchwCn+iTNq8uE6drM042C79ANX0TIjYEQLNOmZKV57y5xUbWKdCnfTV8 jPqw== X-Gm-Message-State: AOJu0Yz4OMUzJ7EugdtIldNjGOLgGZfWo2OlEJm8idy8a7lTZEtWY9rw NEU2tIuNH12WK4Xcf9CRBuVt1bl9O/CbwRnAyv0AQPtYdLt42uarQZMlfg== X-Google-Smtp-Source: AGHT+IEYMTD7Q4jwa5NSWIsDt1dxJHlKxV4Os1jdzCvGiU6TTC/dNnGoGbCpJDz94Jl6fgQSS5N+sg== X-Received: by 2002:a17:902:db0a:b0:1e3:e243:1995 with SMTP id d9443c01a7336-1ef43c095cdmr138250635ad.1.1715540027205; Sun, 12 May 2024 11:53:47 -0700 (PDT) Received: from localhost.localdomain ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1ef0b9d40besm66725615ad.42.2024.05.12.11.53.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 12 May 2024 11:53:46 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Sun, 12 May 2024 15:53:36 -0300 Message-ID: <20240512185336.60155-1-jamrial@gmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240512160657.2733-6-jamrial@gmail.com> References: <20240512160657.2733-6-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 8/8 v2] x86/flacdsp: add an SSE4 version of wasted33 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: flac_wasted_33_c: 214.1 flac_wasted_33_sse4: 103.2 Signed-off-by: James Almer --- Removed the AVX2 one as the lane crossing in pmovsxdq removed pretty much all speed up for processing twice the amount of data. libavcodec/x86/flacdsp.asm | 25 +++++++++++++++++++++++++ libavcodec/x86/flacdsp_init.c | 2 ++ 2 files changed, 27 insertions(+) diff --git a/libavcodec/x86/flacdsp.asm b/libavcodec/x86/flacdsp.asm index 21b2439bc0..15fcec4f08 100644 --- a/libavcodec/x86/flacdsp.asm +++ b/libavcodec/x86/flacdsp.asm @@ -113,6 +113,31 @@ ALIGN 16 jl .loop RET +INIT_XMM sse4 +cglobal flac_wasted_33, 4,4,5, decoded, residuals, wasted, len + shl lend, 2 + lea decodedq, [decodedq+lenq*2] + add residualsq, lenq + neg lenq + movd m4, wastedd +ALIGN 16 +.loop: + pmovsxdq m0, [residualsq+lenq+mmsize*0] + pmovsxdq m1, [residualsq+lenq+mmsize/2] + pmovsxdq m2, [residualsq+lenq+mmsize*1] + pmovsxdq m3, [residualsq+lenq+mmsize*1+mmsize/2] + psllq m0, m4 + psllq m1, m4 + psllq m2, m4 + psllq m3, m4 + mova [decodedq+lenq*2+mmsize*0], m0 + mova [decodedq+lenq*2+mmsize*1], m1 + mova [decodedq+lenq*2+mmsize*2], m2 + mova [decodedq+lenq*2+mmsize*3], m3 + add lenq, mmsize * 2 + jl .loop + RET + ;---------------------------------------------------------------------------------- ;void ff_flac_decorrelate_[lrm]s_16_sse2(uint8_t **out, int32_t **in, int channels, ; int len, int shift); diff --git a/libavcodec/x86/flacdsp_init.c b/libavcodec/x86/flacdsp_init.c index 67aa118760..fa993d3466 100644 --- a/libavcodec/x86/flacdsp_init.c +++ b/libavcodec/x86/flacdsp_init.c @@ -31,6 +31,7 @@ void ff_flac_lpc_32_xop(int32_t *samples, const int coeffs[32], int order, int qlevel, int len); void ff_flac_wasted_32_sse2(int32_t *decoded, int wasted, int len); +void ff_flac_wasted_33_sse4(int64_t *decoded, const int32_t *residual, int wasted, int len); #define DECORRELATE_FUNCS(fmt, opt) \ void ff_flac_decorrelate_ls_##fmt##_##opt(uint8_t **out, int32_t **in, int channels, \ @@ -100,6 +101,7 @@ av_cold void ff_flacdsp_init_x86(FLACDSPContext *c, enum AVSampleFormat fmt, int if (EXTERNAL_SSE4(cpu_flags)) { c->lpc16 = ff_flac_lpc_16_sse4; c->lpc32 = ff_flac_lpc_32_sse4; + c->wasted33 = ff_flac_wasted_33_sse4; } if (EXTERNAL_AVX(cpu_flags)) { if (fmt == AV_SAMPLE_FMT_S16) { -- 2.45.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".