From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 46C744B339 for ; Tue, 4 Jun 2024 01:24:01 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6D0B168D687; Tue, 4 Jun 2024 04:23:59 +0300 (EEST) Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1E5DD68D64C for ; Tue, 4 Jun 2024 04:23:53 +0300 (EEST) Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-1f6262c0a22so27752095ad.1 for ; Mon, 03 Jun 2024 18:23:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717464230; x=1718069030; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=wzcdIBdf2eW2ftb7NE4sn+jPhwtW9CiD5SX01hhibtc=; b=hd7SLnerXg+vXjMFmRTpLPJxXzsX0cjbv7Za1HobCbGHkjaFG0q9EtX6FRz4xdmZzi 07zoJqwLJVwVOpwl+DGodkmNEE7enNNga7xDODcD5r3BlNgxiWD/YeMJMmB3WxXkv1JI lSeq+slvzsWv9SxHDYOGZIY/+K07Jj2K+2iWjyBDIx+WeOgDt2Ruq9OyxxV1Wwk4rxzl jwfiyRpQu99rubTHpiM7gCpjdR22x5GahkNxas5iCG1StyWMxEBccdhAQEaBmXYHz1hG ycmVgmS0Bjgi895xaThbtlixQiPFJFxbVWNQPKKpotp25sLQjF6U+soQaxZhDQTRCWRq EOOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717464230; x=1718069030; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wzcdIBdf2eW2ftb7NE4sn+jPhwtW9CiD5SX01hhibtc=; b=l6HgNqkCkVgZ7J7VslggbkZtznGrbF83+bge+7rJ8jJRXeBb3xiGJI/+Zf1OPXkr7X rvgBzc4iQrskT1xRhJUGnxt6HNUjvdyzDKdoNuy6i3ekoomd3X4d9zWCzDUJxb39huIi iSCtnOwkNDplRhDJltEafDoCrXAiRZ/Whmof+Hy3x60VRxxdb5dLLTntPYjUf7AXeXOg gpLlJ3v+jgUAm+za7A1LJa/eOFAPhMwWMCK1tvjYG3B2XsPJum5mmbTTxXyxH7kQWJkj 6jWGdD8kMTHNSFaZuVNyRhtGYQlqe17WKpo2mnx2/HEvcdPjx9arc9A+VBvGcljbpATI 46kQ== X-Gm-Message-State: AOJu0YyhvvTTXQsiK7jiXGmnjzU8qHGz7f4bp+Lrc0jWAa6PCSjBWhHV SaqxPDDb9d7qZpuE0odgNCo8oFUvhrYW0U4vI1r58v94ZisZxZD6JIlRCg== X-Google-Smtp-Source: AGHT+IFKtELnsb/sLgS37xe721QPNmX6hUFbVRJE9Ifa9GmcBFzyX4bma7zE3OEwJkONBiI2iaQ+pw== X-Received: by 2002:a17:902:d2c4:b0:1f6:23ca:ec6 with SMTP id d9443c01a7336-1f69390b676mr19398135ad.22.1717464230216; Mon, 03 Jun 2024 18:23:50 -0700 (PDT) Received: from localhost.localdomain ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f681e9dabdsm20819475ad.261.2024.06.03.18.23.48 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Jun 2024 18:23:49 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Mon, 3 Jun 2024 22:23:43 -0300 Message-ID: <20240604012343.1771-1-jamrial@gmail.com> X-Mailer: git-send-email 2.45.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] x86/aacencdsp: add SSE2 and AVX versions of quantize_bands X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: quant_bands_signed_sse2: 417.0 quant_bands_signed_avx: 202.0 Signed-off-by: James Almer --- libavcodec/aacenc.h | 2 +- libavcodec/x86/aacencdsp.asm | 27 ++++++++++++++++++++++++--- libavcodec/x86/aacencdsp_init.c | 6 ++++++ tests/checkasm/aacencdsp.c | 4 ++-- 4 files changed, 33 insertions(+), 6 deletions(-) diff --git a/libavcodec/aacenc.h b/libavcodec/aacenc.h index d07960620e..ae15f91e06 100644 --- a/libavcodec/aacenc.h +++ b/libavcodec/aacenc.h @@ -242,7 +242,7 @@ typedef struct AACEncContext { enum RawDataBlockType cur_type; ///< channel group type cur_channel belongs to AudioFrameQueue afq; - DECLARE_ALIGNED(16, int, qcoefs)[96]; ///< quantized coefficients + DECLARE_ALIGNED(32, int, qcoefs)[96]; ///< quantized coefficients DECLARE_ALIGNED(32, float, scoefs)[1024]; ///< scaled coefficients uint16_t quantize_band_cost_cache_generation; diff --git a/libavcodec/x86/aacencdsp.asm b/libavcodec/x86/aacencdsp.asm index 0d3ba4b89d..99be2d87f5 100644 --- a/libavcodec/x86/aacencdsp.asm +++ b/libavcodec/x86/aacencdsp.asm @@ -53,8 +53,19 @@ cglobal abs_pow34, 3, 3, 3, out, in, size ; int size, int is_signed, int maxval, const float Q34, ; const float rounding) ;******************************************************************* -INIT_XMM sse2 +%macro AAC_QUANTIZE_BANDS 0 cglobal aac_quantize_bands, 5, 5, 6, out, in, scaled, size, is_signed, maxval, Q34, rounding +%if mmsize == 32 + vbroadcastss m0, Q34m + vbroadcastss m1, roundingm +%if UNIX64 == 0 + cvtsi2ss xm3, dword maxvalm +%else + cvtsi2ss xm3, maxvald +%endif + shufps xm3, xm3, xm3, 0 + vinsertf128 m3, m3, xm3, 1 +%else ; mmsize == 16 %if UNIX64 == 0 movss m0, Q34m movss m1, roundingm @@ -65,9 +76,13 @@ cglobal aac_quantize_bands, 5, 5, 6, out, in, scaled, size, is_signed, maxval, Q shufps m0, m0, 0 shufps m1, m1, 0 shufps m3, m3, 0 +%endif shl is_signedd, 31 - movd m4, is_signedd - shufps m4, m4, 0 + movd xm4, is_signedd + shufps xm4, xm4, xm4, 0 +%if mmsize == 32 + vinsertf128 m4, m4, xm4, 1 +%endif shl sized, 2 add inq, sizeq add outq, sizeq @@ -84,3 +99,9 @@ cglobal aac_quantize_bands, 5, 5, 6, out, in, scaled, size, is_signed, maxval, Q add sizeq, mmsize jl .loop RET +%endmacro + +INIT_XMM sse2 +AAC_QUANTIZE_BANDS +INIT_YMM avx +AAC_QUANTIZE_BANDS diff --git a/libavcodec/x86/aacencdsp_init.c b/libavcodec/x86/aacencdsp_init.c index e0d8dec4f8..cf17dbf91d 100644 --- a/libavcodec/x86/aacencdsp_init.c +++ b/libavcodec/x86/aacencdsp_init.c @@ -30,6 +30,9 @@ void ff_abs_pow34_sse(float *out, const float *in, const int size); void ff_aac_quantize_bands_sse2(int *out, const float *in, const float *scaled, int size, int is_signed, int maxval, const float Q34, const float rounding); +void ff_aac_quantize_bands_avx(int *out, const float *in, const float *scaled, + int size, int is_signed, int maxval, const float Q34, + const float rounding); av_cold void ff_aacenc_dsp_init_x86(AACEncDSPContext *s) { @@ -40,4 +43,7 @@ av_cold void ff_aacenc_dsp_init_x86(AACEncDSPContext *s) if (EXTERNAL_SSE2(cpu_flags)) s->quant_bands = ff_aac_quantize_bands_sse2; + + if (EXTERNAL_AVX_FAST(cpu_flags)) + s->quant_bands = ff_aac_quantize_bands_avx; } diff --git a/tests/checkasm/aacencdsp.c b/tests/checkasm/aacencdsp.c index 791dd30320..5308a2ac03 100644 --- a/tests/checkasm/aacencdsp.c +++ b/tests/checkasm/aacencdsp.c @@ -81,8 +81,8 @@ static void test_quant_bands(AACEncDSPContext *s) for (int sign = 0; sign <= 1; sign++) { if (check_func(s->quant_bands, "quant_bands_%s", sign ? "signed" : "unsigned")) { - LOCAL_ALIGNED_16(int, out, [BUF_SIZE]); - LOCAL_ALIGNED_16(int, out2, [BUF_SIZE]); + LOCAL_ALIGNED_32(int, out, [BUF_SIZE]); + LOCAL_ALIGNED_32(int, out2, [BUF_SIZE]); call_ref(out, in, scaled, BUF_SIZE, sign, maxval, q34, rounding); call_new(out2, in, scaled, BUF_SIZE, sign, maxval, q34, rounding); -- 2.45.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".