From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 962A7482D4 for ; Wed, 22 Nov 2023 19:49:34 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B43B568CEFA; Wed, 22 Nov 2023 21:49:24 +0200 (EET) Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 58C9468CBA1 for ; Wed, 22 Nov 2023 21:49:18 +0200 (EET) Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-6b5af4662b7so162574b3a.3 for ; Wed, 22 Nov 2023 11:49:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700682556; x=1701287356; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=04CTdn6udgjSE2E73cBKjHqFN6r1aFzHWp4JR3pjS94=; b=dXTtkStpIsbuVvh0KFlz4qD5mz6PmGu/9RjKZZeVRd6GMTmaOu7ATS7UHkBkvuMoXl 0Oy5fL1ELi8jSxbpNL2sNexA5qIMyYl9FwtuOYgJlLIrBInTZpgdFRwvvQODeg+Qkasu 8kJYcwh1Xqd6tiydmBuAvh2WNl7NZXlmnGKhTJknawYUY2coWDtzvPagjsR4s6QSsDJ8 4NcPJH/HfdtQv2O0/ZNa/eW+uVqRdvYcMt+eLWQXRv2o68uMDsEISgSU7gfx4iskinfF mz8ahdkfM6iIpNCi4jOCz0vwzDmhyt3/x1jUeo+GjeBM3cEXtCLLiHDqOTs/3Ac5xw6A yf/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700682556; x=1701287356; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=04CTdn6udgjSE2E73cBKjHqFN6r1aFzHWp4JR3pjS94=; b=qlDW9xn39hPpw5n5cfTJy4fJy2NC9aEzM8MAG5IvP9YT8/KQGc/q2xsr4ym7nkt9hH bOAeNzPRkGz5bLWYi3x9xAAw6sMkmS6QcwMEnen5r91R4bFsAxbV85uyH6w6yKsIvyOd 3/OkP2UPP3IMAV+kCo5TTiJRHku9Iw3puXqNdxotnPfNWRD6fDnh7YYrsrFimIxIPJ5R TKiYwQQDf55DBKxOfDaH7OhgFgZ4i9bK16VlcjhbTT8Cwut5Q3C7y8kg55GC33oG6W3r kRwp2jC4MPGzG+r6RpoUC04Dy7KNFuPPie3KON6uEEChrWFn2/Ca/nr0136LMA/2zNCC sR5g== X-Gm-Message-State: AOJu0Yy1RLazr6FQ14pPTW5+OhflHpBSreVA0/lSCMtjuvJQevCn0b11 78DyLjC2Y+BmbKOAkw6p1aePpr0CcpM= X-Google-Smtp-Source: AGHT+IFSPFvkO6NJeRgth8Hd83NPi4n9ATKQxlIoGuX4M66s33KnpLVlVRQdX5mA2RtyTjTGYGObAg== X-Received: by 2002:a05:6a20:8f02:b0:18b:826e:e611 with SMTP id b2-20020a056a208f0200b0018b826ee611mr751778pzk.40.1700682556156; Wed, 22 Nov 2023 11:49:16 -0800 (PST) Received: from localhost.localdomain (host197.190-225-105.telecom.net.ar. [190.225.105.197]) by smtp.gmail.com with ESMTPSA id h18-20020aa786d2000000b00688965c5227sm100156pfo.120.2023.11.22.11.49.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Nov 2023 11:49:15 -0800 (PST) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Wed, 22 Nov 2023 16:49:12 -0300 Message-ID: <20231122194913.9856-2-jamrial@gmail.com> X-Mailer: git-send-email 2.42.1 In-Reply-To: <20231122194913.9856-1-jamrial@gmail.com> References: <20231122194913.9856-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/3] x86/ac3dsp: add ff_float_to_fixed24_avx2() X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Signed-off-by: James Almer --- libavcodec/ac3dsp.h | 4 ++-- libavcodec/ac3enc_template.c | 2 +- libavcodec/x86/ac3dsp.asm | 28 ++++++++++++++++++++++++++-- libavcodec/x86/ac3dsp_init.c | 4 ++++ 4 files changed, 33 insertions(+), 5 deletions(-) diff --git a/libavcodec/ac3dsp.h b/libavcodec/ac3dsp.h index a01bff3d11..25341f3396 100644 --- a/libavcodec/ac3dsp.h +++ b/libavcodec/ac3dsp.h @@ -47,9 +47,9 @@ typedef struct AC3DSPContext { * [-(1<<24),(1<<24)] * * @param dst destination array of int32_t. - * constraints: 16-byte aligned + * constraints: 32-byte aligned * @param src source array of float. - * constraints: 16-byte aligned + * constraints: 32-byte aligned * @param len number of elements to convert. * constraints: multiple of 32 greater than zero */ diff --git a/libavcodec/ac3enc_template.c b/libavcodec/ac3enc_template.c index be4ecebc9c..a16faea681 100644 --- a/libavcodec/ac3enc_template.c +++ b/libavcodec/ac3enc_template.c @@ -112,7 +112,7 @@ static void apply_channel_coupling(AC3EncodeContext *s) { LOCAL_ALIGNED_16(CoefType, cpl_coords, [AC3_MAX_BLOCKS], [AC3_MAX_CHANNELS][16]); #if AC3ENC_FLOAT - LOCAL_ALIGNED_16(int32_t, fixed_cpl_coords, [AC3_MAX_BLOCKS], [AC3_MAX_CHANNELS][16]); + LOCAL_ALIGNED_32(int32_t, fixed_cpl_coords, [AC3_MAX_BLOCKS], [AC3_MAX_CHANNELS][16]); #else int32_t (*fixed_cpl_coords)[AC3_MAX_CHANNELS][16] = cpl_coords; #endif diff --git a/libavcodec/x86/ac3dsp.asm b/libavcodec/x86/ac3dsp.asm index 42c8310462..e31c58e1c1 100644 --- a/libavcodec/x86/ac3dsp.asm +++ b/libavcodec/x86/ac3dsp.asm @@ -21,10 +21,10 @@ %include "libavutil/x86/x86util.asm" -SECTION_RODATA +SECTION_RODATA 32 ; 16777216.0f - used in ff_float_to_fixed24() -pf_1_24: times 4 dd 0x4B800000 +pf_1_24: times 8 dd 0x4B800000 ; used in ff_ac3_compute_mantissa_size() cextern ac3_bap_bits @@ -128,6 +128,30 @@ cglobal float_to_fixed24, 3, 3, 9, dst, src, len jl .loop RET +INIT_YMM avx2 +cglobal float_to_fixed24, 3, 3, 5, dst, src, len + movaps m0, [pf_1_24] + shl lenq, 2 + add srcq, lenq + add dstq, lenq + neg lenq +.loop: + mulps m1, m0, [srcq+lenq+mmsize*0] + mulps m2, m0, [srcq+lenq+mmsize*1] + mulps m3, m0, [srcq+lenq+mmsize*2] + mulps m4, m0, [srcq+lenq+mmsize*3] + cvtps2dq m1, m1 + cvtps2dq m2, m2 + cvtps2dq m3, m3 + cvtps2dq m4, m4 + movdqa [dstq+lenq+mmsize*0], m1 + movdqa [dstq+lenq+mmsize*1], m2 + movdqa [dstq+lenq+mmsize*2], m3 + movdqa [dstq+lenq+mmsize*3], m4 + add lenq, mmsize*4 + jl .loop + RET + ;------------------------------------------------------------------------------ ; int ff_ac3_compute_mantissa_size(uint16_t mant_cnt[6][16]) ;------------------------------------------------------------------------------ diff --git a/libavcodec/x86/ac3dsp_init.c b/libavcodec/x86/ac3dsp_init.c index 43b3b4ac85..106121b5b9 100644 --- a/libavcodec/x86/ac3dsp_init.c +++ b/libavcodec/x86/ac3dsp_init.c @@ -27,6 +27,7 @@ void ff_ac3_exponent_min_sse2 (uint8_t *exp, int num_reuse_blocks, int nb_coefs); void ff_float_to_fixed24_sse2 (int32_t *dst, const float *src, unsigned int len); +void ff_float_to_fixed24_avx2 (int32_t *dst, const float *src, unsigned int len); int ff_ac3_compute_mantissa_size_sse2(uint16_t mant_cnt[6][16]); @@ -48,6 +49,9 @@ av_cold void ff_ac3dsp_init_x86(AC3DSPContext *c) if (!(cpu_flags & AV_CPU_FLAG_ATOM)) c->extract_exponents = ff_ac3_extract_exponents_ssse3; } + if (EXTERNAL_AVX2_FAST(cpu_flags)) { + c->float_to_fixed24 = ff_float_to_fixed24_avx2; + } } #define DOWNMIX_FUNC_OPT(ch, opt) \ -- 2.42.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".