From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id B436E442EA for ; Mon, 5 Sep 2022 11:19:38 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3332E68B82B; Mon, 5 Sep 2022 14:19:36 +0300 (EEST) Received: from mail-ot1-f49.google.com (mail-ot1-f49.google.com [209.85.210.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2FDCA68B325 for ; Mon, 5 Sep 2022 14:19:30 +0300 (EEST) Received: by mail-ot1-f49.google.com with SMTP id d18-20020a9d72d2000000b0063934f06268so5987856otk.0 for ; Mon, 05 Sep 2022 04:19:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date; bh=Ss+UruAX9CYK7whJLPg/OYeu7PWToRMJIoYy+l9F1z8=; b=XXXxtU2f2IHJj+wy4CyCx4D0h1ZL/DtCdrfkv5knn8k8PotNmWoQZ9wcr7Uej1wftN Ajdb9FJ+IfSMA+KfIvPotFMfabk1U1NCL5HT+ZU8Kem03R4HCAWNM6yc7K9uk1oSkuvj TeQF9gzFj6KynjAnLVu1N1knPuFly77PljX6uoe69Be5belu0NFG9WP00PGmkezRyXmK eFjDdVuW74gZf8WZga7slg4r4cwz+nUL/sBsDC4Aal1oh3p+IBoVgyHBLcLitKCB6NKH TNDjO+OdEo78W1U75RDEiaBRX6kRCQ+NjSYapfcnDvCdm+lyFkP1OhfyfrsZ6NQznbL9 D8EQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date; bh=Ss+UruAX9CYK7whJLPg/OYeu7PWToRMJIoYy+l9F1z8=; b=e53WUfxO0gwd+zmuRmLHPRrvcDZYC3VwzoD6MbIFTsutW1pZPgttPhqk8PhUXQYkrD csH6HgxOFdNRHwABcXXrLK41xpZ0sXXlWuY2q3Lbt53YYS+RogZv5YzzaXn566BGs3Eo /1PoSwCHS3QwGruNshnz20vd6pfDV4eIaq2R9v9VPdksr5x8xbM9FjhunIC8/6LMD2op 6P6Q4/GmJeKgo3mbeKavpnsOpA66Zl7kyAe46MMC3qCP380MrgEDuXqce9MwXaIvlimY oaDHKGnYCTiCaINyKZFnV5TYN4CR7ach0C4/55Lt4vHxmGa8jWb8Q/EyUaRG0VQbd9ve Xh1g== X-Gm-Message-State: ACgBeo1QOUV0vUbUWla03l8DoltlYO+pVAfBHgfghHRWmhaHUvYDrr++ 0SYfjq+33zQYC80rMSVAlcvozDmJVbU= X-Google-Smtp-Source: AA6agR5BM2M6PnFwgJMum2vd14lAGNgz0Ambk70Carq9RZ1kZhRl2uH9AOoYXxzlVIWYOcm+1T2E1Q== X-Received: by 2002:a9d:479a:0:b0:637:16d8:1695 with SMTP id b26-20020a9d479a000000b0063716d81695mr19200261otf.182.1662376768296; Mon, 05 Sep 2022 04:19:28 -0700 (PDT) Received: from [192.168.0.11] ([191.97.187.183]) by smtp.gmail.com with ESMTPSA id d8-20020a05680805c800b00344e3751fc4sm3976955oij.36.2022.09.05.04.19.27 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 05 Sep 2022 04:19:27 -0700 (PDT) Message-ID: <672b8fd2-42d8-889d-40dd-16fb758f3e0c@gmail.com> Date: Mon, 5 Sep 2022 08:19:29 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.13.0 Content-Language: en-US To: ffmpeg-devel@ffmpeg.org References: From: James Almer In-Reply-To: Subject: Re: [FFmpeg-devel] [PATCH] avcodec: flac x86 asm fix X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On 9/3/2022 6:45 PM, Paul B Mahol wrote: > Patch attached. [...] > From 0f220489eb6402c6be3bc1d897c95fa9bc10431c Mon Sep 17 00:00:00 2001 > From: Paul B Mahol > Date: Sat, 3 Sep 2022 23:41:38 +0200 > Subject: [PATCH] avcodec/x86/flacdsp: fix bug in decorrelation > > Fixes #9297 > > Signed-off-by: Paul B Mahol > --- > libavcodec/x86/flacdsp.asm | 23 +++++++++++++++----- > libavcodec/x86/flacdsp_init.c | 41 ++++++++++++++++++++++------------- > 2 files changed, 44 insertions(+), 20 deletions(-) > > diff --git a/libavcodec/x86/flacdsp.asm b/libavcodec/x86/flacdsp.asm > index 7138611526..6d755f4972 100644 > --- a/libavcodec/x86/flacdsp.asm > +++ b/libavcodec/x86/flacdsp.asm > @@ -23,6 +23,10 @@ > > %include "libavutil/x86/x86util.asm" > > +SECTION_RODATA > + > +vector: db 0,1,4,5,8,9,12,13,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,1,4,5,8,9,12,13, > + > SECTION .text > > %macro PMACSDQL 5 > @@ -89,6 +93,9 @@ LPC_32 sse4 > ;---------------------------------------------------------------------------------- > %macro FLAC_DECORRELATE_16 3-4 > cglobal flac_decorrelate_%1_16, 2, 4, 4, out, in0, in1, len > +%ifidn %1, indep2 > + VBROADCASTI128 m2, [vector] You're loading 16 bytes, yet vector is 32 bytes. Are you sure this is ok? The upper 16 bytes look different, which makes me think you meant to use them. > +%endif > %if ARCH_X86_32 > mov lend, lenm > %endif > @@ -112,11 +119,17 @@ align 16 > %endif > %ifnidn %1, indep2 > p%4d m2, m0, m1 > + packssdw m%2, m%2 > + packssdw m%3, m%3 > + punpcklwd m%2, m%3 > + psllw m%2, m3 > +%else > + pslld m%2, m3 > + pslld m%3, m3 > + pshufb m%2, m%2, m2 > + pshufb m%3, m%3, m2 > + punpcklwd m%2, m%3 > %endif > - packssdw m%2, m%2 > - packssdw m%3, m%3 > - punpcklwd m%2, m%3 > - psllw m%2, m3 > mova [outq + lenq], m%2 > add lenq, 16 > jl .loop > @@ -292,7 +305,7 @@ align 16 > REP_RET > %endmacro > > -INIT_XMM sse2 > +INIT_XMM ssse3 Why are you turning all the unrelated indep functions into ssse3 when you only changed the FLAC_DECORRELATE_16 macro to fix the stereo 16bit indep path? And did you try using FLAC_DECORRELATE_INDEP for it instead? I did mention below i purposely used a different macro for that specific scenario (probably to save some instructions), so the fix could have been using the proper indep macro. > FLAC_DECORRELATE_16 indep2, 0, 1 ; Reuse stereo 16bits macro > FLAC_DECORRELATE_INDEP 32, 2, 3, d > FLAC_DECORRELATE_INDEP 16, 4, 3, w > diff --git a/libavcodec/x86/flacdsp_init.c b/libavcodec/x86/flacdsp_init.c > index 2deaf3117f..48e3e7c55c 100644 > --- a/libavcodec/x86/flacdsp_init.c > +++ b/libavcodec/x86/flacdsp_init.c > @@ -34,7 +34,9 @@ void ff_flac_decorrelate_ls_##fmt##_##opt(uint8_t **out, int32_t **in, int chann > void ff_flac_decorrelate_rs_##fmt##_##opt(uint8_t **out, int32_t **in, int channels, \ > int len, int shift); \ > void ff_flac_decorrelate_ms_##fmt##_##opt(uint8_t **out, int32_t **in, int channels, \ > - int len, int shift); \ > + int len, int shift); > + > +#define DECORRELATE_IFUNCS(fmt, opt) \ > void ff_flac_decorrelate_indep2_##fmt##_##opt(uint8_t **out, int32_t **in, int channels, \ > int len, int shift); \ > void ff_flac_decorrelate_indep4_##fmt##_##opt(uint8_t **out, int32_t **in, int channels, \ > @@ -48,6 +50,10 @@ DECORRELATE_FUNCS(16, sse2); > DECORRELATE_FUNCS(16, avx); > DECORRELATE_FUNCS(32, sse2); > DECORRELATE_FUNCS(32, avx); > +DECORRELATE_IFUNCS(16, ssse3); > +DECORRELATE_IFUNCS(16, avx); > +DECORRELATE_IFUNCS(32, ssse3); > +DECORRELATE_IFUNCS(32, avx); > > av_cold void ff_flacdsp_init_x86(FLACDSPContext *c, enum AVSampleFormat fmt, int channels) > { > @@ -55,30 +61,35 @@ av_cold void ff_flacdsp_init_x86(FLACDSPContext *c, enum AVSampleFormat fmt, int > int cpu_flags = av_get_cpu_flags(); > > if (EXTERNAL_SSE2(cpu_flags)) { > + if (fmt == AV_SAMPLE_FMT_S16) { > + c->decorrelate[1] = ff_flac_decorrelate_ls_16_sse2; > + c->decorrelate[2] = ff_flac_decorrelate_rs_16_sse2; > + c->decorrelate[3] = ff_flac_decorrelate_ms_16_sse2; > + } else if (fmt == AV_SAMPLE_FMT_S32) { > + c->decorrelate[1] = ff_flac_decorrelate_ls_32_sse2; > + c->decorrelate[2] = ff_flac_decorrelate_rs_32_sse2; > + c->decorrelate[3] = ff_flac_decorrelate_ms_32_sse2; > + } > + } > + if (EXTERNAL_SSSE3(cpu_flags)) { > if (fmt == AV_SAMPLE_FMT_S16) { > if (channels == 2) > - c->decorrelate[0] = ff_flac_decorrelate_indep2_16_sse2; > + c->decorrelate[0] = ff_flac_decorrelate_indep2_16_ssse3; > else if (channels == 4) > - c->decorrelate[0] = ff_flac_decorrelate_indep4_16_sse2; > + c->decorrelate[0] = ff_flac_decorrelate_indep4_16_ssse3; > else if (channels == 6) > - c->decorrelate[0] = ff_flac_decorrelate_indep6_16_sse2; > + c->decorrelate[0] = ff_flac_decorrelate_indep6_16_ssse3; > else if (ARCH_X86_64 && channels == 8) > - c->decorrelate[0] = ff_flac_decorrelate_indep8_16_sse2; > - c->decorrelate[1] = ff_flac_decorrelate_ls_16_sse2; > - c->decorrelate[2] = ff_flac_decorrelate_rs_16_sse2; > - c->decorrelate[3] = ff_flac_decorrelate_ms_16_sse2; > + c->decorrelate[0] = ff_flac_decorrelate_indep8_16_ssse3; > } else if (fmt == AV_SAMPLE_FMT_S32) { > if (channels == 2) > - c->decorrelate[0] = ff_flac_decorrelate_indep2_32_sse2; > + c->decorrelate[0] = ff_flac_decorrelate_indep2_32_ssse3; > else if (channels == 4) > - c->decorrelate[0] = ff_flac_decorrelate_indep4_32_sse2; > + c->decorrelate[0] = ff_flac_decorrelate_indep4_32_ssse3; > else if (channels == 6) > - c->decorrelate[0] = ff_flac_decorrelate_indep6_32_sse2; > + c->decorrelate[0] = ff_flac_decorrelate_indep6_32_ssse3; > else if (ARCH_X86_64 && channels == 8) > - c->decorrelate[0] = ff_flac_decorrelate_indep8_32_sse2; > - c->decorrelate[1] = ff_flac_decorrelate_ls_32_sse2; > - c->decorrelate[2] = ff_flac_decorrelate_rs_32_sse2; > - c->decorrelate[3] = ff_flac_decorrelate_ms_32_sse2; > + c->decorrelate[0] = ff_flac_decorrelate_indep8_32_ssse3; > } > } > if (EXTERNAL_SSE4(cpu_flags)) { > -- > 2.37.2 > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".