From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id B6A50444A1 for ; Mon, 12 Sep 2022 18:39:39 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 38F1668BACA; Mon, 12 Sep 2022 21:39:36 +0300 (EEST) Received: from mail-oa1-f41.google.com (mail-oa1-f41.google.com [209.85.160.41]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 85FFC68B149 for ; Mon, 12 Sep 2022 21:39:29 +0300 (EEST) Received: by mail-oa1-f41.google.com with SMTP id 586e51a60fabf-127f5411b9cso25755177fac.4 for ; Mon, 12 Sep 2022 11:39:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date; bh=KMt8k0g5NMHKgz0C5fK/Eg0411AWcrTTaa/6UPuA6xc=; b=ZgFNUBKPVeZGVt6rWQRWnBeeJmXMJHz37ofhOX9r+xZdFqY1kGQAjsHBU3FFHqyZ2s +8S5KuwBG3+dGYNFOLh6JdDF8UM6WejKJOuHQUcz3w5WKtLb5SKwZnI9tTGBmFuODhwy sKN2TI60wCneJRwC5Gk2goWW0eqcC+URM0z9ufFD3/vrvkraICK9QumKtnVdgLrGu/qH L0Hx+4ZLoU9r2Ab8By99HdtqT6fVSbp6THW/Zg7iE+qpXtCbb8N1JgVimGy8dSOd70BD uyXROa6yBG4Vl1tKZRBSghULWDm84HZUSPlsDKwPMPYuxE9+oV7ZHGDLdau1vdGYeDE5 8DcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date; bh=KMt8k0g5NMHKgz0C5fK/Eg0411AWcrTTaa/6UPuA6xc=; b=iQpsnyxMV5UJ37hiX8KVqp3v7GAN6XMEk1fA/rpbYcDDKR20b7IxnBl1eRxOuQT7UE kmZhP3xnV05E/pm8LRd/fj6OdN2czdVIaqvGSFYqoOydObLj072DXqDhGsXizO4UUmVu bR99FHmD9cO2MGuX5I/EzErooj/3/nC7OkDjFvDlxy6OXlSQo80wCKS0idmu5dazI2OY RNXFwaNWa1sfdWbbY28NxJQXw//lFVTGaAQV+sF5MnThh7njEgszJGtCcftENw6geoZQ yx9fT48JjYsfY5KxUcoLfNW0R0E9m6e9Nn84y2cVa7jPxfe3p5bOGBKc7Cwr+/Bg4LGv SbSA== X-Gm-Message-State: ACgBeo1EH3C7o0FcwwrZvGJ3zDKZkCWlJwypdLjyn8q/3NS6cb8Id1ZQ mZf2QTi3j3y8gE8Js2secez08Q9Nxws= X-Google-Smtp-Source: AA6agR7R1cnjHuvF4lYJb662j4lHGYrlyS8qx8IV6VcdiYNKl513NMRNJZ1qV9Qf24y+aDIbkbQNEQ== X-Received: by 2002:a05:6808:3024:b0:34d:83a7:9084 with SMTP id ay36-20020a056808302400b0034d83a79084mr8600023oib.75.1663007967396; Mon, 12 Sep 2022 11:39:27 -0700 (PDT) Received: from [192.168.0.13] ([191.97.187.183]) by smtp.gmail.com with ESMTPSA id c12-20020a9d784c000000b00636e6dea5e5sm2818458otm.23.2022.09.12.11.39.26 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 12 Sep 2022 11:39:27 -0700 (PDT) Message-ID: Date: Mon, 12 Sep 2022 15:39:26 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.13.0 Content-Language: en-US To: ffmpeg-devel@ffmpeg.org References: From: James Almer In-Reply-To: Subject: Re: [FFmpeg-devel] [PATCH] avcodec/x86/audiodsp: add scalarproduct avx2 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: > From 55eb5a18b4bf029f52f9d9108a750c576ba780ee Mon Sep 17 00:00:00 2001 > From: Paul B Mahol > Date: Mon, 12 Sep 2022 18:53:31 +0200 > Subject: [PATCH] avcodec/x86/audiodsp: add scalarproduct avx2 > > Signed-off-by: Paul B Mahol > --- > libavcodec/x86/audiodsp.asm | 24 ++++++++++++++++++++++++ > libavcodec/x86/audiodsp_init.c | 6 ++++++ > 2 files changed, 30 insertions(+) > > diff --git a/libavcodec/x86/audiodsp.asm b/libavcodec/x86/audiodsp.asm > index b604b0443c..55051f6aa7 100644 > --- a/libavcodec/x86/audiodsp.asm > +++ b/libavcodec/x86/audiodsp.asm > @@ -44,6 +44,30 @@ cglobal scalarproduct_int16, 3,3,3, v1, v2, order > movd eax, m2 > RET > > +INIT_YMM avx2 > +cglobal scalarproduct_int16, 3,4,3, v1, v2, order, offset > + xor offsetq, offsetq > + add orderd, orderd > + pxor m1, m1 > + cmp orderd, 32 This parameter needs to be multiple of 16. What will happen below if it's for example 48? Are both buffers padded enough to handle 16 bytes of overread? > + jl .l16 > +.loop: > + movu m0, [v1q + offsetq] > + pmaddwd m0, [v2q + offsetq] > + paddd m1, m0 > + add offsetq, mmsize > + cmp offsetq, orderq You should use the neg trick from the sse2 version so you can remove the cmp from this loop. > + jl .loop > + HADDD m1, m0 > + movd eax, xm1 > + RET > +.l16: > + movu xm0, [v1q + offsetq] > + pmaddwd xm0, [v2q + offsetq] > + paddd xm1, xm0 > + HADDD xm1, xm0 > + movd eax, xm1 > + RET > > ;----------------------------------------------------------------------------- > ; void ff_vector_clip_int32(int32_t *dst, const int32_t *src, int32_t min, > diff --git a/libavcodec/x86/audiodsp_init.c b/libavcodec/x86/audiodsp_init.c > index aa5e43e570..77d5948442 100644 > --- a/libavcodec/x86/audiodsp_init.c > +++ b/libavcodec/x86/audiodsp_init.c > @@ -24,6 +24,9 @@ > #include "libavutil/x86/cpu.h" > #include "libavcodec/audiodsp.h" > > +int32_t ff_scalarproduct_int16_avx2(const int16_t *v1, const int16_t *v2, > + int order); > + > int32_t ff_scalarproduct_int16_sse2(const int16_t *v1, const int16_t *v2, > int order); > > @@ -53,4 +56,7 @@ av_cold void ff_audiodsp_init_x86(AudioDSPContext *c) > > if (EXTERNAL_SSE4(cpu_flags)) > c->vector_clip_int32 = ff_vector_clip_int32_sse4; > + > + if (EXTERNAL_AVX2(cpu_flags)) > + c->scalarproduct_int16 = ff_scalarproduct_int16_avx2; > } > -- > 2.37.2 > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".