From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 05EF2490F7 for ; Mon, 3 Jun 2024 02:39:34 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EB6A668D678; Mon, 3 Jun 2024 05:39:31 +0300 (EEST) Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6191A68D517 for ; Mon, 3 Jun 2024 05:39:25 +0300 (EEST) Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-1f6342c5fa8so18911645ad.1 for ; Sun, 02 Jun 2024 19:39:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717382362; x=1717987162; darn=ffmpeg.org; h=content-transfer-encoding:in-reply-to:content-language:references :to:from:subject:user-agent:mime-version:date:message-id:from:to:cc :subject:date:message-id:reply-to; bh=mVtoX3khKfWhzBfGvYCKhTaztA4R6g89bDHMaxZ5bss=; b=GV5Aowvh5IgLTnoTSgHmy2aNTKH1Ft+LqdZwyk6jTh/Jz7Z642mgdCXRu+G8TtZIDn zwF2/7XXakuUyvjo2wg7aoQtSila66IUW0xzI+jQ7lLeEDd9W3/xXL4agaTskt5rSFuW iGq9ldskFLh275z+xXXMLqqEb8Oeq1JmtiORBASjqSk06XQ+Qq+DwY6qZS96+FrJdqr7 zCRuRmTLRjatnWBcLbv/XyIr9Tb1ufRrJMVv/oCpMRula7ROT0XPbvZlU+Z6vT1vgU0q U0RC8gVs1Z4PTRAJrNFNGvlPUjdPVLljMT8qIS2rnfzZUdVVSq3RWdRKE5WklauMn+Xx IelA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717382362; x=1717987162; h=content-transfer-encoding:in-reply-to:content-language:references :to:from:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mVtoX3khKfWhzBfGvYCKhTaztA4R6g89bDHMaxZ5bss=; b=uNZmrpM3zT8TxwHxuznq3PH2Cqg31YZW5AKpFVZeiMg3zio0C2c5buwpZMVRLVF7bz jk/g3qdp9wV9n/+wg/3+nUwnJUrmDCE7iG7nw4lYz4b7AZykrG7vbh/16qmrKKahhlKG NXEQcudU9YL4pwVzXqE8u1gt5q5mBQogTLJ7/Cbs8Dz4qM9b9/z+T5+myE+cJK+X431/ MPP/5X3f18ZUJ7rAWl3MqJqlxFutOnLMwGBdO+zH2vpdebH3FedTuDn0iZZn7gH63/36 z5ltlw5bCExx58cWQHLjfK/ubcjSAQVVvGxoPQQxNLdLWAwbo1qvUUaRI+w5JBASwMhh de+g== X-Gm-Message-State: AOJu0Yze0f4+WbHi3hivp7mPPJxOnGM668g5lcke11bV9QUM1/rWFJ61 6ikVT+6WearSv0rTS9uEw4v3/Vmd6+2nVirLWyf26V4SC08/4nJOBFezGw== X-Google-Smtp-Source: AGHT+IE/Iacy0lTIg7q0Ai/S05ljW4vp+lmTM1miX8GLBh0jAOhaLMPQsPye340+m7ymOlTlUKw//A== X-Received: by 2002:a17:902:d54e:b0:1f4:a026:4888 with SMTP id d9443c01a7336-1f636ff7814mr89902045ad.21.1717382362082; Sun, 02 Jun 2024 19:39:22 -0700 (PDT) Received: from [192.168.0.10] ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f6553aa344sm31153635ad.239.2024.06.02.19.39.20 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 02 Jun 2024 19:39:21 -0700 (PDT) Message-ID: Date: Sun, 2 Jun 2024 23:39:26 -0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: James Almer To: ffmpeg-devel@ffmpeg.org References: <20240531194708.6146-1-jamrial@gmail.com> Content-Language: en-US In-Reply-To: <20240531194708.6146-1-jamrial@gmail.com> Subject: Re: [FFmpeg-devel] [PATCH] x86/float_dsp: add SSE2 and AVX versions of scalarproduct_double X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On 5/31/2024 4:47 PM, James Almer wrote: > Signed-off-by: James Almer > --- > libavutil/x86/float_dsp.asm | 52 ++++++++++++++++++++++++++++++++++ > libavutil/x86/float_dsp_init.c | 5 ++++ > 2 files changed, 57 insertions(+) > > diff --git a/libavutil/x86/float_dsp.asm b/libavutil/x86/float_dsp.asm > index e84ba52566..e9816cdf02 100644 > --- a/libavutil/x86/float_dsp.asm > +++ b/libavutil/x86/float_dsp.asm > @@ -567,6 +567,58 @@ cglobal scalarproduct_float, 3,5,8, v1, v2, size, len, offset > %endif > RET > > +;--------------------------------------------------------------------------------- > +; double scalarproduct_double(const double *v1, const double *v2, size_t len) > +;--------------------------------------------------------------------------------- > +%macro SCALARPRODUCT_DOUBLE 0 > +cglobal scalarproduct_double, 3,3,8, v1, v2, offset > + shl offsetq, 3 > + add v1q, offsetq > + add v2q, offsetq > + neg offsetq > + xorpd m0, m0 > + xorpd m1, m1 > + xorpd m2, m2 > + xorpd m3, m3 > +align 16 > +.loop: > + movapd m4, [v1q+offsetq+mmsize*0] > + movapd m5, [v1q+offsetq+mmsize*1] > + movapd m6, [v1q+offsetq+mmsize*2] > + movapd m7, [v1q+offsetq+mmsize*3] > + mulpd m4, [v2q+offsetq+mmsize*0] > + mulpd m5, [v2q+offsetq+mmsize*1] > + mulpd m6, [v2q+offsetq+mmsize*2] > + mulpd m7, [v2q+offsetq+mmsize*3] > + addpd m0, m4 > + addpd m1, m5 > + addpd m2, m6 > + addpd m3, m7 > + add offsetq, mmsize*4 > + jl .loop > + addpd m0, m1 > + addpd m2, m3 > + addpd m0, m2 > +%if mmsize == 32 > + vextractf128 xm1, m0, 1 > + addpd xm0, xm1 > +%endif > + movhlps xm1, xm0 > + addpd xm0, xm1 > +%if ARCH_X86_64 == 0 > + movsd r0m, xm0 > + fld qword r0m > +%endif > + RET > +%endmacro > + > +INIT_XMM sse2 > +SCALARPRODUCT_DOUBLE > +%if HAVE_AVX_EXTERNAL > +INIT_YMM avx > +SCALARPRODUCT_DOUBLE > +%endif > + > ;----------------------------------------------------------------------------- > ; void ff_butterflies_float(float *src0, float *src1, int len); > ;----------------------------------------------------------------------------- > diff --git a/libavutil/x86/float_dsp_init.c b/libavutil/x86/float_dsp_init.c > index 093bce9b94..6cf0b4a277 100644 > --- a/libavutil/x86/float_dsp_init.c > +++ b/libavutil/x86/float_dsp_init.c > @@ -73,6 +73,9 @@ void ff_vector_fmul_reverse_avx2(float *dst, const float *src0, > float ff_scalarproduct_float_sse(const float *v1, const float *v2, int order); > float ff_scalarproduct_float_fma3(const float *v1, const float *v2, int order); > > +double ff_scalarproduct_double_sse2(const double *v1, const double *v2, size_t order); > +double ff_scalarproduct_double_avx(const double *v1, const double *v2, size_t order); > + > void ff_butterflies_float_sse(float *restrict src0, float *restrict src1, int len); > > av_cold void ff_float_dsp_init_x86(AVFloatDSPContext *fdsp) > @@ -93,6 +96,7 @@ av_cold void ff_float_dsp_init_x86(AVFloatDSPContext *fdsp) > fdsp->vector_dmul = ff_vector_dmul_sse2; > fdsp->vector_dmac_scalar = ff_vector_dmac_scalar_sse2; > fdsp->vector_dmul_scalar = ff_vector_dmul_scalar_sse2; > + fdsp->scalarproduct_double = ff_scalarproduct_double_sse2; > } > if (EXTERNAL_AVX_FAST(cpu_flags)) { > fdsp->vector_fmul = ff_vector_fmul_avx; > @@ -102,6 +106,7 @@ av_cold void ff_float_dsp_init_x86(AVFloatDSPContext *fdsp) > fdsp->vector_dmac_scalar = ff_vector_dmac_scalar_avx; > fdsp->vector_fmul_add = ff_vector_fmul_add_avx; > fdsp->vector_fmul_reverse = ff_vector_fmul_reverse_avx; > + fdsp->scalarproduct_double = ff_scalarproduct_double_avx; > } > if (EXTERNAL_AVX2_FAST(cpu_flags)) { > fdsp->vector_fmul_reverse = ff_vector_fmul_reverse_avx2; Will apply. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".