From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 846C64B26E for ; Fri, 31 May 2024 19:47:31 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2B33A68D595; Fri, 31 May 2024 22:47:29 +0300 (EEST) Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A4EFC68D406 for ; Fri, 31 May 2024 22:47:22 +0300 (EEST) Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-1f630e35a01so11504355ad.1 for ; Fri, 31 May 2024 12:47:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717184840; x=1717789640; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=P2Pfnx8SX9uoiqs8q5b8Vgx+X7U505SfkFbeqeIVvBg=; b=ET1moJLEYG9szAm83GNlIGtWfQdDYE+nNWpDj1aAAq0csBzU/6r142MQTRsqNfyWXg Z81J3VM2Zj0uK4G7dOUnkW6RkTGbOS8syTropFo/EeM2shdbQ9Fjv1OoTAQTP0kmax6u IXqtMYemnDaJrHWISN8x94jDfLNQtdiWuCbM+TdUj4bQw4DzKccrDbGosQXfPcVDZSHD Bi3U3M3s6W9lvpNkav6tfUXoLmh/ix8nXXahwnoqemdjs1jmrJ3qWG4o2aAAk/928YTy YTBm0zyu8cvLm7F90AclR2I0RmFKRjABw1W6H1xJM+YI5vNGLvUzGiBE13w/xSBncCWE /Pkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717184840; x=1717789640; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=P2Pfnx8SX9uoiqs8q5b8Vgx+X7U505SfkFbeqeIVvBg=; b=i9mFqzg3MeUIckD/uC8vooO3UsTS/9wIqs/KZCR8nAHH/MUGc5C7Rg8hxkF1+LbLA6 eEsHmvgmKX6HdcE97oum1pKKu0neVsWKuL3pHD4N5lk3uxNr9YmxdcBr/PlVIXJPT3if 4arIIr5ivc266DU3o8RNL3pLuwy7KRFKd0jzfRXMbPKzPeIyYYN619wnVyPIvQS9t4Oq nB0A6HpBytXulAVhQgXVhCPYEnUfIxbUG7jOtBEhrNDMY3Q3y7prHv/RvjTTjYilZpLV fnO8Yx76TkjR+XuPDBa9RzAcegOingFs8A7ZS7jJrB2FrI0q4qRNxeI91gi2q7rMX0tH AWEQ== X-Gm-Message-State: AOJu0Yw/O18f0zhWS9Tbgt+w3o4VNRuQbonRWT7q782nTFzYy72ETYxl DidUIhrAJYq4K0R+Q7SbgwoOALCnQvfUDMw94FBpd7jo4A4nZvAfbcUzNA== X-Google-Smtp-Source: AGHT+IGaHAcIoi2V69JAmOsr6DQbvvzLO6pxM9UQulU8qVTVvwHqYORXsQH+KPVxtU3Ys03fiGAFIA== X-Received: by 2002:a17:902:f54e:b0:1f6:3429:c8c with SMTP id d9443c01a7336-1f63700a5edmr33791765ad.30.1717184840211; Fri, 31 May 2024 12:47:20 -0700 (PDT) Received: from localhost.localdomain ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f63241d4d9sm20074135ad.296.2024.05.31.12.47.18 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 31 May 2024 12:47:19 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Fri, 31 May 2024 16:47:08 -0300 Message-ID: <20240531194708.6146-1-jamrial@gmail.com> X-Mailer: git-send-email 2.45.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] x86/float_dsp: add SSE2 and AVX versions of scalarproduct_double X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Signed-off-by: James Almer --- libavutil/x86/float_dsp.asm | 52 ++++++++++++++++++++++++++++++++++ libavutil/x86/float_dsp_init.c | 5 ++++ 2 files changed, 57 insertions(+) diff --git a/libavutil/x86/float_dsp.asm b/libavutil/x86/float_dsp.asm index e84ba52566..e9816cdf02 100644 --- a/libavutil/x86/float_dsp.asm +++ b/libavutil/x86/float_dsp.asm @@ -567,6 +567,58 @@ cglobal scalarproduct_float, 3,5,8, v1, v2, size, len, offset %endif RET +;--------------------------------------------------------------------------------- +; double scalarproduct_double(const double *v1, const double *v2, size_t len) +;--------------------------------------------------------------------------------- +%macro SCALARPRODUCT_DOUBLE 0 +cglobal scalarproduct_double, 3,3,8, v1, v2, offset + shl offsetq, 3 + add v1q, offsetq + add v2q, offsetq + neg offsetq + xorpd m0, m0 + xorpd m1, m1 + xorpd m2, m2 + xorpd m3, m3 +align 16 +.loop: + movapd m4, [v1q+offsetq+mmsize*0] + movapd m5, [v1q+offsetq+mmsize*1] + movapd m6, [v1q+offsetq+mmsize*2] + movapd m7, [v1q+offsetq+mmsize*3] + mulpd m4, [v2q+offsetq+mmsize*0] + mulpd m5, [v2q+offsetq+mmsize*1] + mulpd m6, [v2q+offsetq+mmsize*2] + mulpd m7, [v2q+offsetq+mmsize*3] + addpd m0, m4 + addpd m1, m5 + addpd m2, m6 + addpd m3, m7 + add offsetq, mmsize*4 + jl .loop + addpd m0, m1 + addpd m2, m3 + addpd m0, m2 +%if mmsize == 32 + vextractf128 xm1, m0, 1 + addpd xm0, xm1 +%endif + movhlps xm1, xm0 + addpd xm0, xm1 +%if ARCH_X86_64 == 0 + movsd r0m, xm0 + fld qword r0m +%endif + RET +%endmacro + +INIT_XMM sse2 +SCALARPRODUCT_DOUBLE +%if HAVE_AVX_EXTERNAL +INIT_YMM avx +SCALARPRODUCT_DOUBLE +%endif + ;----------------------------------------------------------------------------- ; void ff_butterflies_float(float *src0, float *src1, int len); ;----------------------------------------------------------------------------- diff --git a/libavutil/x86/float_dsp_init.c b/libavutil/x86/float_dsp_init.c index 093bce9b94..6cf0b4a277 100644 --- a/libavutil/x86/float_dsp_init.c +++ b/libavutil/x86/float_dsp_init.c @@ -73,6 +73,9 @@ void ff_vector_fmul_reverse_avx2(float *dst, const float *src0, float ff_scalarproduct_float_sse(const float *v1, const float *v2, int order); float ff_scalarproduct_float_fma3(const float *v1, const float *v2, int order); +double ff_scalarproduct_double_sse2(const double *v1, const double *v2, size_t order); +double ff_scalarproduct_double_avx(const double *v1, const double *v2, size_t order); + void ff_butterflies_float_sse(float *restrict src0, float *restrict src1, int len); av_cold void ff_float_dsp_init_x86(AVFloatDSPContext *fdsp) @@ -93,6 +96,7 @@ av_cold void ff_float_dsp_init_x86(AVFloatDSPContext *fdsp) fdsp->vector_dmul = ff_vector_dmul_sse2; fdsp->vector_dmac_scalar = ff_vector_dmac_scalar_sse2; fdsp->vector_dmul_scalar = ff_vector_dmul_scalar_sse2; + fdsp->scalarproduct_double = ff_scalarproduct_double_sse2; } if (EXTERNAL_AVX_FAST(cpu_flags)) { fdsp->vector_fmul = ff_vector_fmul_avx; @@ -102,6 +106,7 @@ av_cold void ff_float_dsp_init_x86(AVFloatDSPContext *fdsp) fdsp->vector_dmac_scalar = ff_vector_dmac_scalar_avx; fdsp->vector_fmul_add = ff_vector_fmul_add_avx; fdsp->vector_fmul_reverse = ff_vector_fmul_reverse_avx; + fdsp->scalarproduct_double = ff_scalarproduct_double_avx; } if (EXTERNAL_AVX2_FAST(cpu_flags)) { fdsp->vector_fmul_reverse = ff_vector_fmul_reverse_avx2; -- 2.45.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".