From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 995084E085 for ; Mon, 12 Jan 2026 05:54:27 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'iIjbfCXmNqkuGASDtHvRd1IPC2xj824iuGRnNnsWY2Q=', expected b'8i\r\n\tfuzqn7SVU/bt8tS3vaf7e9YoAaMYBRT2RkERbeiVw=')) header.d=163.com header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1768197152; h=to : date : message-id : mime-version : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=dZzg9O8F16y+Lp31zTOOSZ/+jZgATg2YuPD1EI0Pa00=; b=H9k+qjG7ZDPO6Li06PiAWzBC1DT4SwVWl1+mo48dEO9Zq/KEYwp4sIfnJU0h6NKtzxdw5 br5p5ISHQB3l92LnX09x+IdrE+D3aBVMeBqjGskGPgDGI5bel5FXNYe/H5n1qkVqfg7VddG K/U+o1lLFRbub5mXHsnXWALhVZ4Za+8YlUoK0Neehulkyd/OTFp7jPV2shfsRqMUN9g9KP8 ankVax8bKCYT7S9py3BmSnv6MiHJtS7Vn+JT6bjEGynaDj6YG/nV/O2hM4BsaRozRZeKxKb mpuJBojgcDV/EDdh7XN+1zaKFhXorEbDBZJZz7gT1oFMKtAjhjaKHcOHJrlQ== Received: from [172.20.0.4] (unknown [172.20.0.4]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id EE8BB690DFA; Mon, 12 Jan 2026 07:52:31 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1768197110; b=QYf1TiM9ogv8YD9+SlQEpfr+YfpSd3PTeq/8jODbYA51tAerDCFb71geApkrlTR7qCEwt Z7919NNZsuYn2EUaNpEZ9F3ezHXmZO0PnyilK9NpXY08TAFRHvC7KyAxIUqoUsvrTD7bv1j K11nxIxiqD3HiWFYdBXQPROKZO8eDfEKzTmA9SqYv6gNCozZPjTLrinRuXbHl8am3OudYzI DkTjnRyIDx5xkKF+rF/L5Mo+2hKy3NAWHSHCDafEU9YLP/SAiyDtxXtBfWAWDGE0rO876yd z18kwxs9pq4n+uRb0/DDBmICgTFG8dd43AyuupvDflnc86kjtuZFt1TTT/7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1768197110; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=iIjbfCXmNqkuGASDtHvRd1IPC2xj824iuGRnNnsWY2Q=; b=OsCKrfKi6GGJ+Y9f863zDAxNyhQBBMWmrrzsCSYZs1LbhG4GutyKMCnBr/6dwpXoiCdJa 9tDvXyc2kyaxweAeqmsltBLzw6r2ktUT1O3GJY3FoQVtvf74LX2I0aS2rMzsU+CVycxgPq0 mxKdTrH4NdAdYgc3Dv8ijW7NdY9QrXRcXEsCAyO1cqjEZQjLwnW2R3fhPic+OuJOqf+5CZQ FvUZUKj0Faec6cPBAOXpwJlMIjtBiKAYkA8C0g2CmH6GcB3GZTjXQHHTLjuwed8OpL6dEh8 S1dxYQf4S3k6CzfV0NQJy3GfImx3Fe6EcBJOxZwMWLf7+A68gkA5W35MNjyA== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=163.com; arc=none; dmarc=pass header.from=163.com policy.dmarc=none Authentication-Results: ffmpeg.org; dkim=pass header.d=163.com; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=163.com policy.dmarc=none Received: from m16.mail.163.com (m16.mail.163.com [220.197.31.2]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 6303168FD8A for ; Mon, 12 Jan 2026 07:51:29 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=8i fuzqn7SVU/bt8tS3vaf7e9YoAaMYBRT2RkERbeiVw=; b=DDuCnojuWdoalaw5NZ WOMVFyUT34U/fe1MLNLZQIioHyoC4IaEUg4z2I3Ilsnukt4/dIu/MMGYuuOwca2D m+316jWvtJTdxEu9lm2ONOY8D87q3clfnx8iTG/1ZfjrkrQgmEYUvCU3Iv4PO0+9 GAjP99CRMuNs+O+nhZPs4LVoc= Received: from DESKTOP-MV4TM6D.localdomain (unknown []) by gzsmtp3 (Coremail) with SMTP id PigvCgDXp9jRi2RpXICLLA--.24S2; Mon, 12 Jan 2026 13:51:22 +0800 (CST) To: ffmpeg-devel@ffmpeg.org Date: Mon, 12 Jan 2026 13:51:05 +0800 Message-ID: <20260112055109.282-1-lpageo@163.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-CM-TRANSID: PigvCgDXp9jRi2RpXICLLA--.24S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxXr1fWw45uF43KFyrCryUJrb_yoWrGw17pa nrtFZ8CrsrXaySvFZFvr1rZFyrtr4fCr1Fyr17WFW7Ar45JasrXFyxK3ykAF17CrsYvF13 XFn0qa1F93W7t3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0JUU5r7UUUUU= X-Originating-IP: [103.172.183.81] X-CM-SenderInfo: 5osdwvrr6rljoofrz/xtbDAByNr2lki9zGEQAA3z Message-ID-Hash: SSYGR65WQTBC4BA24VLGKU5RNWYEULH7 X-Message-ID-Hash: SSYGR65WQTBC4BA24VLGKU5RNWYEULH7 X-MailFrom: SRS0=OwG0=7R=163.com=lpageo@ffmpeg.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH] avfilter/x86/vf_nlmeans: add AVX2 safe ssd integral image List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Andy Wu via ffmpeg-devel Cc: Andy Wu Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: Add an AVX2 implementation of compute_safe_ssd_integral_image used by vf_nlmeans. checkasm: vf_nlmeans bench: (x86_64, Linux) ssd_integral_image 1.93x bench: (x86_64, Windows/MSVC) ssd_integral_image 1.71x Signed-off-by: Andy Wu --- libavfilter/x86/vf_nlmeans.asm | 114 ++++++++++++++++++++++++++++++ libavfilter/x86/vf_nlmeans_init.c | 9 ++- 2 files changed, 122 insertions(+), 1 deletion(-) diff --git a/libavfilter/x86/vf_nlmeans.asm b/libavfilter/x86/vf_nlmeans.asm index 8f57801035..c61593b916 100644 --- a/libavfilter/x86/vf_nlmeans.asm +++ b/libavfilter/x86/vf_nlmeans.asm @@ -37,6 +37,120 @@ ending_lut: dd -1, -1, -1, -1, -1, -1, -1, -1,\ SECTION .text +; void ff_compute_safe_ssd_integral_image(uint32_t *dst, ptrdiff_t dst_linesize_32, +; const uint8_t *s1, ptrdiff_t linesize1, +; const uint8_t *s2, ptrdiff_t linesize2, +; int w, int h); +; +; Assumptions (see C version): +; - w is multiple of 16 and w >= 16 +; - h >= 1 +; - dst[-1] and dst_top[-1] are readable + +INIT_YMM avx2 +cglobal compute_safe_ssd_integral_image, 8, 14, 6, 0, dst, dst_lz, s1, ls1, s2, ls2, w, h, dst_top, dst_stride, x, carry, tmp + mov wd, dword wm + mov hd, dword hm + movsxd wq, wd + + mov dst_strideq, dst_lzq + shl dst_strideq, 2 + mov dst_topq, dstq + sub dst_topq, dst_strideq + +.yloop: + xor xq, xq + mov carryd, [dstq - 4] + +.xloop: + ; ---- process 8 pixels ---- + pmovzxbd m0, [s1q + xq] + pmovzxbd m1, [s2q + xq] + psubd m0, m1 + pmulld m0, m0 + + movu m1, [dst_topq + xq*4] + movu m2, [dst_topq + xq*4 - 4] + psubd m1, m2 + paddd m0, m1 + + mova m5, m0 + pslldq m5, 4 + paddd m0, m5 + mova m5, m0 + pslldq m5, 8 + paddd m0, m5 + mova m5, m0 + pslldq m5, 16 + paddd m0, m5 + + vextracti128 xm5, m0, 0 + pshufd xm5, xm5, 0xff + pxor m4, m4 + vinserti128 m4, m4, xm5, 1 + paddd m0, m4 + + movd xm5, carryd + vpbroadcastd m4, xm5 + paddd m0, m4 + + movu [dstq + xq*4], m0 + + vextracti128 xm5, m0, 1 + pshufd xm5, xm5, 0xff + movd carryd, xm5 + + add xq, 8 + + ; ---- process 8 pixels ---- + pmovzxbd m0, [s1q + xq] + pmovzxbd m1, [s2q + xq] + psubd m0, m1 + pmulld m0, m0 + + movu m1, [dst_topq + xq*4] + movu m2, [dst_topq + xq*4 - 4] + psubd m1, m2 + paddd m0, m1 + + mova m5, m0 + pslldq m5, 4 + paddd m0, m5 + mova m5, m0 + pslldq m5, 8 + paddd m0, m5 + mova m5, m0 + pslldq m5, 16 + paddd m0, m5 + + vextracti128 xm5, m0, 0 + pshufd xm5, xm5, 0xff + pxor m4, m4 + vinserti128 m4, m4, xm5, 1 + paddd m0, m4 + + movd xm5, carryd + vpbroadcastd m4, xm5 + paddd m0, m4 + + movu [dstq + xq*4], m0 + + vextracti128 xm5, m0, 1 + pshufd xm5, xm5, 0xff + movd carryd, xm5 + + add xq, 8 + cmp xq, wq + jl .xloop + + add s1q, ls1q + add s2q, ls2q + add dstq, dst_strideq + add dst_topq, dst_strideq + dec hd + jg .yloop + RET + ; void ff_compute_weights_line(const uint32_t *const iia, ; const uint32_t *const iib, ; const uint32_t *const iid, diff --git a/libavfilter/x86/vf_nlmeans_init.c b/libavfilter/x86/vf_nlmeans_init.c index 0adb2c7e8a..5bfdc7e028 100644 --- a/libavfilter/x86/vf_nlmeans_init.c +++ b/libavfilter/x86/vf_nlmeans_init.c @@ -20,6 +20,11 @@ #include "libavutil/x86/cpu.h" #include "libavfilter/vf_nlmeans.h" +void ff_compute_safe_ssd_integral_image_avx2(uint32_t *dst, ptrdiff_t dst_linesize_32, + const uint8_t *s1, ptrdiff_t linesize1, + const uint8_t *s2, ptrdiff_t linesize2, + int w, int h); + void ff_compute_weights_line_avx2(const uint32_t *const iia, const uint32_t *const iib, const uint32_t *const iid, @@ -36,7 +41,9 @@ av_cold void ff_nlmeans_init_x86(NLMeansDSPContext *dsp) #if ARCH_X86_64 int cpu_flags = av_get_cpu_flags(); - if (EXTERNAL_AVX2_FAST(cpu_flags)) + if (EXTERNAL_AVX2_FAST(cpu_flags)) { + dsp->compute_safe_ssd_integral_image = ff_compute_safe_ssd_integral_image_avx2; dsp->compute_weights_line = ff_compute_weights_line_avx2; + } #endif } -- 2.43.0 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org