From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id EB9284DC80 for ; Mon, 2 Feb 2026 17:37:20 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'M7IrBuK164vtAALSrDx8P+Y7vI2XselbIzIDJh1moY4=', expected b'/S\r\n\tBBbz2hGY/2dKihmcAPrDFQTUTqWJa/cGR2q+7L7AE=')) header.d=163.com header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1770053830; h=to : date : message-id : mime-version : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=QBmN8av6k8n9AhwLXNdjr5QJ/u+DHZpukR3rnKQ75eI=; b=Ot/upv7QPwqm0N9J3kiytGvTJoI3kcxpdFNzege/QPuQrs1IFGgUPPq8OJwU1HzBXEt5p pR6WPR16N7kqKvTYEwKSIbrwTW4yBz8G7Jw4HQZTvEPWE+q6NEGmOF1XgcQfsnBDsASUjBh f3KudpGYWygBjrSiV5a0ENDe555FZoFvvsQQSPa7uivXZJOvjzus0eXscTewR7val0A/Ae/ WtNhyPbLgmIwV45LvFoGe/YqOVI/qItSjYepo5wmkLjLd8BnkUscK2tIbFPG4a6HxUympnU lD0TtZ8TwwMRJf40CSiwB/0FZ3jSqef4j6gzISQvBrRexENf1qKNOKOZguKw== Received: from [172.20.0.4] (unknown [172.20.0.4]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id EA677690F98; Mon, 2 Feb 2026 19:37:10 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1770053814; b=szRZOxkG+1xYsdx8QDe/p1N+UIhIrIRCMAFwgZlND7QpkNam4NIQKWe0Lr0MYz8B+OOfJ Bve4eqJ+jMNe706gcEbJIFu8WtQ7PDRHGOfzyZH4NZ0Ou7tNrFNAoR7HQ+tZJFlrfUg/SoC 5BTXPylR4ALw1lWMr682Af+pSAohoODqxwP2fquZRiI9VCrDt1arOXeM1cJMZyRzs8Yo9K5 xifotki2wNIhBzCZ8eqM0NHly6gI4cpssaVZOOMBp2jMlRvN0yRXYeCYQ5m/iJOooyvM9MK Mhl9vrkyD0tWaPhFKmpQogFJimt9srXlg7guWKVAYFRL8vdssIiOmaYTZ+wg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1770053814; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=M7IrBuK164vtAALSrDx8P+Y7vI2XselbIzIDJh1moY4=; b=WR6uPULOAyOYhqI8MCt1dbVklXZFAuABFHPyTMVZccMA1AB0iESi2sY42E5OUzPZMIFxI DQ+lyM+jvgJbL2ZGx9U/EMh7bkao58NL067fgbpa5v2Pp96UtDlNHoLQK5H52lE6ALpcCAX vbW8DNR3h5lIimTOJjEXkoQTkDiUf7IRmXX8gDZ5Oa64D9cCPKMn1vezQ/Q4bokTO5Iq1Sl vcPvFB7OzNwjHEJX+0tEgMAXRFaBbfPTk7oIK4dxokUX5BemM9Y+MoF1KOd0XGPXDiI6Kkm bVdxYWhGtHE2ry2rW985Hg7sU7B/reb/VQGYJyJ+1DouG/uN6CwBPs43tLLw== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=163.com; arc=none; dmarc=pass header.from=163.com policy.dmarc=none Authentication-Results: ffmpeg.org; dkim=pass header.d=163.com; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=163.com policy.dmarc=none Received: from m16.mail.163.com (m16.mail.163.com [220.197.31.4]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 48999690E76 for ; Mon, 2 Feb 2026 19:36:40 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=/S BBbz2hGY/2dKihmcAPrDFQTUTqWJa/cGR2q+7L7AE=; b=SoekUue3R6v2EGpb1Q RM9H52YBYQ7/+N/5v0k3G3oAmV1YJx6NSuPOTxr+gQ1PZTCXyaUHMAvz6ceMZXyd 8iAfdy1fzjuiR/bGUpp9+IO2lk+9G97XmDffKrD7Z58tD6M8Zr8aGbMC3LLSeK+B Q9E4/g4Y04612C6dRXO8m61zc= Received: from localhost.localdomain (unknown []) by gzga-smtp-mtada-g1-2 (Coremail) with SMTP id _____wCnL76c4IBpchfJIw--.1468S2; Tue, 03 Feb 2026 01:36:35 +0800 (CST) To: ffmpeg-devel@ffmpeg.org Date: Tue, 3 Feb 2026 01:35:57 +0800 Message-ID: <20260202173557.24-1-lpageo@163.com> X-Mailer: git-send-email 2.45.1.windows.1 MIME-Version: 1.0 X-CM-TRANSID: _____wCnL76c4IBpchfJIw--.1468S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxJF13Cw4rtFWUZF43GrWrGrg_yoW5KFWfp3 ZrtFW5CrsrJayFvFZIyryrZFWrtw4rCF1Fyr17WFW3JrW7JasrXrWxK34kCFWUGrZYvFy3 ZFn0gF1F9a4xW3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0JUj388UUUUU= X-Originating-IP: [120.230.111.200] X-CM-SenderInfo: 5osdwvrr6rljoofrz/xtbC5APa-GmA4KOwoQAA36 Message-ID-Hash: Q6KCAMGBEQUBRWKGYYDI6XGFGASVIJGY X-Message-ID-Hash: Q6KCAMGBEQUBRWKGYYDI6XGFGASVIJGY X-MailFrom: SRS0=ZG3v=AG=163.com=lpageo@ffmpeg.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH v2] avfilter/x86/vf_nlmeans: add AVX2 safe ssd integral image List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Andy Wu via ffmpeg-devel Cc: lpageo@163.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: Add an AVX2 implementation of compute_safe_ssd_integral_image used by vf_nlmeans. checkasm: vf_nlmeans (x86_64, Windows/MSVC) checkasm: vf_nlmeans (x86_64, Linux/WSL) bench: (x86_64, Windows/MSVC) ssd_integral_image 1.94x bench: (x86_64, Linux/WSL) ssd_integral_image 1.60x Signed-off-by: Andy Wu --- v2: wrap duplicated 8-pixel block in a macro v2: update bench numbers (Linux/WSL and Windows/MSVC) libavfilter/x86/vf_nlmeans.asm | 81 +++++++++++++++++++++++++++++++ libavfilter/x86/vf_nlmeans_init.c | 9 +++- 2 files changed, 89 insertions(+), 1 deletion(-) diff --git a/libavfilter/x86/vf_nlmeans.asm b/libavfilter/x86/vf_nlmeans.asm index 8f57801035..90cbdabe86 100644 --- a/libavfilter/x86/vf_nlmeans.asm +++ b/libavfilter/x86/vf_nlmeans.asm @@ -37,6 +37,87 @@ ending_lut: dd -1, -1, -1, -1, -1, -1, -1, -1,\ SECTION .text +%macro PROCESS_8_SSD_INTEGRAL 0 + pmovzxbd m0, [s1q + xq] + pmovzxbd m1, [s2q + xq] + psubd m0, m1 + pmulld m0, m0 + + movu m1, [dst_topq + xq*4] + movu m2, [dst_topq + xq*4 - 4] + psubd m1, m2 + paddd m0, m1 + + mova m5, m0 + pslldq m5, 4 + paddd m0, m5 + mova m5, m0 + pslldq m5, 8 + paddd m0, m5 + mova m5, m0 + pslldq m5, 16 + paddd m0, m5 + + vextracti128 xm5, m0, 0 + pshufd xm5, xm5, 0xff + pxor m4, m4 + vinserti128 m4, m4, xm5, 1 + paddd m0, m4 + + movd xm5, carryd + vpbroadcastd m4, xm5 + paddd m0, m4 + + movu [dstq + xq*4], m0 + + vextracti128 xm5, m0, 1 + pshufd xm5, xm5, 0xff + movd carryd, xm5 + + add xq, 8 +%endmacro + +; void ff_compute_safe_ssd_integral_image(uint32_t *dst, ptrdiff_t dst_linesize_32, +; const uint8_t *s1, ptrdiff_t linesize1, +; const uint8_t *s2, ptrdiff_t linesize2, +; int w, int h); +; +; Assumptions (see C version): +; - w is multiple of 16 and w >= 16 +; - h >= 1 +; - dst[-1] and dst_top[-1] are readable + +INIT_YMM avx2 +cglobal compute_safe_ssd_integral_image, 8, 14, 6, 0, dst, dst_lz, s1, ls1, s2, ls2, w, h, dst_top, dst_stride, x, carry, tmp + mov wd, dword wm + mov hd, dword hm + movsxd wq, wd + + mov dst_strideq, dst_lzq + shl dst_strideq, 2 + mov dst_topq, dstq + sub dst_topq, dst_strideq + +.yloop: + xor xq, xq + mov carryd, [dstq - 4] + +.xloop: + ; ---- process 8 pixels ---- + PROCESS_8_SSD_INTEGRAL + ; ---- process 8 pixels ---- + PROCESS_8_SSD_INTEGRAL + cmp xq, wq + jl .xloop + + add s1q, ls1q + add s2q, ls2q + add dstq, dst_strideq + add dst_topq, dst_strideq + dec hd + jg .yloop + RET + ; void ff_compute_weights_line(const uint32_t *const iia, ; const uint32_t *const iib, ; const uint32_t *const iid, diff --git a/libavfilter/x86/vf_nlmeans_init.c b/libavfilter/x86/vf_nlmeans_init.c index 0adb2c7e8a..5bfdc7e028 100644 --- a/libavfilter/x86/vf_nlmeans_init.c +++ b/libavfilter/x86/vf_nlmeans_init.c @@ -20,6 +20,11 @@ #include "libavutil/x86/cpu.h" #include "libavfilter/vf_nlmeans.h" +void ff_compute_safe_ssd_integral_image_avx2(uint32_t *dst, ptrdiff_t dst_linesize_32, + const uint8_t *s1, ptrdiff_t linesize1, + const uint8_t *s2, ptrdiff_t linesize2, + int w, int h); + void ff_compute_weights_line_avx2(const uint32_t *const iia, const uint32_t *const iib, const uint32_t *const iid, @@ -36,7 +41,9 @@ av_cold void ff_nlmeans_init_x86(NLMeansDSPContext *dsp) #if ARCH_X86_64 int cpu_flags = av_get_cpu_flags(); - if (EXTERNAL_AVX2_FAST(cpu_flags)) + if (EXTERNAL_AVX2_FAST(cpu_flags)) { + dsp->compute_safe_ssd_integral_image = ff_compute_safe_ssd_integral_image_avx2; dsp->compute_weights_line = ff_compute_weights_line_avx2; + } #endif } -- 2.45.1.windows.1 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org