From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 731904E5EA for ; Mon, 26 Jan 2026 01:28:15 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'BvK6zv9yvs1HMS8USd+BZ5B1rlZ7S/smD8FS94/s7/o=', expected b'JHOiWby137p8WXRmUbhUBrmSaFdb3xRow1pUeOcSbhE=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1769390874; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=BvK6zv9yvs1HMS8USd+BZ5B1rlZ7S/smD8FS94/s7/o=; b=EoHr4WvZsgiEI/1POGqczxY3IiMPcKBRYgyrNremG9eXi2n2+kCJplHwUOmdTJV6MUU3k tgddT0bN2L+FDy7KdT5TNtxXaj2QIAWTzRWcU2UR86TC3DzPwK2KJH/j+AQpb5DaK+xjYfm Po8lGY1fDQ8WLfYp4S7XF753TMi056WABDTJ63H3jSE0aV+AmlJ7yqElfntrZ2oNJegxKp2 56C7V6r1yIW7CkjtEzyLQFkjGeBnYpIb5R6hpPV5ouPdXFHX3dEal+R7i85+38E77XJc+sU x6Rq6oQJTMKMfmuP/eT5XOGvnBqGAka9XX0tM5GhLSpfK+KrZ9kXInXbHmpg== Received: from [172.20.0.4] (unknown [172.20.0.4]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id B2E17690E97; Mon, 26 Jan 2026 03:27:54 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1769390858; b=NuKNS+FGyZuL/35ACJ2m7Jdfl+jVxR+7ODUiKe50AcJRjhjTb+oxX46bfAWYwyLaTedTN 06kNTcInn+MvqUYeERDv6Rm/XiRzo1xvP94j9CKsiPfSJowVHGXbsnDXJ6DsurEOoXkod0X iDHoqX5VAau4ql7lm4uJkmkie/iQJ5kTPzGMyXwaVaEi9INlYeSAPq3S9LAHUpDTb2PHLKv jQZkiT3yqtL7t2jPCWwwn/vnAbR/aoi1Wos+GUntzYvboGUNLuQ4v7oCfkkaQTW40xZVXQJ hUpyNbZ6HTL+ErOTXA3avL71r13oexTEEaiPi0PaXzJmT2CEpSg3JpJEw8Sw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1769390858; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=+0k7nqCabFvoRpvaCKbkxGhpvdgEqVV2j2c50v1XHAg=; b=Y5koWu0E5VqAZvkDjKYhMJMUQQdPdp5tQwmr2/9SPMz6owAD8rpTZdvZwi7Xvcjp4HIS/ /8lyHJ3bqgNbDQjxdx0k9G74aykafHFqg9jBymuPkfMhuiKZY8++owxILkL/orFJ7GTYySF blUi43EEZzkiei0dHu0b233Bbq49rWa+6NUhS1kjEEc0xB4OtAa60Ylp/yTDw6KxIxcMt+K e+ioNJ0/8miWo/D+kS5f8uuGSUN5TQgD/Nxt+PU6BtFvSDPYQ3v+9GRWFcWpnFyc/INx9R0 lEwTabDVHbFcnFmLPNTLPV3bZd0/Cd7hsnMXPPCnrqRr5//U+aPyqAqmPg1w== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1769390850; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=JHOiWby137p8WXRmUbhUBrmSaFdb3xRow1pUeOcSbhE=; b=5Jrm/jkCrmFflqjWZa2VKXZhZrJY5P0F6o6eq8RyBlJfQBrJYda9yA2LWyCcG2DX2MqCo k0kguu6aV48vkKQZrI6m9SsmdWOcyQV2CvCS5aSo1Zo03Z71f+SRayZNsKFE04I9DQCKNam OMcXA0/JMOICSuQY8cQaTaRjjQSvRDKcEz8CeNoGqNYwft9nNB/eToEezl9zvQiV4bJW/sF y8leTGbHakgohHE+7d/+xv9HGNy6x/1DqfjWHNjZAJ+5/+P5UG3354Aog19zRZHr38gmxpa xpa9wjm3sH/T68VoZEAlC8YosMxss63WW5gtJP+HSaF215PfEghcLLWg257w== Received: from 69dab402ede7 (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id B2D3868BE76 for ; Mon, 26 Jan 2026 03:27:30 +0200 (EET) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Mon, 26 Jan 2026 01:27:30 -0000 Message-ID: <176939085085.25.8307423681209595508@4457048688e7> Message-ID-Hash: Y7YF7DKCPMXL67TN7B47IPUGDQ7U7HB7 X-Message-ID-Hash: Y7YF7DKCPMXL67TN7B47IPUGDQ7U7HB7 X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PR] avcodec/x86/hevc/dequant: Add SSSE3 dequant ASM function (PR #21579) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: mkver via ffmpeg-devel Cc: mkver Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #21579 opened by mkver URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21579 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21579.patch >>From 86e553bdda774c17c30b87192b198eddae9dd2ef Mon Sep 17 00:00:00 2001 From: Andreas Rheinhardt Date: Sun, 25 Jan 2026 23:23:36 +0100 Subject: [PATCH 1/4] avcodec/hevc/dsp_template: Optimize impossible branches away Saves 1856B of .text here. Signed-off-by: Andreas Rheinhardt --- libavcodec/hevc/dsp_template.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/hevc/dsp_template.c b/libavcodec/hevc/dsp_template.c index 573cf9ee1e..f703f6d071 100644 --- a/libavcodec/hevc/dsp_template.c +++ b/libavcodec/hevc/dsp_template.c @@ -132,7 +132,7 @@ static void FUNC(dequant)(int16_t *coeffs, int16_t log2_size) int x, y; int size = 1 << log2_size; - if (shift > 0) { + if (BIT_DEPTH <= 9 || shift > 0) { int offset = 1 << (shift - 1); for (y = 0; y < size; y++) { for (x = 0; x < size; x++) { @@ -140,7 +140,7 @@ static void FUNC(dequant)(int16_t *coeffs, int16_t log2_size) coeffs++; } } - } else if (shift < 0) { + } else if (BIT_DEPTH > 10 && shift < 0) { for (y = 0; y < size; y++) { for (x = 0; x < size; x++) { *coeffs = *(uint16_t*)coeffs << -shift; -- 2.52.0 >>From 2e5ae4f840dea1a8cd3c2907d5a007616e7ed27b Mon Sep 17 00:00:00 2001 From: Andreas Rheinhardt Date: Sun, 25 Jan 2026 23:32:14 +0100 Subject: [PATCH 2/4] avcodec/hevc/dsp: Add alignment for dequant Signed-off-by: Andreas Rheinhardt --- libavcodec/hevc/dsp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/hevc/dsp.h b/libavcodec/hevc/dsp.h index a63586c3a2..b884cd36be 100644 --- a/libavcodec/hevc/dsp.h +++ b/libavcodec/hevc/dsp.h @@ -50,7 +50,7 @@ typedef struct HEVCDSPContext { void (*add_residual[4])(uint8_t *dst, const int16_t *res, ptrdiff_t stride); - void (*dequant)(int16_t *coeffs, int16_t log2_size); + void (*dequant)(int16_t *coeffs /* align 32 */, int16_t log2_size); void (*transform_rdpcm)(int16_t *coeffs, int16_t log2_size, int mode); -- 2.52.0 >>From 5edc6a6274f1592c3d2de62f9782f4e3b93d1842 Mon Sep 17 00:00:00 2001 From: Andreas Rheinhardt Date: Mon, 26 Jan 2026 02:03:32 +0100 Subject: [PATCH 3/4] avcodec/x86/hevc/dequant: Add SSSE3 dequant ASM function hevc_dequant_4x4_8_c (GCC): 20.2 ( 1.00x) hevc_dequant_4x4_8_c (Clang): 21.7 ( 1.00x) hevc_dequant_4x4_8_ssse3: 5.8 ( 3.51x) hevc_dequant_8x8_8_c (GCC): 32.9 ( 1.00x) hevc_dequant_8x8_8_c (Clang): 78.7 ( 1.00x) hevc_dequant_8x8_8_ssse3: 6.8 ( 4.83x) hevc_dequant_16x16_8_c (GCC): 105.1 ( 1.00x) hevc_dequant_16x16_8_c (Clang): 151.1 ( 1.00x) hevc_dequant_16x16_8_ssse3: 19.3 ( 5.45x) hevc_dequant_32x32_8_c (GCC): 415.7 ( 1.00x) hevc_dequant_32x32_8_c (Clang): 602.3 ( 1.00x) hevc_dequant_32x32_8_ssse3: 78.2 ( 5.32x) Signed-off-by: Andreas Rheinhardt --- libavcodec/x86/hevc/Makefile | 1 + libavcodec/x86/hevc/dequant.asm | 60 +++++++++++++++++++++++++++++++++ libavcodec/x86/hevc/dsp_init.c | 3 ++ 3 files changed, 64 insertions(+) create mode 100644 libavcodec/x86/hevc/dequant.asm diff --git a/libavcodec/x86/hevc/Makefile b/libavcodec/x86/hevc/Makefile index 74418a322c..d09c613a19 100644 --- a/libavcodec/x86/hevc/Makefile +++ b/libavcodec/x86/hevc/Makefile @@ -4,6 +4,7 @@ clean:: X86ASM-OBJS-$(CONFIG_HEVC_DECODER) += x86/hevc/dsp_init.o \ x86/hevc/add_res.o \ x86/hevc/deblock.o \ + x86/hevc/dequant.o \ x86/hevc/idct.o \ x86/hevc/mc.o \ x86/hevc/sao.o \ diff --git a/libavcodec/x86/hevc/dequant.asm b/libavcodec/x86/hevc/dequant.asm new file mode 100644 index 0000000000..f0453c940b --- /dev/null +++ b/libavcodec/x86/hevc/dequant.asm @@ -0,0 +1,60 @@ +;***************************************************************************** +;* SSSE3-optimized HEVC dequant code +;***************************************************************************** +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation; either +;* version 2.1 of the License, or (at your option) any later version. +;* +;* FFmpeg is distributed in the hope that it will be useful, +;* but WITHOUT ANY WARRANTY; without even the implied warranty of +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;* Lesser General Public License for more details. +;* +;* You should have received a copy of the GNU Lesser General Public +;* License along with FFmpeg; if not, write to the Free Software +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +;****************************************************************************** + +%include "libavutil/x86/x86util.asm" + +SECTION .text + +INIT_XMM ssse3 +; void ff_hevc_dequant_8_ssse3(int16_t *coeffs, int16_t log2_size) +cglobal hevc_dequant_8, 2, 3+UNIX64, 3 + +; coeffs, log2_size (in ecx), tmp/size +%if WIN64 + DECLARE_REG_TMP 1,0,2 + ; r0 is the shift register (ecx) on win64 + xchg r0, r1 +%elif ARCH_X86_64 + DECLARE_REG_TMP 0,3,1 + ; r3 is ecx + mov t1d, r1d +%else + ; r1 is ecx + DECLARE_REG_TMP 0,1,2 +%endif + + mov t2d, 256 + shl t2d, t1b + movd m0, t2d + add t1d, t1d + SPLATW m0, m0 + mov t2d, 1 + shl t2d, t1b +.loop: + mova m1, [t0] + mova m2, [t0+mmsize] + pmulhrsw m1, m0 + pmulhrsw m2, m0 + mova [t0], m1 + mova [t0+mmsize], m2 + add t0, 2*mmsize + sub t2d, mmsize + jg .loop + RET diff --git a/libavcodec/x86/hevc/dsp_init.c b/libavcodec/x86/hevc/dsp_init.c index 5b2b10f33a..bd967eac67 100644 --- a/libavcodec/x86/hevc/dsp_init.c +++ b/libavcodec/x86/hevc/dsp_init.c @@ -30,6 +30,8 @@ #include "libavcodec/x86/hevc/dsp.h" #include "libavcodec/x86/h26x/h2656dsp.h" +void ff_hevc_dequant_8_ssse3(int16_t *coeffs, int16_t log2_size); + #define LFC_FUNC(DIR, DEPTH, OPT) \ void ff_hevc_ ## DIR ## _loop_filter_chroma_ ## DEPTH ## _ ## OPT(uint8_t *pix, ptrdiff_t stride, const int *tc, const uint8_t *no_p, const uint8_t *no_q); @@ -847,6 +849,7 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth) c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_8_ssse3; c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_8_ssse3; #endif + c->dequant = ff_hevc_dequant_8_ssse3; SAO_EDGE_INIT(8, ssse3); } #if HAVE_SSE4_EXTERNAL && ARCH_X86_64 -- 2.52.0 >>From 3fbdf06a6d681a86578bca2812fd052c639f35f9 Mon Sep 17 00:00:00 2001 From: Andreas Rheinhardt Date: Mon, 26 Jan 2026 02:16:47 +0100 Subject: [PATCH 4/4] tests/checkasm/hevc_dequant: Only init buffer when needed Signed-off-by: Andreas Rheinhardt --- tests/checkasm/hevc_dequant.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/checkasm/hevc_dequant.c b/tests/checkasm/hevc_dequant.c index 20e322994a..5036662666 100644 --- a/tests/checkasm/hevc_dequant.c +++ b/tests/checkasm/hevc_dequant.c @@ -48,11 +48,11 @@ static void check_dequant(HEVCDSPContext *h, int bit_depth) int size = block_size * block_size; declare_func(void, int16_t *coeffs, int16_t log2_size); - randomize_buffers(coeffs0, size); - memcpy(coeffs1, coeffs0, sizeof(*coeffs0) * size); - if (check_func(h->dequant, "hevc_dequant_%dx%d_%d", block_size, block_size, bit_depth)) { + randomize_buffers(coeffs0, size); + memcpy(coeffs1, coeffs0, sizeof(*coeffs0) * size); + call_ref(coeffs0, i); call_new(coeffs1, i); if (memcmp(coeffs0, coeffs1, sizeof(*coeffs0) * size)) -- 2.52.0 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org