From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 1A9134F42B for ; Wed, 25 Feb 2026 17:07:17 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'i5O33qPMRN01oGhRMfK8QCQr2I1FA8nHveuig+XZYyA=', expected b'BO1+6jH0lqFBI3cBuzwc+2rD2/pqxbE7fqS6vhORj5M=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1772039196; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=i5O33qPMRN01oGhRMfK8QCQr2I1FA8nHveuig+XZYyA=; b=dKpBj9d1Ga54giZ9/MYBINQSn0ipODF0gSteL0KMPL/UebYdjZHJMATFaVEk2bDMxpd4j 3T3C3R/FMx8ycEy/UElL6fk11keavbvqsKDSXcLTSabZywXCgjRPRMxSMnhpxeSugte02s9 QnbaLs4EI6XvlzH+Bfhk7x/tH5LQygDLpviBL0ZWVCXch69qaVYZ15ckfMjT6Po1mW0zXw5 eE8XiAsGYhmK/bARCtcXDu+ByH0kp7Z002MwH39sEvQRv5g8sfl44LRFuK6HTbwIdI0D74R xu08WVKUQpyrK3iJlFhml9WtfrwnkbMmgKz+ZaRX+FXJCDX2Dztn6tjEpAeA== Received: from [172.18.0.3] (unknown [172.18.0.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 03B88691CA6; Wed, 25 Feb 2026 19:06:36 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1772039184; b=K9PbdBs+hi4KW37QyGucv410NEAu7F+JcS6eCbXyLX434TNyWUPLokJWKFJL5jSsT5cmA wYu1V52hlzdONC0IW0Ihc2fHEYd2VA4qH0mePTXRXHh2IidLYVkT8xT4x2Vr2s4D76Zev3k 4sUdXLOg26lpbJqJ1iUxgc52hcce+xgljGDBH7sV+8KEmARX2OKLbvC7L5fW1EvCnFkG2fB RKfrxgf9Oc0hI86T/kqdzAbdnWRGBP6SUltih2Nb/lYQG5h2jiqVwUSelVxUFIpIFdDoluN lAubP0tG4NN0PX5WoWGWyIBU1RAGjvwAlW6lZsjyXafyPWCprXaOCxm2JUnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1772039184; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=JnLBC2g9ikccvrHHzWI8RZEHNEpmlHYyLblNUUpc2uo=; b=FGQ6no3CDVixloGeVB43vBmRai9Ea39oRjFXbT6pW/m3Y8EuRi+nnNf9Kvwu5Ut6dnP+N I7LHAWcYLtDmgHoc3ze84LyepgEnV50JxZZvr57nJwHVA64GAO8bSGMsoVOzuD8HY69sqiK t8qfjOWttFOLyRY9uD35g3BgJrUEemIEqYNz/etIPV/Xih/yklyjgVrgvzU7wNnRYJ15uUa +xuKpmxxiZGkHwHcSMiSGGrp81Z211zhgy30vCyGGRI3HISALlsFy3rYHMhJf2pXqhWYN4n lqIJXT92wWJJrUKUuLfCL/QT1wUfihIUlialf26SEIKC3iLPoOzinTwyBX9A== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1772039175; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=BO1+6jH0lqFBI3cBuzwc+2rD2/pqxbE7fqS6vhORj5M=; b=Z9RI2E1iF5Barc17ifacs6uaUwlgpzN19nUP+lZ1PEbLDxEdWBYuRGtPgFJl4Lxih4AwN a0k+6740QcLSqoslBKaAvPYAeTEf0433Hk8RIoq5bzlm6QCHaZhXezeACcRAtS9JS/4XVlH /FHv/YZKn2fk4vyas+5kOmBF1m13c2wKMXwrTMqOCdN9E5IrQofwgK4h42KVFC/kq0ANQ+h 7Z1CXg8tp9sNUMACG1AT1RUzjesysTQgz3TNvQkeRLDMPmohEmkf5J/IIdK0SHNfrp0pDkc IMNx6yFbZicYKRCPNKrTqmfzQLxlZ+l+ebz6/48/PSl9Jp1MWY8eG7Va0T9A== MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Wed, 25 Feb 2026 17:06:14 -0000 Message-ID: <177203917565.25.17597080578994453438@29965ddac10e> Message-ID-Hash: BAG26VUDB6C2ALSCRJU2RPXDPEIHDR2G X-Message-ID-Hash: BAG26VUDB6C2ALSCRJU2RPXDPEIHDR2G X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PR] swscale/ops: add ability to exclude planes from SWS_OP_DITHER (PR #22283) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Niklas Haas via ffmpeg-devel Cc: Niklas Haas Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #22283 opened by Niklas Haas (haasn) URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/22283 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/22283.patch Helps optimize in some future use cases, also IMHO simplifies the x86 implementation. >>From b89dca6d76987a563f1a9db895c72780c99969e9 Mon Sep 17 00:00:00 2001 From: Niklas Haas Date: Wed, 25 Feb 2026 16:46:04 +0100 Subject: [PATCH 1/6] swscale/ops: allow excluding components from SWS_OP_DITHER We often need to dither only a subset of the components. Previously this was not possible, but we can just use the special value -1 for this. The main motivating factor is actually the fact that "unnecessary" dither ops would otherwise frequently prevent plane splitting, since e.g. a copied alpha plane has to come along for the ride through the whole F32/dither pipeline. Additionally, it somewhat simplifies implementations. Signed-off-by: Niklas Haas --- libswscale/format.c | 4 +++- libswscale/ops.c | 6 ++++-- libswscale/ops.h | 2 +- libswscale/ops_optimizer.c | 5 +++-- 4 files changed, 11 insertions(+), 6 deletions(-) diff --git a/libswscale/format.c b/libswscale/format.c index e0434a2024..d53acdbcdc 100644 --- a/libswscale/format.c +++ b/libswscale/format.c @@ -1229,8 +1229,10 @@ static int fmt_dither(SwsContext *ctx, SwsOpList *ops, /* Brute-forced offsets; minimizes quantization error across a 16x16 * bayer dither pattern for standard RGBA and YUVA pixel formats */ const int offsets_16x16[4] = {0, 3, 2, 5}; - for (int i = 0; i < 4; i++) + for (int i = 0; i < 4; i++) { + av_assert0(offsets_16x16[i] <= INT8_MAX); dither.y_offset[i] = offsets_16x16[i]; + } if (src.desc->nb_components < 3 && bpc >= 8) { /** diff --git a/libswscale/ops.c b/libswscale/ops.c index 2889e95d12..311a17fe43 100644 --- a/libswscale/ops.c +++ b/libswscale/ops.c @@ -181,8 +181,10 @@ void ff_sws_apply_op_q(const SwsOp *op, AVRational x[4]) return; case SWS_OP_DITHER: av_assert1(!ff_sws_pixel_type_is_int(op->type)); - for (int i = 0; i < 4; i++) - x[i] = x[i].den ? av_add_q(x[i], av_make_q(1, 2)) : x[i]; + for (int i = 0; i < 4; i++) { + if (op->dither.y_offset[i] >= 0 && x[i].den) + x[i] = av_add_q(x[i], av_make_q(1, 2)); + } return; case SWS_OP_MIN: for (int i = 0; i < 4; i++) diff --git a/libswscale/ops.h b/libswscale/ops.h index 7b79fdc69d..d1576b9325 100644 --- a/libswscale/ops.h +++ b/libswscale/ops.h @@ -139,7 +139,7 @@ typedef struct SwsConvertOp { typedef struct SwsDitherOp { AVRational *matrix; /* tightly packed dither matrix (refstruct) */ int size_log2; /* size (in bits) of the dither matrix */ - uint8_t y_offset[4]; /* row offset for each component */ + int8_t y_offset[4]; /* row offset for each component, or -1 for ignored */ } SwsDitherOp; typedef struct SwsLinearOp { diff --git a/libswscale/ops_optimizer.c b/libswscale/ops_optimizer.c index 45ad3d4490..203dff1ac2 100644 --- a/libswscale/ops_optimizer.c +++ b/libswscale/ops_optimizer.c @@ -518,8 +518,9 @@ retry: case SWS_OP_DITHER: for (int i = 0; i < 4; i++) { - noop &= (prev->comps.flags[i] & SWS_COMP_EXACT) || - next->comps.unused[i]; + if (next->comps.unused[i] || op->dither.y_offset[i] < 0) + continue; + noop &= !!(prev->comps.flags[i] & SWS_COMP_EXACT); } if (noop) { -- 2.52.0 >>From 357667a25e35ef43beb84bc6cb4bd13ade19edb2 Mon Sep 17 00:00:00 2001 From: Niklas Haas Date: Wed, 25 Feb 2026 16:52:09 +0100 Subject: [PATCH 2/6] swscale/ops_backend: implement support for optional dither indices If you place the branch inside the loop, gcc at least reverts back to scalar code, so better to just split up and guard the entire loop. Signed-off-by: Niklas Haas --- libswscale/ops_chain.h | 1 + libswscale/ops_tmpl_float.c | 25 +++++++++++++------------ 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/libswscale/ops_chain.h b/libswscale/ops_chain.h index 2f5a31793e..ec55e87998 100644 --- a/libswscale/ops_chain.h +++ b/libswscale/ops_chain.h @@ -44,6 +44,7 @@ typedef union SwsOpPriv { /* Common types */ void *ptr; + int8_t i8[16]; uint8_t u8[16]; uint16_t u16[8]; uint32_t u32[4]; diff --git a/libswscale/ops_tmpl_float.c b/libswscale/ops_tmpl_float.c index 10749d5f7d..2f1d249168 100644 --- a/libswscale/ops_tmpl_float.c +++ b/libswscale/ops_tmpl_float.c @@ -57,7 +57,7 @@ DECL_SETUP(setup_dither) return AVERROR(ENOMEM); static_assert(sizeof(out->ptr) <= sizeof(uint8_t[8]), ">8 byte pointers not supported"); - uint8_t *offset = &out->u8[8]; + int8_t *offset = &out->i8[8]; for (int i = 0; i < 4; i++) offset[i] = op->dither.y_offset[i]; @@ -74,25 +74,26 @@ DECL_SETUP(setup_dither) DECL_FUNC(dither, const int size_log2) { const pixel_t *restrict matrix = impl->priv.ptr; - const uint8_t *offset = &impl->priv.u8[8]; + const int8_t *restrict offset = &impl->priv.i8[8]; const int mask = (1 << size_log2) - 1; const int y_line = iter->y; - const int row0 = (y_line + offset[0]) & mask; - const int row1 = (y_line + offset[1]) & mask; - const int row2 = (y_line + offset[2]) & mask; - const int row3 = (y_line + offset[3]) & mask; const int size = 1 << size_log2; const int width = FFMAX(size, SWS_BLOCK_SIZE); const int base = iter->x & ~(SWS_BLOCK_SIZE - 1) & (size - 1); - SWS_LOOP - for (int i = 0; i < SWS_BLOCK_SIZE; i++) { - x[i] += size_log2 ? matrix[row0 * width + base + i] : (pixel_t) 0.5; - y[i] += size_log2 ? matrix[row1 * width + base + i] : (pixel_t) 0.5; - z[i] += size_log2 ? matrix[row2 * width + base + i] : (pixel_t) 0.5; - w[i] += size_log2 ? matrix[row3 * width + base + i] : (pixel_t) 0.5; +#define DITHER_COMP(VAR, IDX) \ + if (offset[IDX] >= 0) { \ + const int row = (y_line + offset[IDX]) & mask; \ + SWS_LOOP \ + for (int i = 0; i < SWS_BLOCK_SIZE; i++) \ + VAR[i] += size_log2 ? matrix[row * width + base + i] : (pixel_t) 0.5; \ } + DITHER_COMP(x, 0) + DITHER_COMP(y, 1) + DITHER_COMP(z, 2) + DITHER_COMP(w, 3) + CONTINUE(block_t, x, y, z, w); } -- 2.52.0 >>From 9616c55a96868f9c5f99e0582b00c87f5f63e987 Mon Sep 17 00:00:00 2001 From: Niklas Haas Date: Wed, 25 Feb 2026 15:08:34 +0100 Subject: [PATCH 3/6] swscale/x86/ops: split off dither0 special case I want to rewrite the dither kernel a bit, and this special case is a bit too annoying and gets in the way. Signed-off-by: Niklas Haas --- libswscale/x86/ops_float.asm | 45 +++++++++++++++++++++--------------- 1 file changed, 26 insertions(+), 19 deletions(-) diff --git a/libswscale/x86/ops_float.asm b/libswscale/x86/ops_float.asm index 2863085a8e..625cf81553 100644 --- a/libswscale/x86/ops_float.asm +++ b/libswscale/x86/ops_float.asm @@ -193,37 +193,45 @@ IF W, mulps mw2, m8 %endif %endmacro +%macro dither0 0 +op dither0 + ; constant offset for all channels + vbroadcastss m8, [implq + SwsOpImpl.priv] + LOAD_CONT tmp0q +IF X, addps mx, m8 +IF Y, addps my, m8 +IF Z, addps mz, m8 +IF W, addps mw, m8 +IF X, addps mx2, m8 +IF Y, addps my2, m8 +IF Z, addps mz2, m8 +IF W, addps mw2, m8 + CONTINUE tmp0q +%endmacro + %macro dither 1 ; size_log2 op dither%1 %define DX m8 %define DY m9 %define DZ m10 %define DW m11 - %define DX2 DX - %define DY2 DY - %define DZ2 DZ - %define DW2 DW -%if %1 == 0 - ; constant offset for all channels - vbroadcastss DX, [implq + SwsOpImpl.priv] - %define DY DX - %define DZ DX - %define DW DX -%else - ; load all four channels with custom offset - ; - ; note that for 2x2, we would only need to look at the sign of `y`, but - ; this special case is ignored for simplicity reasons (and because - ; the current upstream format code never generates matrices that small) %if (4 << %1) > mmsize %define DX2 m12 %define DY2 m13 %define DZ2 m14 %define DW2 m15 + %else + %define DX2 DX + %define DY2 DY + %define DZ2 DZ + %define DW2 DW %endif ; dither matrix is stored indirectly at the private data address mov tmp1q, [implq + SwsOpImpl.priv] - ; add y offset + ; add y offset. note that for 2x2, we would only need to look at the + ; sign of `y`, but this special case is ignored for simplicity reasons + ; (and because the current upstream format code never generates matrices + ; that small) mov tmp0d, yd and tmp0d, (1 << %1) - 1 shl tmp0d, %1 + 2 ; * sizeof(float) @@ -239,7 +247,6 @@ IF X, load_dither_row %1, 0, tmp1q, DX, DX2 IF Y, load_dither_row %1, 1, tmp1q, DY, DY2 IF Z, load_dither_row %1, 2, tmp1q, DZ, DZ2 IF W, load_dither_row %1, 3, tmp1q, DW, DW2 -%endif LOAD_CONT tmp0q IF X, addps mx, DX IF Y, addps my, DY @@ -253,7 +260,7 @@ IF W, addps mw2, DW2 %endmacro %macro dither_fns 0 - dither 0 + dither0 dither 1 dither 2 dither 3 -- 2.52.0 >>From 5d559a85672e64d60e8f0366ffb220150a7ac3db Mon Sep 17 00:00:00 2001 From: Niklas Haas Date: Wed, 25 Feb 2026 15:19:32 +0100 Subject: [PATCH 4/6] swscale/x86/ops: don't preload dither weights This doesn't actually gain any performance but makes the code needlessly complicated. Just directly add the indirect address as needed. Signed-off-by: Niklas Haas --- libswscale/x86/ops_float.asm | 64 ++++++++++++------------------------ 1 file changed, 21 insertions(+), 43 deletions(-) diff --git a/libswscale/x86/ops_float.asm b/libswscale/x86/ops_float.asm index 625cf81553..78f35a9785 100644 --- a/libswscale/x86/ops_float.asm +++ b/libswscale/x86/ops_float.asm @@ -179,20 +179,6 @@ IF W, mulps mw2, m8 CONTINUE tmp0q %endmacro -%macro load_dither_row 5 ; size_log2, comp_idx, addr, out, out2 - mov tmp0w, [implq + SwsOpImpl.priv + (4 + %2) * 2] ; priv.u16[4 + i] -%if %1 == 1 - vbroadcastsd %4, [%3 + tmp0q] -%elif %1 == 2 - VBROADCASTI128 %4, [%3 + tmp0q] -%else - mova %4, [%3 + tmp0q] - %if (4 << %1) > mmsize - mova %5, [%3 + tmp0q + mmsize] - %endif -%endif -%endmacro - %macro dither0 0 op dither0 ; constant offset for all channels @@ -209,23 +195,24 @@ IF W, addps mw2, m8 CONTINUE tmp0q %endmacro +%macro dither_row 5 ; size_log2, comp_idx, matrix, out, out2 + mov tmp0w, [implq + SwsOpImpl.priv + (4 + %2) * 2] ; priv.u16[4 + i] +%if %1 == 1 + vbroadcastsd m8, [%3 + tmp0q] + addps %4, m8 + addps %5, m8 +%elif %1 == 2 + VBROADCASTI128 m8, [%3 + tmp0q] + addps %4, m8 + addps %5, m8 +%else + addps %4, [%3 + tmp0q] + addps %5, [%3 + tmp0q + mmsize * ((4 << %1) > mmsize)] +%endif +%endmacro + %macro dither 1 ; size_log2 op dither%1 - %define DX m8 - %define DY m9 - %define DZ m10 - %define DW m11 - %if (4 << %1) > mmsize - %define DX2 m12 - %define DY2 m13 - %define DZ2 m14 - %define DW2 m15 - %else - %define DX2 DX - %define DY2 DY - %define DZ2 DZ - %define DW2 DW - %endif ; dither matrix is stored indirectly at the private data address mov tmp1q, [implq + SwsOpImpl.priv] ; add y offset. note that for 2x2, we would only need to look at the @@ -243,20 +230,11 @@ op dither%1 and tmp0d, (4 << %1) - 1 add tmp1q, tmp0q %endif -IF X, load_dither_row %1, 0, tmp1q, DX, DX2 -IF Y, load_dither_row %1, 1, tmp1q, DY, DY2 -IF Z, load_dither_row %1, 2, tmp1q, DZ, DZ2 -IF W, load_dither_row %1, 3, tmp1q, DW, DW2 - LOAD_CONT tmp0q -IF X, addps mx, DX -IF Y, addps my, DY -IF Z, addps mz, DZ -IF W, addps mw, DW -IF X, addps mx2, DX2 -IF Y, addps my2, DY2 -IF Z, addps mz2, DZ2 -IF W, addps mw2, DW2 - CONTINUE tmp0q + dither_row %1, 0, tmp1q, mx, mx2 + dither_row %1, 1, tmp1q, my, my2 + dither_row %1, 2, tmp1q, mz, mz2 + dither_row %1, 3, tmp1q, mw, mw2 + CONTINUE %endmacro %macro dither_fns 0 -- 2.52.0 >>From bbf013d09c6ab775944618ca6d447d0f4ac1ff14 Mon Sep 17 00:00:00 2001 From: Niklas Haas Date: Wed, 25 Feb 2026 17:00:07 +0100 Subject: [PATCH 5/6] swscale/x86/ops: add support for optional dither indices Instead of defining multiple patterns for the dither ops, just define a single generic function that branches internally. The branch is well-predicted and ridiculously cheap. At least on my end, within margin of error. Signed-off-by: Niklas Haas --- libswscale/ops_chain.h | 1 + libswscale/x86/ops.c | 51 ++++++++++++++++++------------------ libswscale/x86/ops_float.asm | 8 ++++-- 3 files changed, 33 insertions(+), 27 deletions(-) diff --git a/libswscale/ops_chain.h b/libswscale/ops_chain.h index ec55e87998..3b791f3394 100644 --- a/libswscale/ops_chain.h +++ b/libswscale/ops_chain.h @@ -47,6 +47,7 @@ typedef union SwsOpPriv { int8_t i8[16]; uint8_t u8[16]; uint16_t u16[8]; + int16_t i16[8]; uint32_t u32[4]; float f32[4]; } SwsOpPriv; diff --git a/libswscale/x86/ops.c b/libswscale/x86/ops.c index 44dbe05b35..d82a637b1b 100644 --- a/libswscale/x86/ops.c +++ b/libswscale/x86/ops.c @@ -194,10 +194,11 @@ static int setup_dither(const SwsOp *op, SwsOpPriv *out) } const int size = 1 << op->dither.size_log2; + const int8_t *off = op->dither.y_offset; int max_offset = 0; for (int i = 0; i < 4; i++) { - const int offset = op->dither.y_offset[i] & (size - 1); - max_offset = FFMAX(max_offset, offset); + if (off[i] >= 0) + max_offset = FFMAX(max_offset, off[i] & (size - 1)); } /* Allocate extra rows to allow over-reading for row offsets. Note that @@ -216,17 +217,17 @@ static int setup_dither(const SwsOp *op, SwsOpPriv *out) memcpy(&matrix[size * size], matrix, max_offset * stride); /* Store relative pointer offset to each row inside extra space */ - static_assert(sizeof(out->ptr) <= sizeof(uint16_t[4]), ">8 byte pointers not supported"); - assert(max_offset * stride <= UINT16_MAX); - uint16_t *offset = &out->u16[4]; + static_assert(sizeof(out->ptr) <= sizeof(int16_t[4]), ">8 byte pointers not supported"); + assert(max_offset * stride <= INT16_MAX); + int16_t *off_out = &out->i16[4]; for (int i = 0; i < 4; i++) - offset[i] = (op->dither.y_offset[i] & (size - 1)) * stride; + off_out[i] = off[i] >= 0 ? (off[i] & (size - 1)) * stride : -1; return 0; } -#define DECL_DITHER(EXT, SIZE) \ - DECL_COMMON_PATTERNS(F32, dither##SIZE##EXT, \ +#define DECL_DITHER(DECL_MACRO, EXT, SIZE) \ + DECL_MACRO(F32, dither##SIZE##EXT, \ .op = SWS_OP_DITHER, \ .setup = setup_dither, \ .free = (SIZE) ? av_free : NULL, \ @@ -442,15 +443,15 @@ static const SwsOpTable ops16##EXT = { DECL_EXPAND(EXT, U8, U32) \ DECL_MIN_MAX(EXT) \ DECL_SCALE(EXT) \ - DECL_DITHER(EXT, 0) \ - DECL_DITHER(EXT, 1) \ - DECL_DITHER(EXT, 2) \ - DECL_DITHER(EXT, 3) \ - DECL_DITHER(EXT, 4) \ - DECL_DITHER(EXT, 5) \ - DECL_DITHER(EXT, 6) \ - DECL_DITHER(EXT, 7) \ - DECL_DITHER(EXT, 8) \ + DECL_DITHER(DECL_COMMON_PATTERNS, EXT, 0) \ + DECL_DITHER(DECL_ASM, EXT, 1) \ + DECL_DITHER(DECL_ASM, EXT, 2) \ + DECL_DITHER(DECL_ASM, EXT, 3) \ + DECL_DITHER(DECL_ASM, EXT, 4) \ + DECL_DITHER(DECL_ASM, EXT, 5) \ + DECL_DITHER(DECL_ASM, EXT, 6) \ + DECL_DITHER(DECL_ASM, EXT, 7) \ + DECL_DITHER(DECL_ASM, EXT, 8) \ DECL_LINEAR(EXT, luma, SWS_MASK_LUMA) \ DECL_LINEAR(EXT, alpha, SWS_MASK_ALPHA) \ DECL_LINEAR(EXT, lumalpha, SWS_MASK_LUMA | SWS_MASK_ALPHA) \ @@ -494,14 +495,14 @@ static const SwsOpTable ops32##EXT = { REF_COMMON_PATTERNS(max##EXT), \ REF_COMMON_PATTERNS(scale##EXT), \ REF_COMMON_PATTERNS(dither0##EXT), \ - REF_COMMON_PATTERNS(dither1##EXT), \ - REF_COMMON_PATTERNS(dither2##EXT), \ - REF_COMMON_PATTERNS(dither3##EXT), \ - REF_COMMON_PATTERNS(dither4##EXT), \ - REF_COMMON_PATTERNS(dither5##EXT), \ - REF_COMMON_PATTERNS(dither6##EXT), \ - REF_COMMON_PATTERNS(dither7##EXT), \ - REF_COMMON_PATTERNS(dither8##EXT), \ + &op_dither1##EXT, \ + &op_dither2##EXT, \ + &op_dither3##EXT, \ + &op_dither4##EXT, \ + &op_dither5##EXT, \ + &op_dither6##EXT, \ + &op_dither7##EXT, \ + &op_dither8##EXT, \ &op_luma##EXT, \ &op_alpha##EXT, \ &op_lumalpha##EXT, \ diff --git a/libswscale/x86/ops_float.asm b/libswscale/x86/ops_float.asm index 78f35a9785..c9dc408a9b 100644 --- a/libswscale/x86/ops_float.asm +++ b/libswscale/x86/ops_float.asm @@ -197,6 +197,9 @@ IF W, addps mw2, m8 %macro dither_row 5 ; size_log2, comp_idx, matrix, out, out2 mov tmp0w, [implq + SwsOpImpl.priv + (4 + %2) * 2] ; priv.u16[4 + i] + ; test is tmp0w < 0 + test tmp0w, tmp0w + js .skip%2 %if %1 == 1 vbroadcastsd m8, [%3 + tmp0q] addps %4, m8 @@ -209,6 +212,7 @@ IF W, addps mw2, m8 addps %4, [%3 + tmp0q] addps %5, [%3 + tmp0q + mmsize * ((4 << %1) > mmsize)] %endif +.skip%2: %endmacro %macro dither 1 ; size_log2 @@ -238,7 +242,7 @@ op dither%1 %endmacro %macro dither_fns 0 - dither0 + decl_common_patterns dither0 dither 1 dither 2 dither 3 @@ -364,5 +368,5 @@ decl_common_patterns conv32fto8 decl_common_patterns conv32fto16 decl_common_patterns min_max decl_common_patterns scale -decl_common_patterns dither_fns +dither_fns linear_fns -- 2.52.0 >>From e303a503097986917524c5620a4c61a7597a242f Mon Sep 17 00:00:00 2001 From: Niklas Haas Date: Wed, 25 Feb 2026 17:29:43 +0100 Subject: [PATCH 6/6] swscale/ops_optimizer: eliminate unnecessary dither indices Generates a lot of incremental diffs due to things like ignored alpha planes or chroma planes that are not actually modified. e.g. bgr24 -> gbrap10be: [ u8 XXXX -> +++X] SWS_OP_READ : 3 elem(s) packed >> 0 [ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> f32 [f32 ...X -> ...X] SWS_OP_SCALE : * 341/85 - [f32 ...X -> ...X] SWS_OP_DITHER : 16x16 matrix + {2 3 0 5} + [f32 ...X -> ...X] SWS_OP_DITHER : 16x16 matrix + {2 3 0 -1} [f32 ...X -> ...X] SWS_OP_MIN : x <= {1023 1023 1023 1023} [f32 ...X -> +++X] SWS_OP_CONVERT : f32 -> u16 [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES [u16 ...X -> zzzX] SWS_OP_SWIZZLE : 1023 [u16 ...X -> zzz+] SWS_OP_CLEAR : {_ _ _ 65283} [u16 .... -> zzz+] SWS_OP_WRITE : 4 elem(s) planar >> 0 (X = unused, z = byteswapped, + = exact, 0 = zero) Signed-off-by: Niklas Haas --- libswscale/ops_optimizer.c | 7 ++++++- tests/ref/fate/sws-ops-list | 2 +- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/libswscale/ops_optimizer.c b/libswscale/ops_optimizer.c index 203dff1ac2..9cddb3d7f5 100644 --- a/libswscale/ops_optimizer.c +++ b/libswscale/ops_optimizer.c @@ -520,7 +520,12 @@ retry: for (int i = 0; i < 4; i++) { if (next->comps.unused[i] || op->dither.y_offset[i] < 0) continue; - noop &= !!(prev->comps.flags[i] & SWS_COMP_EXACT); + if (prev->comps.flags[i] & SWS_COMP_EXACT) { + op->dither.y_offset[i] = -1; /* unnecessary dither */ + goto retry; + } else { + noop = false; + } } if (noop) { diff --git a/tests/ref/fate/sws-ops-list b/tests/ref/fate/sws-ops-list index 9f85a106ed..13049a0c14 100644 --- a/tests/ref/fate/sws-ops-list +++ b/tests/ref/fate/sws-ops-list @@ -1 +1 @@ -86c2335d1adad97dda299cbc6234ac57 +a312bd79cadff3e2e02fd14ae7e54e26 -- 2.52.0 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org