From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTPS id 1A9134F42B
	for <ffmpegdev@gitmailbox.com>; Wed, 25 Feb 2026 17:07:17 +0000 (UTC)
Authentication-Results: ffbox; dkim=fail (body hash mismatch (got 
   b'i5O33qPMRN01oGhRMfK8QCQr2I1FA8nHveuig+XZYyA=', expected 
   b'BO1+6jH0lqFBI3cBuzwc+2rD2/pqxbE7fqS6vhORj5M=')) header.d=ffmpeg.org 
   header.i=@ffmpeg.org header.a=rsa-sha256
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org;
 i=@ffmpeg.org; q=dns/txt; s=mail; t=1772039196; h=mime-version : to :
 date : message-id : reply-to : subject : list-id : list-archive :
 list-archive : list-help : list-owner : list-post : list-subscribe :
 list-unsubscribe : from : cc : content-type :
 content-transfer-encoding : from;
 bh=i5O33qPMRN01oGhRMfK8QCQr2I1FA8nHveuig+XZYyA=;
 b=dKpBj9d1Ga54giZ9/MYBINQSn0ipODF0gSteL0KMPL/UebYdjZHJMATFaVEk2bDMxpd4j
 3T3C3R/FMx8ycEy/UElL6fk11keavbvqsKDSXcLTSabZywXCgjRPRMxSMnhpxeSugte02s9
 QnbaLs4EI6XvlzH+Bfhk7x/tH5LQygDLpviBL0ZWVCXch69qaVYZ15ckfMjT6Po1mW0zXw5
 eE8XiAsGYhmK/bARCtcXDu+ByH0kp7Z002MwH39sEvQRv5g8sfl44LRFuK6HTbwIdI0D74R
 xu08WVKUQpyrK3iJlFhml9WtfrwnkbMmgKz+ZaRX+FXJCDX2Dztn6tjEpAeA==
Received: from [172.18.0.3] (unknown [172.18.0.3])
	by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 03B88691CA6;
	Wed, 25 Feb 2026 19:06:36 +0200 (EET)
ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1772039184;
 b=K9PbdBs+hi4KW37QyGucv410NEAu7F+JcS6eCbXyLX434TNyWUPLokJWKFJL5jSsT5cmA
 wYu1V52hlzdONC0IW0Ihc2fHEYd2VA4qH0mePTXRXHh2IidLYVkT8xT4x2Vr2s4D76Zev3k
 4sUdXLOg26lpbJqJ1iUxgc52hcce+xgljGDBH7sV+8KEmARX2OKLbvC7L5fW1EvCnFkG2fB
 RKfrxgf9Oc0hI86T/kqdzAbdnWRGBP6SUltih2Nb/lYQG5h2jiqVwUSelVxUFIpIFdDoluN
 lAubP0tG4NN0PX5WoWGWyIBU1RAGjvwAlW6lZsjyXafyPWCprXaOCxm2JUnA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=ffmpeg.org; s=arc; t=1772039184; h=from : sender : reply-to :
 subject : date : message-id : to : cc : mime-version : content-type :
 content-transfer-encoding : content-id : content-description :
 resent-date : resent-from : resent-sender : resent-to : resent-cc :
 resent-message-id : in-reply-to : references : list-id : list-help :
 list-unsubscribe : list-subscribe : list-post : list-owner :
 list-archive; bh=JnLBC2g9ikccvrHHzWI8RZEHNEpmlHYyLblNUUpc2uo=;
 b=FGQ6no3CDVixloGeVB43vBmRai9Ea39oRjFXbT6pW/m3Y8EuRi+nnNf9Kvwu5Ut6dnP+N
 I7LHAWcYLtDmgHoc3ze84LyepgEnV50JxZZvr57nJwHVA64GAO8bSGMsoVOzuD8HY69sqiK
 t8qfjOWttFOLyRY9uD35g3BgJrUEemIEqYNz/etIPV/Xih/yklyjgVrgvzU7wNnRYJ15uUa
 +xuKpmxxiZGkHwHcSMiSGGrp81Z211zhgy30vCyGGRI3HISALlsFy3rYHMhJf2pXqhWYN4n
 lqIJXT92wWJJrUKUuLfCL/QT1wUfihIUlialf26SEIKC3iLPoOzinTwyBX9A==
ARC-Authentication-Results: i=1; ffmpeg.org;
 dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org;
 arc=none;
 dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine
Authentication-Results: ffmpeg.org;
 dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org;
 arc=none (Message is not ARC signed);
 dmarc=pass (Used From Domain Record) header.from=ffmpeg.org
 policy.dmarc=quarantine
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org;
 i=@ffmpeg.org; q=dns/txt; s=mail; t=1772039175; h=content-type :
 mime-version : content-transfer-encoding : from : to : reply-to :
 subject : date : from;
 bh=BO1+6jH0lqFBI3cBuzwc+2rD2/pqxbE7fqS6vhORj5M=;
 b=Z9RI2E1iF5Barc17ifacs6uaUwlgpzN19nUP+lZ1PEbLDxEdWBYuRGtPgFJl4Lxih4AwN
 a0k+6740QcLSqoslBKaAvPYAeTEf0433Hk8RIoq5bzlm6QCHaZhXezeACcRAtS9JS/4XVlH
 /FHv/YZKn2fk4vyas+5kOmBF1m13c2wKMXwrTMqOCdN9E5IrQofwgK4h42KVFC/kq0ANQ+h
 7Z1CXg8tp9sNUMACG1AT1RUzjesysTQgz3TNvQkeRLDMPmohEmkf5J/IIdK0SHNfrp0pDkc
 IMNx6yFbZicYKRCPNKrTqmfzQLxlZ+l+ebz6/48/PSl9Jp1MWY8eG7Va0T9A==
MIME-Version: 1.0
To: ffmpeg-devel@ffmpeg.org
Date: Wed, 25 Feb 2026 17:06:14 -0000
Message-ID: <177203917565.25.17597080578994453438@29965ddac10e>
Message-ID-Hash: BAG26VUDB6C2ALSCRJU2RPXDPEIHDR2G
X-Message-ID-Hash: BAG26VUDB6C2ALSCRJU2RPXDPEIHDR2G
X-MailFrom: code@ffmpeg.org
X-Mailman-Rule-Hits: nonmember-moderation
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; header-match-ffmpeg-devel.ffmpeg.org-0;
 header-match-ffmpeg-devel.ffmpeg.org-1;
 header-match-ffmpeg-devel.ffmpeg.org-2;
 header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation
X-Mailman-Version: 3.3.10
Precedence: list
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: [FFmpeg-devel] [PR] swscale/ops: add ability to exclude planes from
 SWS_OP_DITHER (PR #22283)
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
Archived-At: 
 <https://lists.ffmpeg.org/archives/list/ffmpeg-devel@ffmpeg.org/message/BAG26VUDB6C2ALSCRJU2RPXDPEIHDR2G/>
Archived-At: 
 <https://lists.ffmpeg.org/lore/ffmpeg-devel/177203917565.25.17597080578994453438@29965ddac10e/>
List-Archive: 
 <https://lists.ffmpeg.org/archives/list/ffmpeg-devel@ffmpeg.org/>
List-Archive: <https://lists.ffmpeg.org/lore/ffmpeg-devel/>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Owner: <mailto:ffmpeg-devel-owner@ffmpeg.org>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Subscribe: <mailto:ffmpeg-devel-join@ffmpeg.org>
List-Unsubscribe: <mailto:ffmpeg-devel-leave@ffmpeg.org>
From: Niklas Haas via ffmpeg-devel <ffmpeg-devel@ffmpeg.org>
Cc: Niklas Haas <code@ffmpeg.org>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Archived-At: <https://master.gitmailbox.com/ffmpegdev/177203917565.25.17597080578994453438@29965ddac10e/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

PR #22283 opened by Niklas Haas (haasn)
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/22283
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/22283.patch

Helps optimize in some future use cases, also IMHO simplifies the x86 implementation.


>>From b89dca6d76987a563f1a9db895c72780c99969e9 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Wed, 25 Feb 2026 16:46:04 +0100
Subject: [PATCH 1/6] swscale/ops: allow excluding components from
 SWS_OP_DITHER

We often need to dither only a subset of the components. Previously this
was not possible, but we can just use the special value -1 for this.

The main motivating factor is actually the fact that "unnecessary" dither ops
would otherwise frequently prevent plane splitting, since e.g. a copied
alpha plane has to come along for the ride through the whole F32/dither
pipeline.

Additionally, it somewhat simplifies implementations.

Signed-off-by: Niklas Haas <git@haasn.dev>
---
 libswscale/format.c        | 4 +++-
 libswscale/ops.c           | 6 ++++--
 libswscale/ops.h           | 2 +-
 libswscale/ops_optimizer.c | 5 +++--
 4 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/libswscale/format.c b/libswscale/format.c
index e0434a2024..d53acdbcdc 100644
--- a/libswscale/format.c
+++ b/libswscale/format.c
@@ -1229,8 +1229,10 @@ static int fmt_dither(SwsContext *ctx, SwsOpList *ops,
         /* Brute-forced offsets; minimizes quantization error across a 16x16
          * bayer dither pattern for standard RGBA and YUVA pixel formats */
         const int offsets_16x16[4] = {0, 3, 2, 5};
-        for (int i = 0; i < 4; i++)
+        for (int i = 0; i < 4; i++) {
+            av_assert0(offsets_16x16[i] <= INT8_MAX);
             dither.y_offset[i] = offsets_16x16[i];
+        }
 
         if (src.desc->nb_components < 3 && bpc >= 8) {
             /**
diff --git a/libswscale/ops.c b/libswscale/ops.c
index 2889e95d12..311a17fe43 100644
--- a/libswscale/ops.c
+++ b/libswscale/ops.c
@@ -181,8 +181,10 @@ void ff_sws_apply_op_q(const SwsOp *op, AVRational x[4])
         return;
     case SWS_OP_DITHER:
         av_assert1(!ff_sws_pixel_type_is_int(op->type));
-        for (int i = 0; i < 4; i++)
-            x[i] = x[i].den ? av_add_q(x[i], av_make_q(1, 2)) : x[i];
+        for (int i = 0; i < 4; i++) {
+            if (op->dither.y_offset[i] >= 0 && x[i].den)
+                x[i] = av_add_q(x[i], av_make_q(1, 2));
+        }
         return;
     case SWS_OP_MIN:
         for (int i = 0; i < 4; i++)
diff --git a/libswscale/ops.h b/libswscale/ops.h
index 7b79fdc69d..d1576b9325 100644
--- a/libswscale/ops.h
+++ b/libswscale/ops.h
@@ -139,7 +139,7 @@ typedef struct SwsConvertOp {
 typedef struct SwsDitherOp {
     AVRational *matrix; /* tightly packed dither matrix (refstruct) */
     int size_log2; /* size (in bits) of the dither matrix */
-    uint8_t y_offset[4]; /* row offset for each component */
+    int8_t y_offset[4]; /* row offset for each component, or -1 for ignored */
 } SwsDitherOp;
 
 typedef struct SwsLinearOp {
diff --git a/libswscale/ops_optimizer.c b/libswscale/ops_optimizer.c
index 45ad3d4490..203dff1ac2 100644
--- a/libswscale/ops_optimizer.c
+++ b/libswscale/ops_optimizer.c
@@ -518,8 +518,9 @@ retry:
 
         case SWS_OP_DITHER:
             for (int i = 0; i < 4; i++) {
-                noop &= (prev->comps.flags[i] & SWS_COMP_EXACT) ||
-                        next->comps.unused[i];
+                if (next->comps.unused[i] || op->dither.y_offset[i] < 0)
+                    continue;
+                noop &= !!(prev->comps.flags[i] & SWS_COMP_EXACT);
             }
 
             if (noop) {
-- 
2.52.0


>>From 357667a25e35ef43beb84bc6cb4bd13ade19edb2 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Wed, 25 Feb 2026 16:52:09 +0100
Subject: [PATCH 2/6] swscale/ops_backend: implement support for optional
 dither indices

If you place the branch inside the loop, gcc at least reverts back to scalar
code, so better to just split up and guard the entire loop.

Signed-off-by: Niklas Haas <git@haasn.dev>
---
 libswscale/ops_chain.h      |  1 +
 libswscale/ops_tmpl_float.c | 25 +++++++++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/libswscale/ops_chain.h b/libswscale/ops_chain.h
index 2f5a31793e..ec55e87998 100644
--- a/libswscale/ops_chain.h
+++ b/libswscale/ops_chain.h
@@ -44,6 +44,7 @@ typedef union SwsOpPriv {
 
     /* Common types */
     void *ptr;
+    int8_t    i8[16];
     uint8_t   u8[16];
     uint16_t u16[8];
     uint32_t u32[4];
diff --git a/libswscale/ops_tmpl_float.c b/libswscale/ops_tmpl_float.c
index 10749d5f7d..2f1d249168 100644
--- a/libswscale/ops_tmpl_float.c
+++ b/libswscale/ops_tmpl_float.c
@@ -57,7 +57,7 @@ DECL_SETUP(setup_dither)
         return AVERROR(ENOMEM);
 
     static_assert(sizeof(out->ptr) <= sizeof(uint8_t[8]), ">8 byte pointers not supported");
-    uint8_t *offset = &out->u8[8];
+    int8_t *offset = &out->i8[8];
     for (int i = 0; i < 4; i++)
         offset[i] = op->dither.y_offset[i];
 
@@ -74,25 +74,26 @@ DECL_SETUP(setup_dither)
 DECL_FUNC(dither, const int size_log2)
 {
     const pixel_t *restrict matrix = impl->priv.ptr;
-    const uint8_t *offset = &impl->priv.u8[8];
+    const int8_t *restrict offset = &impl->priv.i8[8];
     const int mask = (1 << size_log2) - 1;
     const int y_line = iter->y;
-    const int row0 = (y_line + offset[0]) & mask;
-    const int row1 = (y_line + offset[1]) & mask;
-    const int row2 = (y_line + offset[2]) & mask;
-    const int row3 = (y_line + offset[3]) & mask;
     const int size = 1 << size_log2;
     const int width = FFMAX(size, SWS_BLOCK_SIZE);
     const int base = iter->x & ~(SWS_BLOCK_SIZE - 1) & (size - 1);
 
-    SWS_LOOP
-    for (int i = 0; i < SWS_BLOCK_SIZE; i++) {
-        x[i] += size_log2 ? matrix[row0 * width + base + i] : (pixel_t) 0.5;
-        y[i] += size_log2 ? matrix[row1 * width + base + i] : (pixel_t) 0.5;
-        z[i] += size_log2 ? matrix[row2 * width + base + i] : (pixel_t) 0.5;
-        w[i] += size_log2 ? matrix[row3 * width + base + i] : (pixel_t) 0.5;
+#define DITHER_COMP(VAR, IDX)                                                            \
+    if (offset[IDX] >= 0) {                                                              \
+        const int row = (y_line + offset[IDX]) & mask;                                   \
+        SWS_LOOP                                                                         \
+        for (int i = 0; i < SWS_BLOCK_SIZE; i++)                                         \
+            VAR[i] += size_log2 ? matrix[row * width + base + i] : (pixel_t) 0.5;        \
     }
 
+    DITHER_COMP(x, 0)
+    DITHER_COMP(y, 1)
+    DITHER_COMP(z, 2)
+    DITHER_COMP(w, 3)
+
     CONTINUE(block_t, x, y, z, w);
 }
 
-- 
2.52.0


>>From 9616c55a96868f9c5f99e0582b00c87f5f63e987 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Wed, 25 Feb 2026 15:08:34 +0100
Subject: [PATCH 3/6] swscale/x86/ops: split off dither0 special case

I want to rewrite the dither kernel a bit, and this special case is a bit
too annoying and gets in the way.

Signed-off-by: Niklas Haas <git@haasn.dev>
---
 libswscale/x86/ops_float.asm | 45 +++++++++++++++++++++---------------
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/libswscale/x86/ops_float.asm b/libswscale/x86/ops_float.asm
index 2863085a8e..625cf81553 100644
--- a/libswscale/x86/ops_float.asm
+++ b/libswscale/x86/ops_float.asm
@@ -193,37 +193,45 @@ IF W,   mulps mw2, m8
 %endif
 %endmacro
 
+%macro dither0 0
+op dither0
+        ; constant offset for all channels
+        vbroadcastss m8, [implq + SwsOpImpl.priv]
+        LOAD_CONT tmp0q
+IF X,   addps mx, m8
+IF Y,   addps my, m8
+IF Z,   addps mz, m8
+IF W,   addps mw, m8
+IF X,   addps mx2, m8
+IF Y,   addps my2, m8
+IF Z,   addps mz2, m8
+IF W,   addps mw2, m8
+        CONTINUE tmp0q
+%endmacro
+
 %macro dither 1 ; size_log2
 op dither%1
         %define DX  m8
         %define DY  m9
         %define DZ  m10
         %define DW  m11
-        %define DX2 DX
-        %define DY2 DY
-        %define DZ2 DZ
-        %define DW2 DW
-%if %1 == 0
-        ; constant offset for all channels
-        vbroadcastss DX, [implq + SwsOpImpl.priv]
-        %define DY DX
-        %define DZ DX
-        %define DW DX
-%else
-        ; load all four channels with custom offset
-        ;
-        ; note that for 2x2, we would only need to look at the sign of `y`, but
-        ; this special case is ignored for simplicity reasons (and because
-        ; the current upstream format code never generates matrices that small)
     %if (4 << %1) > mmsize
         %define DX2 m12
         %define DY2 m13
         %define DZ2 m14
         %define DW2 m15
+    %else
+        %define DX2 DX
+        %define DY2 DY
+        %define DZ2 DZ
+        %define DW2 DW
     %endif
         ; dither matrix is stored indirectly at the private data address
         mov tmp1q, [implq + SwsOpImpl.priv]
-        ; add y offset
+        ; add y offset. note that for 2x2, we would only need to look at the
+        ; sign of `y`, but this special case is ignored for simplicity reasons
+        ; (and because the current upstream format code never generates matrices
+        ; that small)
         mov tmp0d, yd
         and tmp0d, (1 << %1) - 1
         shl tmp0d, %1 + 2 ; * sizeof(float)
@@ -239,7 +247,6 @@ IF X,   load_dither_row %1, 0, tmp1q, DX, DX2
 IF Y,   load_dither_row %1, 1, tmp1q, DY, DY2
 IF Z,   load_dither_row %1, 2, tmp1q, DZ, DZ2
 IF W,   load_dither_row %1, 3, tmp1q, DW, DW2
-%endif
         LOAD_CONT tmp0q
 IF X,   addps mx, DX
 IF Y,   addps my, DY
@@ -253,7 +260,7 @@ IF W,   addps mw2, DW2
 %endmacro
 
 %macro dither_fns 0
-        dither 0
+        dither0
         dither 1
         dither 2
         dither 3
-- 
2.52.0


>>From 5d559a85672e64d60e8f0366ffb220150a7ac3db Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Wed, 25 Feb 2026 15:19:32 +0100
Subject: [PATCH 4/6] swscale/x86/ops: don't preload dither weights

This doesn't actually gain any performance but makes the code needlessly
complicated. Just directly add the indirect address as needed.

Signed-off-by: Niklas Haas <git@haasn.dev>
---
 libswscale/x86/ops_float.asm | 64 ++++++++++++------------------------
 1 file changed, 21 insertions(+), 43 deletions(-)

diff --git a/libswscale/x86/ops_float.asm b/libswscale/x86/ops_float.asm
index 625cf81553..78f35a9785 100644
--- a/libswscale/x86/ops_float.asm
+++ b/libswscale/x86/ops_float.asm
@@ -179,20 +179,6 @@ IF W,   mulps mw2, m8
         CONTINUE tmp0q
 %endmacro
 
-%macro load_dither_row 5 ; size_log2, comp_idx, addr, out, out2
-        mov tmp0w, [implq + SwsOpImpl.priv + (4 + %2) * 2] ; priv.u16[4 + i]
-%if %1 == 1
-        vbroadcastsd   %4, [%3 + tmp0q]
-%elif %1 == 2
-        VBROADCASTI128 %4, [%3 + tmp0q]
-%else
-        mova %4, [%3 + tmp0q]
-    %if (4 << %1) > mmsize
-        mova %5, [%3 + tmp0q + mmsize]
-    %endif
-%endif
-%endmacro
-
 %macro dither0 0
 op dither0
         ; constant offset for all channels
@@ -209,23 +195,24 @@ IF W,   addps mw2, m8
         CONTINUE tmp0q
 %endmacro
 
+%macro dither_row 5 ; size_log2, comp_idx, matrix, out, out2
+        mov tmp0w, [implq + SwsOpImpl.priv + (4 + %2) * 2] ; priv.u16[4 + i]
+%if %1 == 1
+        vbroadcastsd m8, [%3 + tmp0q]
+        addps %4, m8
+        addps %5, m8
+%elif %1 == 2
+        VBROADCASTI128 m8, [%3 + tmp0q]
+        addps %4, m8
+        addps %5, m8
+%else
+        addps %4, [%3 + tmp0q]
+        addps %5, [%3 + tmp0q + mmsize * ((4 << %1) > mmsize)]
+%endif
+%endmacro
+
 %macro dither 1 ; size_log2
 op dither%1
-        %define DX  m8
-        %define DY  m9
-        %define DZ  m10
-        %define DW  m11
-    %if (4 << %1) > mmsize
-        %define DX2 m12
-        %define DY2 m13
-        %define DZ2 m14
-        %define DW2 m15
-    %else
-        %define DX2 DX
-        %define DY2 DY
-        %define DZ2 DZ
-        %define DW2 DW
-    %endif
         ; dither matrix is stored indirectly at the private data address
         mov tmp1q, [implq + SwsOpImpl.priv]
         ; add y offset. note that for 2x2, we would only need to look at the
@@ -243,20 +230,11 @@ op dither%1
         and tmp0d, (4 << %1) - 1
         add tmp1q, tmp0q
     %endif
-IF X,   load_dither_row %1, 0, tmp1q, DX, DX2
-IF Y,   load_dither_row %1, 1, tmp1q, DY, DY2
-IF Z,   load_dither_row %1, 2, tmp1q, DZ, DZ2
-IF W,   load_dither_row %1, 3, tmp1q, DW, DW2
-        LOAD_CONT tmp0q
-IF X,   addps mx, DX
-IF Y,   addps my, DY
-IF Z,   addps mz, DZ
-IF W,   addps mw, DW
-IF X,   addps mx2, DX2
-IF Y,   addps my2, DY2
-IF Z,   addps mz2, DZ2
-IF W,   addps mw2, DW2
-        CONTINUE tmp0q
+        dither_row %1, 0, tmp1q, mx, mx2
+        dither_row %1, 1, tmp1q, my, my2
+        dither_row %1, 2, tmp1q, mz, mz2
+        dither_row %1, 3, tmp1q, mw, mw2
+        CONTINUE
 %endmacro
 
 %macro dither_fns 0
-- 
2.52.0


>>From bbf013d09c6ab775944618ca6d447d0f4ac1ff14 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Wed, 25 Feb 2026 17:00:07 +0100
Subject: [PATCH 5/6] swscale/x86/ops: add support for optional dither indices

Instead of defining multiple patterns for the dither ops, just define a
single generic function that branches internally. The branch is well-predicted
and ridiculously cheap. At least on my end, within margin of error.

Signed-off-by: Niklas Haas <git@haasn.dev>
---
 libswscale/ops_chain.h       |  1 +
 libswscale/x86/ops.c         | 51 ++++++++++++++++++------------------
 libswscale/x86/ops_float.asm |  8 ++++--
 3 files changed, 33 insertions(+), 27 deletions(-)

diff --git a/libswscale/ops_chain.h b/libswscale/ops_chain.h
index ec55e87998..3b791f3394 100644
--- a/libswscale/ops_chain.h
+++ b/libswscale/ops_chain.h
@@ -47,6 +47,7 @@ typedef union SwsOpPriv {
     int8_t    i8[16];
     uint8_t   u8[16];
     uint16_t u16[8];
+    int16_t  i16[8];
     uint32_t u32[4];
     float    f32[4];
 } SwsOpPriv;
diff --git a/libswscale/x86/ops.c b/libswscale/x86/ops.c
index 44dbe05b35..d82a637b1b 100644
--- a/libswscale/x86/ops.c
+++ b/libswscale/x86/ops.c
@@ -194,10 +194,11 @@ static int setup_dither(const SwsOp *op, SwsOpPriv *out)
     }
 
     const int size = 1 << op->dither.size_log2;
+    const int8_t *off = op->dither.y_offset;
     int max_offset = 0;
     for (int i = 0; i < 4; i++) {
-        const int offset = op->dither.y_offset[i] & (size - 1);
-        max_offset = FFMAX(max_offset, offset);
+        if (off[i] >= 0)
+            max_offset = FFMAX(max_offset, off[i] & (size - 1));
     }
 
     /* Allocate extra rows to allow over-reading for row offsets. Note that
@@ -216,17 +217,17 @@ static int setup_dither(const SwsOp *op, SwsOpPriv *out)
     memcpy(&matrix[size * size], matrix, max_offset * stride);
 
     /* Store relative pointer offset to each row inside extra space */
-    static_assert(sizeof(out->ptr) <= sizeof(uint16_t[4]), ">8 byte pointers not supported");
-    assert(max_offset * stride <= UINT16_MAX);
-    uint16_t *offset = &out->u16[4];
+    static_assert(sizeof(out->ptr) <= sizeof(int16_t[4]), ">8 byte pointers not supported");
+    assert(max_offset * stride <= INT16_MAX);
+    int16_t *off_out = &out->i16[4];
     for (int i = 0; i < 4; i++)
-        offset[i] = (op->dither.y_offset[i] & (size - 1)) * stride;
+        off_out[i] = off[i] >= 0 ? (off[i] & (size - 1)) * stride : -1;
 
     return 0;
 }
 
-#define DECL_DITHER(EXT, SIZE)                                                  \
-    DECL_COMMON_PATTERNS(F32, dither##SIZE##EXT,                                \
+#define DECL_DITHER(DECL_MACRO, EXT, SIZE)                                      \
+    DECL_MACRO(F32, dither##SIZE##EXT,                                          \
         .op    = SWS_OP_DITHER,                                                 \
         .setup = setup_dither,                                                  \
         .free  = (SIZE) ? av_free : NULL,                                       \
@@ -442,15 +443,15 @@ static const SwsOpTable ops16##EXT = {
     DECL_EXPAND(EXT,   U8, U32)                                                 \
     DECL_MIN_MAX(EXT)                                                           \
     DECL_SCALE(EXT)                                                             \
-    DECL_DITHER(EXT, 0)                                                         \
-    DECL_DITHER(EXT, 1)                                                         \
-    DECL_DITHER(EXT, 2)                                                         \
-    DECL_DITHER(EXT, 3)                                                         \
-    DECL_DITHER(EXT, 4)                                                         \
-    DECL_DITHER(EXT, 5)                                                         \
-    DECL_DITHER(EXT, 6)                                                         \
-    DECL_DITHER(EXT, 7)                                                         \
-    DECL_DITHER(EXT, 8)                                                         \
+    DECL_DITHER(DECL_COMMON_PATTERNS, EXT, 0)                                   \
+    DECL_DITHER(DECL_ASM, EXT, 1)                                               \
+    DECL_DITHER(DECL_ASM, EXT, 2)                                               \
+    DECL_DITHER(DECL_ASM, EXT, 3)                                               \
+    DECL_DITHER(DECL_ASM, EXT, 4)                                               \
+    DECL_DITHER(DECL_ASM, EXT, 5)                                               \
+    DECL_DITHER(DECL_ASM, EXT, 6)                                               \
+    DECL_DITHER(DECL_ASM, EXT, 7)                                               \
+    DECL_DITHER(DECL_ASM, EXT, 8)                                               \
     DECL_LINEAR(EXT, luma,      SWS_MASK_LUMA)                                  \
     DECL_LINEAR(EXT, alpha,     SWS_MASK_ALPHA)                                 \
     DECL_LINEAR(EXT, lumalpha,  SWS_MASK_LUMA | SWS_MASK_ALPHA)                 \
@@ -494,14 +495,14 @@ static const SwsOpTable ops32##EXT = {
         REF_COMMON_PATTERNS(max##EXT),                                          \
         REF_COMMON_PATTERNS(scale##EXT),                                        \
         REF_COMMON_PATTERNS(dither0##EXT),                                      \
-        REF_COMMON_PATTERNS(dither1##EXT),                                      \
-        REF_COMMON_PATTERNS(dither2##EXT),                                      \
-        REF_COMMON_PATTERNS(dither3##EXT),                                      \
-        REF_COMMON_PATTERNS(dither4##EXT),                                      \
-        REF_COMMON_PATTERNS(dither5##EXT),                                      \
-        REF_COMMON_PATTERNS(dither6##EXT),                                      \
-        REF_COMMON_PATTERNS(dither7##EXT),                                      \
-        REF_COMMON_PATTERNS(dither8##EXT),                                      \
+        &op_dither1##EXT,                                                       \
+        &op_dither2##EXT,                                                       \
+        &op_dither3##EXT,                                                       \
+        &op_dither4##EXT,                                                       \
+        &op_dither5##EXT,                                                       \
+        &op_dither6##EXT,                                                       \
+        &op_dither7##EXT,                                                       \
+        &op_dither8##EXT,                                                       \
         &op_luma##EXT,                                                          \
         &op_alpha##EXT,                                                         \
         &op_lumalpha##EXT,                                                      \
diff --git a/libswscale/x86/ops_float.asm b/libswscale/x86/ops_float.asm
index 78f35a9785..c9dc408a9b 100644
--- a/libswscale/x86/ops_float.asm
+++ b/libswscale/x86/ops_float.asm
@@ -197,6 +197,9 @@ IF W,   addps mw2, m8
 
 %macro dither_row 5 ; size_log2, comp_idx, matrix, out, out2
         mov tmp0w, [implq + SwsOpImpl.priv + (4 + %2) * 2] ; priv.u16[4 + i]
+        ; test is tmp0w < 0
+        test tmp0w, tmp0w
+        js .skip%2
 %if %1 == 1
         vbroadcastsd m8, [%3 + tmp0q]
         addps %4, m8
@@ -209,6 +212,7 @@ IF W,   addps mw2, m8
         addps %4, [%3 + tmp0q]
         addps %5, [%3 + tmp0q + mmsize * ((4 << %1) > mmsize)]
 %endif
+.skip%2:
 %endmacro
 
 %macro dither 1 ; size_log2
@@ -238,7 +242,7 @@ op dither%1
 %endmacro
 
 %macro dither_fns 0
-        dither0
+        decl_common_patterns dither0
         dither 1
         dither 2
         dither 3
@@ -364,5 +368,5 @@ decl_common_patterns conv32fto8
 decl_common_patterns conv32fto16
 decl_common_patterns min_max
 decl_common_patterns scale
-decl_common_patterns dither_fns
+dither_fns
 linear_fns
-- 
2.52.0


>>From e303a503097986917524c5620a4c61a7597a242f Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Wed, 25 Feb 2026 17:29:43 +0100
Subject: [PATCH 6/6] swscale/ops_optimizer: eliminate unnecessary dither
 indices

Generates a lot of incremental diffs due to things like ignored alpha
planes or chroma planes that are not actually modified.

e.g.

 bgr24 -> gbrap10be:
   [ u8 XXXX -> +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
   [ u8 ...X -> +++X] SWS_OP_CONVERT      : u8 -> f32
   [f32 ...X -> ...X] SWS_OP_SCALE        : * 341/85
-  [f32 ...X -> ...X] SWS_OP_DITHER       : 16x16 matrix + {2 3 0 5}
+  [f32 ...X -> ...X] SWS_OP_DITHER       : 16x16 matrix + {2 3 0 -1}
   [f32 ...X -> ...X] SWS_OP_MIN          : x <= {1023 1023 1023 1023}
   [f32 ...X -> +++X] SWS_OP_CONVERT      : f32 -> u16
   [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES
   [u16 ...X -> zzzX] SWS_OP_SWIZZLE      : 1023
   [u16 ...X -> zzz+] SWS_OP_CLEAR        : {_ _ _ 65283}
   [u16 .... -> zzz+] SWS_OP_WRITE        : 4 elem(s) planar >> 0
     (X = unused, z = byteswapped, + = exact, 0 = zero)

Signed-off-by: Niklas Haas <git@haasn.dev>
---
 libswscale/ops_optimizer.c  | 7 ++++++-
 tests/ref/fate/sws-ops-list | 2 +-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/libswscale/ops_optimizer.c b/libswscale/ops_optimizer.c
index 203dff1ac2..9cddb3d7f5 100644
--- a/libswscale/ops_optimizer.c
+++ b/libswscale/ops_optimizer.c
@@ -520,7 +520,12 @@ retry:
             for (int i = 0; i < 4; i++) {
                 if (next->comps.unused[i] || op->dither.y_offset[i] < 0)
                     continue;
-                noop &= !!(prev->comps.flags[i] & SWS_COMP_EXACT);
+                if (prev->comps.flags[i] & SWS_COMP_EXACT) {
+                    op->dither.y_offset[i] = -1; /* unnecessary dither */
+                    goto retry;
+                } else {
+                    noop = false;
+                }
             }
 
             if (noop) {
diff --git a/tests/ref/fate/sws-ops-list b/tests/ref/fate/sws-ops-list
index 9f85a106ed..13049a0c14 100644
--- a/tests/ref/fate/sws-ops-list
+++ b/tests/ref/fate/sws-ops-list
@@ -1 +1 @@
-86c2335d1adad97dda299cbc6234ac57
+a312bd79cadff3e2e02fd14ae7e54e26
-- 
2.52.0

_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org