* [FFmpeg-devel] [PATCH] swscale/ops: generate correct value range in ff_sws_decode_pixfmt(), remove hacks from optimizer (PR #21264)
@ 2025-12-22 14:48 Niklas Haas via ffmpeg-devel
0 siblings, 0 replies; only message in thread
From: Niklas Haas via ffmpeg-devel @ 2025-12-22 14:48 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Niklas Haas
PR #21264 opened by Niklas Haas (haasn)
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21264
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21264.patch
Depends on #21263. Supersedes #21208.
This is an IMHO cleaner reimplementation of the same underlying principle of letting the format code attach metadata to the op list. Instead of a `SWS_OP_ASSUME` pseudo-op, this version instead attaches it directly to the `SwsComps` field of `SWS_OP_READ`, and changes the semantics of `ff_sws_op_list_update_comps()` to respect that information rather than nuking it every time.
Crucially, this version ensures that all available information survives and persists all the way to the backend; allowing those to make smarter decisions about what to do. (The current backends do not care, but @ramiro wants it for his asmjit)
Also fixes a number of bugs en passent.
>From d93ed0e7ada69c32541a0688d50527cb62a68637 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Mon, 22 Dec 2025 15:31:01 +0100
Subject: [PATCH 01/10] swscale/ops_internal: fix ff_sws_pack_op_decode()
This function was assuming that the bits are MSB-aligned, but they are
LSB-aligned in both practice (and in the actual backend).
Also update the documentation of SwsPackOp to make this clearer.
Fixes an incorrect omission of a clamp after decoding e.g. rgb4, since
the max value range was incorrectly determined as 0 as a result of unpacking
the MSB bits instead of the LSB bits:
bgr4 -> gray:
[ u8 XXXX -> +XXX] SWS_OP_READ : 1 elem(s) packed >> 1
[ u8 .XXX -> +++X] SWS_OP_UNPACK : {1 2 1 0}
[ u8 ...X -> +++X] SWS_OP_SWIZZLE : 2103
[ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> f32
[f32 ...X -> .++X] SWS_OP_LINEAR : dot3 [...]
[f32 .XXX -> .++X] SWS_OP_DITHER : 16x16 matrix + {0 3 2 5}
+ [f32 .XXX -> .++X] SWS_OP_MIN : x <= {255 _ _ _}
[f32 .XXX -> +++X] SWS_OP_CONVERT : f32 -> u8
[ u8 .XXX -> +++X] SWS_OP_WRITE : 1 elem(s) planar >> 0
(X = unused, + = exact, 0 = zero)
---
libswscale/ops.h | 4 ++++
libswscale/ops_internal.h | 4 +++-
tests/ref/fate/sws-ops-list | 2 +-
3 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/libswscale/ops.h b/libswscale/ops.h
index 6392d0ffdf..6fc7e60a02 100644
--- a/libswscale/ops.h
+++ b/libswscale/ops.h
@@ -109,6 +109,10 @@ typedef struct SwsReadWriteOp {
} SwsReadWriteOp;
typedef struct SwsPackOp {
+ /**
+ * Packed bits are assumed to be LSB-aligned within the underlying
+ * integer type; i.e. (msb) 0 ... X Y Z W (lsb).
+ */
uint8_t pattern[4]; /* bit depth pattern, from MSB to LSB */
} SwsPackOp;
diff --git a/libswscale/ops_internal.h b/libswscale/ops_internal.h
index 0071f78558..ba4c9b39da 100644
--- a/libswscale/ops_internal.h
+++ b/libswscale/ops_internal.h
@@ -39,7 +39,9 @@ static inline AVRational ff_sws_pixel_expand(SwsPixelType from, SwsPixelType to)
static inline void ff_sws_pack_op_decode(const SwsOp *op, uint64_t mask[4], int shift[4])
{
- const int size = ff_sws_pixel_type_size(op->type) * 8;
+ int size = 0;
+ for (int i = 0; i < 4; i++)
+ size += op->pack.pattern[i];
for (int i = 0; i < 4; i++) {
const int bits = op->pack.pattern[i];
mask[i] = (UINT64_C(1) << bits) - 1;
diff --git a/tests/ref/fate/sws-ops-list b/tests/ref/fate/sws-ops-list
index b49f944794..a7d6149d8b 100644
--- a/tests/ref/fate/sws-ops-list
+++ b/tests/ref/fate/sws-ops-list
@@ -1 +1 @@
-e910ff7ceaeb64bfdbac3f652b67403f
+ef1dd10af970984495f6008e43d0fe1b
--
2.49.1
>From dc2f27b3f68c12e8ce3dd8315d98ad4ecdf307d0 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Mon, 22 Dec 2025 14:07:45 +0100
Subject: [PATCH 02/10] swscale/ops: move ff_sws_op_list_update_comps() to
ops.c
I think this is ultimately a better home, since the semantics of this are
not really tied to optimization itself; and because I want to make it an
explicitly suported part of the user-facing API (rather than just an
internal-use field).
The secondary motivating reason here is that I intend to use internal
helpers of `ops.c` inside the next commit. (Though this is a weak reason
on its own, and not sufficient to justify this move by itself.)
---
libswscale/ops.c | 239 +++++++++++++++++++++++++++++++++++++
libswscale/ops_optimizer.c | 239 -------------------------------------
2 files changed, 239 insertions(+), 239 deletions(-)
diff --git a/libswscale/ops.c b/libswscale/ops.c
index f84416feed..5b08b743bf 100644
--- a/libswscale/ops.c
+++ b/libswscale/ops.c
@@ -211,6 +211,245 @@ void ff_sws_apply_op_q(const SwsOp *op, AVRational x[4])
av_unreachable("Invalid operation type!");
}
+/* merge_comp_flags() forms a monoid with flags_identity as the null element */
+static const unsigned flags_identity = SWS_COMP_ZERO | SWS_COMP_EXACT;
+static unsigned merge_comp_flags(unsigned a, unsigned b)
+{
+ const unsigned flags_or = SWS_COMP_GARBAGE;
+ const unsigned flags_and = SWS_COMP_ZERO | SWS_COMP_EXACT;
+ return ((a & b) & flags_and) | ((a | b) & flags_or);
+}
+
+/* Infer + propagate known information about components */
+void ff_sws_op_list_update_comps(SwsOpList *ops)
+{
+ SwsComps next = { .unused = {true, true, true, true} };
+ SwsComps prev = { .flags = {
+ SWS_COMP_GARBAGE, SWS_COMP_GARBAGE, SWS_COMP_GARBAGE, SWS_COMP_GARBAGE,
+ }};
+
+ /* Forwards pass, propagates knowledge about the incoming pixel values */
+ for (int n = 0; n < ops->num_ops; n++) {
+ SwsOp *op = &ops->ops[n];
+
+ /* Prefill min/max values automatically; may have to be fixed in
+ * special cases */
+ memcpy(op->comps.min, prev.min, sizeof(prev.min));
+ memcpy(op->comps.max, prev.max, sizeof(prev.max));
+
+ if (op->op != SWS_OP_SWAP_BYTES) {
+ ff_sws_apply_op_q(op, op->comps.min);
+ ff_sws_apply_op_q(op, op->comps.max);
+ }
+
+ switch (op->op) {
+ case SWS_OP_READ:
+ for (int i = 0; i < op->rw.elems; i++) {
+ if (ff_sws_pixel_type_is_int(op->type)) {
+ int bits = 8 * ff_sws_pixel_type_size(op->type) >> op->rw.frac;
+ if (!op->rw.packed && ops->src.desc) {
+ /* Use legal value range from pixdesc if available;
+ * we don't need to do this for packed formats because
+ * non-byte-aligned packed formats will necessarily go
+ * through SWS_OP_UNPACK anyway */
+ for (int c = 0; c < 4; c++) {
+ if (ops->src.desc->comp[c].plane == i) {
+ bits = ops->src.desc->comp[c].depth;
+ break;
+ }
+ }
+ }
+
+ op->comps.flags[i] = SWS_COMP_EXACT;
+ op->comps.min[i] = Q(0);
+ op->comps.max[i] = Q((1ULL << bits) - 1);
+ }
+ }
+ for (int i = op->rw.elems; i < 4; i++)
+ op->comps.flags[i] = prev.flags[i];
+ break;
+ case SWS_OP_WRITE:
+ for (int i = 0; i < op->rw.elems; i++)
+ av_assert1(!(prev.flags[i] & SWS_COMP_GARBAGE));
+ /* fall through */
+ case SWS_OP_SWAP_BYTES:
+ case SWS_OP_LSHIFT:
+ case SWS_OP_RSHIFT:
+ case SWS_OP_MIN:
+ case SWS_OP_MAX:
+ /* Linearly propagate flags per component */
+ for (int i = 0; i < 4; i++)
+ op->comps.flags[i] = prev.flags[i];
+ break;
+ case SWS_OP_DITHER:
+ /* Strip zero flag because of the nonzero dithering offset */
+ for (int i = 0; i < 4; i++)
+ op->comps.flags[i] = prev.flags[i] & ~SWS_COMP_ZERO;
+ break;
+ case SWS_OP_UNPACK:
+ for (int i = 0; i < 4; i++) {
+ if (op->pack.pattern[i])
+ op->comps.flags[i] = prev.flags[0];
+ else
+ op->comps.flags[i] = SWS_COMP_GARBAGE;
+ }
+ break;
+ case SWS_OP_PACK: {
+ unsigned flags = flags_identity;
+ for (int i = 0; i < 4; i++) {
+ if (op->pack.pattern[i])
+ flags = merge_comp_flags(flags, prev.flags[i]);
+ if (i > 0) /* clear remaining comps for sanity */
+ op->comps.flags[i] = SWS_COMP_GARBAGE;
+ }
+ op->comps.flags[0] = flags;
+ break;
+ }
+ case SWS_OP_CLEAR:
+ for (int i = 0; i < 4; i++) {
+ if (op->c.q4[i].den) {
+ if (op->c.q4[i].num == 0) {
+ op->comps.flags[i] = SWS_COMP_ZERO | SWS_COMP_EXACT;
+ } else if (op->c.q4[i].den == 1) {
+ op->comps.flags[i] = SWS_COMP_EXACT;
+ }
+ } else {
+ op->comps.flags[i] = prev.flags[i];
+ }
+ }
+ break;
+ case SWS_OP_SWIZZLE:
+ for (int i = 0; i < 4; i++)
+ op->comps.flags[i] = prev.flags[op->swizzle.in[i]];
+ break;
+ case SWS_OP_CONVERT:
+ for (int i = 0; i < 4; i++) {
+ op->comps.flags[i] = prev.flags[i];
+ if (ff_sws_pixel_type_is_int(op->convert.to))
+ op->comps.flags[i] |= SWS_COMP_EXACT;
+ }
+ break;
+ case SWS_OP_LINEAR:
+ for (int i = 0; i < 4; i++) {
+ unsigned flags = flags_identity;
+ AVRational min = Q(0), max = Q(0);
+ for (int j = 0; j < 4; j++) {
+ const AVRational k = op->lin.m[i][j];
+ AVRational mink = av_mul_q(prev.min[j], k);
+ AVRational maxk = av_mul_q(prev.max[j], k);
+ if (k.num) {
+ flags = merge_comp_flags(flags, prev.flags[j]);
+ if (k.den != 1) /* fractional coefficient */
+ flags &= ~SWS_COMP_EXACT;
+ if (k.num < 0)
+ FFSWAP(AVRational, mink, maxk);
+ min = av_add_q(min, mink);
+ max = av_add_q(max, maxk);
+ }
+ }
+ if (op->lin.m[i][4].num) { /* nonzero offset */
+ flags &= ~SWS_COMP_ZERO;
+ if (op->lin.m[i][4].den != 1) /* fractional offset */
+ flags &= ~SWS_COMP_EXACT;
+ min = av_add_q(min, op->lin.m[i][4]);
+ max = av_add_q(max, op->lin.m[i][4]);
+ }
+ op->comps.flags[i] = flags;
+ op->comps.min[i] = min;
+ op->comps.max[i] = max;
+ }
+ break;
+ case SWS_OP_SCALE:
+ for (int i = 0; i < 4; i++) {
+ op->comps.flags[i] = prev.flags[i];
+ if (op->c.q.den != 1) /* fractional scale */
+ op->comps.flags[i] &= ~SWS_COMP_EXACT;
+ if (op->c.q.num < 0)
+ FFSWAP(AVRational, op->comps.min[i], op->comps.max[i]);
+ }
+ break;
+
+ case SWS_OP_INVALID:
+ case SWS_OP_TYPE_NB:
+ av_unreachable("Invalid operation type!");
+ }
+
+ prev = op->comps;
+ }
+
+ /* Backwards pass, solves for component dependencies */
+ for (int n = ops->num_ops - 1; n >= 0; n--) {
+ SwsOp *op = &ops->ops[n];
+
+ switch (op->op) {
+ case SWS_OP_READ:
+ case SWS_OP_WRITE:
+ for (int i = 0; i < op->rw.elems; i++)
+ op->comps.unused[i] = op->op == SWS_OP_READ;
+ for (int i = op->rw.elems; i < 4; i++)
+ op->comps.unused[i] = next.unused[i];
+ break;
+ case SWS_OP_SWAP_BYTES:
+ case SWS_OP_LSHIFT:
+ case SWS_OP_RSHIFT:
+ case SWS_OP_CONVERT:
+ case SWS_OP_DITHER:
+ case SWS_OP_MIN:
+ case SWS_OP_MAX:
+ case SWS_OP_SCALE:
+ for (int i = 0; i < 4; i++)
+ op->comps.unused[i] = next.unused[i];
+ break;
+ case SWS_OP_UNPACK: {
+ bool unused = true;
+ for (int i = 0; i < 4; i++) {
+ if (op->pack.pattern[i])
+ unused &= next.unused[i];
+ op->comps.unused[i] = i > 0;
+ }
+ op->comps.unused[0] = unused;
+ break;
+ }
+ case SWS_OP_PACK:
+ for (int i = 0; i < 4; i++) {
+ if (op->pack.pattern[i])
+ op->comps.unused[i] = next.unused[0];
+ else
+ op->comps.unused[i] = true;
+ }
+ break;
+ case SWS_OP_CLEAR:
+ for (int i = 0; i < 4; i++) {
+ if (op->c.q4[i].den)
+ op->comps.unused[i] = true;
+ else
+ op->comps.unused[i] = next.unused[i];
+ }
+ break;
+ case SWS_OP_SWIZZLE: {
+ bool unused[4] = { true, true, true, true };
+ for (int i = 0; i < 4; i++)
+ unused[op->swizzle.in[i]] &= next.unused[i];
+ for (int i = 0; i < 4; i++)
+ op->comps.unused[i] = unused[i];
+ break;
+ }
+ case SWS_OP_LINEAR:
+ for (int j = 0; j < 4; j++) {
+ bool unused = true;
+ for (int i = 0; i < 4; i++) {
+ if (op->lin.m[i][j].num)
+ unused &= next.unused[i];
+ }
+ op->comps.unused[j] = unused;
+ }
+ break;
+ }
+
+ next = op->comps;
+ }
+}
+
static void op_uninit(SwsOp *op)
{
switch (op->op) {
diff --git a/libswscale/ops_optimizer.c b/libswscale/ops_optimizer.c
index 9d668fee74..11ee40e268 100644
--- a/libswscale/ops_optimizer.c
+++ b/libswscale/ops_optimizer.c
@@ -146,245 +146,6 @@ static bool op_commute_swizzle(SwsOp *op, SwsOp *next)
return false;
}
-/* merge_comp_flags() forms a monoid with flags_identity as the null element */
-static const unsigned flags_identity = SWS_COMP_ZERO | SWS_COMP_EXACT;
-static unsigned merge_comp_flags(unsigned a, unsigned b)
-{
- const unsigned flags_or = SWS_COMP_GARBAGE;
- const unsigned flags_and = SWS_COMP_ZERO | SWS_COMP_EXACT;
- return ((a & b) & flags_and) | ((a | b) & flags_or);
-}
-
-/* Infer + propagate known information about components */
-void ff_sws_op_list_update_comps(SwsOpList *ops)
-{
- SwsComps next = { .unused = {true, true, true, true} };
- SwsComps prev = { .flags = {
- SWS_COMP_GARBAGE, SWS_COMP_GARBAGE, SWS_COMP_GARBAGE, SWS_COMP_GARBAGE,
- }};
-
- /* Forwards pass, propagates knowledge about the incoming pixel values */
- for (int n = 0; n < ops->num_ops; n++) {
- SwsOp *op = &ops->ops[n];
-
- /* Prefill min/max values automatically; may have to be fixed in
- * special cases */
- memcpy(op->comps.min, prev.min, sizeof(prev.min));
- memcpy(op->comps.max, prev.max, sizeof(prev.max));
-
- if (op->op != SWS_OP_SWAP_BYTES) {
- ff_sws_apply_op_q(op, op->comps.min);
- ff_sws_apply_op_q(op, op->comps.max);
- }
-
- switch (op->op) {
- case SWS_OP_READ:
- for (int i = 0; i < op->rw.elems; i++) {
- if (ff_sws_pixel_type_is_int(op->type)) {
- int bits = 8 * ff_sws_pixel_type_size(op->type) >> op->rw.frac;
- if (!op->rw.packed && ops->src.desc) {
- /* Use legal value range from pixdesc if available;
- * we don't need to do this for packed formats because
- * non-byte-aligned packed formats will necessarily go
- * through SWS_OP_UNPACK anyway */
- for (int c = 0; c < 4; c++) {
- if (ops->src.desc->comp[c].plane == i) {
- bits = ops->src.desc->comp[c].depth;
- break;
- }
- }
- }
-
- op->comps.flags[i] = SWS_COMP_EXACT;
- op->comps.min[i] = Q(0);
- op->comps.max[i] = Q((1ULL << bits) - 1);
- }
- }
- for (int i = op->rw.elems; i < 4; i++)
- op->comps.flags[i] = prev.flags[i];
- break;
- case SWS_OP_WRITE:
- for (int i = 0; i < op->rw.elems; i++)
- av_assert1(!(prev.flags[i] & SWS_COMP_GARBAGE));
- /* fall through */
- case SWS_OP_SWAP_BYTES:
- case SWS_OP_LSHIFT:
- case SWS_OP_RSHIFT:
- case SWS_OP_MIN:
- case SWS_OP_MAX:
- /* Linearly propagate flags per component */
- for (int i = 0; i < 4; i++)
- op->comps.flags[i] = prev.flags[i];
- break;
- case SWS_OP_DITHER:
- /* Strip zero flag because of the nonzero dithering offset */
- for (int i = 0; i < 4; i++)
- op->comps.flags[i] = prev.flags[i] & ~SWS_COMP_ZERO;
- break;
- case SWS_OP_UNPACK:
- for (int i = 0; i < 4; i++) {
- if (op->pack.pattern[i])
- op->comps.flags[i] = prev.flags[0];
- else
- op->comps.flags[i] = SWS_COMP_GARBAGE;
- }
- break;
- case SWS_OP_PACK: {
- unsigned flags = flags_identity;
- for (int i = 0; i < 4; i++) {
- if (op->pack.pattern[i])
- flags = merge_comp_flags(flags, prev.flags[i]);
- if (i > 0) /* clear remaining comps for sanity */
- op->comps.flags[i] = SWS_COMP_GARBAGE;
- }
- op->comps.flags[0] = flags;
- break;
- }
- case SWS_OP_CLEAR:
- for (int i = 0; i < 4; i++) {
- if (op->c.q4[i].den) {
- if (op->c.q4[i].num == 0) {
- op->comps.flags[i] = SWS_COMP_ZERO | SWS_COMP_EXACT;
- } else if (op->c.q4[i].den == 1) {
- op->comps.flags[i] = SWS_COMP_EXACT;
- }
- } else {
- op->comps.flags[i] = prev.flags[i];
- }
- }
- break;
- case SWS_OP_SWIZZLE:
- for (int i = 0; i < 4; i++)
- op->comps.flags[i] = prev.flags[op->swizzle.in[i]];
- break;
- case SWS_OP_CONVERT:
- for (int i = 0; i < 4; i++) {
- op->comps.flags[i] = prev.flags[i];
- if (ff_sws_pixel_type_is_int(op->convert.to))
- op->comps.flags[i] |= SWS_COMP_EXACT;
- }
- break;
- case SWS_OP_LINEAR:
- for (int i = 0; i < 4; i++) {
- unsigned flags = flags_identity;
- AVRational min = Q(0), max = Q(0);
- for (int j = 0; j < 4; j++) {
- const AVRational k = op->lin.m[i][j];
- AVRational mink = av_mul_q(prev.min[j], k);
- AVRational maxk = av_mul_q(prev.max[j], k);
- if (k.num) {
- flags = merge_comp_flags(flags, prev.flags[j]);
- if (k.den != 1) /* fractional coefficient */
- flags &= ~SWS_COMP_EXACT;
- if (k.num < 0)
- FFSWAP(AVRational, mink, maxk);
- min = av_add_q(min, mink);
- max = av_add_q(max, maxk);
- }
- }
- if (op->lin.m[i][4].num) { /* nonzero offset */
- flags &= ~SWS_COMP_ZERO;
- if (op->lin.m[i][4].den != 1) /* fractional offset */
- flags &= ~SWS_COMP_EXACT;
- min = av_add_q(min, op->lin.m[i][4]);
- max = av_add_q(max, op->lin.m[i][4]);
- }
- op->comps.flags[i] = flags;
- op->comps.min[i] = min;
- op->comps.max[i] = max;
- }
- break;
- case SWS_OP_SCALE:
- for (int i = 0; i < 4; i++) {
- op->comps.flags[i] = prev.flags[i];
- if (op->c.q.den != 1) /* fractional scale */
- op->comps.flags[i] &= ~SWS_COMP_EXACT;
- if (op->c.q.num < 0)
- FFSWAP(AVRational, op->comps.min[i], op->comps.max[i]);
- }
- break;
-
- case SWS_OP_INVALID:
- case SWS_OP_TYPE_NB:
- av_unreachable("Invalid operation type!");
- }
-
- prev = op->comps;
- }
-
- /* Backwards pass, solves for component dependencies */
- for (int n = ops->num_ops - 1; n >= 0; n--) {
- SwsOp *op = &ops->ops[n];
-
- switch (op->op) {
- case SWS_OP_READ:
- case SWS_OP_WRITE:
- for (int i = 0; i < op->rw.elems; i++)
- op->comps.unused[i] = op->op == SWS_OP_READ;
- for (int i = op->rw.elems; i < 4; i++)
- op->comps.unused[i] = next.unused[i];
- break;
- case SWS_OP_SWAP_BYTES:
- case SWS_OP_LSHIFT:
- case SWS_OP_RSHIFT:
- case SWS_OP_CONVERT:
- case SWS_OP_DITHER:
- case SWS_OP_MIN:
- case SWS_OP_MAX:
- case SWS_OP_SCALE:
- for (int i = 0; i < 4; i++)
- op->comps.unused[i] = next.unused[i];
- break;
- case SWS_OP_UNPACK: {
- bool unused = true;
- for (int i = 0; i < 4; i++) {
- if (op->pack.pattern[i])
- unused &= next.unused[i];
- op->comps.unused[i] = i > 0;
- }
- op->comps.unused[0] = unused;
- break;
- }
- case SWS_OP_PACK:
- for (int i = 0; i < 4; i++) {
- if (op->pack.pattern[i])
- op->comps.unused[i] = next.unused[0];
- else
- op->comps.unused[i] = true;
- }
- break;
- case SWS_OP_CLEAR:
- for (int i = 0; i < 4; i++) {
- if (op->c.q4[i].den)
- op->comps.unused[i] = true;
- else
- op->comps.unused[i] = next.unused[i];
- }
- break;
- case SWS_OP_SWIZZLE: {
- bool unused[4] = { true, true, true, true };
- for (int i = 0; i < 4; i++)
- unused[op->swizzle.in[i]] &= next.unused[i];
- for (int i = 0; i < 4; i++)
- op->comps.unused[i] = unused[i];
- break;
- }
- case SWS_OP_LINEAR:
- for (int j = 0; j < 4; j++) {
- bool unused = true;
- for (int i = 0; i < 4; i++) {
- if (op->lin.m[i][j].num)
- unused &= next.unused[i];
- }
- op->comps.unused[j] = unused;
- }
- break;
- }
-
- next = op->comps;
- }
-}
-
/* returns log2(x) only if x is a power of two, or 0 otherwise */
static int exact_log2(const int x)
{
--
2.49.1
>From 58fc6803333a13fea92efefe6ecd0529de389c5c Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Mon, 22 Dec 2025 14:16:50 +0100
Subject: [PATCH 03/10] swscale/ops_optimizer: don't strip SwsComps from
SWS_OP_READ
The current behavior of assuming the value range implicitly on SWS_OP_READ
has a number of serious drawbacks and shortcomings:
- It ignored the effects of SWS_OP_RSHIFT, such as for p010 and related
MSB-aligned formats. (This is actually a bug)
- It adds a needless dependency on the "purely informative" src/dst fields
inside SwsOpList.
- It is difficult to reason about when acted upon by SWS_OP_SWAP_BYTES, and
the existing hack of simply ignoring SWAP_BYTES on the value range is not
a very good solution here.
Instead, we need a more principled way for the op list generating code
to communicate extra metadata about the operations read to the optimizer.
I think the simplest way of doing this is to allow the SwsComps field attached
to SWS_OP_READ to carry additional, user-provided information about the values
read.
This requires changing ff_sws_op_list_update_comps() slightly to not completely
overwrite SwsComps on SWS_OP_READ, but instead merge the implicit information
with the explictly provided one.
---
libswscale/ops.c | 30 ++++++++++++++++++------------
libswscale/ops.h | 17 ++++++++++++++++-
2 files changed, 34 insertions(+), 13 deletions(-)
diff --git a/libswscale/ops.c b/libswscale/ops.c
index 5b08b743bf..f2e9767956 100644
--- a/libswscale/ops.c
+++ b/libswscale/ops.c
@@ -232,14 +232,15 @@ void ff_sws_op_list_update_comps(SwsOpList *ops)
for (int n = 0; n < ops->num_ops; n++) {
SwsOp *op = &ops->ops[n];
- /* Prefill min/max values automatically; may have to be fixed in
- * special cases */
- memcpy(op->comps.min, prev.min, sizeof(prev.min));
- memcpy(op->comps.max, prev.max, sizeof(prev.max));
-
- if (op->op != SWS_OP_SWAP_BYTES) {
- ff_sws_apply_op_q(op, op->comps.min);
- ff_sws_apply_op_q(op, op->comps.max);
+ if (op->op != SWS_OP_READ) {
+ /* Prefill min/max values automatically; may have to be fixed in
+ * special cases */
+ memcpy(op->comps.min, prev.min, sizeof(prev.min));
+ memcpy(op->comps.max, prev.max, sizeof(prev.max));
+ if (op->op != SWS_OP_SWAP_BYTES) {
+ ff_sws_apply_op_q(op, op->comps.min);
+ ff_sws_apply_op_q(op, op->comps.max);
+ }
}
switch (op->op) {
@@ -260,13 +261,18 @@ void ff_sws_op_list_update_comps(SwsOpList *ops)
}
}
- op->comps.flags[i] = SWS_COMP_EXACT;
- op->comps.min[i] = Q(0);
- op->comps.max[i] = Q((1ULL << bits) - 1);
+ op->comps.flags[i] |= SWS_COMP_EXACT;
+ op->comps.min[i] = av_max_q(Q(0), op->comps.min[i]);
+ op->comps.max[i] = av_min_q(Q((1ULL << bits) - 1), op->comps.max[i]);
}
}
- for (int i = op->rw.elems; i < 4; i++)
+
+ /* Explicitly strip flags and min/max range data for unread comps */
+ for (int i = op->rw.elems; i < 4; i++) {
op->comps.flags[i] = prev.flags[i];
+ op->comps.min[i] = prev.min[i];
+ op->comps.max[i] = prev.max[i];
+ }
break;
case SWS_OP_WRITE:
for (int i = 0; i < op->rw.elems; i++)
diff --git a/libswscale/ops.h b/libswscale/ops.h
index 6fc7e60a02..f49bbcc601 100644
--- a/libswscale/ops.h
+++ b/libswscale/ops.h
@@ -195,7 +195,22 @@ typedef struct SwsOp {
SwsConst c;
};
- /* For use internal use inside ff_sws_*() functions */
+ /**
+ * Metadata about the operation's input/output components.
+ *
+ * For SWS_OP_READ, this is informative; and lets the optimizer know
+ * additional information about the value range and/or pixel data to expect.
+ * The default value of {0} is safe to pass in the case that no additional
+ * information is known. Extra components past the SWS_OP_READ boundary are
+ * always marked as unused/garbage.
+ *
+ * For every other operation, this metadata is discarded and regenerated
+ * automatically by `ff_sws_op_list_update_comps()`.
+ *
+ * Note that backends will only ever see optimized operation lists, so
+ * they may rely on the presence and accuracy of this metadata for all
+ * operations.
+ */
SwsComps comps;
} SwsOp;
--
2.49.1
>From 0adebde9a439f1ad2442c0079f11be22cd3922ac Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Mon, 22 Dec 2025 14:29:52 +0100
Subject: [PATCH 04/10] swscale/format: add pixel range metadata to SWS_OP_READ
---
libswscale/format.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/libswscale/format.c b/libswscale/format.c
index 9e1121f484..589ae71b71 100644
--- a/libswscale/format.c
+++ b/libswscale/format.c
@@ -896,16 +896,29 @@ int ff_sws_decode_pixfmt(SwsOpList *ops, enum AVPixelFormat fmt)
SwsReadWriteOp rw_op;
SwsSwizzleOp swizzle;
SwsPackOp unpack;
+ SwsComps comps = {0};
int shift;
RET(fmt_analyze(fmt, &rw_op, &unpack, &swizzle, &shift,
&pixel_type, &raw_type));
+ /* Generate value range information for simple unpacked formats */
+ if (!(desc->flags & AV_PIX_FMT_FLAG_FLOAT) && !unpack.pattern[0]) {
+ for (int c = 0; c < desc->nb_components; c++) {
+ const int bits = desc->comp[c].depth + shift;
+ const int idx = swizzle.in[c];
+ comps.min[idx] = Q0;
+ if (bits < 32) /* FIXME: AVRational is limited to INT_MAX */
+ comps.max[idx] = Q((1ULL << bits) - 1);
+ }
+ }
+
/* TODO: handle subsampled or semipacked input formats */
RET(ff_sws_op_list_append(ops, &(SwsOp) {
- .op = SWS_OP_READ,
- .type = raw_type,
- .rw = rw_op,
+ .op = SWS_OP_READ,
+ .type = raw_type,
+ .rw = rw_op,
+ .comps = comps,
}));
if ((desc->flags & AV_PIX_FMT_FLAG_BE) != NATIVE_ENDIAN_FLAG) {
--
2.49.1
>From cb41401eb8a4e62fc6b8029cb861564bfeab67f7 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Mon, 22 Dec 2025 14:33:47 +0100
Subject: [PATCH 05/10] swscale/optimizer: remove broken value range assumption
hack
This information is now pre-filled automatically for SWS_OP_READ when
relevant.
yuv444p10msbbe -> rgb24:
[u16 XXXX -> +++X] SWS_OP_READ : 3 elem(s) planar >> 0
[u16 ...X -> +++X] SWS_OP_SWAP_BYTES
[u16 ...X -> +++X] SWS_OP_RSHIFT : >> 6
[u16 ...X -> +++X] SWS_OP_CONVERT : u16 -> f32
[f32 ...X -> ...X] SWS_OP_LINEAR : matrix3+off3 [...]
[f32 ...X -> ...X] SWS_OP_DITHER : 16x16 matrix + {0 3 2 5}
[f32 ...X -> ...X] SWS_OP_MAX : {0 0 0 0} <= x
+ [f32 ...X -> ...X] SWS_OP_MIN : x <= {255 255 255 _}
[f32 ...X -> +++X] SWS_OP_CONVERT : f32 -> u8
[ u8 ...X -> +++X] SWS_OP_WRITE : 3 elem(s) packed >> 0
(X = unused, + = exact, 0 = zero)
(This clamp is needed and was incorrectly optimized away before, because the
`SWS_OP_RSHIFT` incorrectly distorted the value range assertion)
---
libswscale/ops.c | 13 -------------
tests/ref/fate/sws-ops-list | 2 +-
2 files changed, 1 insertion(+), 14 deletions(-)
diff --git a/libswscale/ops.c b/libswscale/ops.c
index f2e9767956..25bce3c59f 100644
--- a/libswscale/ops.c
+++ b/libswscale/ops.c
@@ -248,19 +248,6 @@ void ff_sws_op_list_update_comps(SwsOpList *ops)
for (int i = 0; i < op->rw.elems; i++) {
if (ff_sws_pixel_type_is_int(op->type)) {
int bits = 8 * ff_sws_pixel_type_size(op->type) >> op->rw.frac;
- if (!op->rw.packed && ops->src.desc) {
- /* Use legal value range from pixdesc if available;
- * we don't need to do this for packed formats because
- * non-byte-aligned packed formats will necessarily go
- * through SWS_OP_UNPACK anyway */
- for (int c = 0; c < 4; c++) {
- if (ops->src.desc->comp[c].plane == i) {
- bits = ops->src.desc->comp[c].depth;
- break;
- }
- }
- }
-
op->comps.flags[i] |= SWS_COMP_EXACT;
op->comps.min[i] = av_max_q(Q(0), op->comps.min[i]);
op->comps.max[i] = av_min_q(Q((1ULL << bits) - 1), op->comps.max[i]);
diff --git a/tests/ref/fate/sws-ops-list b/tests/ref/fate/sws-ops-list
index a7d6149d8b..1ec4e5af79 100644
--- a/tests/ref/fate/sws-ops-list
+++ b/tests/ref/fate/sws-ops-list
@@ -1 +1 @@
-ef1dd10af970984495f6008e43d0fe1b
+08b3e768ae2f518b73897ab5bad459e9
--
2.49.1
>From c70c1101751938df21b6f91fe76e9e115e86f94a Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Mon, 22 Dec 2025 13:51:54 +0100
Subject: [PATCH 06/10] swscale/ops: add SWS_COMP_SWAPPED
This flag keeps track of whether a pixel is currently byte-swapped or
not. Not needed by current backends, but informative and useful for
catching potential endianness errors.
Updates a lot of FATE tests with a cosmetic diff like this:
rgb24 -> gray16be:
[ u8 XXXX -> +++X] SWS_OP_READ : 3 elem(s) packed >> 0
[ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> f32
[f32 ...X -> .++X] SWS_OP_LINEAR : dot3 [...]
[f32 .XXX -> +++X] SWS_OP_CONVERT : f32 -> u16
- [u16 .XXX -> +++X] SWS_OP_SWAP_BYTES
- [u16 .XXX -> +++X] SWS_OP_WRITE : 1 elem(s) planar >> 0
+ [u16 .XXX -> zzzX] SWS_OP_SWAP_BYTES
+ [u16 .XXX -> zzzX] SWS_OP_WRITE : 1 elem(s) planar >> 0
(X = unused, + = exact, 0 = zero)
(The choice of `z` to represent swapped integers is arbitrary, but I think
it's visually evocative and distinct from the other symbols)
---
libswscale/format.c | 8 +++++++-
libswscale/ops.c | 7 ++++++-
libswscale/ops.h | 1 +
tests/ref/fate/sws-ops-list | 2 +-
4 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/libswscale/format.c b/libswscale/format.c
index 589ae71b71..90c9404518 100644
--- a/libswscale/format.c
+++ b/libswscale/format.c
@@ -913,6 +913,12 @@ int ff_sws_decode_pixfmt(SwsOpList *ops, enum AVPixelFormat fmt)
}
}
+ const int swapped = (desc->flags & AV_PIX_FMT_FLAG_BE) != NATIVE_ENDIAN_FLAG;
+ if (swapped) {
+ for (int i = 0; i < 4; i++)
+ comps.flags[i] |= SWS_COMP_SWAPPED;
+ }
+
/* TODO: handle subsampled or semipacked input formats */
RET(ff_sws_op_list_append(ops, &(SwsOp) {
.op = SWS_OP_READ,
@@ -921,7 +927,7 @@ int ff_sws_decode_pixfmt(SwsOpList *ops, enum AVPixelFormat fmt)
.comps = comps,
}));
- if ((desc->flags & AV_PIX_FMT_FLAG_BE) != NATIVE_ENDIAN_FLAG) {
+ if (swapped) {
RET(ff_sws_op_list_append(ops, &(SwsOp) {
.op = SWS_OP_SWAP_BYTES,
.type = raw_type,
diff --git a/libswscale/ops.c b/libswscale/ops.c
index 25bce3c59f..2bf4550444 100644
--- a/libswscale/ops.c
+++ b/libswscale/ops.c
@@ -261,11 +261,14 @@ void ff_sws_op_list_update_comps(SwsOpList *ops)
op->comps.max[i] = prev.max[i];
}
break;
+ case SWS_OP_SWAP_BYTES:
+ for (int i = 0; i < 4; i++)
+ op->comps.flags[i] = prev.flags[i] ^ SWS_COMP_SWAPPED;
+ break;
case SWS_OP_WRITE:
for (int i = 0; i < op->rw.elems; i++)
av_assert1(!(prev.flags[i] & SWS_COMP_GARBAGE));
/* fall through */
- case SWS_OP_SWAP_BYTES:
case SWS_OP_LSHIFT:
case SWS_OP_RSHIFT:
case SWS_OP_MIN:
@@ -607,6 +610,8 @@ static char describe_comp_flags(unsigned flags)
return 'X';
else if (flags & SWS_COMP_ZERO)
return '0';
+ else if (flags & SWS_COMP_SWAPPED)
+ return 'z';
else if (flags & SWS_COMP_EXACT)
return '+';
else
diff --git a/libswscale/ops.h b/libswscale/ops.h
index f49bbcc601..058fe9c77c 100644
--- a/libswscale/ops.h
+++ b/libswscale/ops.h
@@ -73,6 +73,7 @@ enum SwsCompFlags {
SWS_COMP_GARBAGE = 1 << 0, /* contents are undefined / garbage data */
SWS_COMP_EXACT = 1 << 1, /* value is an exact integer */
SWS_COMP_ZERO = 1 << 2, /* known to be a constant zero */
+ SWS_COMP_SWAPPED = 1 << 3, /* byte order is swapped */
};
typedef union SwsConst {
diff --git a/tests/ref/fate/sws-ops-list b/tests/ref/fate/sws-ops-list
index 1ec4e5af79..5b5797e962 100644
--- a/tests/ref/fate/sws-ops-list
+++ b/tests/ref/fate/sws-ops-list
@@ -1 +1 @@
-08b3e768ae2f518b73897ab5bad459e9
+7c45512e9589aad5843e6ad3e430267b
--
2.49.1
>From 3c19f5b093dda38bc17cdecc57ceb9bbfc41d2b4 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Mon, 22 Dec 2025 14:58:42 +0100
Subject: [PATCH 07/10] swscale/ops: use switch/case for updating SwsComps
ranges
A bit more readable and makes it clear what the special cases are.
---
libswscale/ops.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/libswscale/ops.c b/libswscale/ops.c
index 2bf4550444..b703956c5c 100644
--- a/libswscale/ops.c
+++ b/libswscale/ops.c
@@ -232,15 +232,17 @@ void ff_sws_op_list_update_comps(SwsOpList *ops)
for (int n = 0; n < ops->num_ops; n++) {
SwsOp *op = &ops->ops[n];
- if (op->op != SWS_OP_READ) {
- /* Prefill min/max values automatically; may have to be fixed in
- * special cases */
+ switch (op->op) {
+ case SWS_OP_READ:
+ case SWS_OP_LINEAR:
+ case SWS_OP_SWAP_BYTES:
+ break; /* special cases, handled below */
+ default:
memcpy(op->comps.min, prev.min, sizeof(prev.min));
memcpy(op->comps.max, prev.max, sizeof(prev.max));
- if (op->op != SWS_OP_SWAP_BYTES) {
- ff_sws_apply_op_q(op, op->comps.min);
- ff_sws_apply_op_q(op, op->comps.max);
- }
+ ff_sws_apply_op_q(op, op->comps.min);
+ ff_sws_apply_op_q(op, op->comps.max);
+ break;
}
switch (op->op) {
@@ -262,8 +264,11 @@ void ff_sws_op_list_update_comps(SwsOpList *ops)
}
break;
case SWS_OP_SWAP_BYTES:
- for (int i = 0; i < 4; i++)
+ for (int i = 0; i < 4; i++) {
op->comps.flags[i] = prev.flags[i] ^ SWS_COMP_SWAPPED;
+ op->comps.min[i] = prev.min[i];
+ op->comps.max[i] = prev.max[i];
+ }
break;
case SWS_OP_WRITE:
for (int i = 0; i < op->rw.elems; i++)
--
2.49.1
>From 8c5b178f14a61c386612de4cbfd7f6d63c3bc2ae Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Mon, 22 Dec 2025 15:01:46 +0100
Subject: [PATCH 08/10] swscale/ops: explicitly reset value range after
SWS_OP_UNPACK
The current logic implicitly pulled the new value range out of SWS_OP_ASSUME,
but this was quite ill-formed and not very robust. In particular, it only
worked because of the implicit assumption that the value range was always
set to 0b1111...111.
This actually poses a serious problem for 32-bit packed formats, whose
value range actually does not fit into AVRational. In the past, it only
worked because the value would implicitly overflow to -1, which SWS_OP_UNPACK
would then correctly extract the bits out from again.
In general, it's cleaner (and sufficient) to just explicitly reset the value
range on SWS_OP_UNPACK again.
---
libswscale/ops.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/libswscale/ops.c b/libswscale/ops.c
index b703956c5c..4be43ed7c1 100644
--- a/libswscale/ops.c
+++ b/libswscale/ops.c
@@ -236,6 +236,7 @@ void ff_sws_op_list_update_comps(SwsOpList *ops)
case SWS_OP_READ:
case SWS_OP_LINEAR:
case SWS_OP_SWAP_BYTES:
+ case SWS_OP_UNPACK:
break; /* special cases, handled below */
default:
memcpy(op->comps.min, prev.min, sizeof(prev.min));
@@ -289,9 +290,13 @@ void ff_sws_op_list_update_comps(SwsOpList *ops)
break;
case SWS_OP_UNPACK:
for (int i = 0; i < 4; i++) {
- if (op->pack.pattern[i])
+ const int pattern = op->pack.pattern[i];
+ if (pattern) {
+ av_assert1(pattern < 32);
op->comps.flags[i] = prev.flags[0];
- else
+ op->comps.min[i] = Q(0);
+ op->comps.max[i] = Q((1L << pattern) - 1);
+ } else
op->comps.flags[i] = SWS_COMP_GARBAGE;
}
break;
--
2.49.1
>From 099a620f3d3845d1d2b01b297fa46fcb53cf20cd Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Mon, 22 Dec 2025 15:42:08 +0100
Subject: [PATCH 09/10] swscale/ops: don't overflow AVRational range
AVRational64 is not quite ready, so avoid UB here for now.
---
libswscale/ops.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/libswscale/ops.c b/libswscale/ops.c
index 4be43ed7c1..8a4a4a26c0 100644
--- a/libswscale/ops.c
+++ b/libswscale/ops.c
@@ -250,10 +250,12 @@ void ff_sws_op_list_update_comps(SwsOpList *ops)
case SWS_OP_READ:
for (int i = 0; i < op->rw.elems; i++) {
if (ff_sws_pixel_type_is_int(op->type)) {
- int bits = 8 * ff_sws_pixel_type_size(op->type) >> op->rw.frac;
+ const int bits = 8 * ff_sws_pixel_type_size(op->type) >> op->rw.frac;
+ const uint64_t range = (1ULL << bits) - 1;
op->comps.flags[i] |= SWS_COMP_EXACT;
op->comps.min[i] = av_max_q(Q(0), op->comps.min[i]);
- op->comps.max[i] = av_min_q(Q((1ULL << bits) - 1), op->comps.max[i]);
+ if (range <= INT_MAX) /* FIXME: AVRational is limited to INT_MAX */
+ op->comps.max[i] = av_min_q(Q(range), op->comps.max[i]);
}
}
--
2.49.1
>From be834ed1159148bc241c27eac9ca45df9ecb3df4 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Mon, 15 Dec 2025 18:34:47 +0100
Subject: [PATCH 10/10] Revert "swscale/ops: clarify SwsOpList.src/dst
semantics"
This reverts commit c94e8afe5d195fb08c441fbe3f8c2295081fcf8b.
These are now actually purely informational.
---
libswscale/ops.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/libswscale/ops.h b/libswscale/ops.h
index 058fe9c77c..9e5e55f5ac 100644
--- a/libswscale/ops.h
+++ b/libswscale/ops.h
@@ -232,8 +232,8 @@ typedef struct SwsOpList {
SwsOp *ops;
int num_ops;
- /* Metadata associated with this operation list */
- SwsFormat src, dst; /* if set; may inform the optimizer about e.g value ranges */
+ /* Purely informative metadata associated with this operation list */
+ SwsFormat src, dst;
} SwsOpList;
SwsOpList *ff_sws_op_list_alloc(void);
--
2.49.1
_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2025-12-22 14:49 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-22 14:48 [FFmpeg-devel] [PATCH] swscale/ops: generate correct value range in ff_sws_decode_pixfmt(), remove hacks from optimizer (PR #21264) Niklas Haas via ffmpeg-devel
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git