[FFmpeg-devel] [PR] WIP: libavformat/movenc: Uses dynamic buffers for fragmented chunks (PR #21613)

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed

* [FFmpeg-devel] [PR] WIP: libavformat/movenc: Uses dynamic buffers for fragmented chunks (PR #21613)
@ 2026-01-30 23:22 anthonybajoua via ffmpeg-devel
  0 siblings, 0 replies; only message in thread
From: anthonybajoua via ffmpeg-devel @ 2026-01-30 23:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: anthonybajoua

PR #21613 opened by anthonybajoua
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21613
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21613.patch

# Problem:
There are frequent [realloc](https://pubs.opengroup.org/onlinepubs/7908799/xsh/realloc.html) calls implemented when using hybrid/fragmented MP4 in chunking.


[This was done for MKV](https://github.com/FFmpeg/FFmpeg/commit/4d97b2ad2fa6d851c70fd982ab300e4fd559f1d0) but hybrid/fragmented MP4 notably have it missing.


# Solution:
Uses `dyn_buf` directives in managing fragmented, hybrid MP4 memory


## Automated
```
./configure
make
make fate-rsync SAMPLES=fate-suite
make
```

## Manual:


```
curl https://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4 -o bbb.mp4

./ffmpeg -i bbb.mp4 -c copy -movflags hybrid_fragmented+delay_moov out.mp4

./ffprobe out.mp4
```


From 8e17afa45687996752388301d227c5eaa07903f9 Mon Sep 17 00:00:00 2001
From: Anthony Bajoua <anthonybajoua@meta.com>
Date: Mon, 17 Nov 2025 22:45:53 -0800
Subject: [PATCH 001/304] libavformat/mov: Fixes individual track duration on
 fragmented files

---
 libavformat/mov.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libavformat/mov.c b/libavformat/mov.c
index eab9f79577..721ffdcca0 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -6183,7 +6183,8 @@ static int mov_read_sidx(MOVContext *c, AVIOContext *pb, MOVAtom atom)
             }
         }
 
-        c->frag_index.complete = 1;
+        if (offadd == 0)
+            c->frag_index.complete = 1;
     }
 
     return 0;
-- 
2.52.0


From f427cc1e7796389ade72e391b4ca970db700fe17 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 17 Nov 2025 19:57:31 +0100
Subject: [PATCH 002/304] fate: add more configure flags to fate config
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 tests/fate.sh | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tests/fate.sh b/tests/fate.sh
index 4081e865ae..2d6313820f 100755
--- a/tests/fate.sh
+++ b/tests/fate.sh
@@ -55,13 +55,17 @@ configure()(
         ${cross_prefix:+--cross-prefix="$cross_prefix"}                 \
         ${as:+--as="$as"}                                               \
         ${cc:+--cc="$cc"}                                               \
+        ${cxx:+--cxx="$cxx"}                                            \
         ${ld:+--ld="$ld"}                                               \
+        ${nm:+--nm="$nm"}                                               \
         ${target_os:+--target-os="$target_os"}                          \
         ${sysroot:+--sysroot="$sysroot"}                                \
         ${target_exec:+--target-exec="$target_exec"}                    \
         ${target_path:+--target-path="$target_path"}                    \
         ${target_samples:+--target-samples="$target_samples"}           \
         ${extra_cflags:+--extra-cflags="$extra_cflags"}                 \
+        ${extra_cxxflags:+--extra-cxxflags="$extra_cxxflags"}           \
+        ${extra_objcflags:+--extra-objcflags="$extra_objcflags"}        \
         ${extra_ldflags:+--extra-ldflags="$extra_ldflags"}              \
         ${extra_libs:+--extra-libs="$extra_libs"}                       \
         ${extra_conf}
-- 
2.52.0


From 5f0c87e116079694bdaee9d39f7da4f84310be42 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 17 Nov 2025 21:20:16 +0100
Subject: [PATCH 003/304] configure: filter out -guard:signret from armasm
 flags
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

While cl.exe supports -guard:signret, armasm64 complains about
unknown flag. Note that -guard:ehcont is accepted by armasm64.

Fixes:
error A2029: unknown command-line argument or argument value -guard:signret

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 configure | 1 +
 1 file changed, 1 insertion(+)

diff --git a/configure b/configure
index 441bdc9373..e2caf3b24c 100755
--- a/configure
+++ b/configure
@@ -4968,6 +4968,7 @@ armasm_flags(){
             # Filter out MSVC cl.exe options from cflags that shouldn't
             # be passed to gas-preprocessor
             -M[TD]*)                                            ;;
+            -guard:signret)                                     ;;
             *)                  echo $flag                      ;;
         esac
    done
-- 
2.52.0


From 23776b859bbb5ac3f8a506c6a3aec0b7e7d5e406 Mon Sep 17 00:00:00 2001
From: Gyan Doshi <ffmpeg@gyani.pro>
Date: Sun, 16 Nov 2025 10:24:38 +0530
Subject: [PATCH 004/304] doc/fate: document setting of session-wide env
 variables

---
 doc/fate.texi | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/doc/fate.texi b/doc/fate.texi
index 57268d44b4..12c884047c 100644
--- a/doc/fate.texi
+++ b/doc/fate.texi
@@ -65,7 +65,7 @@ make fate
 See @ref{makefile variables} for a list of arguments that can be added.
 
 If you did not set the samples path during configuration, or if you wish to
-override it just before starting FATE, you can do so in one of two ways.
+override it just before starting FATE, you can do so in one of three ways.
 
 Either by setting a make variable:
 
@@ -74,7 +74,15 @@ make fate-rsync SAMPLES=/path/to/fate-suite
 make fate       SAMPLES=/path/to/fate-suite
 @end example
 
-or by prepending an environment variable:
+or by setting an environment variable for the current session:
+
+@example
+export FATE_SAMPLES=/path/to/fate-suite
+make fate-rsync
+make fate
+@end example
+
+or in isolation for a single command by prepending it:
 
 @example
 FATE_SAMPLES=/path/to/fate-suite make fate-rsync
-- 
2.52.0


From 2fa531355ab6d9a33af6fbb131f112ca6ca94b88 Mon Sep 17 00:00:00 2001
From: Carl Hetherington via ffmpeg-devel <ffmpeg-devel@ffmpeg.org>
Date: Mon, 3 Nov 2025 20:38:57 +0000
Subject: [PATCH 005/304] avfilter/f_ebur128: Fix incorrect ebur128 peak
 calculation.

Since 3b26b782eeded9b9ab7fac013cd1a83a30d68206 it would only look at the
first channel.

Signed-off-by: Carl Hetherington <cth@carlh.net>
Reviewed-by: Niklas Haas <ffmpeg@haasn.xyz>
---
 libavfilter/f_ebur128.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavfilter/f_ebur128.c b/libavfilter/f_ebur128.c
index a352f3831f..84d8e44035 100644
--- a/libavfilter/f_ebur128.c
+++ b/libavfilter/f_ebur128.c
@@ -657,7 +657,7 @@ double ff_ebur128_find_peak_c(double *restrict ch_peaks, const int nb_channels,
     for (int ch = 0; ch < nb_channels; ch++) {
         double ch_peak = ch_peaks[ch];
         for (int i = 0; i < nb_samples; i++) {
-            const double sample = fabs(samples[i * nb_channels]);
+            const double sample = fabs(samples[i * nb_channels + ch]);
             ch_peak = FFMAX(ch_peak, sample);
         }
         maxpeak = FFMAX(maxpeak, ch_peak);
-- 
2.52.0


From 9a068a4b5b91b9917e5c6ffc83327c6fb15fcfb3 Mon Sep 17 00:00:00 2001
From: Marvin Scholz <epirat07@gmail.com>
Date: Tue, 18 Nov 2025 15:17:05 +0100
Subject: [PATCH 006/304] .forgejo/CODEOWNERS: add myself to VideoToolbox and
 Icecast

---
 .forgejo/CODEOWNERS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/.forgejo/CODEOWNERS b/.forgejo/CODEOWNERS
index d8ffa27426..45f501a4c3 100644
--- a/.forgejo/CODEOWNERS
+++ b/.forgejo/CODEOWNERS
@@ -62,6 +62,7 @@ libavcodec/smpte_436m.* @programmerjake
 libavcodec/svq1.* @pross
 libavcodec/svq3.* @pross
 libavcodec/.*vc2.* @lynne
+libavcodec/videotoolbox.* @ePirat
 libavcodec/vp3.* @pross
 libavcodec/vp4.* @pross
 libavcodec/vp5.* @pross
@@ -134,6 +135,7 @@ libavformat/.*exif.* @Traneptora
 libavformat/filmstrip.* @pross
 libavformat/frm.* @pross
 libavformat/iamf.* @jamrial
+libavformat/icecast.c @ePirat
 libavformat/ico.* @pross
 libavformat/iff.* @pross
 libavformat/.*jpegxl.* @Traneptora
@@ -165,6 +167,7 @@ libavutil/film_grain.* @haasn
 libavutil/dovi_meta.* @haasn
 libavutil/hwcontext_oh.* @quink
 libavutil/hwcontext_mediacodec.* @quink
+libavutil/hwcontext_videotoolbox.* @ePirat
 libavutil/iamf.* @jamrial
 libavutil/integer.* @michaelni
 libavutil/lfg.* @michaelni
-- 
2.52.0


From a5ef8780a2ea6ca752e3b75cdbc121308aa7d8b6 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sat, 15 Nov 2025 16:18:16 +0100
Subject: [PATCH 007/304] avcodec/x86/mpegvideoenc: Remove check for MMX

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/mpegvideoenc.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/libavcodec/x86/mpegvideoenc.c b/libavcodec/x86/mpegvideoenc.c
index eac9947590..bb1d2cc319 100644
--- a/libavcodec/x86/mpegvideoenc.c
+++ b/libavcodec/x86/mpegvideoenc.c
@@ -123,16 +123,14 @@ av_cold void ff_dct_encode_init_x86(MPVEncContext *const s)
     const int dct_algo = s->c.avctx->dct_algo;
 
     if (dct_algo == FF_DCT_AUTO || dct_algo == FF_DCT_MMX) {
-#if HAVE_MMX_INLINE
-        int cpu_flags = av_get_cpu_flags();
 #if HAVE_SSE2_INLINE
+        int cpu_flags = av_get_cpu_flags();
         if (INLINE_SSE2(cpu_flags)) {
 #if HAVE_6REGS
             s->dct_quantize = dct_quantize_sse2;
 #endif
             s->denoise_dct  = denoise_dct_sse2;
         }
-#endif
 #if HAVE_6REGS && HAVE_SSSE3_INLINE
         if (INLINE_SSSE3(cpu_flags))
             s->dct_quantize = dct_quantize_ssse3;
-- 
2.52.0


From a3191f2f90098427f8ca33106a285a2c5afc0f80 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sat, 15 Nov 2025 16:46:18 +0100
Subject: [PATCH 008/304] avcodec/x86/mpegvideoenc: Reduce number of registers
 used

Avoids a push+pop on x64 Windows.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/mpegvideoenc.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/libavcodec/x86/mpegvideoenc.c b/libavcodec/x86/mpegvideoenc.c
index bb1d2cc319..2ca05f69ea 100644
--- a/libavcodec/x86/mpegvideoenc.c
+++ b/libavcodec/x86/mpegvideoenc.c
@@ -68,7 +68,7 @@ static void denoise_dct_sse2(MPVEncContext *const s, int16_t block[])
     s->dct_count[intra]++;
 
     __asm__ volatile(
-        "pxor %%xmm7, %%xmm7                    \n\t"
+        "pxor %%xmm6, %%xmm6                    \n\t"
         "1:                                     \n\t"
         "pxor %%xmm0, %%xmm0                    \n\t"
         "pxor %%xmm1, %%xmm1                    \n\t"
@@ -90,18 +90,18 @@ static void denoise_dct_sse2(MPVEncContext *const s, int16_t block[])
         "psubw %%xmm1, %%xmm3                   \n\t"
         "movdqa %%xmm2, (%0)                    \n\t"
         "movdqa %%xmm3, 16(%0)                  \n\t"
-        "movdqa %%xmm4, %%xmm6                  \n\t"
+        "movdqa %%xmm4, %%xmm2                  \n\t"
         "movdqa %%xmm5, %%xmm0                  \n\t"
-        "punpcklwd %%xmm7, %%xmm4               \n\t"
-        "punpckhwd %%xmm7, %%xmm6               \n\t"
-        "punpcklwd %%xmm7, %%xmm5               \n\t"
-        "punpckhwd %%xmm7, %%xmm0               \n\t"
+        "punpcklwd %%xmm6, %%xmm4               \n\t"
+        "punpckhwd %%xmm6, %%xmm2               \n\t"
+        "punpcklwd %%xmm6, %%xmm5               \n\t"
+        "punpckhwd %%xmm6, %%xmm0               \n\t"
         "paddd (%1), %%xmm4                     \n\t"
-        "paddd 16(%1), %%xmm6                   \n\t"
+        "paddd 16(%1), %%xmm2                   \n\t"
         "paddd 32(%1), %%xmm5                   \n\t"
         "paddd 48(%1), %%xmm0                   \n\t"
         "movdqa %%xmm4, (%1)                    \n\t"
-        "movdqa %%xmm6, 16(%1)                  \n\t"
+        "movdqa %%xmm2, 16(%1)                  \n\t"
         "movdqa %%xmm5, 32(%1)                  \n\t"
         "movdqa %%xmm0, 48(%1)                  \n\t"
         "add $32, %0                            \n\t"
@@ -112,7 +112,7 @@ static void denoise_dct_sse2(MPVEncContext *const s, int16_t block[])
         : "+r" (block), "+r" (sum), "+r" (offset)
         : "r"(block+64)
           XMM_CLOBBERS_ONLY("%xmm0", "%xmm1", "%xmm2", "%xmm3",
-                            "%xmm4", "%xmm5", "%xmm6", "%xmm7")
+                            "%xmm4", "%xmm5", "%xmm6")
     );
 }
 #endif /* HAVE_SSE2_INLINE */
-- 
2.52.0


From dc52acd691827a5f929ed0ee2864bf3316109d96 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sat, 15 Nov 2025 17:32:29 +0100
Subject: [PATCH 009/304] avcodec/x86/mpegvideoenc: Port denoise_dct_sse2 to
 external assembly

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/mpegvideoenc.c      | 59 ++++--------------------------
 libavcodec/x86/mpegvideoencdsp.asm | 46 +++++++++++++++++++++++
 2 files changed, 54 insertions(+), 51 deletions(-)

diff --git a/libavcodec/x86/mpegvideoenc.c b/libavcodec/x86/mpegvideoenc.c
index 2ca05f69ea..e5665ac781 100644
--- a/libavcodec/x86/mpegvideoenc.c
+++ b/libavcodec/x86/mpegvideoenc.c
@@ -57,8 +57,10 @@ DECLARE_ALIGNED(16, static const uint16_t, inv_zigzag_direct16)[64] = {
 
 #endif /* HAVE_6REGS */
 
-#if HAVE_INLINE_ASM
-#if HAVE_SSE2_INLINE
+#if HAVE_SSE2_EXTERNAL
+void ff_mpv_denoise_dct_sse2(int16_t block[64], int dct_error_sum[64],
+                             const uint16_t dct_offset[64]);
+
 static void denoise_dct_sse2(MPVEncContext *const s, int16_t block[])
 {
     const int intra = s->c.mb_intra;
@@ -67,56 +69,9 @@ static void denoise_dct_sse2(MPVEncContext *const s, int16_t block[])
 
     s->dct_count[intra]++;
 
-    __asm__ volatile(
-        "pxor %%xmm6, %%xmm6                    \n\t"
-        "1:                                     \n\t"
-        "pxor %%xmm0, %%xmm0                    \n\t"
-        "pxor %%xmm1, %%xmm1                    \n\t"
-        "movdqa (%0), %%xmm2                    \n\t"
-        "movdqa 16(%0), %%xmm3                  \n\t"
-        "pcmpgtw %%xmm2, %%xmm0                 \n\t"
-        "pcmpgtw %%xmm3, %%xmm1                 \n\t"
-        "pxor %%xmm0, %%xmm2                    \n\t"
-        "pxor %%xmm1, %%xmm3                    \n\t"
-        "psubw %%xmm0, %%xmm2                   \n\t"
-        "psubw %%xmm1, %%xmm3                   \n\t"
-        "movdqa %%xmm2, %%xmm4                  \n\t"
-        "movdqa %%xmm3, %%xmm5                  \n\t"
-        "psubusw (%2), %%xmm2                   \n\t"
-        "psubusw 16(%2), %%xmm3                 \n\t"
-        "pxor %%xmm0, %%xmm2                    \n\t"
-        "pxor %%xmm1, %%xmm3                    \n\t"
-        "psubw %%xmm0, %%xmm2                   \n\t"
-        "psubw %%xmm1, %%xmm3                   \n\t"
-        "movdqa %%xmm2, (%0)                    \n\t"
-        "movdqa %%xmm3, 16(%0)                  \n\t"
-        "movdqa %%xmm4, %%xmm2                  \n\t"
-        "movdqa %%xmm5, %%xmm0                  \n\t"
-        "punpcklwd %%xmm6, %%xmm4               \n\t"
-        "punpckhwd %%xmm6, %%xmm2               \n\t"
-        "punpcklwd %%xmm6, %%xmm5               \n\t"
-        "punpckhwd %%xmm6, %%xmm0               \n\t"
-        "paddd (%1), %%xmm4                     \n\t"
-        "paddd 16(%1), %%xmm2                   \n\t"
-        "paddd 32(%1), %%xmm5                   \n\t"
-        "paddd 48(%1), %%xmm0                   \n\t"
-        "movdqa %%xmm4, (%1)                    \n\t"
-        "movdqa %%xmm2, 16(%1)                  \n\t"
-        "movdqa %%xmm5, 32(%1)                  \n\t"
-        "movdqa %%xmm0, 48(%1)                  \n\t"
-        "add $32, %0                            \n\t"
-        "add $64, %1                            \n\t"
-        "add $32, %2                            \n\t"
-        "cmp %3, %0                             \n\t"
-            " jb 1b                             \n\t"
-        : "+r" (block), "+r" (sum), "+r" (offset)
-        : "r"(block+64)
-          XMM_CLOBBERS_ONLY("%xmm0", "%xmm1", "%xmm2", "%xmm3",
-                            "%xmm4", "%xmm5", "%xmm6")
-    );
+    ff_mpv_denoise_dct_sse2(block, sum, offset);
 }
-#endif /* HAVE_SSE2_INLINE */
-#endif /* HAVE_INLINE_ASM */
+#endif /* HAVE_SSE2_EXTERNAL */
 
 av_cold void ff_dct_encode_init_x86(MPVEncContext *const s)
 {
@@ -129,7 +84,9 @@ av_cold void ff_dct_encode_init_x86(MPVEncContext *const s)
 #if HAVE_6REGS
             s->dct_quantize = dct_quantize_sse2;
 #endif
+#if HAVE_SSE2_EXTERNAL
             s->denoise_dct  = denoise_dct_sse2;
+#endif
         }
 #if HAVE_6REGS && HAVE_SSSE3_INLINE
         if (INLINE_SSSE3(cpu_flags))
diff --git a/libavcodec/x86/mpegvideoencdsp.asm b/libavcodec/x86/mpegvideoencdsp.asm
index d12646ae54..0e86a5304c 100644
--- a/libavcodec/x86/mpegvideoencdsp.asm
+++ b/libavcodec/x86/mpegvideoencdsp.asm
@@ -24,6 +24,52 @@
 %include "libavutil/x86/x86util.asm"
 
 SECTION .text
+
+INIT_XMM sse2
+cglobal mpv_denoise_dct, 3, 4, 7, block, sum, offset
+    pxor            m6, m6
+    lea             r3, [sumq+256]
+.loop:
+    mova            m2, [blockq]
+    mova            m3, [blockq+16]
+    mova            m0, m6
+    mova            m1, m6
+    pcmpgtw         m0, m2
+    pcmpgtw         m1, m3
+    pxor            m2, m0
+    pxor            m3, m1
+    psubw           m2, m0
+    psubw           m3, m1
+    psubusw         m4, m2, [offsetq]
+    psubusw         m5, m3, [offsetq+16]
+    pxor            m4, m0
+    pxor            m5, m1
+    add        offsetq, 32
+    psubw           m4, m0
+    psubw           m5, m1
+    mova      [blockq], m4
+    mova   [blockq+16], m5
+    mova            m0, m2
+    mova            m1, m3
+    add         blockq, 32
+    punpcklwd       m0, m6
+    punpckhwd       m2, m6
+    punpcklwd       m1, m6
+    punpckhwd       m3, m6
+    paddd           m0, [sumq]
+    paddd           m2, [sumq+16]
+    paddd           m1, [sumq+32]
+    paddd           m3, [sumq+48]
+    mova        [sumq], m0
+    mova     [sumq+16], m2
+    mova     [sumq+32], m1
+    mova     [sumq+48], m3
+    add           sumq, 64
+    cmp           sumq, r3
+    jb           .loop
+    RET
+
+
 ; int ff_pix_sum16(const uint8_t *pix, ptrdiff_t line_size)
 ; %1 = number of loops
 ; %2 = number of GPRs used
-- 
2.52.0


From dd1a6f715daee74050cb0359f61338aca73326c2 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sat, 15 Nov 2025 18:24:18 +0100
Subject: [PATCH 010/304] avcodec/mpegvideo_enc: Port denoise_dct to
 MpegvideoEncDSPContext

It is very simple to remove the MPVEncContext from it.
Notice that this also fixes a bug in x86/mpegvideoenc.c: It only
used the SSE2 version of denoise_dct when dct_algo was auto or mmx
(and it was therefore unused during FATE).

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/mips/Makefile                      |  3 +-
 libavcodec/mips/mpegvideo_mips.h              |  3 +-
 libavcodec/mips/mpegvideoenc_init_mips.c      | 33 ----------------
 libavcodec/mips/mpegvideoencdsp_init_mips.c   |  5 +++
 ...egvideoenc_mmi.c => mpegvideoencdsp_mmi.c} |  7 +---
 libavcodec/mpegvideo_enc.c                    | 38 +++++--------------
 libavcodec/mpegvideoenc.h                     |  2 -
 libavcodec/mpegvideoencdsp.c                  | 25 ++++++++++++
 libavcodec/mpegvideoencdsp.h                  |  3 ++
 libavcodec/x86/mpegvideoenc.c                 | 19 ----------
 libavcodec/x86/mpegvideoenc_template.c        |  7 +++-
 libavcodec/x86/mpegvideoencdsp_init.c         |  3 ++
 12 files changed, 53 insertions(+), 95 deletions(-)
 delete mode 100644 libavcodec/mips/mpegvideoenc_init_mips.c
 rename libavcodec/mips/{mpegvideoenc_mmi.c => mpegvideoencdsp_mmi.c} (95%)

diff --git a/libavcodec/mips/Makefile b/libavcodec/mips/Makefile
index 4bbc2f00ea..1d777293d0 100644
--- a/libavcodec/mips/Makefile
+++ b/libavcodec/mips/Makefile
@@ -54,7 +54,6 @@ OBJS-$(CONFIG_BLOCKDSP)                   += mips/blockdsp_init_mips.o
 OBJS-$(CONFIG_PIXBLOCKDSP)                += mips/pixblockdsp_init_mips.o
 OBJS-$(CONFIG_IDCTDSP)                    += mips/idctdsp_init_mips.o
 OBJS-$(CONFIG_MPEGVIDEO)                  += mips/mpegvideo_init_mips.o
-OBJS-$(CONFIG_MPEGVIDEOENC)               += mips/mpegvideoenc_init_mips.o
 OBJS-$(CONFIG_MPEGVIDEOENCDSP)            += mips/mpegvideoencdsp_init_mips.o
 OBJS-$(CONFIG_ME_CMP)                     += mips/me_cmp_init_mips.o
 OBJS-$(CONFIG_MPEG4_DECODER)              += mips/xvididct_init_mips.o
@@ -100,7 +99,7 @@ MMI-OBJS-$(CONFIG_H264DSP)                += mips/h264dsp_mmi.o
 MMI-OBJS-$(CONFIG_H264CHROMA)             += mips/h264chroma_mmi.o
 MMI-OBJS-$(CONFIG_H264PRED)               += mips/h264pred_mmi.o
 MMI-OBJS-$(CONFIG_MPEGVIDEO)              += mips/mpegvideo_mmi.o
-MMI-OBJS-$(CONFIG_MPEGVIDEOENC)           += mips/mpegvideoenc_mmi.o
+MMI-OBJS-$(CONFIG_MPEGVIDEOENCDSP)        += mips/mpegvideoenc_mmi.o
 MMI-OBJS-$(CONFIG_IDCTDSP)                += mips/idctdsp_mmi.o           \
                                              mips/simple_idct_mmi.o
 MMI-OBJS-$(CONFIG_MPEG4_DECODER)          += mips/xvid_idct_mmi.o
diff --git a/libavcodec/mips/mpegvideo_mips.h b/libavcodec/mips/mpegvideo_mips.h
index 72ffed6985..2a9ea4006e 100644
--- a/libavcodec/mips/mpegvideo_mips.h
+++ b/libavcodec/mips/mpegvideo_mips.h
@@ -22,7 +22,6 @@
 #define AVCODEC_MIPS_MPEGVIDEO_MIPS_H
 
 #include "libavcodec/mpegvideo.h"
-#include "libavcodec/mpegvideoenc.h"
 
 void ff_dct_unquantize_h263_intra_mmi(MpegEncContext *s, int16_t *block,
         int n, int qscale);
@@ -34,6 +33,6 @@ void ff_dct_unquantize_mpeg1_inter_mmi(MpegEncContext *s, int16_t *block,
         int n, int qscale);
 void ff_dct_unquantize_mpeg2_intra_mmi(MpegEncContext *s, int16_t *block,
         int n, int qscale);
-void ff_denoise_dct_mmi(MPVEncContext *s, int16_t *block);
+void ff_denoise_dct_mmi(int16_t block[64], int sum[64], const uint16_t offset[64]);
 
 #endif /* AVCODEC_MIPS_MPEGVIDEO_MIPS_H */
diff --git a/libavcodec/mips/mpegvideoenc_init_mips.c b/libavcodec/mips/mpegvideoenc_init_mips.c
deleted file mode 100644
index 7831973eb8..0000000000
--- a/libavcodec/mips/mpegvideoenc_init_mips.c
+++ /dev/null
@@ -1,33 +0,0 @@
-/*
- * Copyright (c) 2015 Manojkumar Bhosale (Manojkumar.Bhosale@imgtec.com)
- *
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#include "libavutil/attributes.h"
-#include "libavutil/mips/cpu.h"
-#include "libavcodec/mpegvideoenc.h"
-#include "mpegvideo_mips.h"
-
-av_cold void ff_mpvenc_dct_init_mips(MPVEncContext *s)
-{
-    int cpu_flags = av_get_cpu_flags();
-
-    if (have_mmi(cpu_flags)) {
-        s->denoise_dct = ff_denoise_dct_mmi;
-    }
-}
diff --git a/libavcodec/mips/mpegvideoencdsp_init_mips.c b/libavcodec/mips/mpegvideoencdsp_init_mips.c
index 24a17b91db..df916282a2 100644
--- a/libavcodec/mips/mpegvideoencdsp_init_mips.c
+++ b/libavcodec/mips/mpegvideoencdsp_init_mips.c
@@ -23,12 +23,17 @@
 #include "libavcodec/bit_depth_template.c"
 #include "libavcodec/mpegvideoencdsp.h"
 #include "h263dsp_mips.h"
+#include "mpegvideo_mips.h"
 
 av_cold void ff_mpegvideoencdsp_init_mips(MpegvideoEncDSPContext *c,
                                           AVCodecContext *avctx)
 {
     int cpu_flags = av_get_cpu_flags();
 
+    if (have_mmi(cpu_flags)) {
+        c->denoise_dct = ff_denoise_dct_mmi;
+    }
+
     if (have_msa(cpu_flags)) {
 #if BIT_DEPTH == 8
         c->pix_sum = ff_pix_sum_msa;
diff --git a/libavcodec/mips/mpegvideoenc_mmi.c b/libavcodec/mips/mpegvideoencdsp_mmi.c
similarity index 95%
rename from libavcodec/mips/mpegvideoenc_mmi.c
rename to libavcodec/mips/mpegvideoencdsp_mmi.c
index 085be3b0ec..2239a05978 100644
--- a/libavcodec/mips/mpegvideoenc_mmi.c
+++ b/libavcodec/mips/mpegvideoencdsp_mmi.c
@@ -25,17 +25,12 @@
 #include "mpegvideo_mips.h"
 #include "libavutil/mips/mmiutils.h"
 
-void ff_denoise_dct_mmi(MPVEncContext *s, int16_t *block)
+void ff_denoise_dct_mmi(int16_t block[64], int sum[64], const uint16_t offset[64])
 {
-    const int intra = s->c.mb_intra;
-    int *sum = s->dct_error_sum[intra];
-    uint16_t *offset = s->dct_offset[intra];
     double ftmp[8];
     mips_reg addr[1];
     DECLARE_VAR_ALL64;
 
-    s->dct_count[intra]++;
-
     __asm__ volatile(
         "pxor       %[ftmp0],   %[ftmp0],       %[ftmp0]                \n\t"
         "1:                                                             \n\t"
diff --git a/libavcodec/mpegvideo_enc.c b/libavcodec/mpegvideo_enc.c
index ce0ee4bb68..9e83026b51 100644
--- a/libavcodec/mpegvideo_enc.c
+++ b/libavcodec/mpegvideo_enc.c
@@ -86,7 +86,6 @@
 static int encode_picture(MPVMainEncContext *const s, const AVPacket *pkt);
 static int dct_quantize_refine(MPVEncContext *const s, int16_t *block, int16_t *weight, int16_t *orig, int n, int qscale);
 static int sse_mb(MPVEncContext *const s);
-static void denoise_dct_c(MPVEncContext *const s, int16_t *block);
 static int dct_quantize_c(MPVEncContext *const s,
                           int16_t *block, int n,
                           int qscale, int *overflow);
@@ -300,11 +299,8 @@ static av_cold void mpv_encode_defaults(MPVMainEncContext *const m)
 av_cold void ff_dct_encode_init(MPVEncContext *const s)
 {
     s->dct_quantize = dct_quantize_c;
-    s->denoise_dct  = denoise_dct_c;
 
-#if ARCH_MIPS
-    ff_mpvenc_dct_init_mips(s);
-#elif ARCH_X86
+#if ARCH_X86
     ff_dct_encode_init_x86(s);
 #endif
 
@@ -3955,29 +3951,14 @@ static int encode_picture(MPVMainEncContext *const m, const AVPacket *pkt)
     return 0;
 }
 
-static void denoise_dct_c(MPVEncContext *const s, int16_t *block)
+static inline void denoise_dct(MPVEncContext *const s, int16_t block[])
 {
+    if (!s->dct_error_sum)
+        return;
+
     const int intra = s->c.mb_intra;
-    int i;
-
     s->dct_count[intra]++;
-
-    for(i=0; i<64; i++){
-        int level= block[i];
-
-        if(level){
-            if(level>0){
-                s->dct_error_sum[intra][i] += level;
-                level -= s->dct_offset[intra][i];
-                if(level<0) level=0;
-            }else{
-                s->dct_error_sum[intra][i] -= level;
-                level += s->dct_offset[intra][i];
-                if(level>0) level=0;
-            }
-            block[i]= level;
-        }
-    }
+    s->mpvencdsp.denoise_dct(block, s->dct_error_sum[intra], s->dct_offset[intra]);
 }
 
 static int dct_quantize_trellis_c(MPVEncContext *const s,
@@ -4009,8 +3990,8 @@ static int dct_quantize_trellis_c(MPVEncContext *const s,
 
     s->fdsp.fdct(block);
 
-    if(s->dct_error_sum)
-        s->denoise_dct(s, block);
+    denoise_dct(s, block);
+
     qmul= qscale*16;
     qadd= ((qscale-1)|1)*8;
 
@@ -4678,8 +4659,7 @@ static int dct_quantize_c(MPVEncContext *const s,
 
     s->fdsp.fdct(block);
 
-    if(s->dct_error_sum)
-        s->denoise_dct(s, block);
+    denoise_dct(s, block);
 
     if (s->c.mb_intra) {
         scantable = s->c.intra_scantable.scantable;
diff --git a/libavcodec/mpegvideoenc.h b/libavcodec/mpegvideoenc.h
index ee115c3611..131908c10a 100644
--- a/libavcodec/mpegvideoenc.h
+++ b/libavcodec/mpegvideoenc.h
@@ -123,7 +123,6 @@ typedef struct MPVEncContext {
     uint16_t (*q_inter_matrix16)[2][64];
 
     /* noise reduction */
-    void (*denoise_dct)(struct MPVEncContext *s, int16_t *block);
     int (*dct_error_sum)[64];
     int dct_count[2];
     uint16_t (*dct_offset)[64];
@@ -397,7 +396,6 @@ int ff_mpv_reallocate_putbitbuffer(MPVEncContext *s, size_t threshold, size_t si
 void ff_write_quant_matrix(PutBitContext *pb, uint16_t *matrix);
 
 void ff_dct_encode_init(MPVEncContext *s);
-void ff_mpvenc_dct_init_mips(MPVEncContext *s);
 void ff_dct_encode_init_x86(MPVEncContext *s);
 
 void ff_convert_matrix(MPVEncContext *s, int (*qmat)[64], uint16_t (*qmat16)[2][64],
diff --git a/libavcodec/mpegvideoencdsp.c b/libavcodec/mpegvideoencdsp.c
index b4fd2af915..3b4a57d58a 100644
--- a/libavcodec/mpegvideoencdsp.c
+++ b/libavcodec/mpegvideoencdsp.c
@@ -28,6 +28,29 @@
 #include "mathops.h"
 #include "mpegvideoencdsp.h"
 
+static void denoise_dct_c(int16_t block[64], int dct_error_sum[64],
+                          const uint16_t dct_offset[64])
+{
+    for (int i = 0; i < 64; ++i) {
+        int level = block[i];
+
+        if (level) {
+            if (level > 0) {
+                dct_error_sum[i] += level;
+                level -= dct_offset[i];
+                if (level < 0)
+                    level = 0;
+            } else {
+                dct_error_sum[i] -= level;
+                level += dct_offset[i];
+                if (level > 0)
+                    level = 0;
+            }
+            block[i] = level;
+        }
+    }
+}
+
 static int try_8x8basis_c(const int16_t rem[64], const int16_t weight[64],
                           const int16_t basis[64], int scale)
 {
@@ -253,6 +276,8 @@ static void shrink88(uint8_t *dst, ptrdiff_t dst_wrap,
 av_cold void ff_mpegvideoencdsp_init(MpegvideoEncDSPContext *c,
                                      AVCodecContext *avctx)
 {
+    c->denoise_dct  = denoise_dct_c;
+
     c->try_8x8basis = try_8x8basis_c;
     c->add_8x8basis = add_8x8basis_c;
 
diff --git a/libavcodec/mpegvideoencdsp.h b/libavcodec/mpegvideoencdsp.h
index 6ec665677b..989503f25f 100644
--- a/libavcodec/mpegvideoencdsp.h
+++ b/libavcodec/mpegvideoencdsp.h
@@ -30,6 +30,9 @@
 #define EDGE_BOTTOM 2
 
 typedef struct MpegvideoEncDSPContext {
+    void (*denoise_dct)(int16_t block[64], int dct_error_sum[64],
+                        const uint16_t dct_offset[64]);
+
     int (*try_8x8basis)(const int16_t rem[64], const int16_t weight[64],
                         const int16_t basis[64], int scale);
     void (*add_8x8basis)(int16_t rem[64], const int16_t basis[64], int scale);
diff --git a/libavcodec/x86/mpegvideoenc.c b/libavcodec/x86/mpegvideoenc.c
index e5665ac781..c667dcd2a2 100644
--- a/libavcodec/x86/mpegvideoenc.c
+++ b/libavcodec/x86/mpegvideoenc.c
@@ -57,22 +57,6 @@ DECLARE_ALIGNED(16, static const uint16_t, inv_zigzag_direct16)[64] = {
 
 #endif /* HAVE_6REGS */
 
-#if HAVE_SSE2_EXTERNAL
-void ff_mpv_denoise_dct_sse2(int16_t block[64], int dct_error_sum[64],
-                             const uint16_t dct_offset[64]);
-
-static void denoise_dct_sse2(MPVEncContext *const s, int16_t block[])
-{
-    const int intra = s->c.mb_intra;
-    int *sum= s->dct_error_sum[intra];
-    uint16_t *offset= s->dct_offset[intra];
-
-    s->dct_count[intra]++;
-
-    ff_mpv_denoise_dct_sse2(block, sum, offset);
-}
-#endif /* HAVE_SSE2_EXTERNAL */
-
 av_cold void ff_dct_encode_init_x86(MPVEncContext *const s)
 {
     const int dct_algo = s->c.avctx->dct_algo;
@@ -83,9 +67,6 @@ av_cold void ff_dct_encode_init_x86(MPVEncContext *const s)
         if (INLINE_SSE2(cpu_flags)) {
 #if HAVE_6REGS
             s->dct_quantize = dct_quantize_sse2;
-#endif
-#if HAVE_SSE2_EXTERNAL
-            s->denoise_dct  = denoise_dct_sse2;
 #endif
         }
 #if HAVE_6REGS && HAVE_SSSE3_INLINE
diff --git a/libavcodec/x86/mpegvideoenc_template.c b/libavcodec/x86/mpegvideoenc_template.c
index f0b95c1621..14e993de2b 100644
--- a/libavcodec/x86/mpegvideoenc_template.c
+++ b/libavcodec/x86/mpegvideoenc_template.c
@@ -76,8 +76,11 @@ static int RENAME(dct_quantize)(MPVEncContext *const s,
     //s->fdct (block);
     ff_fdct_sse2(block); // cannot be anything else ...
 
-    if(s->dct_error_sum)
-        s->denoise_dct(s, block);
+    if (s->dct_error_sum) {
+        const int intra = s->c.mb_intra;
+        s->dct_count[intra]++;
+        s->mpvencdsp.denoise_dct(block, s->dct_error_sum[intra], s->dct_offset[intra]);
+    }
 
     if (s->c.mb_intra) {
         int dummy;
diff --git a/libavcodec/x86/mpegvideoencdsp_init.c b/libavcodec/x86/mpegvideoencdsp_init.c
index bf5b722016..f6169b5399 100644
--- a/libavcodec/x86/mpegvideoencdsp_init.c
+++ b/libavcodec/x86/mpegvideoencdsp_init.c
@@ -27,6 +27,8 @@
 #include "libavcodec/avcodec.h"
 #include "libavcodec/mpegvideoencdsp.h"
 
+void ff_mpv_denoise_dct_sse2(int16_t block[64], int dct_error_sum[64],
+                             const uint16_t dct_offset[64]);
 int ff_pix_sum16_sse2(const uint8_t *pix, ptrdiff_t line_size);
 int ff_pix_sum16_xop(const uint8_t *pix, ptrdiff_t line_size);
 int ff_pix_norm1_sse2(const uint8_t *pix, ptrdiff_t line_size);
@@ -209,6 +211,7 @@ av_cold void ff_mpegvideoencdsp_init_x86(MpegvideoEncDSPContext *c,
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_SSE2(cpu_flags)) {
+        c->denoise_dct = ff_mpv_denoise_dct_sse2;
         c->pix_sum     = ff_pix_sum16_sse2;
         c->pix_norm1   = ff_pix_norm1_sse2;
     }
-- 
2.52.0


From 030471dd4edc6d7923d9ae1bccc03d26a26e2f15 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sat, 15 Nov 2025 19:06:14 +0100
Subject: [PATCH 011/304] tests/checkasm/mpegvideoencdsp: Test denoise_dct

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 tests/checkasm/mpegvideoencdsp.c | 33 ++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/tests/checkasm/mpegvideoencdsp.c b/tests/checkasm/mpegvideoencdsp.c
index a4a4fa6f5c..955cd9f5b7 100644
--- a/tests/checkasm/mpegvideoencdsp.c
+++ b/tests/checkasm/mpegvideoencdsp.c
@@ -37,6 +37,37 @@
             buf[j] = rnd() % (max - min + 1) + min;      \
     } while (0)
 
+static void check_denoise_dct(MpegvideoEncDSPContext *c)
+{
+    declare_func(void, int16_t block[64], int dct_error_sum[64],
+                       const uint16_t dct_offset[64]);
+
+    if (check_func(c->denoise_dct, "denoise_dct")) {
+        DECLARE_ALIGNED(16, int16_t, block_ref)[64];
+        DECLARE_ALIGNED(16, int16_t, block_new)[64];
+        DECLARE_ALIGNED(16, int, dct_error_sum_ref)[64];
+        DECLARE_ALIGNED(16, int, dct_error_sum_new)[64];
+        DECLARE_ALIGNED(16, uint16_t, dct_offset)[64];
+
+        for (size_t i = 0; i < FF_ARRAY_ELEMS(block_ref); ++i) {
+            unsigned random = rnd();
+            block_ref[i] = random & (1 << 16) ? random : 0;
+        }
+        randomize_buffers(dct_offset, sizeof(dct_offset));
+        randomize_buffer_clipped(dct_error_sum_ref, 0, (1 << 24) - 1);
+        memcpy(block_new, block_ref, sizeof(block_new));
+        memcpy(dct_error_sum_new, dct_error_sum_ref, sizeof(dct_error_sum_ref));
+
+        call_ref(block_ref, dct_error_sum_ref, dct_offset);
+        call_new(block_new, dct_error_sum_new, dct_offset);
+        if (memcmp(block_ref, block_new, sizeof(block_ref)) ||
+            memcmp(dct_error_sum_new, dct_error_sum_ref, sizeof(dct_error_sum_new)))
+            fail();
+
+        bench_new(block_new, dct_error_sum_new, dct_offset);
+    }
+}
+
 static void check_add_8x8basis(MpegvideoEncDSPContext *c)
 {
     declare_func(void, int16_t rem[64], const int16_t basis[64], int scale);
@@ -166,6 +197,8 @@ void checkasm_check_mpegvideoencdsp(void)
 
     ff_mpegvideoencdsp_init(&c, &avctx);
 
+    check_denoise_dct(&c);
+    report("denoise_dct");
     check_pix_sum(&c);
     report("pix_sum");
     check_pix_norm1(&c);
-- 
2.52.0


From a0650dce295d22ccfed8fb0dd68d4578d4553a3e Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sat, 15 Nov 2025 19:44:02 +0100
Subject: [PATCH 012/304] avcodec/x86/mpegvideoencdsp: Port
 add_8x8basis_ssse3() to ASM

Both GCC and Clang completely unroll the unlikely loop at -O3,
leading to codesize bloat; their code is also suboptimal, as they
don't make use of pmulhrsw (even with -mssse3). This commit
therefore ports the whole function to external assembly. The new
function occupies 176B here vs 1406B for GCC.

Benchmarks for a testcase with huge qscale (notice that the C version
is unrolled just like the unlikely loop in the SSSE3 version):
add_8x8basis_c:                                         43.4 ( 1.00x)
add_8x8basis_ssse3 (old):                               43.6 ( 1.00x)
add_8x8basis_ssse3 (new):                               11.9 ( 3.63x)

Reviewed-by: Kieran Kunhya <kieran@kunhya.com>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/mpegvideoencdsp.asm    | 52 +++++++++++++++++++++++++++
 libavcodec/x86/mpegvideoencdsp_init.c | 46 ++++--------------------
 2 files changed, 59 insertions(+), 39 deletions(-)

diff --git a/libavcodec/x86/mpegvideoencdsp.asm b/libavcodec/x86/mpegvideoencdsp.asm
index 0e86a5304c..300f98b438 100644
--- a/libavcodec/x86/mpegvideoencdsp.asm
+++ b/libavcodec/x86/mpegvideoencdsp.asm
@@ -25,6 +25,58 @@
 
 SECTION .text
 
+; void ff_add_8x8basis_ssse3(int16_t rem[64], const int16_t basis[64], int scale)
+INIT_XMM ssse3
+cglobal add_8x8basis, 3, 3+ARCH_X86_64, 4, rem, basis, scale
+    movd            m0, scaled
+    add         scaled, 1024
+    add         basisq, 128
+    add           remq, 128
+%if ARCH_X86_64
+%define OFF r3q
+    mov            r3q, -128
+    cmp         scaled, 2047
+%else
+%define OFF r2q
+    cmp         scaled, 2047
+    mov            r2q, -128
+%endif
+    ja     .huge_scale
+
+    punpcklwd       m0, m0
+    pshufd          m0, m0, 0x0
+    psllw           m0, 5
+.loop1:
+    mova            m1, [basisq+OFF]
+    mova            m2, [basisq+OFF+16]
+    pmulhrsw        m1, m0
+    pmulhrsw        m2, m0
+    paddw           m1, [remq+OFF]
+    paddw           m2, [remq+OFF+16]
+    mova    [remq+OFF], m1
+    mova [remq+OFF+16], m2
+    add            OFF, 32
+    js          .loop1
+    RET
+
+.huge_scale:
+    pslld           m0, 6
+    punpcklwd       m0, m0
+    pshufd          m1, m0, 0x55
+    psrlw           m0, 1
+    pshufd          m0, m0, 0x0
+.loop2:
+    mova            m2, [basisq+OFF]
+    pmulhrsw        m3, m2, m0
+    pmullw          m2, m1
+    paddw           m2, m3
+    paddw           m2, [remq+OFF]
+    mova    [remq+OFF], m2
+    add            OFF, 16
+    js          .loop2
+    RET
+
+
 INIT_XMM sse2
 cglobal mpv_denoise_dct, 3, 4, 7, block, sum, offset
     pxor            m6, m6
diff --git a/libavcodec/x86/mpegvideoencdsp_init.c b/libavcodec/x86/mpegvideoencdsp_init.c
index f6169b5399..220c75785a 100644
--- a/libavcodec/x86/mpegvideoencdsp_init.c
+++ b/libavcodec/x86/mpegvideoencdsp_init.c
@@ -32,6 +32,7 @@ void ff_mpv_denoise_dct_sse2(int16_t block[64], int dct_error_sum[64],
 int ff_pix_sum16_sse2(const uint8_t *pix, ptrdiff_t line_size);
 int ff_pix_sum16_xop(const uint8_t *pix, ptrdiff_t line_size);
 int ff_pix_norm1_sse2(const uint8_t *pix, ptrdiff_t line_size);
+void ff_add_8x8basis_ssse3(int16_t rem[64], const int16_t basis[64], int scale);
 
 #if HAVE_INLINE_ASM
 #if HAVE_SSSE3_INLINE
@@ -83,41 +84,6 @@ static int try_8x8basis_ssse3(const int16_t rem[64], const int16_t weight[64], c
     );
     return i;
 }
-
-static void add_8x8basis_ssse3(int16_t rem[64], const int16_t basis[64], int scale)
-{
-    x86_reg i=0;
-
-    if (FFABS(scale) < 1024) {
-        scale *= 1 << (16 + SCALE_OFFSET - BASIS_SHIFT + RECON_SHIFT);
-        __asm__ volatile(
-                "movd                %3, %%xmm2     \n\t"
-                "punpcklwd       %%xmm2, %%xmm2     \n\t"
-                "pshufd      $0, %%xmm2, %%xmm2     \n\t"
-                ".p2align 4                         \n\t"
-                "1:                                 \n\t"
-                "movdqa        (%1, %0), %%xmm0     \n\t"
-                "movdqa      16(%1, %0), %%xmm1     \n\t"
-                "pmulhrsw        %%xmm2, %%xmm0     \n\t"
-                "pmulhrsw        %%xmm2, %%xmm1     \n\t"
-                "paddw         (%2, %0), %%xmm0     \n\t"
-                "paddw       16(%2, %0), %%xmm1     \n\t"
-                "movdqa          %%xmm0, (%2, %0)   \n\t"
-                "movdqa          %%xmm1, 16(%2, %0) \n\t"
-                "add                $32, %0         \n\t"
-                "cmp               $128, %0         \n\t" // FIXME optimize & bench
-                " jb                 1b             \n\t"
-                : "+r" (i)
-                : "r"(basis), "r"(rem), "g"(scale)
-                XMM_CLOBBERS_ONLY("%xmm0", "%xmm1", "%xmm2")
-        );
-    } else {
-        for (i=0; i<8*8; i++) {
-            rem[i] += (basis[i]*scale + (1<<(BASIS_SHIFT - RECON_SHIFT-1)))>>(BASIS_SHIFT - RECON_SHIFT);
-        }
-    }
-}
-
 #endif /* HAVE_SSSE3_INLINE */
 
 /* Draw the edges of width 'w' of an image of size width, height */
@@ -227,15 +193,17 @@ av_cold void ff_mpegvideoencdsp_init_x86(MpegvideoEncDSPContext *c,
             c->draw_edges = draw_edges_mmx;
         }
     }
+#endif /* HAVE_INLINE_ASM */
 
+    if (X86_SSSE3(cpu_flags)) {
 #if HAVE_SSSE3_INLINE
-    if (INLINE_SSSE3(cpu_flags)) {
         if (!(avctx->flags & AV_CODEC_FLAG_BITEXACT)) {
             c->try_8x8basis = try_8x8basis_ssse3;
         }
-        c->add_8x8basis = add_8x8basis_ssse3;
-    }
 #endif /* HAVE_SSSE3_INLINE */
+#if HAVE_SSSE3_EXTERNAL
+        c->add_8x8basis = ff_add_8x8basis_ssse3;
+#endif
+    }
 
-#endif /* HAVE_INLINE_ASM */
 }
-- 
2.52.0


From 67255670b7b746ec6a9226764562740bf374e46b Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sat, 15 Nov 2025 19:56:23 +0100
Subject: [PATCH 013/304] avcodec/x86/mpegvideoenc_template: Avoid touching
 nonvolatile register

xmm7 is nonvolatile on x64 Windows.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/mpegvideoenc_template.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/libavcodec/x86/mpegvideoenc_template.c b/libavcodec/x86/mpegvideoenc_template.c
index 14e993de2b..b5417f6d32 100644
--- a/libavcodec/x86/mpegvideoenc_template.c
+++ b/libavcodec/x86/mpegvideoenc_template.c
@@ -117,7 +117,7 @@ static int RENAME(dct_quantize)(MPVEncContext *const s,
         __asm__ volatile(
             "movd %%"FF_REG_a", %%xmm3          \n\t" // last_non_zero_p1
             SPREADW("%%xmm3")
-            "pxor  %%xmm7, %%xmm7               \n\t" // 0
+            "pxor  %%xmm2, %%xmm2               \n\t" // 0
             "pxor  %%xmm4, %%xmm4               \n\t" // 0
             "movdqa  (%2), %%xmm5               \n\t" // qmat[0]
             "pxor  %%xmm6, %%xmm6               \n\t"
@@ -132,9 +132,9 @@ static int RENAME(dct_quantize)(MPVEncContext *const s,
             "por     %%xmm0, %%xmm4             \n\t"
             RESTORE_SIGN("%%xmm1", "%%xmm0")          // out=((ABS(block[i])*qmat[0] - bias[0]*qmat[0])>>16)*sign(block[i])
             "movdqa  %%xmm0, (%5, %%"FF_REG_a") \n\t"
-            "pcmpeqw %%xmm7, %%xmm0             \n\t" // out==0 ? 0xFF : 0x00
+            "pcmpeqw %%xmm2, %%xmm0             \n\t" // out==0 ? 0xFF : 0x00
             "movdqa  (%4, %%"FF_REG_a"), %%xmm1 \n\t"
-            "movdqa  %%xmm7, (%1, %%"FF_REG_a") \n\t" // 0
+            "movdqa  %%xmm2, (%1, %%"FF_REG_a") \n\t" // 0
             "pandn   %%xmm1, %%xmm0             \n\t"
             "pmaxsw  %%xmm0, %%xmm3             \n\t"
             "add        $16, %%"FF_REG_a"       \n\t"
@@ -146,13 +146,13 @@ static int RENAME(dct_quantize)(MPVEncContext *const s,
             : "r" (block+64), "r" (qmat), "r" (bias),
               "r" (inv_zigzag_direct16 + 64), "r" (temp_block + 64)
               XMM_CLOBBERS_ONLY("%xmm0", "%xmm1", "%xmm2", "%xmm3",
-                                "%xmm4", "%xmm5", "%xmm6", "%xmm7")
+                                "%xmm4", "%xmm5", "%xmm6")
         );
     }else{ // FMT_H263
         __asm__ volatile(
             "movd %%"FF_REG_a", %%xmm3          \n\t" // last_non_zero_p1
             SPREADW("%%xmm3")
-            "pxor %%xmm7, %%xmm7                \n\t" // 0
+            "pxor %%xmm2, %%xmm2                \n\t" // 0
             "pxor %%xmm4, %%xmm4                \n\t" // 0
             "mov $-128, %%"FF_REG_a"            \n\t"
             ".p2align 4                         \n\t"
@@ -166,9 +166,9 @@ static int RENAME(dct_quantize)(MPVEncContext *const s,
             "por     %%xmm0, %%xmm4             \n\t"
             RESTORE_SIGN("%%xmm1", "%%xmm0")          // out=((ABS(block[i])*qmat[0] - bias[0]*qmat[0])>>16)*sign(block[i])
             "movdqa  %%xmm0, (%5, %%"FF_REG_a") \n\t"
-            "pcmpeqw %%xmm7, %%xmm0             \n\t" // out==0 ? 0xFF : 0x00
+            "pcmpeqw %%xmm2, %%xmm0             \n\t" // out==0 ? 0xFF : 0x00
             "movdqa  (%4, %%"FF_REG_a"), %%xmm1 \n\t"
-            "movdqa  %%xmm7, (%1, %%"FF_REG_a") \n\t" // 0
+            "movdqa  %%xmm2, (%1, %%"FF_REG_a") \n\t" // 0
             "pandn   %%xmm1, %%xmm0             \n\t"
             "pmaxsw  %%xmm0, %%xmm3             \n\t"
             "add        $16, %%"FF_REG_a"       \n\t"
@@ -180,7 +180,7 @@ static int RENAME(dct_quantize)(MPVEncContext *const s,
             : "r" (block+64), "r" (qmat+64), "r" (bias+64),
               "r" (inv_zigzag_direct16 + 64), "r" (temp_block + 64)
               XMM_CLOBBERS_ONLY("%xmm0", "%xmm1", "%xmm2", "%xmm3",
-                                "%xmm4", "%xmm5", "%xmm6", "%xmm7")
+                                "%xmm4", "%xmm5", "%xmm6")
         );
     }
     __asm__ volatile(
-- 
2.52.0


From 37e726ccade0543b04a215779cd37706bbe59e38 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sun, 16 Nov 2025 11:10:07 +0100
Subject: [PATCH 014/304] avcodec/x86/mpegvideoenc_template: Reduce number of
 registers used

qmat and bias always have a constant offset, so one can use one register
to address both of them. This allows to remove the check for HAVE_6REGS
(untested on a system where HAVE_6REGS is false).
Also avoid FF_REG_a while at it.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/mpegvideoenc.c          |  8 +-------
 libavcodec/x86/mpegvideoenc_template.c | 21 +++++++++------------
 2 files changed, 10 insertions(+), 19 deletions(-)

diff --git a/libavcodec/x86/mpegvideoenc.c b/libavcodec/x86/mpegvideoenc.c
index c667dcd2a2..24dd049200 100644
--- a/libavcodec/x86/mpegvideoenc.c
+++ b/libavcodec/x86/mpegvideoenc.c
@@ -39,8 +39,6 @@ DECLARE_ALIGNED(16, static const uint16_t, inv_zigzag_direct16)[64] = {
     36, 37, 49, 50, 58, 59, 63, 64,
 };
 
-#if HAVE_6REGS
-
 #if HAVE_SSE2_INLINE
 #define COMPILE_TEMPLATE_SSSE3  0
 #define RENAME(a)      a ## _sse2
@@ -55,8 +53,6 @@ DECLARE_ALIGNED(16, static const uint16_t, inv_zigzag_direct16)[64] = {
 #include "mpegvideoenc_template.c"
 #endif /* HAVE_SSSE3_INLINE */
 
-#endif /* HAVE_6REGS */
-
 av_cold void ff_dct_encode_init_x86(MPVEncContext *const s)
 {
     const int dct_algo = s->c.avctx->dct_algo;
@@ -65,11 +61,9 @@ av_cold void ff_dct_encode_init_x86(MPVEncContext *const s)
 #if HAVE_SSE2_INLINE
         int cpu_flags = av_get_cpu_flags();
         if (INLINE_SSE2(cpu_flags)) {
-#if HAVE_6REGS
             s->dct_quantize = dct_quantize_sse2;
-#endif
         }
-#if HAVE_6REGS && HAVE_SSSE3_INLINE
+#if HAVE_SSSE3_INLINE
         if (INLINE_SSSE3(cpu_flags))
             s->dct_quantize = dct_quantize_ssse3;
 #endif
diff --git a/libavcodec/x86/mpegvideoenc_template.c b/libavcodec/x86/mpegvideoenc_template.c
index b5417f6d32..e6ce791347 100644
--- a/libavcodec/x86/mpegvideoenc_template.c
+++ b/libavcodec/x86/mpegvideoenc_template.c
@@ -70,7 +70,7 @@ static int RENAME(dct_quantize)(MPVEncContext *const s,
 {
     x86_reg last_non_zero_p1;
     int level=0, q; //=0 is because gcc says uninitialized ...
-    const uint16_t *qmat, *bias;
+    const uint16_t *qmat;
     LOCAL_ALIGNED_16(int16_t, temp_block, [64]);
 
     //s->fdct (block);
@@ -86,11 +86,9 @@ static int RENAME(dct_quantize)(MPVEncContext *const s,
         int dummy;
         if (n < 4){
             q = s->c.y_dc_scale;
-            bias = s->q_intra_matrix16[qscale][1];
             qmat = s->q_intra_matrix16[qscale][0];
         }else{
             q = s->c.c_dc_scale;
-            bias = s->q_chroma_intra_matrix16[qscale][1];
             qmat = s->q_chroma_intra_matrix16[qscale][0];
         }
         /* note: block[0] is assumed to be positive */
@@ -109,7 +107,6 @@ static int RENAME(dct_quantize)(MPVEncContext *const s,
         last_non_zero_p1 = 1;
     } else {
         last_non_zero_p1 = 0;
-        bias = s->q_inter_matrix16[qscale][1];
         qmat = s->q_inter_matrix16[qscale][0];
     }
 
@@ -121,7 +118,7 @@ static int RENAME(dct_quantize)(MPVEncContext *const s,
             "pxor  %%xmm4, %%xmm4               \n\t" // 0
             "movdqa  (%2), %%xmm5               \n\t" // qmat[0]
             "pxor  %%xmm6, %%xmm6               \n\t"
-            "psubw   (%3), %%xmm6               \n\t" // -bias[0]
+            "psubw 128(%2), %%xmm6              \n\t" // -bias[0]
             "mov $-128, %%"FF_REG_a"            \n\t"
             ".p2align 4                         \n\t"
             "1:                                 \n\t"
@@ -131,9 +128,9 @@ static int RENAME(dct_quantize)(MPVEncContext *const s,
             "pmulhw  %%xmm5, %%xmm0             \n\t" // (ABS(block[i])*qmat[0] - bias[0]*qmat[0])>>16
             "por     %%xmm0, %%xmm4             \n\t"
             RESTORE_SIGN("%%xmm1", "%%xmm0")          // out=((ABS(block[i])*qmat[0] - bias[0]*qmat[0])>>16)*sign(block[i])
-            "movdqa  %%xmm0, (%5, %%"FF_REG_a") \n\t"
+            "movdqa  %%xmm0, (%4, %0)           \n\t"
             "pcmpeqw %%xmm2, %%xmm0             \n\t" // out==0 ? 0xFF : 0x00
-            "movdqa  (%4, %%"FF_REG_a"), %%xmm1 \n\t"
+            "movdqa  (%3, %0), %%xmm1           \n\t"
             "movdqa  %%xmm2, (%1, %%"FF_REG_a") \n\t" // 0
             "pandn   %%xmm1, %%xmm0             \n\t"
             "pmaxsw  %%xmm0, %%xmm3             \n\t"
@@ -143,7 +140,7 @@ static int RENAME(dct_quantize)(MPVEncContext *const s,
             "movd %%xmm3, %%"FF_REG_a"          \n\t"
             "movzbl %%al, %%eax                 \n\t" // last_non_zero_p1
             : "+a" (last_non_zero_p1)
-            : "r" (block+64), "r" (qmat), "r" (bias),
+            : "r" (block+64), "r" (qmat),
               "r" (inv_zigzag_direct16 + 64), "r" (temp_block + 64)
               XMM_CLOBBERS_ONLY("%xmm0", "%xmm1", "%xmm2", "%xmm3",
                                 "%xmm4", "%xmm5", "%xmm6")
@@ -159,15 +156,15 @@ static int RENAME(dct_quantize)(MPVEncContext *const s,
             "1:                                 \n\t"
             "movdqa  (%1, %%"FF_REG_a"), %%xmm0 \n\t" // block[i]
             SAVE_SIGN("%%xmm1", "%%xmm0")             // ABS(block[i])
-            "movdqa  (%3, %%"FF_REG_a"), %%xmm6 \n\t" // bias[0]
+            "movdqa  128(%2, %0), %%xmm6        \n\t" // bias[i]
             "paddusw %%xmm6, %%xmm0             \n\t" // ABS(block[i]) + bias[0]
             "movdqa  (%2, %%"FF_REG_a"), %%xmm5 \n\t" // qmat[i]
             "pmulhw  %%xmm5, %%xmm0             \n\t" // (ABS(block[i])*qmat[0] + bias[0]*qmat[0])>>16
             "por     %%xmm0, %%xmm4             \n\t"
             RESTORE_SIGN("%%xmm1", "%%xmm0")          // out=((ABS(block[i])*qmat[0] - bias[0]*qmat[0])>>16)*sign(block[i])
-            "movdqa  %%xmm0, (%5, %%"FF_REG_a") \n\t"
+            "movdqa  %%xmm0, (%4, %0)           \n\t"
             "pcmpeqw %%xmm2, %%xmm0             \n\t" // out==0 ? 0xFF : 0x00
-            "movdqa  (%4, %%"FF_REG_a"), %%xmm1 \n\t"
+            "movdqa  (%3, %0), %%xmm1           \n\t"
             "movdqa  %%xmm2, (%1, %%"FF_REG_a") \n\t" // 0
             "pandn   %%xmm1, %%xmm0             \n\t"
             "pmaxsw  %%xmm0, %%xmm3             \n\t"
@@ -177,7 +174,7 @@ static int RENAME(dct_quantize)(MPVEncContext *const s,
             "movd %%xmm3, %%"FF_REG_a"          \n\t"
             "movzbl %%al, %%eax                 \n\t" // last_non_zero_p1
             : "+a" (last_non_zero_p1)
-            : "r" (block+64), "r" (qmat+64), "r" (bias+64),
+            : "r" (block+64), "r" (qmat+64),
               "r" (inv_zigzag_direct16 + 64), "r" (temp_block + 64)
               XMM_CLOBBERS_ONLY("%xmm0", "%xmm1", "%xmm2", "%xmm3",
                                 "%xmm4", "%xmm5", "%xmm6")
-- 
2.52.0


From 523697823f4323c932957c3a10d18e083edfd45d Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sun, 16 Nov 2025 12:10:22 +0100
Subject: [PATCH 015/304] avutil/x86/asm: Remove wrong comment, rename
 FF_REG_sp

Before FFmpeg commit 531b0a316b24f00965cd8a88efdbea2c6d63147f,
FFmpeg used REG_SP as macro for the stack pointer, yet this
clashed with a REG_SP define in Solaris system headers, so it
was changed to REG_sp and a comment was added for this.

Libav fixed it by adding an FF_ prefix to the macros in
1e9c5bf4c136fe9e010cc8a7e7270bba0d1bf45e. FFmpeg switched
to using these prefixes in 9eb3da2f9942cf1b1148d242bccfc383f666feb6,
using FF_REG_sp instead of Libav's FF_REG_SP. In said commit
the comment was changed to claim that Solaris system headers
define FF_REG_SP, but this is (most likely) wrong.

This commit removes the wrong comment and renames the (actually unused)
macro to FF_REG_SP to make it consistent with FF_REG_BP.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavutil/x86/asm.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/libavutil/x86/asm.h b/libavutil/x86/asm.h
index 9bff42d628..f06ea25035 100644
--- a/libavutil/x86/asm.h
+++ b/libavutil/x86/asm.h
@@ -38,8 +38,7 @@ typedef struct ymm_reg { uint64_t a, b, c, d; } ymm_reg;
 #    define FF_PTR_SIZE "8"
 typedef int64_t x86_reg;
 
-/* FF_REG_SP is defined in Solaris sys headers, so use FF_REG_sp */
-#    define FF_REG_sp "rsp"
+#    define FF_REG_SP "rsp"
 #    define FF_REG_BP "rbp"
 #    define FF_REGBP   rbp
 #    define FF_REGa    rax
@@ -60,7 +59,7 @@ typedef int64_t x86_reg;
 #    define FF_PTR_SIZE "4"
 typedef int32_t x86_reg;
 
-#    define FF_REG_sp "esp"
+#    define FF_REG_SP "esp"
 #    define FF_REG_BP "ebp"
 #    define FF_REGBP   ebp
 #    define FF_REGa    eax
-- 
2.52.0


From e06474013d7e59e622bdb9d9801ea1637f777a0b Mon Sep 17 00:00:00 2001
From: Stefan Breunig <stefan-ffmpeg-devel@breunig.xyz>
Date: Sun, 16 Nov 2025 11:14:34 +0100
Subject: [PATCH 016/304] avfilter/vf_frei0r: fix time when input is realigned

av_frame_copy doesn't copy the input's PTS property, which resulted
in the frei0r filter always receiving the same static time.

Example that has a static distortion without patch:

ffmpeg -filter_complex "testsrc2=s=328x240:d=5,frei0r=distort0r" out.mp4
---
 libavfilter/vf_frei0r.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/libavfilter/vf_frei0r.c b/libavfilter/vf_frei0r.c
index 50d81d220f..cbd236faab 100644
--- a/libavfilter/vf_frei0r.c
+++ b/libavfilter/vf_frei0r.c
@@ -375,6 +375,10 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *in)
         if (!in2)
             goto fail;
         av_frame_copy(in2, in);
+        if (av_frame_copy_props(in2, in) < 0) {
+            av_frame_free(&in2);
+            goto fail;
+        }
         av_frame_free(&in);
         in = in2;
     }
-- 
2.52.0


From b09304c24cb57884eb7de10f40903788a03e13bc Mon Sep 17 00:00:00 2001
From: Stefan Breunig <stefan-ffmpeg-devel@breunig.xyz>
Date: Sun, 16 Nov 2025 11:15:05 +0100
Subject: [PATCH 017/304] fate/filter-video: add frei0r test where input is
 realigned

An installation of frei0r-plugins is required to run the tests,
which is usually seperate from the build headers. Some systems
have it packaged (e.g. apt install frei0r-plugins). An upstream
release extracted to FREI0R_PATH also works.

The distort0r filter requires dimensions to be divisible by 8.
---
 tests/fate/filter-video.mak                   |  3 ++-
 tests/ref/fate/filter-frei0r-filter-unaligned | 10 ++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 tests/ref/fate/filter-frei0r-filter-unaligned

diff --git a/tests/fate/filter-video.mak b/tests/fate/filter-video.mak
index cd5903c960..3fe7f10476 100644
--- a/tests/fate/filter-video.mak
+++ b/tests/fate/filter-video.mak
@@ -717,8 +717,9 @@ $(FATE_FILTER_VSYNTH-yes): SRC = $(TARGET_PATH)/tests/vsynth1/%02d.pgm
 
 FATE_FFMPEG += $(FATE_FILTER_VSYNTH-yes)
 
-FATE_FILTER_FREI0R-$(call FILTERFRAMECRC, TESTSRC2, FREI0R_FILTER) = fate-filter-frei0r-filter
+FATE_FILTER_FREI0R-$(call FILTERFRAMECRC, TESTSRC2, FREI0R_FILTER) = fate-filter-frei0r-filter fate-filter-frei0r-filter-unaligned
 fate-filter-frei0r-filter: CMD = framecrc -lavfi "testsrc2=r=1:d=5,frei0r=enable=gte(n\,3):filter_name=distort0r"
+fate-filter-frei0r-filter-unaligned: CMD = framecrc -lavfi "testsrc2=s=328x240:r=1:d=5,frei0r=filter_name=distort0r"
 FATE_FFMPEG += $(FATE_FILTER_FREI0R-yes)
 
 #
diff --git a/tests/ref/fate/filter-frei0r-filter-unaligned b/tests/ref/fate/filter-frei0r-filter-unaligned
new file mode 100644
index 0000000000..c3cffc69f1
--- /dev/null
+++ b/tests/ref/fate/filter-frei0r-filter-unaligned
@@ -0,0 +1,10 @@
+#tb 0: 1/1
+#media_type 0: video
+#codec_id 0: rawvideo
+#dimensions 0: 328x240
+#sar 0: 1/1
+0,          0,          0,        1,   314880, 0x7b9cad8f
+0,          1,          1,        1,   314880, 0x0184436f
+0,          2,          2,        1,   314880, 0x7e3f2776
+0,          3,          3,        1,   314880, 0x0dc5e915
+0,          4,          4,        1,   314880, 0xcf9c76ef
-- 
2.52.0


From 6b43f099ecf618b00b66784407aa1715a111db91 Mon Sep 17 00:00:00 2001
From: Hendi <hendi48@freenet.de>
Date: Sun, 2 Nov 2025 23:11:02 +0100
Subject: [PATCH 018/304] avformat/dashdec: Fix urls with special characters in
 manifest

This was especially a problem with ampersands, which occur
frequently as part of query parameters.
---
 libavformat/dashdec.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/libavformat/dashdec.c b/libavformat/dashdec.c
index 1f59d3a41c..500d8ca518 100644
--- a/libavformat/dashdec.c
+++ b/libavformat/dashdec.c
@@ -780,7 +780,13 @@ static int resolve_content_path(AVFormatContext *s, const char *url, int *max_ur
     }
     root_url = (av_strcasecmp(baseurl, "")) ? baseurl : path;
     if (node) {
-        xmlNodeSetContent(node, root_url);
+        xmlChar *escaped = xmlEncodeSpecialChars(NULL, root_url);
+        if (!escaped) {
+            updated = AVERROR(ENOMEM);
+            goto end;
+        }
+        xmlNodeSetContent(node, escaped);
+        xmlFree(escaped);
         updated = 1;
     }
 
@@ -814,9 +820,15 @@ static int resolve_content_path(AVFormatContext *s, const char *url, int *max_ur
                 memset(p + 1, 0, strlen(p));
             }
             av_strlcat(tmp_str, text + start, tmp_max_url_size);
-            xmlNodeSetContent(baseurl_nodes[i], tmp_str);
-            updated = 1;
             xmlFree(text);
+            xmlChar* escaped = xmlEncodeSpecialChars(NULL, tmp_str);
+            if (!escaped) {
+                updated = AVERROR(ENOMEM);
+                goto end;
+            }
+            xmlNodeSetContent(baseurl_nodes[i], escaped);
+            updated = 1;
+            xmlFree(escaped);
         }
     }
 
-- 
2.52.0


From 45f077df0ea2f59007a6cf367c9f2f9b908b1147 Mon Sep 17 00:00:00 2001
From: Harshitha <harshitha@multicorewareinc.com>
Date: Fri, 14 Nov 2025 01:45:44 -0800
Subject: [PATCH 019/304] doc/encoders: Document MediaFoundation encoders

---
 doc/encoders.texi | 91 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 90 insertions(+), 1 deletion(-)

diff --git a/doc/encoders.texi b/doc/encoders.texi
index b24f98946a..56e2f9008a 100644
--- a/doc/encoders.texi
+++ b/doc/encoders.texi
@@ -3414,11 +3414,100 @@ Maximum quantization parameter for B frame.
 @section MediaFoundation
 
 This provides wrappers to encoders (both audio and video) in the
-MediaFoundation framework. It can access both SW and HW encoders.
+MediaFoundation framework. It supports both software and hardware encoders
+through the following codecs:
+
+@itemize
+@item h264_mf
+@item hevc_mf
+@item av1_mf
+@end itemize
+
 Video encoders can take input in either of nv12 or yuv420p form
 (some encoders support both, some support only either - in practice,
 nv12 is the safer choice, especially among HW encoders).
 
+Hardware-accelerated encoding is supported via D3D11, including hardware
+scaling capabilities through the scale_d3d11 filter.
+
+To list all available options for the MediaFoundation encoders, use:
+@command{ffmpeg -h encoder=h264_mf}
+
+@subsection Options
+
+@table @option
+@item rate_control
+Select rate control mode. Available modes:
+
+@table @samp
+@item default
+Default mode
+@item cbr
+CBR mode
+@item pc_vbr
+Peak constrained VBR mode
+@item u_vbr
+Unconstrained VBR mode
+@item quality
+Quality mode
+@item ld_vbr
+Low delay VBR mode (requires Windows 8+)
+@item g_vbr
+Global VBR mode (requires Windows 8+)
+@item gld_vbr
+Global low delay VBR mode (requires Windows 8+)
+@end table
+
+@item scenario
+Select usage scenario. Available scenarios:
+
+@table @samp
+@item default
+Default scenario
+@item display_remoting
+Display remoting scenario
+@item video_conference
+Video conference scenario
+@item archive
+Archive scenario
+@item live_streaming
+Live streaming scenario
+@item camera_record
+Camera record scenario
+@item display_remoting_with_feature_map
+Display remoting with feature map scenario
+@end table
+
+@item quality
+Set encoding quality (0-100). -1 means default quality.
+
+@item hw_encoding
+Force hardware encoding (0-1). Default is 0 (disabled).
+
+@end table
+
+@subsection Examples
+
+Hardware encoding:
+@example
+ffmpeg -i input.mp4 -c:v h264_mf -hw_encoding 1 output.mp4
+@end example
+
+Hardware-accelerated decoding with hardware encoding:
+@example
+ffmpeg -hwaccel d3d11va -i input.mp4 -c:v h264_mf -hw_encoding 1 output.mp4
+@end example
+
+Hardware-accelerated decoding and encoding with scaling:
+@example
+ffmpeg -hwaccel d3d11va -hwaccel_output_format d3d11 -i input.mp4 -c:v h264_mf -hw_encoding 1 -vf scale_d3d11=1920:1080 output.mp4
+@end example
+
+Hardware decoding and encoding with quality setting:
+@example
+ffmpeg -hwaccel d3d11va -hwaccel_output_format d3d11 -i input.mp4 -c:v hevc_mf -hw_encoding 1 -quality 80 output.mp4
+@end example
+
 @section Microsoft RLE
 
 Microsoft RLE aka MSRLE encoder.
-- 
2.52.0


From 1bdb84f370ffca2cba6e6b6de6d1d1d8120b2815 Mon Sep 17 00:00:00 2001
From: GyanD <ffmpeg@gyani.pro>
Date: Sat, 15 Nov 2025 08:26:46 +0000
Subject: [PATCH 020/304] doc/encoders: minor mediafoundation encoders updates

---
 doc/encoders.texi | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/doc/encoders.texi b/doc/encoders.texi
index 56e2f9008a..f29086df2d 100644
--- a/doc/encoders.texi
+++ b/doc/encoders.texi
@@ -3413,9 +3413,8 @@ Maximum quantization parameter for B frame.
 
 @section MediaFoundation
 
-This provides wrappers to encoders (both audio and video) in the
-MediaFoundation framework. It supports both software and hardware encoders
-through the following codecs:
+The following wrappers for encoders in the MediaFoundation framework are
+available:
 
 @itemize
 @item h264_mf
@@ -3423,15 +3422,17 @@ through the following codecs:
 @item av1_mf
 @end itemize
 
+These support both software and hardware encoding.
+
 Video encoders can take input in either of nv12 or yuv420p form
 (some encoders support both, some support only either - in practice,
 nv12 is the safer choice, especially among HW encoders).
 
-Hardware-accelerated encoding is supported via D3D11, including hardware
+Hardware-accelerated encoding requires D3D11, including hardware
 scaling capabilities through the scale_d3d11 filter.
 
 To list all available options for the MediaFoundation encoders, use:
-@command{ffmpeg -h encoder=h264_mf}
+@command{ffmpeg -h encoder=<encoder>} e.g. @command{ffmpeg -h encoder=h264_mf}
 
 @subsection Options
 
@@ -3498,14 +3499,9 @@ Hardware-accelerated decoding with hardware encoding:
 ffmpeg -hwaccel d3d11va -i input.mp4 -c:v h264_mf -hw_encoding 1 output.mp4
 @end example
 
-Hardware-accelerated decoding and encoding with scaling:
+Hardware-accelerated decoding, HW scaling and encoding with quality setting:
 @example
-ffmpeg -hwaccel d3d11va -hwaccel_output_format d3d11 -i input.mp4 -c:v h264_mf -hw_encoding 1 -vf scale_d3d11=1920:1080 output.mp4
-@end example
-
-Hardware decoding and encoding with quality setting:
-@example
-ffmpeg -hwaccel d3d11va -hwaccel_output_format d3d11 -i input.mp4 -c:v hevc_mf -hw_encoding 1 -quality 80 output.mp4
+ffmpeg -hwaccel d3d11va -hwaccel_output_format d3d11 -i input.mp4 -vf scale_d3d11=1920:1080 -c:v hevc_mf -hw_encoding 1 -quality 80 output.mp4
 @end example
 
 @section Microsoft RLE
-- 
2.52.0


From 02b8d26be0c5e75b3174e057634e0230e55f3af2 Mon Sep 17 00:00:00 2001
From: Zhao Zhili <zhilizhao@tencent.com>
Date: Tue, 18 Nov 2025 12:46:13 +0800
Subject: [PATCH 021/304] avcodec/videotoolboxenc: improve Lock/Unlock
 BaseAddress error handling

1. Fix continue after CVPixelBufferLockBaseAddress.
2. Remove redundant "Error: " in error message.
---
 libavcodec/videotoolboxenc.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/libavcodec/videotoolboxenc.c b/libavcodec/videotoolboxenc.c
index 729072c0b9..c9c4014a6b 100644
--- a/libavcodec/videotoolboxenc.c
+++ b/libavcodec/videotoolboxenc.c
@@ -2414,12 +2414,8 @@ static int copy_avframe_to_pixel_buffer(AVCodecContext   *avctx,
 
     status = CVPixelBufferLockBaseAddress(cv_img, 0);
     if (status) {
-        av_log(
-            avctx,
-            AV_LOG_ERROR,
-            "Error: Could not lock base address of CVPixelBuffer: %d.\n",
-            status
-        );
+        av_log(avctx, AV_LOG_ERROR, "Could not lock base address of CVPixelBuffer: %d.\n", status);
+        return AVERROR_EXTERNAL;
     }
 
     if (CVPixelBufferIsPlanar(cv_img)) {
@@ -2481,7 +2477,7 @@ static int copy_avframe_to_pixel_buffer(AVCodecContext   *avctx,
 
     status = CVPixelBufferUnlockBaseAddress(cv_img, 0);
     if (status) {
-        av_log(avctx, AV_LOG_ERROR, "Error: Could not unlock CVPixelBuffer base address: %d.\n", status);
+        av_log(avctx, AV_LOG_ERROR, "Could not unlock CVPixelBuffer base address: %d.\n", status);
         return AVERROR_EXTERNAL;
     }
 
-- 
2.52.0


From eaf66e51ebac01def8f3a329298df04f2ea0ea26 Mon Sep 17 00:00:00 2001
From: Zhao Zhili <zhilizhao@tencent.com>
Date: Tue, 18 Nov 2025 11:02:59 +0800
Subject: [PATCH 022/304] avcodec/videotoolboxenc: fix crash with negative
 linesize

---
 libavcodec/videotoolboxenc.c | 196 ++++++-----------------------------
 1 file changed, 31 insertions(+), 165 deletions(-)

diff --git a/libavcodec/videotoolboxenc.c b/libavcodec/videotoolboxenc.c
index c9c4014a6b..ef32c81278 100644
--- a/libavcodec/videotoolboxenc.c
+++ b/libavcodec/videotoolboxenc.c
@@ -28,6 +28,7 @@
 #include "libavutil/opt.h"
 #include "libavutil/avassert.h"
 #include "libavutil/avstring.h"
+#include "libavutil/imgutils.h"
 #include "libavcodec/avcodec.h"
 #include "libavutil/pixdesc.h"
 #include "libavutil/hwcontext_videotoolbox.h"
@@ -2328,89 +2329,20 @@ static int vtenc_cm_to_avpacket(
     return 0;
 }
 
-/*
- * contiguous_buf_size is 0 if not contiguous, and the size of the buffer
- * containing all planes if so.
- */
-static int get_cv_pixel_info(
-    AVCodecContext *avctx,
-    const AVFrame  *frame,
-    int            *color,
-    int            *plane_count,
-    size_t         *widths,
-    size_t         *heights,
-    size_t         *strides,
-    size_t         *contiguous_buf_size)
-{
-    const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(avctx->pix_fmt);
-    VTEncContext *vtctx = avctx->priv_data;
-    int av_format       = frame->format;
-    int av_color_range  = avctx->color_range;
-    int i;
-    int range_guessed;
-    int status;
-
-    if (!desc)
-        return AVERROR(EINVAL);
-
-    status = get_cv_pixel_format(avctx, av_format, av_color_range, color, &range_guessed);
-    if (status)
-        return status;
-
-    if (range_guessed) {
-        if (!vtctx->warned_color_range) {
-            vtctx->warned_color_range = true;
-            av_log(avctx,
-                   AV_LOG_WARNING,
-                   "Color range not set for %s. Using MPEG range.\n",
-                   av_get_pix_fmt_name(av_format));
-        }
-    }
-
-    *plane_count = av_pix_fmt_count_planes(avctx->pix_fmt);
-
-    for (i = 0; i < desc->nb_components; i++) {
-        int p = desc->comp[i].plane;
-        bool hasAlpha = (desc->flags & AV_PIX_FMT_FLAG_ALPHA);
-        bool isAlpha = hasAlpha && (p + 1 == *plane_count);
-        bool isChroma = (p != 0) && !isAlpha;
-        int shiftw = isChroma ? desc->log2_chroma_w : 0;
-        int shifth = isChroma ? desc->log2_chroma_h : 0;
-        widths[p]  = (avctx->width  + ((1 << shiftw) >> 1)) >> shiftw;
-        heights[p] = (avctx->height + ((1 << shifth) >> 1)) >> shifth;
-        strides[p] = frame->linesize[p];
-    }
-
-    *contiguous_buf_size = 0;
-    for (i = 0; i < *plane_count; i++) {
-        if (i < *plane_count - 1 &&
-            frame->data[i] + strides[i] * heights[i] != frame->data[i + 1]) {
-            *contiguous_buf_size = 0;
-            break;
-        }
-
-        *contiguous_buf_size += strides[i] * heights[i];
-    }
-
-    return 0;
-}
-
-//Not used on OSX - frame is never copied.
 static int copy_avframe_to_pixel_buffer(AVCodecContext   *avctx,
                                         const AVFrame    *frame,
-                                        CVPixelBufferRef cv_img,
-                                        const size_t     *plane_strides,
-                                        const size_t     *plane_rows)
+                                        CVPixelBufferRef cv_img)
 {
-    int i, j;
-    size_t plane_count;
     int status;
-    int rows;
-    int src_stride;
-    int dst_stride;
-    uint8_t *src_addr;
-    uint8_t *dst_addr;
-    size_t copy_bytes;
+
+    int num_planes = av_pix_fmt_count_planes(frame->format);
+    size_t num_cv_plane = CVPixelBufferIsPlanar(cv_img) ?
+                          CVPixelBufferGetPlaneCount(cv_img) : 1;
+    if (num_planes != num_cv_plane) {
+        av_log(avctx, AV_LOG_ERROR,
+               "Different number of planes in AVFrame and CVPixelBuffer.\n");
+        return AVERROR_BUG;
+    }
 
     status = CVPixelBufferLockBaseAddress(cv_img, 0);
     if (status) {
@@ -2418,62 +2350,14 @@ static int copy_avframe_to_pixel_buffer(AVCodecContext   *avctx,
         return AVERROR_EXTERNAL;
     }
 
-    if (CVPixelBufferIsPlanar(cv_img)) {
-        plane_count = CVPixelBufferGetPlaneCount(cv_img);
-        for (i = 0; frame->data[i]; i++) {
-            if (i == plane_count) {
-                CVPixelBufferUnlockBaseAddress(cv_img, 0);
-                av_log(avctx,
-                    AV_LOG_ERROR,
-                    "Error: different number of planes in AVFrame and CVPixelBuffer.\n"
-                );
-
-                return AVERROR_EXTERNAL;
-            }
-
-            dst_addr = (uint8_t*)CVPixelBufferGetBaseAddressOfPlane(cv_img, i);
-            src_addr = (uint8_t*)frame->data[i];
-            dst_stride = CVPixelBufferGetBytesPerRowOfPlane(cv_img, i);
-            src_stride = plane_strides[i];
-            rows = plane_rows[i];
-
-            if (dst_stride == src_stride) {
-                memcpy(dst_addr, src_addr, src_stride * rows);
-            } else {
-                copy_bytes = dst_stride < src_stride ? dst_stride : src_stride;
-
-                for (j = 0; j < rows; j++) {
-                    memcpy(dst_addr + j * dst_stride, src_addr + j * src_stride, copy_bytes);
-                }
-            }
-        }
-    } else {
-        if (frame->data[1]) {
-            CVPixelBufferUnlockBaseAddress(cv_img, 0);
-            av_log(avctx,
-                AV_LOG_ERROR,
-                "Error: different number of planes in AVFrame and non-planar CVPixelBuffer.\n"
-            );
-
-            return AVERROR_EXTERNAL;
-        }
-
-        dst_addr = (uint8_t*)CVPixelBufferGetBaseAddress(cv_img);
-        src_addr = (uint8_t*)frame->data[0];
-        dst_stride = CVPixelBufferGetBytesPerRow(cv_img);
-        src_stride = plane_strides[0];
-        rows = plane_rows[0];
-
-        if (dst_stride == src_stride) {
-            memcpy(dst_addr, src_addr, src_stride * rows);
-        } else {
-            copy_bytes = dst_stride < src_stride ? dst_stride : src_stride;
-
-            for (j = 0; j < rows; j++) {
-                memcpy(dst_addr + j * dst_stride, src_addr + j * src_stride, copy_bytes);
-            }
-        }
+    int dst_stride[4] = {0};
+    uint8_t *dst_addr[4] = {0};
+    for (int i = 0; i < num_planes; i++) {
+        dst_addr[i] = (uint8_t*)CVPixelBufferGetBaseAddressOfPlane(cv_img, i);
+        dst_stride[i] = CVPixelBufferGetBytesPerRowOfPlane(cv_img, i);
     }
+    av_image_copy2(dst_addr, dst_stride, frame->data, frame->linesize,
+                   frame->format, frame->width, frame->height);
 
     status = CVPixelBufferUnlockBaseAddress(cv_img, 0);
     if (status) {
@@ -2489,13 +2373,7 @@ static int create_cv_pixel_buffer(AVCodecContext   *avctx,
                                   CVPixelBufferRef *cv_img,
                                   BufNode          *node)
 {
-    int plane_count;
-    int color;
-    size_t widths [AV_NUM_DATA_POINTERS];
-    size_t heights[AV_NUM_DATA_POINTERS];
-    size_t strides[AV_NUM_DATA_POINTERS];
     int status;
-    size_t contiguous_buf_size;
     CVPixelBufferPoolRef pix_buf_pool;
     VTEncContext* vtctx = avctx->priv_data;
 
@@ -2515,33 +2393,21 @@ static int create_cv_pixel_buffer(AVCodecContext   *avctx,
         return 0;
     }
 
-    memset(widths,  0, sizeof(widths));
-    memset(heights, 0, sizeof(heights));
-    memset(strides, 0, sizeof(strides));
-
-    status = get_cv_pixel_info(
-        avctx,
-        frame,
-        &color,
-        &plane_count,
-        widths,
-        heights,
-        strides,
-        &contiguous_buf_size
-    );
-
+    int range_guessed;
+    status = get_cv_pixel_format(avctx, frame->format, avctx->color_range,
+                                 &(int) {0}, &range_guessed);
     if (status) {
-        av_log(
-            avctx,
-            AV_LOG_ERROR,
-            "Error: Cannot convert format %d color_range %d: %d\n",
-            frame->format,
-            frame->color_range,
-            status
-        );
-
+        av_log(avctx, AV_LOG_ERROR, "Cannot convert format %d color_range %d: %d\n",
+            frame->format, frame->color_range, status);
         return status;
     }
+    if (range_guessed) {
+        if (!vtctx->warned_color_range) {
+            vtctx->warned_color_range = true;
+            av_log(avctx, AV_LOG_WARNING, "Color range not set for %s. Using MPEG range.\n",
+                   av_get_pix_fmt_name(frame->format));
+        }
+    }
 
     pix_buf_pool = VTCompressionSessionGetPixelBufferPool(vtctx->session);
     if (!pix_buf_pool) {
@@ -2578,7 +2444,7 @@ static int create_cv_pixel_buffer(AVCodecContext   *avctx,
         return AVERROR_EXTERNAL;
     }
 
-    status = copy_avframe_to_pixel_buffer(avctx, frame, *cv_img, strides, heights);
+    status = copy_avframe_to_pixel_buffer(avctx, frame, *cv_img);
     if (status) {
         CFRelease(*cv_img);
         *cv_img = NULL;
-- 
2.52.0


From 484f014e42a3dcabc80e440cca9d2d19934723d4 Mon Sep 17 00:00:00 2001
From: Zhao Zhili <zhilizhao@tencent.com>
Date: Tue, 18 Nov 2025 15:23:16 +0800
Subject: [PATCH 023/304] avcodec/videotoolboxenc: reorder and cleanup headers

---
 libavcodec/videotoolboxenc.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/libavcodec/videotoolboxenc.c b/libavcodec/videotoolboxenc.c
index ef32c81278..47367bb68e 100644
--- a/libavcodec/videotoolboxenc.c
+++ b/libavcodec/videotoolboxenc.c
@@ -18,29 +18,28 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
-#include <VideoToolbox/VideoToolbox.h>
-#include <CoreVideo/CoreVideo.h>
-#include <CoreMedia/CoreMedia.h>
-#include <TargetConditionals.h>
 #include <Availability.h>
-#include "avcodec.h"
+#include <CoreMedia/CoreMedia.h>
+#include <CoreVideo/CoreVideo.h>
+#include <dlfcn.h>
+#include <pthread.h>
+#include <TargetConditionals.h>
+#include <VideoToolbox/VideoToolbox.h>
+
+#include "libavutil/avassert.h"
+#include "libavutil/imgutils.h"
 #include "libavutil/mem.h"
 #include "libavutil/opt.h"
-#include "libavutil/avassert.h"
-#include "libavutil/avstring.h"
-#include "libavutil/imgutils.h"
-#include "libavcodec/avcodec.h"
 #include "libavutil/pixdesc.h"
 #include "libavutil/hwcontext_videotoolbox.h"
-#include "codec_internal.h"
-#include "internal.h"
-#include <pthread.h>
+
 #include "atsc_a53.h"
+#include "codec_internal.h"
 #include "encode.h"
 #include "h264.h"
 #include "h264_sei.h"
 #include "hwconfig.h"
-#include <dlfcn.h>
+#include "internal.h"
 
 #if !HAVE_KCMVIDEOCODECTYPE_HEVC
 enum { kCMVideoCodecType_HEVC = 'hvc1' };
-- 
2.52.0


From 77cfc7e57865e50ab708ead428ad71c7f3aa77d8 Mon Sep 17 00:00:00 2001
From: Zhao Zhili <zhilizhao@tencent.com>
Date: Tue, 18 Nov 2025 19:17:08 +0800
Subject: [PATCH 024/304] avcodec/videotoolboxenc: remove redundant "Error: "
 in error message

---
 libavcodec/videotoolboxenc.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/libavcodec/videotoolboxenc.c b/libavcodec/videotoolboxenc.c
index 47367bb68e..63e93b9d10 100644
--- a/libavcodec/videotoolboxenc.c
+++ b/libavcodec/videotoolboxenc.c
@@ -667,7 +667,7 @@ static int copy_param_sets(
 
         next_offset = offset + sizeof(start_code) + ps_size;
         if (dst_size < next_offset) {
-            av_log(avctx, AV_LOG_ERROR, "Error: buffer too small for parameter sets.\n");
+            av_log(avctx, AV_LOG_ERROR, "Buffer too small for parameter sets.\n");
             return AVERROR_BUFFER_TOO_SMALL;
         }
 
@@ -1205,7 +1205,7 @@ static int vtenc_create_encoder(AVCodecContext   *avctx,
                                             session);
 
     if (status || !vtctx->session) {
-        av_log(avctx, AV_LOG_ERROR, "Error: cannot create compression session: %d\n", status);
+        av_log(avctx, AV_LOG_ERROR, "Cannot create compression session: %d\n", status);
 
 #if !TARGET_OS_IPHONE
         if (!vtctx->allow_sw) {
@@ -1241,7 +1241,7 @@ static int vtenc_create_encoder(AVCodecContext   *avctx,
         return status;
 
     if (avctx->flags & AV_CODEC_FLAG_QSCALE && !vtenc_qscale_enabled()) {
-        av_log(avctx, AV_LOG_ERROR, "Error: -q:v qscale not available for encoder. Use -b:v bitrate instead.\n");
+        av_log(avctx, AV_LOG_ERROR, "-q:v qscale not available for encoder. Use -b:v bitrate instead.\n");
         return AVERROR_EXTERNAL;
     }
 
@@ -1269,7 +1269,7 @@ static int vtenc_create_encoder(AVCodecContext   *avctx,
                                           compat_keys.kVTCompressionPropertyKey_ConstantBitRate,
                                           bit_rate_num);
             if (status == kVTPropertyNotSupportedErr) {
-                av_log(avctx, AV_LOG_ERROR, "Error: -constant_bit_rate true is not supported by the encoder.\n");
+                av_log(avctx, AV_LOG_ERROR, "-constant_bit_rate true is not supported by the encoder.\n");
                 return AVERROR_EXTERNAL;
             }
         } else {
@@ -1625,7 +1625,7 @@ static int vtenc_create_encoder(AVCodecContext   *avctx,
 
     status = VTCompressionSessionPrepareToEncodeFrames(vtctx->session);
     if (status) {
-        av_log(avctx, AV_LOG_ERROR, "Error: cannot prepare encoder: %d\n", status);
+        av_log(avctx, AV_LOG_ERROR, "Cannot prepare encoder: %d\n", status);
         return AVERROR_EXTERNAL;
     }
 
@@ -1644,7 +1644,7 @@ static int vtenc_configure_encoder(AVCodecContext *avctx)
 
     codec_type = get_cm_codec_type(avctx, vtctx->profile, vtctx->alpha_quality);
     if (!codec_type) {
-        av_log(avctx, AV_LOG_ERROR, "Error: no mapping for AVCodecID %d\n", avctx->codec_id);
+        av_log(avctx, AV_LOG_ERROR, "No mapping for AVCodecID %d\n", avctx->codec_id);
         return AVERROR(EINVAL);
     }
 
@@ -2513,7 +2513,7 @@ static int vtenc_send_frame(AVCodecContext *avctx,
     );
 
     if (status) {
-        av_log(avctx, AV_LOG_ERROR, "Error: cannot encode frame: %d\n", status);
+        av_log(avctx, AV_LOG_ERROR, "Cannot encode frame: %d\n", status);
         status = AVERROR_EXTERNAL;
         // Not necessary, just in case new code put after here
         goto out;
-- 
2.52.0


From 5c74ac5ca227936ce144846ead5b3c6ef1f7d2ed Mon Sep 17 00:00:00 2001
From: Zhao Zhili <zhilizhao@tencent.com>
Date: Fri, 14 Nov 2025 16:23:10 +0800
Subject: [PATCH 025/304] avfilter/vf_drawtext: fix incorrect text length

From the doc of HarfBuzz, what hb_buffer_add_utf8 needs is the
number of bytes, not Unicode character:
hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text));

Fix issue #20906.
---
 libavfilter/vf_drawtext.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/libavfilter/vf_drawtext.c b/libavfilter/vf_drawtext.c
index 1d23805b54..c0400b8272 100644
--- a/libavfilter/vf_drawtext.c
+++ b/libavfilter/vf_drawtext.c
@@ -1396,7 +1396,6 @@ static int measure_text(AVFilterContext *ctx, TextMetrics *metrics)
     DrawTextContext *s = ctx->priv;
     char *text = s->expanded_text.str;
     char *textdup = NULL, *start = NULL;
-    int num_chars = 0;
     int width64 = 0, w64 = 0;
     int cur_min_y64 = 0, first_max_y64 = -32000;
     int first_min_x64 = 32000, last_max_x64 = -32000;
@@ -1459,7 +1458,7 @@ continue_on_failed2:
             TextLine *cur_line = &s->lines[line_count];
             HarfbuzzData *hb = &cur_line->hb_data;
             cur_line->cluster_offset = line_offset;
-            ret = shape_text_hb(s, hb, start, num_chars);
+            ret = shape_text_hb(s, hb, start, p - start);
             if (ret != 0) {
                 goto done;
             }
@@ -1517,14 +1516,12 @@ continue_on_failed2:
             if (w64 > width64) {
                 width64 = w64;
             }
-            num_chars = -1;
             start = p;
             ++line_count;
             line_offset = i + 1;
         }
 
         if (code == 0) break;
-        ++num_chars;
     }
 
     metrics->line_height64 = s->face->size->metrics.height;
-- 
2.52.0


From 2345b1f4db2fce9aace719de6c0fe55744aa0056 Mon Sep 17 00:00:00 2001
From: Zhao Zhili <zhilizhao@tencent.com>
Date: Fri, 14 Nov 2025 16:53:07 +0800
Subject: [PATCH 026/304] avfilter/vf_drawtext: fix call GET_UTF8 with invalid
 argument

For GET_UTF8(val, GET_BYTE, ERROR), val has type of uint32_t,
GET_BYTE must return an unsigned integer, otherwise signed
extension happened due to val= (GET_BYTE), and GET_UTF8 went to
the error path.

This bug incidentally cancelled the bug where hb_buffer_add_utf8
was being called with incorrect argument, allowing drawtext to
function correctly on x86 and macOS ARM, which defined char as
signed. However, on Linux and Android ARM environments, because
char is unsigned by default, GET_UTF8 now returns the correct
return, which unexpectedly revealed issue #20906.
---
 libavfilter/vf_drawtext.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavfilter/vf_drawtext.c b/libavfilter/vf_drawtext.c
index c0400b8272..2812de37f5 100644
--- a/libavfilter/vf_drawtext.c
+++ b/libavfilter/vf_drawtext.c
@@ -1395,7 +1395,7 @@ static int measure_text(AVFilterContext *ctx, TextMetrics *metrics)
 {
     DrawTextContext *s = ctx->priv;
     char *text = s->expanded_text.str;
-    char *textdup = NULL, *start = NULL;
+    char *textdup = NULL;
     int width64 = 0, w64 = 0;
     int cur_min_y64 = 0, first_max_y64 = -32000;
     int first_min_x64 = 32000, last_max_x64 = -32000;
@@ -1405,7 +1405,7 @@ static int measure_text(AVFilterContext *ctx, TextMetrics *metrics)
     Glyph *glyph = NULL;
 
     int i, tab_idx = 0, last_tab_idx = 0, line_offset = 0;
-    char* p;
+    uint8_t *start, *p;
     int ret = 0;
 
     // Count the lines and the tab characters
-- 
2.52.0


From 054651707e0125886abc4bc4b35197eaac0cb21a Mon Sep 17 00:00:00 2001
From: Zhao Zhili <zhilizhao@tencent.com>
Date: Fri, 14 Nov 2025 17:23:22 +0800
Subject: [PATCH 027/304] avutil/common: cast GET_BYTE/GET_16BIT returned value

In case of GET_BYTE/GET_16BIT return signed value.
---
 libavutil/common.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libavutil/common.h b/libavutil/common.h
index 3b830daf30..bf23aa50b0 100644
--- a/libavutil/common.h
+++ b/libavutil/common.h
@@ -486,13 +486,13 @@ static av_always_inline av_const int av_parity_c(uint32_t v)
  * to prevent undefined results.
  */
 #define GET_UTF8(val, GET_BYTE, ERROR)\
-    val= (GET_BYTE);\
+    val= (uint8_t)(GET_BYTE);\
     {\
         uint32_t top = (val & 128) >> 1;\
         if ((val & 0xc0) == 0x80 || val >= 0xFE)\
             {ERROR}\
         while (val & top) {\
-            unsigned int tmp = (GET_BYTE) - 128;\
+            unsigned int tmp = (uint8_t)(GET_BYTE) - 128;\
             if(tmp>>6)\
                 {ERROR}\
             val= (val<<6) + tmp;\
@@ -511,11 +511,11 @@ static av_always_inline av_const int av_parity_c(uint32_t v)
  *                  typically a goto statement.
  */
 #define GET_UTF16(val, GET_16BIT, ERROR)\
-    val = (GET_16BIT);\
+    val = (uint16_t)(GET_16BIT);\
     {\
         unsigned int hi = val - 0xD800;\
         if (hi < 0x800) {\
-            val = (GET_16BIT) - 0xDC00;\
+            val = (uint16_t)(GET_16BIT) - 0xDC00;\
             if (val > 0x3FFU || hi > 0x3FFU)\
                 {ERROR}\
             val += (hi<<10) + 0x10000;\
-- 
2.52.0


From 336507ff57119dde2d4b60c4d4a4a08df1c5a370 Mon Sep 17 00:00:00 2001
From: James Zern <jzern@google.com>
Date: Wed, 19 Nov 2025 10:38:58 -0800
Subject: [PATCH 028/304] avcodec/libaomenc: Fix use of uninitialized value

codecctl_intp() is used to populate values via aom_codec_control(), not
set them. This change moves the output of the return pointer's value
after the aom_codec_control() to avoid a memory sanitizer warning (use
of uninitialized value).

Signed-off-by: James Zern <jzern@google.com>
---
 libavcodec/libaomenc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libavcodec/libaomenc.c b/libavcodec/libaomenc.c
index 9fb97e74ff..1e5db4ad8c 100644
--- a/libavcodec/libaomenc.c
+++ b/libavcodec/libaomenc.c
@@ -392,17 +392,17 @@ static av_cold int codecctl_intp(AVCodecContext *avctx,
     int width = -30;
     int res;
 
-    snprintf(buf, sizeof(buf), "%s:", ctlidstr[id]);
-    av_log(avctx, AV_LOG_DEBUG, "  %*s%d\n", width, buf, *ptr);
-
     res = aom_codec_control(&ctx->encoder, id, ptr);
     if (res != AOM_CODEC_OK) {
-        snprintf(buf, sizeof(buf), "Failed to set %s codec control",
+        snprintf(buf, sizeof(buf), "Failed to get %s codec control",
                  ctlidstr[id]);
         log_encoder_error(avctx, buf);
         return AVERROR(EINVAL);
     }
 
+    snprintf(buf, sizeof(buf), "%s:", ctlidstr[id]);
+    av_log(avctx, AV_LOG_DEBUG, "  %*s%d\n", width, buf, *ptr);
+
     return 0;
 }
 #endif
-- 
2.52.0


From f96b7da1022cb116a30f2a5de865f798cb6af87d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Wed, 19 Nov 2025 19:37:03 +0100
Subject: [PATCH 029/304] Makefile: remove config_components.asm on distclean
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Forgotten in c607aae2b95b05bdc7066e3572737cb00a596e9f.

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 17fadd58c2..2f78db02a5 100644
--- a/Makefile
+++ b/Makefile
@@ -184,7 +184,7 @@ clean::
 	$(RM) -rf coverage.info coverage.info.in lcov
 
 distclean:: clean
-	$(RM) .version config.asm config.h config_components.h mapfile  \
+	$(RM) .version config.asm config.h config_components.* mapfile  \
 		ffbuild/.config ffbuild/config.* libavutil/avconfig.h \
 		version.h libavutil/ffversion.h libavcodec/codec_names.h \
 		libavcodec/bsf_list.c libavformat/protocol_list.c \
-- 
2.52.0


From a6f257dbc4bf93217dda3bf1392ce5cb4efff779 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Mon, 17 Nov 2025 15:35:38 +0100
Subject: [PATCH 030/304] avcodec/mpeg12: Inline ff_mpeg1_clean_buffers() into
 its callers

This function is extremely small, so inlining it is appropriate (and
actually beneficial size-wise here). It furthermore allows to remove
the mpeg12codecs.h header and the mpeg12-encoders->mpeg12.o dependency.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/Makefile        |  4 ++--
 libavcodec/mpeg12.c        | 10 ----------
 libavcodec/mpeg12codecs.h  | 29 -----------------------------
 libavcodec/mpeg12dec.c     |  7 +++++--
 libavcodec/mpeg12enc.h     |  8 ++++++++
 libavcodec/mpegvideo_enc.c |  3 +--
 6 files changed, 16 insertions(+), 45 deletions(-)
 delete mode 100644 libavcodec/mpeg12codecs.h

diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 49c284ef9e..3e6f000868 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -566,14 +566,14 @@ OBJS-$(CONFIG_MPC7_DECODER)            += mpc7.o mpc.o
 OBJS-$(CONFIG_MPC8_DECODER)            += mpc8.o mpc.o
 OBJS-$(CONFIG_MPEGVIDEO_DECODER)       += mpeg12dec.o mpeg12.o mpeg12data.o
 OBJS-$(CONFIG_MPEG1VIDEO_DECODER)      += mpeg12dec.o mpeg12.o mpeg12data.o
-OBJS-$(CONFIG_MPEG1VIDEO_ENCODER)      += mpeg12enc.o mpeg12.o
+OBJS-$(CONFIG_MPEG1VIDEO_ENCODER)      += mpeg12enc.o
 OBJS-$(CONFIG_MPEG1_CUVID_DECODER)     += cuviddec.o
 OBJS-$(CONFIG_MPEG1_V4L2M2M_DECODER)   += v4l2_m2m_dec.o
 OBJS-$(CONFIG_MPEG2_MMAL_DECODER)      += mmaldec.o
 OBJS-$(CONFIG_MPEG2_QSV_DECODER)       += qsvdec.o
 OBJS-$(CONFIG_MPEG2_QSV_ENCODER)       += qsvenc_mpeg2.o
 OBJS-$(CONFIG_MPEG2VIDEO_DECODER)      += mpeg12dec.o mpeg12.o mpeg12data.o
-OBJS-$(CONFIG_MPEG2VIDEO_ENCODER)      += mpeg12enc.o mpeg12.o
+OBJS-$(CONFIG_MPEG2VIDEO_ENCODER)      += mpeg12enc.o
 OBJS-$(CONFIG_MPEG2_CUVID_DECODER)     += cuviddec.o
 OBJS-$(CONFIG_MPEG2_MEDIACODEC_DECODER) += mediacodecdec.o
 OBJS-$(CONFIG_MPEG2_VAAPI_ENCODER)     += vaapi_encode_mpeg2.o
diff --git a/libavcodec/mpeg12.c b/libavcodec/mpeg12.c
index 62a77b6806..61723f3a29 100644
--- a/libavcodec/mpeg12.c
+++ b/libavcodec/mpeg12.c
@@ -30,8 +30,6 @@
 #include "libavutil/attributes.h"
 #include "libavutil/thread.h"
 
-#include "mpegvideo.h"
-#include "mpeg12codecs.h"
 #include "mpeg12data.h"
 #include "mpeg12dec.h"
 #include "mpegutils.h"
@@ -122,14 +120,6 @@ av_cold void ff_init_2d_vlc_rl(const uint16_t table_vlc[][2], RL_VLC_ELEM rl_vlc
     }
 }
 
-void ff_mpeg1_clean_buffers(MpegEncContext *s)
-{
-    s->last_dc[0] = 1 << (7 + s->intra_dc_precision);
-    s->last_dc[1] = s->last_dc[0];
-    s->last_dc[2] = s->last_dc[0];
-    memset(s->last_mv, 0, sizeof(s->last_mv));
-}
-
 
 /******************************************/
 /* decoding */
diff --git a/libavcodec/mpeg12codecs.h b/libavcodec/mpeg12codecs.h
deleted file mode 100644
index f8cf5503e2..0000000000
--- a/libavcodec/mpeg12codecs.h
+++ /dev/null
@@ -1,29 +0,0 @@
-/*
- * MPEG-1/2 codecs common code
- * Copyright (c) 2007 Aurelien Jacobs <aurel@gnuage.org>
- *
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#ifndef AVCODEC_MPEG12CODECS_H
-#define AVCODEC_MPEG12CODECS_H
-
-#include "mpegvideo.h"
-
-void ff_mpeg1_clean_buffers(MpegEncContext *s);
-
-#endif /* AVCODEC_MPEG12CODECS_H */
diff --git a/libavcodec/mpeg12dec.c b/libavcodec/mpeg12dec.c
index 3ea8d02e1b..58ff925bfd 100644
--- a/libavcodec/mpeg12dec.c
+++ b/libavcodec/mpeg12dec.c
@@ -50,7 +50,6 @@
 #include "idctdsp.h"
 #include "mpeg_er.h"
 #include "mpeg12.h"
-#include "mpeg12codecs.h"
 #include "mpeg12data.h"
 #include "mpeg12dec.h"
 #include "mpegutils.h"
@@ -1375,7 +1374,6 @@ static int mpeg_decode_slice(Mpeg12SliceContext *const s, int mb_y,
     if (s->c.codec_id != AV_CODEC_ID_MPEG1VIDEO && s->c.mb_height > 2800/16)
         skip_bits(&s->gb, 3);
 
-    ff_mpeg1_clean_buffers(&s->c);
     s->c.interlaced_dct = 0;
 
     s->c.qscale = mpeg_get_qscale(&s->gb, s->c.q_scale_type);
@@ -1455,6 +1453,11 @@ static int mpeg_decode_slice(Mpeg12SliceContext *const s, int mb_y,
         }
     }
 
+    s->c.last_dc[0] = 128 << s->c.intra_dc_precision;
+    s->c.last_dc[1] = s->c.last_dc[0];
+    s->c.last_dc[2] = s->c.last_dc[0];
+    memset(s->c.last_mv, 0, sizeof(s->c.last_mv));
+
     for (int mb_skip_run = 0;;) {
         ret = mpeg_decode_mb(s, &mb_skip_run);
         if (ret < 0)
diff --git a/libavcodec/mpeg12enc.h b/libavcodec/mpeg12enc.h
index a8aeadbb3e..97007be8fe 100644
--- a/libavcodec/mpeg12enc.h
+++ b/libavcodec/mpeg12enc.h
@@ -36,4 +36,12 @@ static inline void ff_mpeg1_encode_init(MPVEncContext *s)
     s->c.c_dc_scale_table = ff_mpeg12_dc_scale_table[s->c.intra_dc_precision];
 }
 
+static inline void ff_mpeg1_clean_buffers(MPVEncContext *s)
+{
+    s->c.last_dc[0] = 128 << s->c.intra_dc_precision;
+    s->c.last_dc[1] = s->c.last_dc[0];
+    s->c.last_dc[2] = s->c.last_dc[0];
+    memset(s->c.last_mv, 0, sizeof(s->c.last_mv));
+}
+
 #endif /* AVCODEC_MPEG12ENC_H */
diff --git a/libavcodec/mpegvideo_enc.c b/libavcodec/mpegvideo_enc.c
index 9e83026b51..8015886dc4 100644
--- a/libavcodec/mpegvideo_enc.c
+++ b/libavcodec/mpegvideo_enc.c
@@ -47,7 +47,6 @@
 #include "avcodec.h"
 #include "encode.h"
 #include "idctdsp.h"
-#include "mpeg12codecs.h"
 #include "mpeg12data.h"
 #include "mpeg12enc.h"
 #include "mpegvideo.h"
@@ -3144,7 +3143,7 @@ static int encode_thread(AVCodecContext *c, void *arg){
                     case AV_CODEC_ID_MPEG2VIDEO:
                         if (CONFIG_MPEG1VIDEO_ENCODER || CONFIG_MPEG2VIDEO_ENCODER) {
                             ff_mpeg1_encode_slice_header(s);
-                            ff_mpeg1_clean_buffers(&s->c);
+                            ff_mpeg1_clean_buffers(s);
                         }
                     break;
 #if CONFIG_H263P_ENCODER
-- 
2.52.0


From 8adabb53109e3968dd8fbc06760fd58101f2da6a Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Mon, 17 Nov 2025 15:39:47 +0100
Subject: [PATCH 031/304] avcodec/Makefile: Remove mpegvideo_parser->mpeg12.o
 dependency

Forgotten in 3ceffe783965767e62d59e8e68ecd265c98460ec.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/Makefile | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 3e6f000868..0cd2408865 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1275,8 +1275,7 @@ OBJS-$(CONFIG_MPEG4VIDEO_PARSER)       += mpeg4video_parser.o h263.o \
                                           mpeg4videodec.o mpeg4video.o \
                                           ituh263dec.o h263data.o
 OBJS-$(CONFIG_MPEGAUDIO_PARSER)        += mpegaudio_parser.o
-OBJS-$(CONFIG_MPEGVIDEO_PARSER)        += mpegvideo_parser.o    \
-                                          mpeg12.o mpeg12data.o
+OBJS-$(CONFIG_MPEGVIDEO_PARSER)        += mpegvideo_parser.o mpeg12data.o
 OBJS-$(CONFIG_OPUS_PARSER)             += vorbis_data.o
 OBJS-$(CONFIG_PNG_PARSER)              += png_parser.o
 OBJS-$(CONFIG_PNM_PARSER)              += pnm_parser.o pnm.o
-- 
2.52.0


From 6dc18bc860093a823b6749daf2e1a9ce32a50c25 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Mon, 17 Nov 2025 16:15:41 +0100
Subject: [PATCH 032/304] avcodec/mpegvideo: Move last_dc to
 {H263Dec,Mpeg12Slice,MPVEnc}Context

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/dnxhdenc.c      | 20 ++++++++++----------
 libavcodec/h263dec.c       |  6 +++---
 libavcodec/h263dec.h       |  3 +++
 libavcodec/ituh263dec.c    |  4 ++--
 libavcodec/mjpegenc.c      | 10 +++++-----
 libavcodec/mpeg12dec.c     | 22 ++++++++++++----------
 libavcodec/mpeg12enc.c     |  6 +++---
 libavcodec/mpeg12enc.h     |  6 +++---
 libavcodec/mpeg4videodec.c | 16 ++++++++--------
 libavcodec/mpegvideo.h     |  1 -
 libavcodec/mpegvideo_enc.c | 16 ++++++++--------
 libavcodec/mpegvideoenc.h  |  1 +
 libavcodec/msmpeg4dec.c    |  8 ++++----
 libavcodec/rv10.c          | 10 +++++-----
 libavcodec/speedhqenc.c    |  4 ++--
 15 files changed, 69 insertions(+), 64 deletions(-)

diff --git a/libavcodec/dnxhdenc.c b/libavcodec/dnxhdenc.c
index edb58ba25f..7994b1d497 100644
--- a/libavcodec/dnxhdenc.c
+++ b/libavcodec/dnxhdenc.c
@@ -575,8 +575,8 @@ void dnxhd_encode_block(PutBitContext *pb, DNXHDEncContext *ctx,
     int last_non_zero = 0;
     int slevel, i, j;
 
-    dnxhd_encode_dc(pb, ctx, block[0] - ctx->m.c.last_dc[n]);
-    ctx->m.c.last_dc[n] = block[0];
+    dnxhd_encode_dc(pb, ctx, block[0] - ctx->m.last_dc[n]);
+    ctx->m.last_dc[n] = block[0];
 
     for (i = 1; i <= last_index; i++) {
         j = ctx->m.c.intra_scantable.permutated[i];
@@ -822,9 +822,9 @@ static int dnxhd_calc_bits_thread(AVCodecContext *avctx, void *arg,
     LOCAL_ALIGNED_16(int16_t, block, [64]);
     ctx = ctx->thread[threadnr];
 
-    ctx->m.c.last_dc[0] =
-    ctx->m.c.last_dc[1] =
-    ctx->m.c.last_dc[2] = 1 << (ctx->bit_depth + 2);
+    ctx->m.last_dc[0] =
+    ctx->m.last_dc[1] =
+    ctx->m.last_dc[2] = 1 << (ctx->bit_depth + 2);
 
     for (int mb_x = 0; mb_x < ctx->m.c.mb_width; mb_x++) {
         unsigned mb = mb_y * ctx->m.c.mb_width + mb_x;
@@ -846,7 +846,7 @@ static int dnxhd_calc_bits_thread(AVCodecContext *avctx, void *arg,
                                              qscale, &overflow);
             ac_bits   += dnxhd_calc_ac_bits(ctx, block, last_index);
 
-            diff = block[0] - ctx->m.c.last_dc[n];
+            diff = block[0] - ctx->m.last_dc[n];
             if (diff < 0)
                 nbits = av_log2_16bit(-2 * diff);
             else
@@ -855,7 +855,7 @@ static int dnxhd_calc_bits_thread(AVCodecContext *avctx, void *arg,
             av_assert1(nbits < ctx->bit_depth + 4);
             dc_bits += ctx->cid_table->dc_bits[nbits] + nbits;
 
-            ctx->m.c.last_dc[n] = block[0];
+            ctx->m.last_dc[n] = block[0];
 
             if (avctx->mb_decision == FF_MB_DECISION_RD || !RC_VARIANCE) {
                 dnxhd_unquantize_c(ctx, block, i, qscale, last_index);
@@ -880,9 +880,9 @@ static int dnxhd_encode_thread(AVCodecContext *avctx, void *arg,
     init_put_bits(pb, (uint8_t *)arg + ctx->data_offset + ctx->slice_offs[jobnr],
                   ctx->slice_size[jobnr]);
 
-    ctx->m.c.last_dc[0] =
-    ctx->m.c.last_dc[1] =
-    ctx->m.c.last_dc[2] = 1 << (ctx->bit_depth + 2);
+    ctx->m.last_dc[0] =
+    ctx->m.last_dc[1] =
+    ctx->m.last_dc[2] = 1 << (ctx->bit_depth + 2);
     for (int mb_x = 0; mb_x < ctx->m.c.mb_width; mb_x++) {
         unsigned mb = mb_y * ctx->m.c.mb_width + mb_x;
         int qscale = ctx->mb_qscale[mb];
diff --git a/libavcodec/h263dec.c b/libavcodec/h263dec.c
index 697fe7e572..b4c8aa38f5 100644
--- a/libavcodec/h263dec.c
+++ b/libavcodec/h263dec.c
@@ -247,9 +247,9 @@ static int decode_slice(H263DecContext *const h)
         }
 
         if (h->c.msmpeg4_version == MSMP4_V1) {
-            h->c.last_dc[0] =
-            h->c.last_dc[1] =
-            h->c.last_dc[2] = 128;
+            h->last_dc[0] =
+            h->last_dc[1] =
+            h->last_dc[2] = 128;
         }
 
         ff_init_block_index(&h->c);
diff --git a/libavcodec/h263dec.h b/libavcodec/h263dec.h
index ace210b036..4c25a833cf 100644
--- a/libavcodec/h263dec.h
+++ b/libavcodec/h263dec.h
@@ -79,6 +79,9 @@ typedef struct H263DecContext {
     /* MSMPEG4 specific */
     int slice_height;           ///< in macroblocks
 
+    /* MPEG-4 (Studio Profile), MSMPEG4 and RV10 specific */
+    int last_dc[3];             ///< last DC values, used by MPEG4, MSMPEG4V1, RV10
+
     /* RV10 specific */
     int rv10_version; ///< RV10 version: 0 or 3
     int rv10_first_dc_coded[3];
diff --git a/libavcodec/ituh263dec.c b/libavcodec/ituh263dec.c
index dc8847b53b..a1472ce0a0 100644
--- a/libavcodec/ituh263dec.c
+++ b/libavcodec/ituh263dec.c
@@ -551,14 +551,14 @@ static int h263_decode_block(H263DecContext *const h, int16_t block[64],
         if (CONFIG_RV10_DECODER && h->c.codec_id == AV_CODEC_ID_RV10) {
             if (h->rv10_version == 3 && h->c.pict_type == AV_PICTURE_TYPE_I) {
                 int component = (n <= 3 ? 0 : n - 4 + 1);
-                level = h->c.last_dc[component];
+                level = h->last_dc[component];
                 if (h->rv10_first_dc_coded[component]) {
                     int diff = ff_rv_decode_dc(h, n);
                     if (diff < 0)
                         return -1;
                     level += diff;
                     level = level & 0xff; /* handle wrap round */
-                    h->c.last_dc[component] = level;
+                    h->last_dc[component] = level;
                 } else {
                     h->rv10_first_dc_coded[component] = 1;
                 }
diff --git a/libavcodec/mjpegenc.c b/libavcodec/mjpegenc.c
index 214e2b0ec1..c5531b1c9f 100644
--- a/libavcodec/mjpegenc.c
+++ b/libavcodec/mjpegenc.c
@@ -279,7 +279,7 @@ int ff_mjpeg_encode_stuffing(MPVEncContext *const s)
 
 fail:
     for (int i = 0; i < 3; i++)
-        s->c.last_dc[i] = 128 << s->c.intra_dc_precision;
+        s->last_dc[i] = 128 << s->c.intra_dc_precision;
 
     return ret;
 }
@@ -371,11 +371,11 @@ static void record_block(MPVEncContext *const s, int16_t block[], int n)
     component = (n <= 3 ? 0 : (n&1) + 1);
     table_id = (n <= 3 ? 0 : 1);
     dc = block[0]; /* overflow is impossible */
-    val = dc - s->c.last_dc[component];
+    val = dc - s->last_dc[component];
 
     mjpeg_encode_coef(m, table_id, val, 0);
 
-    s->c.last_dc[component] = dc;
+    s->last_dc[component] = dc;
 
     /* AC coefs */
 
@@ -415,7 +415,7 @@ static void encode_block(MPVEncContext *const s, int16_t block[], int n)
     /* DC coef */
     component = (n <= 3 ? 0 : (n&1) + 1);
     dc = block[0]; /* overflow is impossible */
-    val = dc - s->c.last_dc[component];
+    val = dc - s->last_dc[component];
     if (n < 4) {
         ff_mjpeg_encode_dc(&s->pb, val, m->huff_size_dc_luminance, m->huff_code_dc_luminance);
         huff_size_ac = m->huff_size_ac_luminance;
@@ -425,7 +425,7 @@ static void encode_block(MPVEncContext *const s, int16_t block[], int n)
         huff_size_ac = m->huff_size_ac_chrominance;
         huff_code_ac = m->huff_code_ac_chrominance;
     }
-    s->c.last_dc[component] = dc;
+    s->last_dc[component] = dc;
 
     /* AC coefs */
 
diff --git a/libavcodec/mpeg12dec.c b/libavcodec/mpeg12dec.c
index 58ff925bfd..1caa8279bd 100644
--- a/libavcodec/mpeg12dec.c
+++ b/libavcodec/mpeg12dec.c
@@ -73,6 +73,8 @@ typedef struct Mpeg12SliceContext {
     MPVContext c;
     GetBitContext gb;
 
+    int last_dc[3];                ///< last DC values
+
     DECLARE_ALIGNED_32(int16_t, block)[12][64];
 } Mpeg12SliceContext;
 
@@ -326,9 +328,9 @@ static inline int mpeg2_decode_block_intra(Mpeg12SliceContext *const s,
         component    = (n & 1) + 1;
     }
     diff = decode_dc(&s->gb, component);
-    dc  = s->c.last_dc[component];
+    dc  = s->last_dc[component];
     dc += diff;
-    s->c.last_dc[component] = dc;
+    s->last_dc[component] = dc;
     block[0] = dc * (1 << (3 - s->c.intra_dc_precision));
     ff_tlog(s->c.avctx, "dc=%d\n", block[0]);
     mismatch = block[0] ^ 1;
@@ -517,7 +519,7 @@ static int mpeg_decode_mb(Mpeg12SliceContext *const s, int *mb_skip_run)
                 ret = ff_mpeg1_decode_block_intra(&s->gb,
                                                   s->c.intra_matrix,
                                                   s->c.intra_scantable.permutated,
-                                                  s->c.last_dc, s->block[i],
+                                                  s->last_dc, s->block[i],
                                                   i, s->c.qscale);
                 if (ret < 0) {
                     av_log(s->c.avctx, AV_LOG_ERROR, "ac-tex damaged at %d %d\n",
@@ -713,7 +715,7 @@ static int mpeg_decode_mb(Mpeg12SliceContext *const s, int *mb_skip_run)
         }
 
         s->c.mb_intra = 0;
-        s->c.last_dc[0] = s->c.last_dc[1] = s->c.last_dc[2] = 128 << s->c.intra_dc_precision;
+        s->last_dc[0] = s->last_dc[1] = s->last_dc[2] = 128 << s->c.intra_dc_precision;
         if (HAS_CBP(mb_type)) {
             s->c.bdsp.clear_blocks(s->block[0]);
 
@@ -1453,9 +1455,9 @@ static int mpeg_decode_slice(Mpeg12SliceContext *const s, int mb_y,
         }
     }
 
-    s->c.last_dc[0] = 128 << s->c.intra_dc_precision;
-    s->c.last_dc[1] = s->c.last_dc[0];
-    s->c.last_dc[2] = s->c.last_dc[0];
+    s->last_dc[0] = 128 << s->c.intra_dc_precision;
+    s->last_dc[1] = s->last_dc[0];
+    s->last_dc[2] = s->last_dc[0];
     memset(s->c.last_mv, 0, sizeof(s->c.last_mv));
 
     for (int mb_skip_run = 0;;) {
@@ -1600,7 +1602,7 @@ static int mpeg_decode_slice(Mpeg12SliceContext *const s, int mb_y,
                 s->c.mb_intra = 0;
                 for (i = 0; i < 12; i++)
                     s->c.block_last_index[i] = -1;
-                s->c.last_dc[0] = s->c.last_dc[1] = s->c.last_dc[2] = 128 << s->c.intra_dc_precision;
+                s->last_dc[0] = s->last_dc[1] = s->last_dc[2] = 128 << s->c.intra_dc_precision;
                 if (s->c.picture_structure == PICT_FRAME)
                     s->c.mv_type = MV_TYPE_16X16;
                 else
@@ -2793,7 +2795,7 @@ static int ipu_decode_frame(AVCodecContext *avctx, AVFrame *frame,
                          s->flags & 0x10 ? ff_alternate_vertical_scan : ff_zigzag_direct,
                          m->idsp.idct_permutation);
 
-    m->last_dc[0] = m->last_dc[1] = m->last_dc[2] = 1 << (7 + (s->flags & 3));
+    s->m.last_dc[0] = s->m.last_dc[1] = s->m.last_dc[2] = 128 << (s->flags & 3);
     m->qscale = 1;
 
     for (int y = 0; y < avctx->height; y += 16) {
@@ -2825,7 +2827,7 @@ static int ipu_decode_frame(AVCodecContext *avctx, AVFrame *frame,
                     ret = ff_mpeg1_decode_block_intra(gb,
                                                       m->intra_matrix,
                                                       m->intra_scantable.permutated,
-                                                      m->last_dc, block[n],
+                                                      s->m.last_dc, block[n],
                                                       n, m->qscale);
                 } else {
                     ret = mpeg2_decode_block_intra(&s->m, block[n], n);
diff --git a/libavcodec/mpeg12enc.c b/libavcodec/mpeg12enc.c
index fb480d0eec..4b9d186396 100644
--- a/libavcodec/mpeg12enc.c
+++ b/libavcodec/mpeg12enc.c
@@ -589,9 +589,9 @@ static void mpeg1_encode_block(MPVEncContext *const s, const int16_t block[], in
     if (s->c.mb_intra) {
         component = (n <= 3 ? 0 : (n & 1) + 1);
         dc        = block[0];                   /* overflow is impossible */
-        diff      = dc - s->c.last_dc[component];
+        diff      = dc - s->last_dc[component];
         encode_dc(s, diff, component);
-        s->c.last_dc[component] = dc;
+        s->last_dc[component] = dc;
         i = 1;
         if (s->c.intra_vlc_format)
             table_vlc = ff_mpeg2_vlc_table;
@@ -936,7 +936,7 @@ static void mpeg12_encode_mb(MPVEncContext *const s, int16_t block[][64],
                              int motion_x, int motion_y)
 {
     if (!s->c.mb_intra)
-        s->c.last_dc[0] = s->c.last_dc[1] = s->c.last_dc[2] = 128 << s->c.intra_dc_precision;
+        s->last_dc[0] = s->last_dc[1] = s->last_dc[2] = 128 << s->c.intra_dc_precision;
     if (s->c.chroma_format == CHROMA_420)
         mpeg1_encode_mb_internal(s, block, motion_x, motion_y, 6, 1);
     else
diff --git a/libavcodec/mpeg12enc.h b/libavcodec/mpeg12enc.h
index 97007be8fe..e04c7dea38 100644
--- a/libavcodec/mpeg12enc.h
+++ b/libavcodec/mpeg12enc.h
@@ -38,9 +38,9 @@ static inline void ff_mpeg1_encode_init(MPVEncContext *s)
 
 static inline void ff_mpeg1_clean_buffers(MPVEncContext *s)
 {
-    s->c.last_dc[0] = 128 << s->c.intra_dc_precision;
-    s->c.last_dc[1] = s->c.last_dc[0];
-    s->c.last_dc[2] = s->c.last_dc[0];
+    s->last_dc[0] = 128 << s->c.intra_dc_precision;
+    s->last_dc[1] = s->last_dc[0];
+    s->last_dc[2] = s->last_dc[0];
     memset(s->c.last_mv, 0, sizeof(s->c.last_mv));
 }
 
diff --git a/libavcodec/mpeg4videodec.c b/libavcodec/mpeg4videodec.c
index 4a1385ea4d..e4a765d5ec 100644
--- a/libavcodec/mpeg4videodec.c
+++ b/libavcodec/mpeg4videodec.c
@@ -794,11 +794,11 @@ int ff_mpeg4_decode_video_packet_header(H263DecContext *const h)
 
 static void reset_studio_dc_predictors(Mpeg4DecContext *const ctx)
 {
-    MPVContext *const s = &ctx->h.c;
+    H263DecContext *const h = &ctx->h;
     /* Reset DC Predictors */
-    s->last_dc[0] =
-    s->last_dc[1] =
-    s->last_dc[2] = 1 << (s->avctx->bits_per_raw_sample + ctx->dct_precision + s->intra_dc_precision - 1);
+    h->last_dc[0] =
+    h->last_dc[1] =
+    h->last_dc[2] = 1 << (h->c.avctx->bits_per_raw_sample + ctx->dct_precision + h->c.intra_dc_precision - 1);
 }
 
 /**
@@ -2196,12 +2196,12 @@ static int mpeg4_decode_studio_block(Mpeg4DecContext *const ctx, int32_t block[6
 
     }
 
-    h->c.last_dc[cc] += dct_diff;
+    h->last_dc[cc] += dct_diff;
 
     if (ctx->mpeg_quant)
-        block[0] = h->c.last_dc[cc] * (8 >> h->c.intra_dc_precision);
+        block[0] = h->last_dc[cc] * (8 >> h->c.intra_dc_precision);
     else
-        block[0] = h->c.last_dc[cc] * (8 >> h->c.intra_dc_precision) * (8 >> ctx->dct_precision);
+        block[0] = h->last_dc[cc] * (8 >> h->c.intra_dc_precision) * (8 >> ctx->dct_precision);
     /* TODO: support mpeg_quant for AC coefficients */
 
     block[0] = av_clip(block[0], min, max);
@@ -2283,7 +2283,7 @@ static int mpeg4_decode_dpcm_macroblock(Mpeg4DecContext *const ctx,
         av_log(h->c.avctx, AV_LOG_ERROR, "Forbidden block_mean\n");
         return AVERROR_INVALIDDATA;
     }
-    h->c.last_dc[n] = block_mean * (1 << (ctx->dct_precision + h->c.intra_dc_precision));
+    h->last_dc[n] = block_mean * (1 << (ctx->dct_precision + h->c.intra_dc_precision));
 
     rice_parameter = get_bits(&h->gb, 4);
     if (rice_parameter == 0) {
diff --git a/libavcodec/mpegvideo.h b/libavcodec/mpegvideo.h
index a04166efa8..cb4b99acd3 100644
--- a/libavcodec/mpegvideo.h
+++ b/libavcodec/mpegvideo.h
@@ -133,7 +133,6 @@ typedef struct MpegEncContext {
      */
     MPVWorkPicture cur_pic;
 
-    int last_dc[3];                ///< last DC values for MPEG-1
     int16_t *dc_val_base;
     const uint8_t *y_dc_scale_table;     ///< qscale -> y_dc_scale table
     const uint8_t *c_dc_scale_table;     ///< qscale -> c_dc_scale table
diff --git a/libavcodec/mpegvideo_enc.c b/libavcodec/mpegvideo_enc.c
index 8015886dc4..e1f9623d65 100644
--- a/libavcodec/mpegvideo_enc.c
+++ b/libavcodec/mpegvideo_enc.c
@@ -2639,13 +2639,13 @@ typedef struct MBBackup {
         int mv[2][4][2];
         int last_mv[2][2][2];
         int mv_type, mv_dir;
-        int last_dc[3];
         int mb_intra, mb_skipped;
         int qscale;
         int block_last_index[8];
         int interlaced_dct;
     } c;
     int mb_skip_run;
+    int last_dc[3];
     int mv_bits, i_tex_bits, p_tex_bits, i_count, misc_bits, last_bits;
     int dquant;
     int esc3_level_length;
@@ -2663,7 +2663,7 @@ static inline void BEFORE ##_context_before_encode(DST_TYPE *const d,       \
     /* MPEG-1 */                                                            \
     d->mb_skip_run = s->mb_skip_run;                                        \
     for (int i = 0; i < 3; i++)                                             \
-        d->c.last_dc[i] = s->c.last_dc[i];                                  \
+        d->last_dc[i] = s->last_dc[i];                                      \
                                                                             \
     /* statistics */                                                        \
     d->mv_bits    = s->mv_bits;                                             \
@@ -2691,7 +2691,7 @@ static inline void AFTER ## _context_after_encode(DST_TYPE *const d,        \
     /* MPEG-1 */                                                            \
     d->mb_skip_run = s->mb_skip_run;                                        \
     for (int i = 0; i < 3; i++)                                             \
-        d->c.last_dc[i] = s->c.last_dc[i];                                  \
+        d->last_dc[i] = s->last_dc[i];                                      \
                                                                             \
     /* statistics */                                                        \
     d->mv_bits    = s->mv_bits;                                             \
@@ -3009,14 +3009,14 @@ static int encode_thread(AVCodecContext *c, void *arg){
     for(i=0; i<3; i++){
         /* init last dc values */
         /* note: quant matrix value (8) is implied here */
-        s->c.last_dc[i] = 128 << s->c.intra_dc_precision;
+        s->last_dc[i] = 128 << s->c.intra_dc_precision;
 
         s->encoding_error[i] = 0;
     }
     if (s->c.codec_id == AV_CODEC_ID_AMV) {
-        s->c.last_dc[0] = 128 * 8 / 13;
-        s->c.last_dc[1] = 128 * 8 / 14;
-        s->c.last_dc[2] = 128 * 8 / 14;
+        s->last_dc[0] = 128 * 8 / 13;
+        s->last_dc[1] = 128 * 8 / 14;
+        s->last_dc[2] = 128 * 8 / 14;
 #if CONFIG_MPEG4_ENCODER
     } else if (s->partitioned_frame) {
         av_assert1(s->c.codec_id == AV_CODEC_ID_MPEG4);
@@ -3039,7 +3039,7 @@ static int encode_thread(AVCodecContext *c, void *arg){
             mb_y = ff_speedhq_mb_y_order_to_mb(mb_y_order, s->c.mb_height, &first_in_slice);
             if (first_in_slice && mb_y_order != s->c.start_mb_y)
                 ff_speedhq_end_slice(s);
-            s->c.last_dc[0] = s->c.last_dc[1] = s->c.last_dc[2] = 1024 << s->c.intra_dc_precision;
+            s->last_dc[0] = s->last_dc[1] = s->last_dc[2] = 1024 << s->c.intra_dc_precision;
         } else {
             mb_y = mb_y_order;
         }
diff --git a/libavcodec/mpegvideoenc.h b/libavcodec/mpegvideoenc.h
index 131908c10a..4366e78f90 100644
--- a/libavcodec/mpegvideoenc.h
+++ b/libavcodec/mpegvideoenc.h
@@ -138,6 +138,7 @@ typedef struct MPVEncContext {
     int last_bits; ///< temp var used for calculating the above vars
 
     int mb_skip_run;
+    int last_dc[3];                ///< last DC values
 
     /* H.263 specific */
     int gob_index;
diff --git a/libavcodec/msmpeg4dec.c b/libavcodec/msmpeg4dec.c
index d2249559c9..f2ab99ecb5 100644
--- a/libavcodec/msmpeg4dec.c
+++ b/libavcodec/msmpeg4dec.c
@@ -48,7 +48,7 @@
 
 static const VLCElem *mv_tables[2];
 
-static inline int msmpeg4v1_pred_dc(MpegEncContext * s, int n,
+static inline int msmpeg4v1_pred_dc(H263DecContext *const h, int n,
                                     int32_t **dc_val_ptr)
 {
     int i;
@@ -59,8 +59,8 @@ static inline int msmpeg4v1_pred_dc(MpegEncContext * s, int n,
         i= n-3;
     }
 
-    *dc_val_ptr= &s->last_dc[i];
-    return s->last_dc[i];
+    *dc_val_ptr= &h->last_dc[i];
+    return h->last_dc[i];
 }
 
 /****************************************/
@@ -588,7 +588,7 @@ static int msmpeg4_decode_dc(MSMP4DecContext *const ms, int n, int *dir_ptr)
 
     if (h->c.msmpeg4_version == MSMP4_V1) {
         int32_t *dc_val;
-        pred = msmpeg4v1_pred_dc(&h->c, n, &dc_val);
+        pred = msmpeg4v1_pred_dc(h, n, &dc_val);
         level += pred;
 
         /* update predictor */
diff --git a/libavcodec/rv10.c b/libavcodec/rv10.c
index b4545f7624..1958f36c98 100644
--- a/libavcodec/rv10.c
+++ b/libavcodec/rv10.c
@@ -129,11 +129,11 @@ static int rv10_decode_picture_header(H263DecContext *const h)
     if (h->c.pict_type == AV_PICTURE_TYPE_I) {
         if (h->rv10_version == 3) {
             /* specific MPEG like DC coding not used */
-            h->c.last_dc[0] = get_bits(&h->gb, 8);
-            h->c.last_dc[1] = get_bits(&h->gb, 8);
-            h->c.last_dc[2] = get_bits(&h->gb, 8);
-            ff_dlog(h->c.avctx, "DC:%d %d %d\n", h->c.last_dc[0],
-                    h->c.last_dc[1], h->c.last_dc[2]);
+            h->last_dc[0] = get_bits(&h->gb, 8);
+            h->last_dc[1] = get_bits(&h->gb, 8);
+            h->last_dc[2] = get_bits(&h->gb, 8);
+            ff_dlog(h->c.avctx, "DC:%d %d %d\n", h->last_dc[0],
+                    h->last_dc[1], h->last_dc[2]);
         }
     }
     /* if multiple packets per frame are sent, the position at which
diff --git a/libavcodec/speedhqenc.c b/libavcodec/speedhqenc.c
index da7aba6ec9..3041c06864 100644
--- a/libavcodec/speedhqenc.c
+++ b/libavcodec/speedhqenc.c
@@ -172,9 +172,9 @@ static void encode_block(MPVEncContext *const s, const int16_t block[], int n)
     /* DC coef */
     component = (n <= 3 ? 0 : (n&1) + 1);
     dc = block[0]; /* overflow is impossible */
-    val = s->c.last_dc[component] - dc;  /* opposite of most codecs */
+    val = s->last_dc[component] - dc;  /* opposite of most codecs */
     encode_dc(&s->pb, val, component);
-    s->c.last_dc[component] = dc;
+    s->last_dc[component] = dc;
 
     /* now quantify & encode AC coefs */
     last_non_zero = 0;
-- 
2.52.0


From 316c293cc932e96ee443a1ba5a8bce6dd298a352 Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Thu, 20 Nov 2025 18:15:23 -0300
Subject: [PATCH 033/304] swscale/x86/ops: fix signed integer related UB in
 normalize_clear()

Signed-off-by: James Almer <jamrial@gmail.com>
---
 libswscale/x86/ops.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libswscale/x86/ops.c b/libswscale/x86/ops.c
index 26f49582ae..97bee93f5b 100644
--- a/libswscale/x86/ops.c
+++ b/libswscale/x86/ops.c
@@ -616,8 +616,8 @@ static void normalize_clear(SwsOp *op)
         if (!op->c.q4[i].den)
             continue;
         switch (ff_sws_pixel_type_size(op->type)) {
-        case 1: c.u32 = 0x1010101 * priv.u8[i]; break;
-        case 2: c.u32 = priv.u16[i] << 16 | priv.u16[i]; break;
+        case 1: c.u32 = 0x1010101U * priv.u8[i]; break;
+        case 2: c.u32 = (uint32_t)priv.u16[i] << 16 | priv.u16[i]; break;
         case 4: c.u32 = priv.u32[i]; break;
         }
 
-- 
2.52.0


From d03c462ca4b70e18b868fd1f5532168d2ee394e8 Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Thu, 20 Nov 2025 23:49:07 -0300
Subject: [PATCH 034/304] swscale/ops_tmpl_int: fix signed integer related UB
 when shifting values

Fixes:
src/libswscale/ops_tmpl_int.c:292:23: runtime error: left shift of 188 by 24 places cannot be represented in type 'int'
src/libswscale/ops_tmpl_int.c:290:23: runtime error: left shift of 158 by 24 places cannot be represented in type 'int'
src/libswscale/ops_tmpl_int.c:293:23: runtime error: left shift of 136 by 24 places cannot be represented in type 'int'
src/libswscale/ops_tmpl_int.c:291:23: runtime error: left shift of 160 by 24 places cannot be represented in type 'int'

Signed-off-by: James Almer <jamrial@gmail.com>
---
 libswscale/ops_tmpl_int.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libswscale/ops_tmpl_int.c b/libswscale/ops_tmpl_int.c
index 857031ada9..da24c1985f 100644
--- a/libswscale/ops_tmpl_int.c
+++ b/libswscale/ops_tmpl_int.c
@@ -287,10 +287,10 @@ DECL_PATTERN(expand32)
 
     SWS_LOOP
     for (int i = 0; i < SWS_BLOCK_SIZE; i++) {
-        x32[i] = x[i] << 24 | x[i] << 16 | x[i] << 8 | x[i];
-        y32[i] = y[i] << 24 | y[i] << 16 | y[i] << 8 | y[i];
-        z32[i] = z[i] << 24 | z[i] << 16 | z[i] << 8 | z[i];
-        w32[i] = w[i] << 24 | w[i] << 16 | w[i] << 8 | w[i];
+        x32[i] = (uint32_t)x[i] << 24 | x[i] << 16 | x[i] << 8 | x[i];
+        y32[i] = (uint32_t)y[i] << 24 | y[i] << 16 | y[i] << 8 | y[i];
+        z32[i] = (uint32_t)z[i] << 24 | z[i] << 16 | z[i] << 8 | z[i];
+        w32[i] = (uint32_t)w[i] << 24 | w[i] << 16 | w[i] << 8 | w[i];
     }
 
     CONTINUE(u32block_t, x32, y32, z32, w32);
-- 
2.52.0


From ae783b9a3c43a76aa0b9b2a011c7c444c3889f82 Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Thu, 20 Nov 2025 23:50:44 -0300
Subject: [PATCH 035/304] tests/checkasm/sw_ops: fix signed integer related UB
 when shifting values

Fixes:
src/tests/checkasm/sw_ops.c:441:34: runtime error: shift exponent 32 is too large for 32-bit type 'int'
src/tests/checkasm/sw_ops.c:591:37: runtime error: shift exponent 32 is too large for 32-bit type 'int'

Signed-off-by: James Almer <jamrial@gmail.com>
---
 tests/checkasm/sw_ops.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/checkasm/sw_ops.c b/tests/checkasm/sw_ops.c
index 20b697bf25..aea25bbbbc 100644
--- a/tests/checkasm/sw_ops.c
+++ b/tests/checkasm/sw_ops.c
@@ -438,7 +438,7 @@ static AVRational rndq(SwsPixelType t)
 {
     const unsigned num = rnd();
     if (ff_sws_pixel_type_is_int(t)) {
-        const unsigned mask = (1 << (ff_sws_pixel_type_size(t) * 8)) - 1;
+        const unsigned mask = UINT_MAX >> (32 - ff_sws_pixel_type_size(t) * 8);
         return (AVRational) { num & mask, 1 };
     } else {
         const unsigned den = rnd();
@@ -588,7 +588,7 @@ static void check_convert(void)
                     .convert.to = o,
                 });
             } else if (isize > osize || !ff_sws_pixel_type_is_int(i)) {
-                uint32_t range = (1 << osize * 8) - 1;
+                uint32_t range = UINT32_MAX >> (32 - osize * 8);
                 CHECK_COMMON_RANGE(name, range, i, o, {
                     .op = SWS_OP_CONVERT,
                     .type = i,
-- 
2.52.0


From 62ad83bb5fafffed29af382b11ef5a9437494fd4 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Fri, 21 Nov 2025 15:52:46 +0100
Subject: [PATCH 036/304] avformat/oggenc: Schedule pagesize option for removal

Deprecated in 59220d559b5077c15fa6434e42df95f3b92f0199.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavformat/oggenc.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/libavformat/oggenc.c b/libavformat/oggenc.c
index e1bb7dd972..9a548a8d29 100644
--- a/libavformat/oggenc.c
+++ b/libavformat/oggenc.c
@@ -76,7 +76,9 @@ typedef struct OGGPageList {
 typedef struct OGGContext {
     const AVClass *class;
     OGGPageList *page_list;
+#if LIBAVFORMAT_VERSION_MAJOR < 63
     int pref_size; ///< preferred page size (0 => fill all segments)
+#endif
     int64_t pref_duration;      ///< preferred page duration (0 => fill all segments)
     int serial_offset;
 } OGGContext;
@@ -87,10 +89,12 @@ typedef struct OGGContext {
 static const AVOption options[] = {
     { "serial_offset", "serial number offset",
         OFFSET(serial_offset), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INT_MAX, PARAM },
+#if LIBAVFORMAT_VERSION_MAJOR < 63
     { "oggpagesize", "Set preferred Ogg page size.",
-      OFFSET(pref_size), AV_OPT_TYPE_INT, {.i64 = 0}, 0, MAX_PAGE_SIZE, PARAM},
-    { "pagesize", "preferred page size in bytes (deprecated)",
-        OFFSET(pref_size), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, MAX_PAGE_SIZE, PARAM },
+      OFFSET(pref_size), AV_OPT_TYPE_INT, {.i64 = 0}, 0, MAX_PAGE_SIZE, PARAM | AV_OPT_FLAG_DEPRECATED },
+    { "pagesize", "preferred page size in bytes",
+        OFFSET(pref_size), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, MAX_PAGE_SIZE, PARAM | AV_OPT_FLAG_DEPRECATED },
+#endif
     { "page_duration", "preferred page duration, in microseconds",
         OFFSET(pref_duration), AV_OPT_TYPE_INT64, { .i64 = 1000000 }, 0, INT64_MAX, PARAM },
     { NULL },
@@ -262,8 +266,12 @@ static int ogg_buffer_data(AVFormatContext *s, AVStream *st,
             if (page->segments_count == 255) {
                 ogg_buffer_page(s, oggstream);
             } else if (!header) {
+#if LIBAVFORMAT_VERSION_MAJOR < 63
                 if ((ogg->pref_size     > 0 && page->size   >= ogg->pref_size) ||
                     (ogg->pref_duration > 0 && next - start >= ogg->pref_duration)) {
+#else
+                if (ogg->pref_duration > 0 && next - start >= ogg->pref_duration) {
+#endif
                     ogg_buffer_page(s, oggstream);
                 }
             }
@@ -477,9 +485,6 @@ static int ogg_init(AVFormatContext *s)
     OGGStreamContext *oggstream = NULL;
     int i, j;
 
-    if (ogg->pref_size)
-        av_log(s, AV_LOG_WARNING, "The pagesize option is deprecated\n");
-
     for (i = 0; i < s->nb_streams; i++) {
         AVStream *st = s->streams[i];
         unsigned serial_num = i + ogg->serial_offset;
-- 
2.52.0


From 74cfc6e60266783320a812b6e647244d1981322c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Martin=20Storsj=C3=B6?= <martin@martin.st>
Date: Wed, 19 Nov 2025 10:43:26 +0200
Subject: [PATCH 037/304] swscale: Remove the unused ff_sws_pixel_type_to_uint

This function uses ff_sws_pixel_type_size to switch on the
size of the provided type. However, ff_sws_pixel_type_size returns
a size in bytes (from sizeof()), not a size in bits. Therefore,
this would previously never return the right thing but always
hit the av_unreachable() below.

As the function is entirely unused, just remove it.

This fixes compilation with MSVC 2026 18.0 when targeting ARM64,
which previously hit an internal compiler error [1].

[1] https://developercommunity.visualstudio.com/t/Internal-Compiler-Error-targeting-ARM64-/10962922
---
 libswscale/ops.c | 15 ---------------
 libswscale/ops.h |  1 -
 2 files changed, 16 deletions(-)

diff --git a/libswscale/ops.c b/libswscale/ops.c
index 21aeb16931..1c408d7482 100644
--- a/libswscale/ops.c
+++ b/libswscale/ops.c
@@ -93,21 +93,6 @@ bool ff_sws_pixel_type_is_int(SwsPixelType type)
     return false;
 }
 
-SwsPixelType ff_sws_pixel_type_to_uint(SwsPixelType type)
-{
-    if (!type)
-        return type;
-
-    switch (ff_sws_pixel_type_size(type)) {
-    case 8:  return SWS_PIXEL_U8;
-    case 16: return SWS_PIXEL_U16;
-    case 32: return SWS_PIXEL_U32;
-    }
-
-    av_unreachable("Invalid pixel type!");
-    return SWS_PIXEL_NONE;
-}
-
 /* biased towards `a` */
 static AVRational av_min_q(AVRational a, AVRational b)
 {
diff --git a/libswscale/ops.h b/libswscale/ops.h
index 8f26ee832e..dccc00d2f0 100644
--- a/libswscale/ops.h
+++ b/libswscale/ops.h
@@ -39,7 +39,6 @@ typedef enum SwsPixelType {
 const char *ff_sws_pixel_type_name(SwsPixelType type);
 int ff_sws_pixel_type_size(SwsPixelType type) av_const;
 bool ff_sws_pixel_type_is_int(SwsPixelType type) av_const;
-SwsPixelType ff_sws_pixel_type_to_uint(SwsPixelType type) av_const;
 
 typedef enum SwsOpType {
     SWS_OP_INVALID = 0,
-- 
2.52.0


From a50316796887cc0fef90a657e45050789ed088e4 Mon Sep 17 00:00:00 2001
From: Anders Rein <anders@onemimir.com>
Date: Mon, 17 Nov 2025 23:52:49 +0100
Subject: [PATCH 038/304] avfilter/f_select: Added activate for aselect

During migration to the activation filter API the aselect filter was
accidentally turned into a no-op filter.
---
 libavfilter/f_select.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libavfilter/f_select.c b/libavfilter/f_select.c
index 46208a7b76..3781e50814 100644
--- a/libavfilter/f_select.c
+++ b/libavfilter/f_select.c
@@ -504,6 +504,7 @@ const FFFilter ff_af_aselect = {
     .p.flags       = AVFILTER_FLAG_DYNAMIC_OUTPUTS,
     .init        = aselect_init,
     .uninit      = uninit,
+    .activate      = activate,
     .priv_size   = sizeof(SelectContext),
     FILTER_INPUTS(avfilter_af_aselect_inputs),
 };
-- 
2.52.0


From 018e64669a97c4f595f373efd235de48e178ce0b Mon Sep 17 00:00:00 2001
From: Anders Rein <anders@onemimir.com>
Date: Tue, 18 Nov 2025 11:22:18 +0100
Subject: [PATCH 039/304] fate/filter-audio: Added test for aselect

---
 tests/fate/filter-audio.mak   |  5 +++++
 tests/ref/fate/filter-aselect | 16 ++++++++++++++++
 2 files changed, 21 insertions(+)
 create mode 100644 tests/ref/fate/filter-aselect

diff --git a/tests/fate/filter-audio.mak b/tests/fate/filter-audio.mak
index eee0209c59..b244f82bb7 100644
--- a/tests/fate/filter-audio.mak
+++ b/tests/fate/filter-audio.mak
@@ -259,6 +259,11 @@ fate-filter-aresample: CMD = pcm -analyzeduration 10000000 -i $(SRC) -af aresamp
 fate-filter-aresample: CMP = oneoff
 fate-filter-aresample: REF = $(SAMPLES)/nellymoser/nellymoser-discont.pcm
 
+FATE_AFILTER-$(call FILTERDEMDECENCMUX, ASELECT, WAV, PCM_S16LE, PCM_S16LE, WAV) += fate-filter-aselect
+fate-filter-aselect: tests/data/asynth-44100-2.wav
+fate-filter-aselect: SRC = $(TARGET_PATH)/tests/data/asynth-44100-2.wav
+fate-filter-aselect: CMD = framecrc -i $(SRC) -af "aselect=gte(t\,1)*lt(t\,2)"
+
 FATE_ATRIM += fate-filter-atrim-duration
 fate-filter-atrim-duration: CMD = framecrc -i $(SRC) -af atrim=start=0.1:duration=0.01
 FATE_ATRIM += fate-filter-atrim-mixed
diff --git a/tests/ref/fate/filter-aselect b/tests/ref/fate/filter-aselect
new file mode 100644
index 0000000000..7d2192e227
--- /dev/null
+++ b/tests/ref/fate/filter-aselect
@@ -0,0 +1,16 @@
+#tb 0: 1/44100
+#media_type 0: audio
+#codec_id 0: pcm_s16le
+#sample_rate 0: 44100
+#channel_layout_name 0: stereo
+0,      45056,      45056,     4096,    16384, 0xe92bd835
+0,      49152,      49152,     4096,    16384, 0x1126dca3
+0,      53248,      53248,     4096,    16384, 0x9647edcf
+0,      57344,      57344,     4096,    16384, 0x5cc345aa
+0,      61440,      61440,     4096,    16384, 0x19d7bd51
+0,      65536,      65536,     4096,    16384, 0x19eccef7
+0,      69632,      69632,     4096,    16384, 0x4b68eeed
+0,      73728,      73728,     4096,    16384, 0x0b3d1bfc
+0,      77824,      77824,     4096,    16384, 0xe9b2e069
+0,      81920,      81920,     4096,    16384, 0xcaa5590e
+0,      86016,      86016,     4096,    16384, 0x47d0b227
-- 
2.52.0


From 3aef84ee849197527c83286361800d4a2600d704 Mon Sep 17 00:00:00 2001
From: Romain Beauxis <romain.beauxis@gmail.com>
Date: Sat, 8 Nov 2025 10:32:32 -0600
Subject: [PATCH 040/304] ffplay: print new metadata

---
 fftools/cmdutils.c | 25 +++++++++++++++++++++++++
 fftools/cmdutils.h |  8 ++++++++
 fftools/ffplay.c   | 21 ++++++++++++++++++++-
 3 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/fftools/cmdutils.c b/fftools/cmdutils.c
index 35f786dba5..e906d4506d 100644
--- a/fftools/cmdutils.c
+++ b/fftools/cmdutils.c
@@ -1613,3 +1613,28 @@ int check_avoptions(AVDictionary *m)
 
     return 0;
 }
+
+void dump_dictionary(void *ctx, const AVDictionary *m,
+                     const char *name, const char *indent,
+                     int log_level)
+{
+    const AVDictionaryEntry *tag = NULL;
+
+    if (!m)
+        return;
+
+    av_log(ctx, log_level, "%s%s:\n", indent, name);
+    while ((tag = av_dict_iterate(m, tag))) {
+        const char *p = tag->value;
+        av_log(ctx, log_level, "%s  %-16s: ", indent, tag->key);
+        while (*p) {
+            size_t len = strcspn(p, "\x8\xa\xb\xc\xd");
+            av_log(ctx, log_level, "%.*s", (int)(FFMIN(255, len)), p);
+            p += len;
+            if (*p == 0xd) av_log(ctx, log_level, " ");
+            if (*p == 0xa) av_log(ctx, log_level, "\n%s  %-16s: ", indent, "");
+            if (*p) p++;
+        }
+        av_log(ctx, log_level, "\n");
+    }
+}
diff --git a/fftools/cmdutils.h b/fftools/cmdutils.h
index 93e05c7130..85b468f2af 100644
--- a/fftools/cmdutils.h
+++ b/fftools/cmdutils.h
@@ -549,4 +549,12 @@ int check_avoptions(AVDictionary *m);
 
 int cmdutils_isalnum(char c);
 
+/**
+ * This does the same as libavformat/dump.c corresponding function
+ * and should probably be kept in sync when the other one changes.
+ */
+void dump_dictionary(void *ctx, const AVDictionary *m,
+                     const char *name, const char *indent,
+                     int log_level);
+
 #endif /* FFTOOLS_CMDUTILS_H */
diff --git a/fftools/ffplay.c b/fftools/ffplay.c
index dc2627521e..dcd20e70bc 100644
--- a/fftools/ffplay.c
+++ b/fftools/ffplay.c
@@ -2843,6 +2843,7 @@ static int read_thread(void *arg)
     int st_index[AVMEDIA_TYPE_NB];
     AVPacket *pkt = NULL;
     int64_t stream_start_time;
+    char metadata_description[96];
     int pkt_in_play_range = 0;
     const AVDictionaryEntry *t;
     SDL_mutex *wait_mutex = SDL_CreateMutex();
@@ -2950,8 +2951,10 @@ static int read_thread(void *arg)
 
     is->realtime = is_realtime(ic);
 
-    if (show_status)
+    if (show_status) {
+        fprintf(stderr, "\x1b[2K\r");
         av_dump_format(ic, 0, is->filename, 0);
+    }
 
     for (i = 0; i < ic->nb_streams; i++) {
         AVStream *st = ic->streams[i];
@@ -2960,6 +2963,9 @@ static int read_thread(void *arg)
         if (type >= 0 && wanted_stream_spec[type] && st_index[type] == -1)
             if (avformat_match_stream_specifier(ic, st, wanted_stream_spec[type]) > 0)
                 st_index[type] = i;
+        // Clear all pre-existing metadata update flags to avoid printing
+        // initial metadata as update.
+        st->event_flags &= ~AVSTREAM_EVENT_FLAG_METADATA_UPDATED;
     }
     for (i = 0; i < AVMEDIA_TYPE_NB; i++) {
         if (wanted_stream_spec[i] && st_index[i] == -1) {
@@ -3128,6 +3134,19 @@ static int read_thread(void *arg)
         } else {
             is->eof = 0;
         }
+
+        if (show_status && ic->streams[pkt->stream_index]->event_flags &
+            AVSTREAM_EVENT_FLAG_METADATA_UPDATED) {
+            fprintf(stderr, "\x1b[2K\r");
+            snprintf(metadata_description,
+                     sizeof(metadata_description),
+                     "\r  New metadata for stream %d",
+                     pkt->stream_index);
+            dump_dictionary(NULL, ic->streams[pkt->stream_index]->metadata,
+                               metadata_description, "    ", AV_LOG_INFO);
+        }
+        ic->streams[pkt->stream_index]->event_flags &= ~AVSTREAM_EVENT_FLAG_METADATA_UPDATED;
+
         /* check if packet is in play range specified by user, then queue, otherwise discard */
         stream_start_time = ic->streams[pkt->stream_index]->start_time;
         pkt_ts = pkt->pts == AV_NOPTS_VALUE ? pkt->dts : pkt->pts;
-- 
2.52.0


From 50d94db698c226edc0e35cbf251ebc3299cc9cf7 Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Sat, 22 Nov 2025 12:02:27 -0300
Subject: [PATCH 041/304] configure: move libtls out of non-free libraries list

LibreSSL uses a permisive license, and the OpenSSL code has the same license as
OpenSSL < 3.

Signed-off-by: James Almer <jamrial@gmail.com>
---
 configure | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index e2caf3b24c..868226ac11 100755
--- a/configure
+++ b/configure
@@ -1927,7 +1927,6 @@ EXTERNAL_LIBRARY_NONFREE_LIST="
     decklink
     libfdk_aac
     libmpeghdec
-    libtls
 "
 
 EXTERNAL_LIBRARY_VERSION3_LIST="
@@ -2014,6 +2013,7 @@ EXTERNAL_LIBRARY_LIST="
     libtensorflow
     libtesseract
     libtheora
+    libtls
     libtorch
     libtwolame
     libuavs3d
@@ -7244,7 +7244,8 @@ enabled libsvtav1         && require_pkg_config libsvtav1 "SvtAv1Enc >= 0.9.0" E
 enabled libtensorflow     && require libtensorflow tensorflow/c/c_api.h TF_Version -ltensorflow
 enabled libtesseract      && require_pkg_config libtesseract tesseract tesseract/capi.h TessBaseAPICreate
 enabled libtheora         && require libtheora theora/theoraenc.h th_info_init -ltheoraenc -ltheoradec -logg
-enabled libtls            && require_pkg_config libtls libtls tls.h tls_configure
+enabled libtls            && require_pkg_config libtls libtls tls.h tls_configure &&
+                             { enabled gpl && ! enabled nonfree && die "ERROR: LibreSSL is incompatible with the gpl"; }
 enabled libtorch          && check_cxxflags -std=c++17 && require_cxx libtorch torch/torch.h "torch::Tensor" -ltorch -lc10 -ltorch_cpu -lstdc++ -lpthread
 enabled libtwolame        && require libtwolame twolame.h twolame_init -ltwolame &&
                              { check_lib libtwolame twolame.h twolame_encode_buffer_float32_interleaved -ltwolame ||
-- 
2.52.0


From 1a54240ce2e905a07160072debd8f9f3c8734a38 Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Fri, 21 Nov 2025 16:58:13 -0300
Subject: [PATCH 042/304] avformat/mov: don't parse reserved ISOBMFF fields as
 if they were QT

Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavformat/mov.c                                | 10 ++++++++++
 tests/ref/fate/h264-bsf-redundant-pps-side-data  |  4 ++--
 tests/ref/fate/h264-bsf-redundant-pps-side-data2 |  4 ++--
 tests/ref/fate/matroska-alac-remux               |  4 ++--
 tests/ref/fate/matroska-dovi-write-config7       |  4 ++--
 tests/ref/fate/mov-mp4-iamf-5_1_4                |  2 --
 tests/ref/fate/mov-mp4-iamf-7_1_4-video-first    |  2 --
 tests/ref/fate/mov-mp4-iamf-7_1_4-video-last     |  2 --
 tests/ref/fate/mov-mp4-iamf-ambisonic_1          |  2 --
 tests/ref/fate/mov-mp4-iamf-stereo               |  2 --
 10 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/libavformat/mov.c b/libavformat/mov.c
index 721ffdcca0..12617e0eba 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -2678,12 +2678,18 @@ static void mov_parse_stsd_video(MOVContext *c, AVIOContext *pb,
      * read in ff_mov_read_stsd_entries() */
     stsd_start = avio_tell(pb) - 16;
 
+    if (c->isom) {
+        avio_skip(pb, 2);  /* pre_defined */
+        avio_skip(pb, 2);  /* reserved */
+        avio_skip(pb, 12); /* pre_defined */
+    } else {
     avio_rb16(pb); /* version */
     avio_rb16(pb); /* revision level */
     id = avio_rl32(pb); /* vendor */
     av_dict_set(&st->metadata, "vendor_id", av_fourcc2str(id), 0);
     avio_rb32(pb); /* temporal quality */
     avio_rb32(pb); /* spatial quality */
+    }
 
     st->codecpar->width  = avio_rb16(pb); /* width */
     st->codecpar->height = avio_rb16(pb); /* height */
@@ -2733,9 +2739,13 @@ static void mov_parse_stsd_audio(MOVContext *c, AVIOContext *pb,
     AVDictionaryEntry *compatible_brands = av_dict_get(c->fc->metadata, "compatible_brands", NULL, AV_DICT_MATCH_CASE);
     int channel_count;
 
+    if (c->isom)
+        avio_skip(pb, 6); /* reserved */
+    else {
     avio_rb16(pb); /* revision level */
     id = avio_rl32(pb); /* vendor */
     av_dict_set(&st->metadata, "vendor_id", av_fourcc2str(id), 0);
+    }
 
     channel_count = avio_rb16(pb);
 
diff --git a/tests/ref/fate/h264-bsf-redundant-pps-side-data b/tests/ref/fate/h264-bsf-redundant-pps-side-data
index cecbe1acf9..da27c3fe59 100644
--- a/tests/ref/fate/h264-bsf-redundant-pps-side-data
+++ b/tests/ref/fate/h264-bsf-redundant-pps-side-data
@@ -1,5 +1,5 @@
-92fe70291f72acf94ba56b426bbaccb0 *tests/data/fate/h264-bsf-redundant-pps-side-data.nut
-596100 tests/data/fate/h264-bsf-redundant-pps-side-data.nut
+18c64ec6b4f0cc39c5f8f3564b372fef *tests/data/fate/h264-bsf-redundant-pps-side-data.nut
+596084 tests/data/fate/h264-bsf-redundant-pps-side-data.nut
 #extradata 0:       34, 0x850408e3
 #tb 0: 1/48000
 #media_type 0: video
diff --git a/tests/ref/fate/h264-bsf-redundant-pps-side-data2 b/tests/ref/fate/h264-bsf-redundant-pps-side-data2
index 2a483144e7..99918b80e4 100644
--- a/tests/ref/fate/h264-bsf-redundant-pps-side-data2
+++ b/tests/ref/fate/h264-bsf-redundant-pps-side-data2
@@ -1,5 +1,5 @@
-dd953f8d95d2927703ce9593a07fe2e7 *tests/data/fate/h264-bsf-redundant-pps-side-data2.nut
-5162 tests/data/fate/h264-bsf-redundant-pps-side-data2.nut
+f94ed2c25b6bbe63160743f08de33665 *tests/data/fate/h264-bsf-redundant-pps-side-data2.nut
+5138 tests/data/fate/h264-bsf-redundant-pps-side-data2.nut
 #tb 0: 1/25
 #media_type 0: video
 #codec_id 0: rawvideo
diff --git a/tests/ref/fate/matroska-alac-remux b/tests/ref/fate/matroska-alac-remux
index 9b73263acd..1e7a5b4c8e 100644
--- a/tests/ref/fate/matroska-alac-remux
+++ b/tests/ref/fate/matroska-alac-remux
@@ -1,5 +1,5 @@
-90c54a00ad8662c3eb93150791fa8328 *tests/data/fate/matroska-alac-remux.matroska
-1293824 tests/data/fate/matroska-alac-remux.matroska
+6075eb35f53692596194f3b0175cb184 *tests/data/fate/matroska-alac-remux.matroska
+1293794 tests/data/fate/matroska-alac-remux.matroska
 #extradata 0:       36, 0x562b05d8
 #tb 0: 1/1000
 #media_type 0: audio
diff --git a/tests/ref/fate/matroska-dovi-write-config7 b/tests/ref/fate/matroska-dovi-write-config7
index 5f3e000279..edb6757c68 100644
--- a/tests/ref/fate/matroska-dovi-write-config7
+++ b/tests/ref/fate/matroska-dovi-write-config7
@@ -1,5 +1,5 @@
-7adef53df9e14358e0b99f8a829e2d97 *tests/data/fate/matroska-dovi-write-config7.matroska
-72700 tests/data/fate/matroska-dovi-write-config7.matroska
+adafbb4d021db027f4ae4ef7ca1c56c2 *tests/data/fate/matroska-dovi-write-config7.matroska
+72640 tests/data/fate/matroska-dovi-write-config7.matroska
 #extradata 0:      116, 0x2b8d1669
 #extradata 1:      116, 0x2b8d1669
 #tb 0: 1/1000
diff --git a/tests/ref/fate/mov-mp4-iamf-5_1_4 b/tests/ref/fate/mov-mp4-iamf-5_1_4
index 18a1f5337f..9eaa5ee42d 100644
--- a/tests/ref/fate/mov-mp4-iamf-5_1_4
+++ b/tests/ref/fate/mov-mp4-iamf-5_1_4
@@ -160,7 +160,6 @@ DISPOSITION:still_image=0
 DISPOSITION:multilayer=0
 TAG:language=und
 TAG:handler_name=SoundHandler
-TAG:vendor_id=[0][0][0][0]
 [STREAM]
 index=0
 id=0x0
@@ -395,7 +394,6 @@ DISPOSITION:still_image=0
 DISPOSITION:multilayer=0
 TAG:language=und
 TAG:handler_name=SoundHandler
-TAG:vendor_id=[0][0][0][0]
 [STREAM]
 index=0
 id=0x0
diff --git a/tests/ref/fate/mov-mp4-iamf-7_1_4-video-first b/tests/ref/fate/mov-mp4-iamf-7_1_4-video-first
index d5a1fe1cad..55cadb3d02 100644
--- a/tests/ref/fate/mov-mp4-iamf-7_1_4-video-first
+++ b/tests/ref/fate/mov-mp4-iamf-7_1_4-video-first
@@ -207,7 +207,6 @@ DISPOSITION:still_image=0
 DISPOSITION:multilayer=0
 TAG:language=und
 TAG:handler_name=SoundHandler
-TAG:vendor_id=[0][0][0][0]
 [STREAM]
 index=1
 id=0x2
@@ -465,7 +464,6 @@ DISPOSITION:still_image=0
 DISPOSITION:multilayer=0
 TAG:language=und
 TAG:handler_name=SoundHandler
-TAG:vendor_id=[0][0][0][0]
 [STREAM]
 index=1
 id=0x2
diff --git a/tests/ref/fate/mov-mp4-iamf-7_1_4-video-last b/tests/ref/fate/mov-mp4-iamf-7_1_4-video-last
index caf89d41f6..80c924c821 100644
--- a/tests/ref/fate/mov-mp4-iamf-7_1_4-video-last
+++ b/tests/ref/fate/mov-mp4-iamf-7_1_4-video-last
@@ -207,7 +207,6 @@ DISPOSITION:still_image=0
 DISPOSITION:multilayer=0
 TAG:language=und
 TAG:handler_name=SoundHandler
-TAG:vendor_id=[0][0][0][0]
 [STREAM]
 index=0
 id=0x9
@@ -465,7 +464,6 @@ DISPOSITION:still_image=0
 DISPOSITION:multilayer=0
 TAG:language=und
 TAG:handler_name=SoundHandler
-TAG:vendor_id=[0][0][0][0]
 [STREAM]
 index=0
 id=0x9
diff --git a/tests/ref/fate/mov-mp4-iamf-ambisonic_1 b/tests/ref/fate/mov-mp4-iamf-ambisonic_1
index d0877f73c7..b6f14099c3 100644
--- a/tests/ref/fate/mov-mp4-iamf-ambisonic_1
+++ b/tests/ref/fate/mov-mp4-iamf-ambisonic_1
@@ -99,7 +99,6 @@ DISPOSITION:still_image=0
 DISPOSITION:multilayer=0
 TAG:language=und
 TAG:handler_name=SoundHandler
-TAG:vendor_id=[0][0][0][0]
 [STREAM]
 index=0
 id=0x0
@@ -264,7 +263,6 @@ DISPOSITION:still_image=0
 DISPOSITION:multilayer=0
 TAG:language=und
 TAG:handler_name=SoundHandler
-TAG:vendor_id=[0][0][0][0]
 [STREAM]
 index=0
 id=0x0
diff --git a/tests/ref/fate/mov-mp4-iamf-stereo b/tests/ref/fate/mov-mp4-iamf-stereo
index ca8f6a76f5..85e6e3efbb 100644
--- a/tests/ref/fate/mov-mp4-iamf-stereo
+++ b/tests/ref/fate/mov-mp4-iamf-stereo
@@ -52,7 +52,6 @@ DISPOSITION:still_image=0
 DISPOSITION:multilayer=0
 TAG:language=und
 TAG:handler_name=SoundHandler
-TAG:vendor_id=[0][0][0][0]
 [STREAM]
 index=0
 id=0x0
@@ -187,7 +186,6 @@ DISPOSITION:still_image=0
 DISPOSITION:multilayer=0
 TAG:language=und
 TAG:handler_name=SoundHandler
-TAG:vendor_id=[0][0][0][0]
 [STREAM]
 index=0
 id=0x0
-- 
2.52.0


From 31a1fd00190587120e0fae21902143f7e9a0bfee Mon Sep 17 00:00:00 2001
From: Zhao Zhili <zhilizhao@tencent.com>
Date: Fri, 21 Nov 2025 20:43:56 +0800
Subject: [PATCH 043/304] avformat/mov: fix incorrect sample rate by parse srat
 box

---
 libavformat/mov.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/libavformat/mov.c b/libavformat/mov.c
index 12617e0eba..11e600d861 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -1075,6 +1075,40 @@ fail:
 }
 #endif
 
+static int mov_read_srat(MOVContext *c, AVIOContext *pb, MOVAtom atom)
+{
+    AVStream *st;
+    int32_t sample_rate;
+
+    if (atom.size < 8 || c->fc->nb_streams < 1)
+        return 0;
+
+    st = c->fc->streams[c->fc->nb_streams-1];
+    if (st->codecpar->codec_type != AVMEDIA_TYPE_AUDIO) {
+        av_log(c->fc, AV_LOG_WARNING, "'srat' within non-audio sample entry, skip\n");
+        return 0;
+    }
+
+    if (!c->isom) {
+        av_log(c->fc, AV_LOG_WARNING, "'srat' within non-isom, skip\n");
+        return 0;
+    }
+
+    avio_skip(pb, 4); // version+flags
+    sample_rate = avio_rb32(pb);
+    if (sample_rate > 0) {
+        av_log(c->fc, AV_LOG_DEBUG,
+               "overwrite sample rate from %d to %d by 'srat'\n",
+               st->codecpar->sample_rate, sample_rate);
+        st->codecpar->sample_rate = sample_rate;
+    } else {
+        av_log(c->fc, AV_LOG_WARNING,
+               "ignore invalid sample rate %d in 'srat'\n", sample_rate);
+    }
+
+    return 0;
+}
+
 static int mov_read_dec3(MOVContext *c, AVIOContext *pb, MOVAtom atom)
 {
     AVStream *st;
@@ -9490,6 +9524,7 @@ static const MOVParseTableEntry mov_default_parse_table[] = {
 #if CONFIG_IAMFDEC
 { MKTAG('i','a','c','b'), mov_read_iacb },
 #endif
+{ MKTAG('s','r','a','t'), mov_read_srat },
 { 0, NULL }
 };
 
-- 
2.52.0


From bb551203f841794acc00a1187e8babb100d8b87e Mon Sep 17 00:00:00 2001
From: Dmitrii Okunev <xaionaro@dx.center>
Date: Sat, 22 Nov 2025 19:57:19 +0000
Subject: [PATCH 044/304] fftools: Fix MediaCodec on Android15+

On Android15+ MediaCodec HAL backend was switched from HIDL to AIDL.
As a result, MediaCodec operations started to hang, see:

    https://trac.ffmpeg.org/ticket/11363
    https://github.com/termux/termux-packages/issues/21264
    https://issuetracker.google.com/issues/382831999

To fix that it is necessary to initialize binder thread pool.

Signed-off-by: Dmitrii Okunev <xaionaro@dx.center>
---
 compat/android/binder.c | 114 ++++++++++++++++++++++++++++++++++++++++
 compat/android/binder.h |  31 +++++++++++
 configure               |   3 +-
 fftools/Makefile        |   1 +
 fftools/ffmpeg.c        |   7 +++
 5 files changed, 155 insertions(+), 1 deletion(-)
 create mode 100644 compat/android/binder.c
 create mode 100644 compat/android/binder.h

diff --git a/compat/android/binder.c b/compat/android/binder.c
new file mode 100644
index 0000000000..a214d977cc
--- /dev/null
+++ b/compat/android/binder.c
@@ -0,0 +1,114 @@
+/*
+ * Android Binder handler
+ *
+ * Copyright (c) 2025 Dmitrii Okunev
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+
+#if defined(__ANDROID__)
+
+#include <dlfcn.h>
+#include <stdint.h>
+#include <stdlib.h>
+
+#include "libavutil/log.h"
+#include "binder.h"
+
+#define THREAD_POOL_SIZE 1
+
+static void *dlopen_libbinder_ndk(void)
+{
+    /*
+     * libbinder_ndk.so often does not contain the functions we need, so making
+     * this dependency optional, thus using dlopen/dlsym instead of linking.
+     *
+     * See also: https://source.android.com/docs/core/architecture/aidl/aidl-backends
+     */
+
+    void *h = dlopen("libbinder_ndk.so", RTLD_NOW | RTLD_LOCAL);
+    if (h != NULL)
+        return h;
+
+    av_log(NULL, AV_LOG_WARNING,
+           "android/binder: unable to load libbinder_ndk.so: '%s'; skipping binder threadpool init (MediaCodec likely won't work)\n",
+           dlerror());
+    return NULL;
+}
+
+static void android_binder_threadpool_init(void)
+{
+    typedef int (*set_thread_pool_max_fn)(uint32_t);
+    typedef void (*start_thread_pool_fn)(void);
+
+    set_thread_pool_max_fn set_thread_pool_max = NULL;
+    start_thread_pool_fn start_thread_pool = NULL;
+
+    void *h = dlopen_libbinder_ndk();
+    if (h == NULL)
+        return;
+
+    unsigned thead_pool_size = THREAD_POOL_SIZE;
+
+    set_thread_pool_max =
+        (set_thread_pool_max_fn) dlsym(h,
+                                       "ABinderProcess_setThreadPoolMaxThreadCount");
+    start_thread_pool =
+        (start_thread_pool_fn) dlsym(h, "ABinderProcess_startThreadPool");
+
+    if (start_thread_pool == NULL) {
+        av_log(NULL, AV_LOG_WARNING,
+               "android/binder: ABinderProcess_startThreadPool not found; skipping threadpool init (MediaCodec likely won't work)\n");
+        return;
+    }
+
+    if (set_thread_pool_max != NULL) {
+        int ok = set_thread_pool_max(thead_pool_size);
+        av_log(NULL, AV_LOG_DEBUG,
+               "android/binder: ABinderProcess_setThreadPoolMaxThreadCount(%u) => %s\n",
+               thead_pool_size, ok ? "ok" : "fail");
+    } else {
+        av_log(NULL, AV_LOG_DEBUG,
+               "android/binder: ABinderProcess_setThreadPoolMaxThreadCount is unavailable; using the library default\n");
+    }
+
+    start_thread_pool();
+    av_log(NULL, AV_LOG_DEBUG,
+           "android/binder: ABinderProcess_startThreadPool() called\n");
+}
+
+void android_binder_threadpool_init_if_required(void)
+{
+#if __ANDROID_API__ >= 24
+    if (android_get_device_api_level() < 35) {
+        // the issue with the thread pool was introduced in Android 15 (API 35)
+        av_log(NULL, AV_LOG_DEBUG,
+               "android/binder: API<35, thus no need to initialize a thread pool\n");
+        return;
+    }
+    android_binder_threadpool_init();
+#else
+    // android_get_device_api_level was introduced in API 24, so we cannot use it
+    // to detect the API level in API<24. For simplicity we just assume
+    // libbinder_ndk.so on the system running this code would have API level < 35;
+    av_log(NULL, AV_LOG_DEBUG,
+           "android/binder: is built with API<24, assuming this is not Android 15+\n");
+#endif
+}
+
+#endif                          /* __ANDROID__ */
diff --git a/compat/android/binder.h b/compat/android/binder.h
new file mode 100644
index 0000000000..2b1ca53fe8
--- /dev/null
+++ b/compat/android/binder.h
@@ -0,0 +1,31 @@
+/*
+ * Android Binder handler
+ *
+ * Copyright (c) 2025 Dmitrii Okunev
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef COMPAT_ANDROID_BINDER_H
+#define COMPAT_ANDROID_BINDER_H
+
+/**
+ * Initialize Android Binder thread pool.
+ */
+void android_binder_threadpool_init_if_required(void);
+
+#endif                          // COMPAT_ANDROID_BINDER_H
diff --git a/configure b/configure
index 868226ac11..0ebfc851e6 100755
--- a/configure
+++ b/configure
@@ -7312,7 +7312,8 @@ enabled mbedtls           && { check_pkg_config mbedtls mbedtls mbedtls/x509_crt
                                check_pkg_config mbedtls mbedtls mbedtls/ssl.h mbedtls_ssl_init ||
                                check_lib mbedtls mbedtls/ssl.h mbedtls_ssl_init -lmbedtls -lmbedx509 -lmbedcrypto ||
                                die "ERROR: mbedTLS not found"; }
-enabled mediacodec        && { enabled jni || die "ERROR: mediacodec requires --enable-jni"; }
+enabled mediacodec        && { enabled jni || die "ERROR: mediacodec requires --enable-jni"; } &&
+                               add_compat android/binder.o
 enabled mmal              && { check_lib mmal interface/mmal/mmal.h mmal_port_connect -lmmal_core -lmmal_util -lmmal_vc_client -lbcm_host ||
                                { ! enabled cross_compile &&
                                  add_cflags -isystem/opt/vc/include/ -isystem/opt/vc/include/interface/vmcs_host/linux -isystem/opt/vc/include/interface/vcos/pthreads -fgnu89-inline &&
diff --git a/fftools/Makefile b/fftools/Makefile
index bdb44fc5ce..01b16fa8f4 100644
--- a/fftools/Makefile
+++ b/fftools/Makefile
@@ -51,6 +51,7 @@ OBJS-ffprobe +=                       \
     fftools/textformat/tw_buffer.o    \
     fftools/textformat/tw_stdout.o    \
 
+OBJS-ffmpeg += $(COMPAT_OBJS:%=compat/%)
 OBJS-ffplay += fftools/ffplay_renderer.o
 
 define DOFFTOOL
diff --git a/fftools/ffmpeg.c b/fftools/ffmpeg.c
index 444d027c15..c2c85d46bd 100644
--- a/fftools/ffmpeg.c
+++ b/fftools/ffmpeg.c
@@ -78,6 +78,9 @@
 #include "libavdevice/avdevice.h"
 
 #include "cmdutils.h"
+#if CONFIG_MEDIACODEC
+#include "compat/android/binder.h"
+#endif
 #include "ffmpeg.h"
 #include "ffmpeg_sched.h"
 #include "ffmpeg_utils.h"
@@ -1019,6 +1022,10 @@ int main(int argc, char **argv)
         goto finish;
     }
 
+#if CONFIG_MEDIACODEC
+    android_binder_threadpool_init_if_required();
+#endif
+
     current_time = ti = get_benchmark_time_stamps();
     ret = transcode(sch);
     if (ret >= 0 && do_benchmark) {
-- 
2.52.0


From e9fdd33975c7b6f188b3b779687e236cd9f42efc Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Sun, 23 Nov 2025 17:01:39 -0300
Subject: [PATCH 045/304] avformat/mov: reindent after the previous change

Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavformat/mov.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/libavformat/mov.c b/libavformat/mov.c
index 11e600d861..ee09478aef 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -2717,12 +2717,12 @@ static void mov_parse_stsd_video(MOVContext *c, AVIOContext *pb,
         avio_skip(pb, 2);  /* reserved */
         avio_skip(pb, 12); /* pre_defined */
     } else {
-    avio_rb16(pb); /* version */
-    avio_rb16(pb); /* revision level */
-    id = avio_rl32(pb); /* vendor */
-    av_dict_set(&st->metadata, "vendor_id", av_fourcc2str(id), 0);
-    avio_rb32(pb); /* temporal quality */
-    avio_rb32(pb); /* spatial quality */
+        avio_rb16(pb); /* version */
+        avio_rb16(pb); /* revision level */
+        id = avio_rl32(pb); /* vendor */
+        av_dict_set(&st->metadata, "vendor_id", av_fourcc2str(id), 0);
+        avio_rb32(pb); /* temporal quality */
+        avio_rb32(pb); /* spatial quality */
     }
 
     st->codecpar->width  = avio_rb16(pb); /* width */
@@ -2776,9 +2776,9 @@ static void mov_parse_stsd_audio(MOVContext *c, AVIOContext *pb,
     if (c->isom)
         avio_skip(pb, 6); /* reserved */
     else {
-    avio_rb16(pb); /* revision level */
-    id = avio_rl32(pb); /* vendor */
-    av_dict_set(&st->metadata, "vendor_id", av_fourcc2str(id), 0);
+        avio_rb16(pb); /* revision level */
+        id = avio_rl32(pb); /* vendor */
+        av_dict_set(&st->metadata, "vendor_id", av_fourcc2str(id), 0);
     }
 
     channel_count = avio_rb16(pb);
-- 
2.52.0


From eea0776aeb493370f1e5b36ff1fd9ffc353bdf38 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 27 Oct 2025 04:33:37 +0100
Subject: [PATCH 046/304] fate: add skip_clean option
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This is useful if one wants to inspect build artifacts after running
fate.sh script.

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 doc/fate_config.sh.template | 1 +
 tests/fate.sh               | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/doc/fate_config.sh.template b/doc/fate_config.sh.template
index 4ff8629d44..36b554b948 100644
--- a/doc/fate_config.sh.template
+++ b/doc/fate_config.sh.template
@@ -6,6 +6,7 @@ workdir=                                 # directory in which to do all the work
 #fate_recv="ssh -T fate@fate.ffmpeg.org" # command to submit report
 comment=                                 # optional description
 build_only=     # set to "yes" for a compile-only instance that skips tests
+skip_clean=     # set to "yes" to preserve build/install directories
 ignore_tests=
 
 # the following are optional and map to configure options
diff --git a/tests/fate.sh b/tests/fate.sh
index 2d6313820f..a3195ccdf5 100755
--- a/tests/fate.sh
+++ b/tests/fate.sh
@@ -95,7 +95,7 @@ fate()(
 )
 
 clean(){
-    rm -rf ${build} ${inst}
+    test "$skip_clean" = "yes" || rm -rf ${build} ${inst}
 }
 
 report(){
-- 
2.52.0


From 2a6efa83da9ddf2b96967441fcb6a77612bb6845 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Tue, 18 Nov 2025 15:42:55 +0100
Subject: [PATCH 047/304] fate: add missing options in config template
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fixes: f01c77157789b8e3a59ed2c9646faf8299e41641
Fixes: 523d688c2b7d5bb535bc203a2c3705d199ddf13d
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 doc/fate_config.sh.template | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/doc/fate_config.sh.template b/doc/fate_config.sh.template
index 36b554b948..73031f54e2 100644
--- a/doc/fate_config.sh.template
+++ b/doc/fate_config.sh.template
@@ -12,16 +12,21 @@ ignore_tests=
 # the following are optional and map to configure options
 arch=
 cpu=
+toolchain=
 cross_prefix=
 as=
 cc=
+cxx=
 ld=
+nm=
 target_os=
 sysroot=
 target_exec=
 target_path=
 target_samples=
 extra_cflags=
+extra_cxxflags=
+extra_objcflags=
 extra_ldflags=
 extra_libs=
 extra_conf=     # extra configure options not covered above
-- 
2.52.0


From a91d900211fed39a6508d8470f29ada6a1046361 Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Fri, 21 Nov 2025 23:06:25 -0300
Subject: [PATCH 048/304] avformat/movenc: add support for writing srat box

Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavformat/movenc.c       | 45 ++++++++++++++++++++++++++++++++++----
 libavformat/movenc.h       |  1 +
 tests/fate/mov.mak         |  4 ++--
 tests/ref/fate/mov-mp4-pcm | 24 ++++++++++----------
 4 files changed, 56 insertions(+), 18 deletions(-)

diff --git a/libavformat/movenc.c b/libavformat/movenc.c
index 42c8771496..afbb1151af 100644
--- a/libavformat/movenc.c
+++ b/libavformat/movenc.c
@@ -161,6 +161,18 @@ static int64_t update_size(AVIOContext *pb, int64_t pos)
     return curpos - pos;
 }
 
+static int64_t update_size_and_version(AVIOContext *pb, int64_t pos, int version)
+{
+    int64_t curpos = avio_tell(pb);
+    avio_seek(pb, pos, SEEK_SET);
+    avio_wb32(pb, curpos - pos); /* rewrite size */
+    avio_skip(pb, 4);
+    avio_w8(pb, version); /* rewrite version */
+    avio_seek(pb, curpos, SEEK_SET);
+
+    return curpos - pos;
+}
+
 static int co64_required(const MOVTrack *track)
 {
     if (track->entry > 0 && track->cluster[track->entry - 1].pos + track->data_offset > UINT32_MAX)
@@ -1344,6 +1356,18 @@ static int mov_write_pcmc_tag(AVFormatContext *s, AVIOContext *pb, MOVTrack *tra
     return update_size(pb, pos);
 }
 
+static int mov_write_srat_tag(AVIOContext *pb, MOVTrack *track)
+{
+    int64_t pos = avio_tell(pb);
+    avio_wb32(pb, 0); /* size */
+    ffio_wfourcc(pb, "srat");
+    avio_wb32(pb, 0); /* version & flags */
+
+    avio_wb32(pb, track->par->sample_rate);
+
+    return update_size(pb, pos);
+}
+
 static int mov_write_audio_tag(AVFormatContext *s, AVIOContext *pb, MOVMuxContext *mov, MOVTrack *track)
 {
     int64_t pos = avio_tell(pb);
@@ -1363,6 +1387,10 @@ static int mov_write_audio_tag(AVFormatContext *s, AVIOContext *pb, MOVMuxContex
                    track->par->codec_id == AV_CODEC_ID_QDM2) {
             version = 1;
         }
+    } else if (track->mode == MODE_MP4) {
+        if (track->par->sample_rate > UINT16_MAX &&
+            (tag == MOV_MP4_IPCM_TAG || tag == MOV_MP4_FPCM_TAG))
+            version = 1;
     }
 
     avio_wb32(pb, 0); /* size */
@@ -1395,6 +1423,8 @@ static int mov_write_audio_tag(AVFormatContext *s, AVIOContext *pb, MOVMuxContex
         avio_wb32(pb, track->sample_size);
         avio_wb32(pb, get_samples_per_packet(track));
     } else {
+        unsigned sample_rate = track->par->sample_rate;
+
         if (track->mode == MODE_MOV) {
             avio_wb16(pb, track->par->ch_layout.nb_channels);
             if (track->par->codec_id == AV_CODEC_ID_PCM_U8 ||
@@ -1415,6 +1445,9 @@ static int mov_write_audio_tag(AVFormatContext *s, AVIOContext *pb, MOVMuxContex
                 avio_wb16(pb, 16);
             }
             avio_wb16(pb, 0);
+
+            while (sample_rate > UINT16_MAX)
+                sample_rate >>= 1;
         }
 
         avio_wb16(pb, 0); /* packet size (= 0) */
@@ -1425,14 +1458,13 @@ static int mov_write_audio_tag(AVFormatContext *s, AVIOContext *pb, MOVMuxContex
         else if (track->par->codec_id == AV_CODEC_ID_TRUEHD)
             avio_wb32(pb, track->par->sample_rate);
         else
-            avio_wb16(pb, track->par->sample_rate <= UINT16_MAX ?
-                          track->par->sample_rate : 0);
+            avio_wb16(pb, sample_rate);
 
         if (track->par->codec_id != AV_CODEC_ID_TRUEHD)
             avio_wb16(pb, 0); /* Reserved */
     }
 
-    if (version == 1) { /* SoundDescription V1 extended info */
+    if (track->mode == MODE_MOV && version == 1) { /* SoundDescription V1 extended info */
         if (mov_pcm_le_gt16(track->par->codec_id) ||
             mov_pcm_be_gt16(track->par->codec_id))
             avio_wb32(pb, 1); /*  must be 1 for  uncompressed formats */
@@ -1478,6 +1510,8 @@ static int mov_write_audio_tag(AVFormatContext *s, AVIOContext *pb, MOVMuxContex
     else if (track->par->codec_id == AV_CODEC_ID_TRUEHD)
         ret = mov_write_dmlp_tag(s, pb, track);
     else if (tag == MOV_MP4_IPCM_TAG || tag == MOV_MP4_FPCM_TAG) {
+        if (track->par->sample_rate > UINT16_MAX)
+            mov_write_srat_tag(pb, track);
         if (track->par->ch_layout.nb_channels > 1)
             ret = mov_write_chnl_tag(s, pb, track);
         if (ret < 0)
@@ -1508,6 +1542,9 @@ static int mov_write_audio_tag(AVFormatContext *s, AVIOContext *pb, MOVMuxContex
             ((ret = mov_write_btrt_tag(pb, track)) < 0))
         return ret;
 
+    if (track->mode == MODE_MP4)
+        track->entry_version = version;
+
     ret = update_size(pb, pos);
     return ret;
 }
@@ -3122,7 +3159,7 @@ static int mov_write_stsd_tag(AVFormatContext *s, AVIOContext *pb, MOVMuxContext
 
     track->last_stsd_index = stsd_index_back;
 
-    return update_size(pb, pos);
+    return update_size_and_version(pb, pos, track->entry_version);
 }
 
 static int mov_write_ctts_tag(AVFormatContext *s, AVIOContext *pb, MOVTrack *track)
diff --git a/libavformat/movenc.h b/libavformat/movenc.h
index 942ad905f7..eb12551ee5 100644
--- a/libavformat/movenc.h
+++ b/libavformat/movenc.h
@@ -86,6 +86,7 @@ typedef struct MOVFragmentInfo {
 
 typedef struct MOVTrack {
     int         mode;
+    int         entry_version;
     int         entry, entry_written;
     unsigned    timescale;
     uint64_t    time;
diff --git a/tests/fate/mov.mak b/tests/fate/mov.mak
index 1f2f589beb..15417a215d 100644
--- a/tests/fate/mov.mak
+++ b/tests/fate/mov.mak
@@ -251,8 +251,8 @@ fate-mov-channel-description: CMD = transcode wav $(TARGET_PATH)/tests/data/asyn
 # Test PCM in mp4 and channel layout
 FATE_MOV_FFMPEG-$(call TRANSCODE, PCM_S16LE, MP4 MOV, WAV_DEMUXER PAN_FILTER) \
                           += fate-mov-mp4-pcm
-fate-mov-mp4-pcm: tests/data/asynth-44100-1.wav tests/data/filtergraphs/mov-mp4-pcm
-fate-mov-mp4-pcm: CMD = transcode wav $(TARGET_PATH)/tests/data/asynth-44100-1.wav mp4 "-/filter_complex $(TARGET_PATH)/tests/data/filtergraphs/mov-mp4-pcm -map [mono] -map [stereo] -map [2.1] -map [5.1] -map [7.1] -c:a pcm_s16le" "-map 0 -c copy -frames:a 0"
+fate-mov-mp4-pcm: tests/data/asynth-96000-1.wav tests/data/filtergraphs/mov-mp4-pcm
+fate-mov-mp4-pcm: CMD = transcode wav $(TARGET_PATH)/tests/data/asynth-96000-1.wav mp4 "-/filter_complex $(TARGET_PATH)/tests/data/filtergraphs/mov-mp4-pcm -map [mono] -map [stereo] -map [2.1] -map [5.1] -map [7.1] -c:a pcm_s16le" "-map 0 -c copy -frames:a 0"
 
 # Test floating sample format PCM in mp4 and unusual channel layout
 FATE_MOV_FFMPEG-$(call TRANSCODE, PCM_F32LE, MP4 MOV, WAV_DEMUXER PAN_FILTER) \
diff --git a/tests/ref/fate/mov-mp4-pcm b/tests/ref/fate/mov-mp4-pcm
index 7cdca8629f..77c1584dcb 100644
--- a/tests/ref/fate/mov-mp4-pcm
+++ b/tests/ref/fate/mov-mp4-pcm
@@ -1,27 +1,27 @@
-0c6802135e9eb442201c0c1b001259d6 *tests/data/fate/mov-mp4-pcm.mp4
-10587977 tests/data/fate/mov-mp4-pcm.mp4
-#tb 0: 1/44100
+531c4a3389a66d305fb247691f4b14ab *tests/data/fate/mov-mp4-pcm.mp4
+23044177 tests/data/fate/mov-mp4-pcm.mp4
+#tb 0: 1/96000
 #media_type 0: audio
 #codec_id 0: pcm_s16le
-#sample_rate 0: 44100
+#sample_rate 0: 96000
 #channel_layout_name 0: mono
-#tb 1: 1/44100
+#tb 1: 1/96000
 #media_type 1: audio
 #codec_id 1: pcm_s16le
-#sample_rate 1: 44100
+#sample_rate 1: 96000
 #channel_layout_name 1: stereo
-#tb 2: 1/44100
+#tb 2: 1/96000
 #media_type 2: audio
 #codec_id 2: pcm_s16le
-#sample_rate 2: 44100
+#sample_rate 2: 96000
 #channel_layout_name 2: 2.1
-#tb 3: 1/44100
+#tb 3: 1/96000
 #media_type 3: audio
 #codec_id 3: pcm_s16le
-#sample_rate 3: 44100
+#sample_rate 3: 96000
 #channel_layout_name 3: 5.1
-#tb 4: 1/44100
+#tb 4: 1/96000
 #media_type 4: audio
 #codec_id 4: pcm_s16le
-#sample_rate 4: 44100
+#sample_rate 4: 96000
 #channel_layout_name 4: 7.1
-- 
2.52.0


From adc1dc3297e8dc2fffdace69b5c2009cbf7fbe65 Mon Sep 17 00:00:00 2001
From: Neal Gompa <neal@gompa.dev>
Date: Fri, 21 Nov 2025 06:16:50 -0500
Subject: [PATCH 049/304] configure: Lower libdvdnav and libdvdread minimum
 versions for EL9

Red Hat Enterprise Linux 9 is one patch version lower than what
FFmpeg currently requests. The slightly older versions still result
in a working build of FFmpeg with DVD support, so allow those
versions to be consumed to build FFmpeg.

Signed-off-by: Neal Gompa <neal@gompa.dev>
---
 configure | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 0ebfc851e6..7d6061b55c 100755
--- a/configure
+++ b/configure
@@ -7121,8 +7121,8 @@ enabled libdav1d          && require_pkg_config libdav1d "dav1d >= 1.0.0" "dav1d
 enabled libdavs2          && require_pkg_config libdavs2 "davs2 >= 1.6.0" davs2.h davs2_decoder_open
 enabled libdc1394         && require_pkg_config libdc1394 libdc1394-2 dc1394/dc1394.h dc1394_new
 enabled libdrm            && check_pkg_config libdrm libdrm xf86drm.h drmGetVersion
-enabled libdvdnav         && require_pkg_config libdvdnav "dvdnav >= 6.1.1" dvdnav/dvdnav.h dvdnav_open2
-enabled libdvdread        && require_pkg_config libdvdread "dvdread >= 6.1.2" dvdread/dvd_reader.h DVDOpen2
+enabled libdvdnav         && require_pkg_config libdvdnav "dvdnav >= 6.1.0" dvdnav/dvdnav.h dvdnav_open2
+enabled libdvdread        && require_pkg_config libdvdread "dvdread >= 6.1.1" dvdread/dvd_reader.h DVDOpen2
 enabled libfdk_aac        && { check_pkg_config libfdk_aac fdk-aac "fdk-aac/aacenc_lib.h" aacEncOpen ||
                                { require libfdk_aac fdk-aac/aacenc_lib.h aacEncOpen -lfdk-aac &&
                                  warn "using libfdk without pkg-config"; } }
-- 
2.52.0


From e53774405ba2c24af12ef0afafac13bea909a410 Mon Sep 17 00:00:00 2001
From: Frank Plowman <post@frankplowman.com>
Date: Sat, 8 Nov 2025 18:35:51 +0000
Subject: [PATCH 050/304] lavc/hevc: Fix usage of slice segment in invalid
 state

Previously, we set s->slice_initialized to 0 to prevent other slice
segments from depending on this slice segment only if hls_slice_header
failed.  If decode_slice fails for some other reason, however, before
decode_slice_data is called to bring the context back into a consistent
state, then slices could depend on this slice segment while it is in an
invalid state.  This can cause segmentation faults and other sorts of
nastiness.  Patch fixes this by always setting s->slice_initialized to 0
while the state is inconsistent.

Resolves #11652.
---
 libavcodec/hevc/hevcdec.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/libavcodec/hevc/hevcdec.c b/libavcodec/hevc/hevcdec.c
index 3f471dbc14..531d1c26f3 100644
--- a/libavcodec/hevc/hevcdec.c
+++ b/libavcodec/hevc/hevcdec.c
@@ -3411,7 +3411,6 @@ fail:
         ff_hevc_unref_frame(l->cur_frame, ~0);
     l->cur_frame = NULL;
     s->cur_frame = s->collocated_ref = NULL;
-    s->slice_initialized = 0;
     return ret;
 }
 
@@ -3544,9 +3543,11 @@ static int decode_slice(HEVCContext *s, unsigned nal_idx, GetBitContext *gb)
         return 0;
 
     ret = hls_slice_header(&s->sh, s, gb);
+    // Once hls_slice_header has been called, the context is inconsistent with the slice header
+    // until the context is reinitialized according to the contents of the new slice header
+    // at the start of decode_slice_data.
+    s->slice_initialized = 0;
     if (ret < 0) {
-        // hls_slice_header() does not cleanup on failure thus the state now is inconsistent so we cannot use it on dependent slices
-        s->slice_initialized = 0;
         return ret;
     }
 
-- 
2.52.0


From 0a0ceaa398f6ae9b6f6c6bef12b482d2892250d1 Mon Sep 17 00:00:00 2001
From: Zhao Zhili <zhilizhao@tencent.com>
Date: Wed, 12 Nov 2025 21:56:36 +0800
Subject: [PATCH 051/304] avutil/hwcontext_vaapi: fix use fourcc not supported
 by devices

1. A AVPixelFormat can map to multiple VA_FOURCCs, while
vaapi_format_from_pix_fmt() only returns the first item matched
before this patch.
2. vaapi_frames_init() use vaapi_format_from_pix_fmt() to get the
first item. Fourcc in this item may not be supported by the device.

This patch makes vaapi_format_from_pix_fmt return all matched items
iteratively, then use strict check in vaapi_frames_init to get the
right fourcc.
---
 libavutil/hwcontext_vaapi.c | 82 ++++++++++++++++++++++---------------
 1 file changed, 49 insertions(+), 33 deletions(-)

diff --git a/libavutil/hwcontext_vaapi.c b/libavutil/hwcontext_vaapi.c
index 753dcf8905..4f3502797b 100644
--- a/libavutil/hwcontext_vaapi.c
+++ b/libavutil/hwcontext_vaapi.c
@@ -195,12 +195,17 @@ static const VAAPIFormatDescriptor *
 }
 
 static const VAAPIFormatDescriptor *
-    vaapi_format_from_pix_fmt(enum AVPixelFormat pix_fmt)
+    vaapi_format_from_pix_fmt(enum AVPixelFormat pix_fmt, const VAAPIFormatDescriptor *prev)
 {
-    int i;
-    for (i = 0; i < FF_ARRAY_ELEMS(vaapi_format_map); i++)
-        if (vaapi_format_map[i].pix_fmt == pix_fmt)
-            return &vaapi_format_map[i];
+    const VAAPIFormatDescriptor *end = &vaapi_format_map[FF_ARRAY_ELEMS(vaapi_format_map)];
+    if (!prev)
+        prev = vaapi_format_map;
+    else
+        prev++;
+
+    for (; prev < end; prev++)
+        if (prev->pix_fmt == pix_fmt)
+            return prev;
     return NULL;
 }
 
@@ -214,27 +219,37 @@ static enum AVPixelFormat vaapi_pix_fmt_from_fourcc(unsigned int fourcc)
         return AV_PIX_FMT_NONE;
 }
 
+static int vaapi_get_img_desc_and_format(AVHWDeviceContext *hwdev,
+                                  enum AVPixelFormat pix_fmt,
+                                  const VAAPIFormatDescriptor **_desc,
+                                  VAImageFormat **image_format)
+{
+    VAAPIDeviceContext *ctx = hwdev->hwctx;
+    const VAAPIFormatDescriptor *desc = NULL;
+    int i;
+
+    while ((desc = vaapi_format_from_pix_fmt(pix_fmt, desc))) {
+        for (i = 0; i < ctx->nb_formats; i++) {
+            if (ctx->formats[i].fourcc == desc->fourcc) {
+                if (_desc)
+                    *_desc = desc;
+                if (image_format)
+                    *image_format = &ctx->formats[i].image_format;
+                return 0;
+            }
+        }
+    }
+
+    return AVERROR(ENOSYS);
+}
+
 static int vaapi_get_image_format(AVHWDeviceContext *hwdev,
                                   enum AVPixelFormat pix_fmt,
                                   VAImageFormat **image_format)
 {
-    VAAPIDeviceContext *ctx = hwdev->hwctx;
-    const VAAPIFormatDescriptor *desc;
-    int i;
-
-    desc = vaapi_format_from_pix_fmt(pix_fmt);
-    if (!desc || !image_format)
-        goto fail;
-
-    for (i = 0; i < ctx->nb_formats; i++) {
-        if (ctx->formats[i].fourcc == desc->fourcc) {
-            *image_format = &ctx->formats[i].image_format;
-            return 0;
-        }
-    }
-
-fail:
-    return AVERROR(ENOSYS);
+    if (!image_format)
+        return AVERROR(EINVAL);
+    return vaapi_get_img_desc_and_format(hwdev, pix_fmt, NULL, image_format);
 }
 
 static int vaapi_frames_get_constraints(AVHWDeviceContext *hwdev,
@@ -562,19 +577,23 @@ static int vaapi_frames_init(AVHWFramesContext *hwfc)
     VAAPIFramesContext     *ctx = hwfc->hwctx;
     AVVAAPIFramesContext  *avfc = &ctx->p;
     AVVAAPIDeviceContext *hwctx = hwfc->device_ctx->hwctx;
-    const VAAPIFormatDescriptor *desc;
-    VAImageFormat *expected_format;
+    const VAAPIFormatDescriptor *desc = NULL;
+    VAImageFormat *expected_format = NULL;
     AVBufferRef *test_surface = NULL;
     VASurfaceID test_surface_id;
     VAImage test_image;
     VAStatus vas;
     int err, i;
 
-    desc = vaapi_format_from_pix_fmt(hwfc->sw_format);
-    if (!desc) {
-        av_log(hwfc, AV_LOG_ERROR, "Unsupported format: %s.\n",
-               av_get_pix_fmt_name(hwfc->sw_format));
-        return AVERROR(EINVAL);
+    err = vaapi_get_img_desc_and_format(hwfc->device_ctx, hwfc->sw_format,
+                                        &desc, &expected_format);
+    if (err < 0) {
+        // Use a relaxed check when pool exist. It can be an external pool.
+        if (!hwfc->pool || !vaapi_format_from_pix_fmt(hwfc->sw_format, NULL)) {
+            av_log(hwfc, AV_LOG_ERROR, "Unsupported format: %s.\n",
+                   av_get_pix_fmt_name(hwfc->sw_format));
+            return AVERROR(EINVAL);
+        }
     }
 
     if (!hwfc->pool) {
@@ -673,10 +692,7 @@ static int vaapi_frames_init(AVHWFramesContext *hwfc)
     test_surface_id = (VASurfaceID)(uintptr_t)test_surface->data;
 
     ctx->derive_works = 0;
-
-    err = vaapi_get_image_format(hwfc->device_ctx,
-                                 hwfc->sw_format, &expected_format);
-    if (err == 0) {
+    if (expected_format) {
         vas = vaDeriveImage(hwctx->display, test_surface_id, &test_image);
         if (vas == VA_STATUS_SUCCESS) {
             if (expected_format->fourcc == test_image.format.fourcc) {
-- 
2.52.0


From 2357c0af9db917d84c284b366fcd967f6d953688 Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Sat, 15 Nov 2025 15:21:25 -0300
Subject: [PATCH 052/304] avformat: don't return EIO on demuxer errors

Demuxers should not generate this error code when they encounter truncated
or otherwise invalid files. It's a code the underlying protocol should generate
when there are legitimate reading errors.

Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavformat/4xm.c            |  8 +++++---
 libavformat/aacdec.c         |  4 ++--
 libavformat/aaxdec.c         |  7 ++++---
 libavformat/adxdec.c         |  2 +-
 libavformat/aiffdec.c        |  2 +-
 libavformat/alp.c            |  5 ++---
 libavformat/apc.c            |  7 +++++--
 libavformat/ape.c            |  2 +-
 libavformat/apm.c            |  5 ++---
 libavformat/argo_asf.c       |  9 +++------
 libavformat/argo_cvg.c       |  9 +++------
 libavformat/avs.c            | 16 +++++++--------
 libavformat/bethsoftvid.c    | 21 +++++++++----------
 libavformat/bfi.c            |  2 +-
 libavformat/bink.c           | 14 ++++++-------
 libavformat/binka.c          |  2 +-
 libavformat/bintext.c        | 39 ++++++++++++++++++------------------
 libavformat/bit.c            |  5 ++---
 libavformat/bmv.c            |  5 +++--
 libavformat/brstm.c          |  9 +++++----
 libavformat/c93.c            | 13 ++++++------
 libavformat/cafdec.c         | 11 +++++-----
 libavformat/cinedec.c        |  2 +-
 libavformat/dfa.c            |  4 ++--
 libavformat/dsicin.c         |  2 +-
 libavformat/dss.c            |  5 +++--
 libavformat/dv.c             | 14 +++++++------
 libavformat/dxa.c            | 10 ++++-----
 libavformat/electronicarts.c |  2 +-
 libavformat/filmstripdec.c   |  2 +-
 libavformat/flic.c           | 25 +++++++++--------------
 libavformat/gifdec.c         |  6 ++++--
 libavformat/gsmdec.c         |  2 +-
 libavformat/hca.c            |  7 ++++---
 libavformat/icoenc.c         |  2 +-
 libavformat/idcin.c          |  6 +++---
 libavformat/idroqdec.c       | 29 ++++++++++++---------------
 libavformat/iff.c            | 10 +++++----
 libavformat/img2dec.c        |  6 +++---
 libavformat/img2enc.c        |  3 +--
 libavformat/ingenientdec.c   |  2 +-
 libavformat/ipmovie.c        |  7 +++----
 libavformat/iss.c            |  2 +-
 libavformat/jvdec.c          |  2 +-
 libavformat/libmodplug.c     |  2 +-
 libavformat/lmlm4.c          |  2 +-
 libavformat/mca.c            |  2 +-
 libavformat/mgsts.c          |  4 ++--
 libavformat/mlvdec.c         |  4 ++--
 libavformat/mpc.c            |  2 +-
 libavformat/mtv.c            |  5 +++--
 libavformat/mvdec.c          |  4 ++--
 libavformat/mvi.c            |  2 +-
 libavformat/ncdec.c          |  4 ++--
 libavformat/nuv.c            |  2 +-
 libavformat/pdvdec.c         |  2 +-
 libavformat/pp_bnk.c         |  7 +++----
 libavformat/psxstr.c         | 10 +++++----
 libavformat/pva.c            | 10 ++++-----
 libavformat/qoadec.c         |  6 +++---
 libavformat/redspark.c       |  2 +-
 libavformat/rl2.c            |  2 +-
 libavformat/rmdec.c          |  6 +++---
 libavformat/rpl.c            | 18 ++++++++++-------
 libavformat/segafilm.c       | 24 ++++++++++++----------
 libavformat/sierravmd.c      | 19 +++++++-----------
 libavformat/siff.c           |  4 ++--
 libavformat/smush.c          |  2 +-
 libavformat/soxdec.c         |  6 ++++--
 libavformat/swfdec.c         |  4 ++--
 libavformat/thp.c            |  4 ++--
 libavformat/tiertexseq.c     | 13 +++++++-----
 libavformat/ty.c             |  2 +-
 libavformat/vc1test.c        |  2 +-
 libavformat/vividas.c        | 12 +++++------
 libavformat/voc_packet.c     |  2 +-
 libavformat/vpk.c            |  7 ++++---
 libavformat/wavarc.c         |  4 ++--
 libavformat/wc3movie.c       | 15 +++++++-------
 libavformat/westwood_aud.c   | 17 ++++++++--------
 libavformat/westwood_vqa.c   | 18 ++++++++---------
 libavformat/wsddec.c         |  6 ++++--
 libavformat/wtvdec.c         |  2 +-
 libavformat/wvdec.c          |  7 ++++---
 libavformat/xmv.c            | 19 +++++++++---------
 libavformat/yuv4mpegdec.c    |  2 +-
 libavformat/yuv4mpegenc.c    |  2 +-
 87 files changed, 324 insertions(+), 312 deletions(-)

diff --git a/libavformat/4xm.c b/libavformat/4xm.c
index 218ea837c5..d2442f3160 100644
--- a/libavformat/4xm.c
+++ b/libavformat/4xm.c
@@ -32,6 +32,7 @@
 #include "libavutil/mem.h"
 #include "libavcodec/internal.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 
@@ -239,9 +240,10 @@ static int fourxm_read_header(AVFormatContext *s)
     header = av_malloc(header_size);
     if (!header)
         return AVERROR(ENOMEM);
-    if (avio_read(pb, header, header_size) != header_size) {
+    ret = ffio_read_size(pb, header, header_size);
+    if (ret < 0) {
         av_free(header);
-        return AVERROR(EIO);
+        return ret;
     }
 
     /* take the lazy approach and search for any and all vtrk and strk chunks */
@@ -312,7 +314,7 @@ static int fourxm_read_packet(AVFormatContext *s,
         fourcc_tag = AV_RL32(&header[0]);
         size       = AV_RL32(&header[4]);
         if (avio_feof(pb))
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         switch (fourcc_tag) {
         case LIST_TAG:
             /* this is a good time to bump the video pts */
diff --git a/libavformat/aacdec.c b/libavformat/aacdec.c
index 38ac9dcbe7..fef3c69f0b 100644
--- a/libavformat/aacdec.c
+++ b/libavformat/aacdec.c
@@ -175,7 +175,7 @@ retry:
         return ret;
 
     if (ret < ADTS_HEADER_SIZE)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
 
     if ((AV_RB16(pkt->data) >> 4) != 0xfff) {
         // Parse all the ID3 headers between frames
@@ -184,7 +184,7 @@ retry:
         av_assert2(append > 0);
         ret = av_append_packet(s->pb, pkt, append);
         if (ret != append)
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         if (!ff_id3v2_match(pkt->data, ID3v2_DEFAULT_MAGIC)) {
             av_packet_unref(pkt);
             ret = adts_aac_resync(s);
diff --git a/libavformat/aaxdec.c b/libavformat/aaxdec.c
index 40a088a35b..9926be0f4b 100644
--- a/libavformat/aaxdec.c
+++ b/libavformat/aaxdec.c
@@ -345,9 +345,10 @@ static int aax_read_packet(AVFormatContext *s, AVPacket *pkt)
             extradata = av_malloc(extradata_size + AV_INPUT_BUFFER_PADDING_SIZE);
             if (!extradata)
                 return AVERROR(ENOMEM);
-            if (avio_read(pb, extradata, extradata_size) != extradata_size) {
+            ret = ffio_read_size(pb, extradata, extradata_size);
+            if (ret < 0) {
                 av_free(extradata);
-                return AVERROR(EIO);
+                return ret;
             }
             memset(extradata + extradata_size, 0, AV_INPUT_BUFFER_PADDING_SIZE);
         }
@@ -356,7 +357,7 @@ static int aax_read_packet(AVFormatContext *s, AVPacket *pkt)
     ret = av_get_packet(pb, pkt, size);
     if (ret != size) {
         av_free(extradata);
-        return ret < 0 ? ret : AVERROR(EIO);
+        return ret < 0 ? ret : AVERROR_INVALIDDATA;
     }
     pkt->duration = 1;
     pkt->stream_index = 0;
diff --git a/libavformat/adxdec.c b/libavformat/adxdec.c
index 00b652315f..b9936799f9 100644
--- a/libavformat/adxdec.c
+++ b/libavformat/adxdec.c
@@ -75,7 +75,7 @@ static int adx_read_packet(AVFormatContext *s, AVPacket *pkt)
         av_shrink_packet(pkt, size);
         pkt->flags &= ~AV_PKT_FLAG_CORRUPT;
     } else if (ret < size) {
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     } else {
         size = ret;
     }
diff --git a/libavformat/aiffdec.c b/libavformat/aiffdec.c
index d9d580bdb5..ff47d8dc7b 100644
--- a/libavformat/aiffdec.c
+++ b/libavformat/aiffdec.c
@@ -60,7 +60,7 @@ static int64_t get_tag(AVIOContext *pb, uint32_t * tag)
     int64_t size;
 
     if (avio_feof(pb))
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
 
     *tag = avio_rl32(pb);
     size = avio_rb32(pb);
diff --git a/libavformat/alp.c b/libavformat/alp.c
index ad8e160223..18cc636296 100644
--- a/libavformat/alp.c
+++ b/libavformat/alp.c
@@ -24,6 +24,7 @@
 
 #include "libavutil/channel_layout.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 #include "mux.h"
@@ -90,10 +91,8 @@ static int alp_read_header(AVFormatContext *s)
         return AVERROR_INVALIDDATA;
     }
 
-    if ((ret = avio_read(s->pb, hdr->adpcm, sizeof(hdr->adpcm))) < 0)
+    if ((ret = ffio_read_size(s->pb, hdr->adpcm, sizeof(hdr->adpcm))) < 0)
         return ret;
-    else if (ret != sizeof(hdr->adpcm))
-        return AVERROR(EIO);
 
     if (strncmp("ADPCM", hdr->adpcm, sizeof(hdr->adpcm)) != 0)
         return AVERROR_INVALIDDATA;
diff --git a/libavformat/apc.c b/libavformat/apc.c
index d24f57d021..da3d80bb9e 100644
--- a/libavformat/apc.c
+++ b/libavformat/apc.c
@@ -73,8 +73,11 @@ static int apc_read_header(AVFormatContext *s)
 
 static int apc_read_packet(AVFormatContext *s, AVPacket *pkt)
 {
-    if (av_get_packet(s->pb, pkt, MAX_READ_SIZE) <= 0)
-        return AVERROR(EIO);
+    int ret = av_get_packet(s->pb, pkt, MAX_READ_SIZE);
+    if (ret < 0)
+        return ret;
+    else if (ret == 0)
+        return AVERROR_INVALIDDATA;
     pkt->stream_index = 0;
     return 0;
 }
diff --git a/libavformat/ape.c b/libavformat/ape.c
index f86ca5e894..7e6bf12961 100644
--- a/libavformat/ape.c
+++ b/libavformat/ape.c
@@ -395,7 +395,7 @@ static int ape_read_packet(AVFormatContext * s, AVPacket * pkt)
         av_log(s, AV_LOG_ERROR, "invalid packet size: %8"PRId64"\n",
                ape->frames[ape->currentframe].size);
         ape->currentframe++;
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     }
 
     ret = av_new_packet(pkt, ape->frames[ape->currentframe].size + extra_size);
diff --git a/libavformat/apm.c b/libavformat/apm.c
index b3716c1d80..76ae9fd844 100644
--- a/libavformat/apm.c
+++ b/libavformat/apm.c
@@ -23,6 +23,7 @@
 #include "config_components.h"
 
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 #include "mux.h"
@@ -153,10 +154,8 @@ static int apm_read_header(AVFormatContext *s)
                                  (int64_t)par->sample_rate *
                                  par->bits_per_coded_sample;
 
-    if ((ret = avio_read(s->pb, buf, APM_FILE_EXTRADATA_SIZE)) < 0)
+    if ((ret = ffio_read_size(s->pb, buf, APM_FILE_EXTRADATA_SIZE)) < 0)
         return ret;
-    else if (ret != APM_FILE_EXTRADATA_SIZE)
-        return AVERROR(EIO);
 
     apm_parse_extradata(&extradata, buf);
 
diff --git a/libavformat/argo_asf.c b/libavformat/argo_asf.c
index e08f029f80..116367516a 100644
--- a/libavformat/argo_asf.c
+++ b/libavformat/argo_asf.c
@@ -24,6 +24,7 @@
 
 #include "libavutil/avstring.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 #include "mux.h"
@@ -188,10 +189,8 @@ static int argo_asf_read_header(AVFormatContext *s)
     if (!(st = avformat_new_stream(s, NULL)))
         return AVERROR(ENOMEM);
 
-    if ((ret = avio_read(pb, buf, ASF_FILE_HEADER_SIZE)) < 0)
+    if ((ret = ffio_read_size(pb, buf, ASF_FILE_HEADER_SIZE)) < 0)
         return ret;
-    else if (ret != ASF_FILE_HEADER_SIZE)
-        return AVERROR(EIO);
 
     ff_argo_asf_parse_file_header(&asf->fhdr, buf);
 
@@ -205,10 +204,8 @@ static int argo_asf_read_header(AVFormatContext *s)
     if ((ret = avio_skip(pb, asf->fhdr.chunk_offset - ASF_FILE_HEADER_SIZE)) < 0)
         return ret;
 
-    if ((ret = avio_read(pb, buf, ASF_CHUNK_HEADER_SIZE)) < 0)
+    if ((ret = ffio_read_size(pb, buf, ASF_CHUNK_HEADER_SIZE)) < 0)
         return ret;
-    else if (ret != ASF_CHUNK_HEADER_SIZE)
-        return AVERROR(EIO);
 
     ff_argo_asf_parse_chunk_header(&asf->ckhdr, buf);
 
diff --git a/libavformat/argo_cvg.c b/libavformat/argo_cvg.c
index 03ae6fa59e..932ec8b966 100644
--- a/libavformat/argo_cvg.c
+++ b/libavformat/argo_cvg.c
@@ -25,6 +25,7 @@
 #include "libavutil/avstring.h"
 #include "libavutil/channel_layout.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 #include "mux.h"
@@ -139,10 +140,8 @@ static int argo_cvg_read_checksum(AVIOContext *pb, const ArgoCVGHeader *cvg, uin
         return ret;
 
     /* NB: Not using avio_rl32() because no error checking. */
-    if ((ret = avio_read(pb, buf, sizeof(buf))) < 0)
+    if ((ret = ffio_read_size(pb, buf, sizeof(buf))) < 0)
         return ret;
-    else if (ret != sizeof(buf))
-        return AVERROR(EIO);
 
     if ((ret = avio_seek(pb, ARGO_CVG_HEADER_SIZE, SEEK_SET)) < 0)
         return ret;
@@ -163,10 +162,8 @@ static int argo_cvg_read_header(AVFormatContext *s)
     if (!(st = avformat_new_stream(s, NULL)))
         return AVERROR(ENOMEM);
 
-    if ((ret = avio_read(s->pb, buf, ARGO_CVG_HEADER_SIZE)) < 0)
+    if ((ret = ffio_read_size(s->pb, buf, ARGO_CVG_HEADER_SIZE)) < 0)
         return ret;
-    else if (ret != ARGO_CVG_HEADER_SIZE)
-        return AVERROR(EIO);
 
     ctx->header.size   = AV_RL32(buf + 0);
     ctx->header.loop   = AV_RL32(buf + 4);
diff --git a/libavformat/avs.c b/libavformat/avs.c
index 3cd814836b..84cd1267e6 100644
--- a/libavformat/avs.c
+++ b/libavformat/avs.c
@@ -26,6 +26,7 @@
  */
 
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "voc.h"
 
@@ -113,10 +114,9 @@ avs_read_video_packet(AVFormatContext * s, AVPacket * pkt,
     pkt->data[palette_size + 1] = type;
     pkt->data[palette_size + 2] = size & 0xFF;
     pkt->data[palette_size + 3] = (size >> 8) & 0xFF;
-    ret = avio_read(s->pb, pkt->data + palette_size + 4, size - 4) + 4;
-    if (ret < size) {
-        return AVERROR(EIO);
-    }
+    ret = ffio_read_size(s->pb, pkt->data + palette_size + 4, size - 4) + 4;
+    if (ret < 0)
+        return ret;
 
     pkt->size = ret + palette_size;
     pkt->stream_index = avs->st_video->index;
@@ -168,7 +168,7 @@ static int avs_read_packet(AVFormatContext * s, AVPacket * pkt)
     while (1) {
         if (avs->remaining_frame_size <= 0) {
             if (!avio_rl16(s->pb))    /* found EOF */
-                return AVERROR(EIO);
+                return AVERROR_INVALIDDATA;
             avs->remaining_frame_size = avio_rl16(s->pb) - 4;
         }
 
@@ -184,9 +184,9 @@ static int avs_read_packet(AVFormatContext * s, AVPacket * pkt)
             case AVS_PALETTE:
                 if (size - 4 > sizeof(palette))
                     return AVERROR_INVALIDDATA;
-                ret = avio_read(s->pb, palette, size - 4);
-                if (ret < size - 4)
-                    return AVERROR(EIO);
+                ret = ffio_read_size(s->pb, palette, size - 4);
+                if (ret < 0)
+                    return ret;
                 palette_size = size;
                 break;
 
diff --git a/libavformat/bethsoftvid.c b/libavformat/bethsoftvid.c
index e3c4758f30..f118f668f1 100644
--- a/libavformat/bethsoftvid.c
+++ b/libavformat/bethsoftvid.c
@@ -32,6 +32,7 @@
 #include "libavutil/intreadwrite.h"
 #include "libavutil/mem.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 #include "libavcodec/bethsoftvideo.h"
@@ -147,10 +148,9 @@ static int read_frame(BVID_DemuxContext *vid, AVIOContext *pb, AVPacket *pkt,
 
     // set the y offset if it exists (decoder header data should be in data section)
     if(block_type == VIDEO_YOFF_P_FRAME){
-        if (avio_read(pb, &vidbuf_start[vidbuf_nbytes], 2) != 2) {
-            ret = AVERROR(EIO);
+        ret = ffio_read_size(pb, &vidbuf_start[vidbuf_nbytes], 2);
+        if (ret < 0)
             goto fail;
-        }
         vidbuf_nbytes += 2;
     }
 
@@ -170,10 +170,9 @@ static int read_frame(BVID_DemuxContext *vid, AVIOContext *pb, AVPacket *pkt,
             if(block_type == VIDEO_I_FRAME)
                 vidbuf_start[vidbuf_nbytes++] = avio_r8(pb);
         } else if(code){ // plain sequence
-            if (avio_read(pb, &vidbuf_start[vidbuf_nbytes], code) != code) {
-                ret = AVERROR(EIO);
+            ret = ffio_read_size(pb, &vidbuf_start[vidbuf_nbytes], code);
+            if (ret < 0)
                 goto fail;
-            }
             vidbuf_nbytes += code;
         }
         bytes_copied += code & 0x7F;
@@ -238,9 +237,9 @@ static int vid_read_packet(AVFormatContext *s,
                 av_log(s, AV_LOG_WARNING, "discarding unused palette\n");
                 vid->has_palette = 0;
             }
-            if (avio_read(pb, vid->palette, BVID_PALETTE_SIZE) != BVID_PALETTE_SIZE) {
-                return AVERROR(EIO);
-            }
+            ret_value = ffio_read_size(pb, vid->palette, BVID_PALETTE_SIZE);
+            if (ret_value < 0)
+                return ret_value;
             vid->has_palette = 1;
             return vid_read_packet(s, pkt);
 
@@ -268,7 +267,7 @@ static int vid_read_packet(AVFormatContext *s,
                 if (ret_value < 0)
                     return ret_value;
                 av_log(s, AV_LOG_ERROR, "incomplete audio block\n");
-                return AVERROR(EIO);
+                return AVERROR_INVALIDDATA;
             }
             pkt->stream_index = vid->audio_index;
             pkt->duration     = audio_length;
@@ -284,7 +283,7 @@ static int vid_read_packet(AVFormatContext *s,
             if(vid->nframes != 0)
                 av_log(s, AV_LOG_VERBOSE, "reached terminating character but not all frames read.\n");
             vid->is_finished = 1;
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         default:
             av_log(s, AV_LOG_ERROR, "unknown block (character = %c, decimal = %d, hex = %x)!!!\n",
                    block_type, block_type, block_type);
diff --git a/libavformat/bfi.c b/libavformat/bfi.c
index 06bf5d2c17..e6ae726404 100644
--- a/libavformat/bfi.c
+++ b/libavformat/bfi.c
@@ -131,7 +131,7 @@ static int bfi_read_packet(AVFormatContext * s, AVPacket * pkt)
         uint32_t state = 0;
         while(state != MKTAG('S','A','V','I')){
             if (avio_feof(pb))
-                return AVERROR(EIO);
+                return AVERROR_INVALIDDATA;
             state = 256*state + avio_r8(pb);
         }
         /* Now that the chunk's location is confirmed, we proceed... */
diff --git a/libavformat/bink.c b/libavformat/bink.c
index 0632d390a2..18eaeba738 100644
--- a/libavformat/bink.c
+++ b/libavformat/bink.c
@@ -120,13 +120,13 @@ static int read_header(AVFormatContext *s)
 
     if (vst->duration > 1000000) {
         av_log(s, AV_LOG_ERROR, "invalid header: more than 1000000 frames\n");
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     }
 
     if (avio_rl32(pb) > bink->file_size) {
         av_log(s, AV_LOG_ERROR,
                "invalid header: largest frame size greater than file size\n");
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     }
 
     avio_skip(pb, 4);
@@ -140,7 +140,7 @@ static int read_header(AVFormatContext *s)
         av_log(s, AV_LOG_ERROR,
                "invalid header: invalid fps (%"PRIu32"/%"PRIu32")\n",
                fps_num, fps_den);
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     }
     avpriv_set_pts_info(vst, 64, fps_den, fps_num);
     vst->avg_frame_rate = av_inv_q(vst->time_base);
@@ -162,7 +162,7 @@ static int read_header(AVFormatContext *s)
         av_log(s, AV_LOG_ERROR,
                "invalid header: more than "AV_STRINGIFY(BINK_MAX_AUDIO_TRACKS)" audio tracks (%"PRIu32")\n",
                bink->num_audio_tracks);
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     }
 
     signature = (vst->codecpar->codec_tag & 0xFFFFFF);
@@ -217,7 +217,7 @@ static int read_header(AVFormatContext *s)
 
         if (next_pos <= pos) {
             av_log(s, AV_LOG_ERROR, "invalid frame index table\n");
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         }
         if ((ret = av_add_index_entry(vst, pos, i, next_pos - pos, 0,
                                       keyframe ? AVINDEX_KEYFRAME : 0)) < 0)
@@ -253,7 +253,7 @@ static int read_packet(AVFormatContext *s, AVPacket *pkt)
             av_log(s, AV_LOG_ERROR,
                    "could not find index entry for frame %"PRId64"\n",
                    bink->video_pts);
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         }
 
         bink->remain_packet_size = sti->index_entries[index_entry].size;
@@ -267,7 +267,7 @@ static int read_packet(AVFormatContext *s, AVPacket *pkt)
             av_log(s, AV_LOG_ERROR,
                    "frame %"PRId64": audio size in header (%"PRIu32") > size of packet left (%"PRIu32")\n",
                    bink->video_pts, audio_size, bink->remain_packet_size);
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         }
         bink->remain_packet_size -= 4 + audio_size;
         bink->current_track++;
diff --git a/libavformat/binka.c b/libavformat/binka.c
index cc5f2555ca..df853890c1 100644
--- a/libavformat/binka.c
+++ b/libavformat/binka.c
@@ -75,7 +75,7 @@ static int binka_read_packet(AVFormatContext *s, AVPacket *pkt)
     avio_skip(pb, 2);
     pkt_size = avio_rl16(pb) + 4;
     if (pkt_size <= 4)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     ret = av_new_packet(pkt, pkt_size);
     if (ret < 0)
         return ret;
diff --git a/libavformat/bintext.c b/libavformat/bintext.c
index c96c14ccd9..5439323cb3 100644
--- a/libavformat/bintext.c
+++ b/libavformat/bintext.c
@@ -36,6 +36,7 @@
 #include "libavutil/opt.h"
 #include "libavutil/parseutils.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 #include "sauce.h"
@@ -99,8 +100,8 @@ static int next_tag_read(AVFormatContext *avctx, uint64_t *fsize)
         return AVERROR_INVALIDDATA;
 
     avio_seek(pb, start_pos - 256, SEEK_SET);
-    if (avio_read(pb, buf, sizeof(next_magic)) != sizeof(next_magic))
-        return -1;
+    if ((len = ffio_read_size(pb, buf, sizeof(next_magic))) < 0)
+        return len;
     if (memcmp(buf, next_magic, sizeof(next_magic)))
         return -1;
     if (avio_r8(pb) != 0x01)
@@ -244,8 +245,8 @@ static int xbin_read_header(AVFormatContext *s)
         return ret;
     st->codecpar->extradata[0] = fontheight;
     st->codecpar->extradata[1] = flags;
-    if (avio_read(pb, st->codecpar->extradata + 2, st->codecpar->extradata_size - 2) < 0)
-        return AVERROR(EIO);
+    if ((ret = ffio_read_size(pb, st->codecpar->extradata + 2, st->codecpar->extradata_size - 2)) < 0)
+        return ret;
 
     if (pb->seekable & AVIO_SEEKABLE_NORMAL) {
         int64_t fsize =  avio_size(pb);
@@ -281,13 +282,13 @@ static int adf_read_header(AVFormatContext *s)
     st->codecpar->extradata[0] = 16;
     st->codecpar->extradata[1] = BINTEXT_PALETTE|BINTEXT_FONT;
 
-    if (avio_read(pb, st->codecpar->extradata + 2, 24) < 0)
-        return AVERROR(EIO);
+    if ((ret = ffio_read_size(pb, st->codecpar->extradata + 2, 24)) < 0)
+        return ret;
     avio_skip(pb, 144);
-    if (avio_read(pb, st->codecpar->extradata + 2 + 24, 24) < 0)
-        return AVERROR(EIO);
-    if (avio_read(pb, st->codecpar->extradata + 2 + 48, 4096) < 0)
-        return AVERROR(EIO);
+    if ((ret = ffio_read_size(pb, st->codecpar->extradata + 2 + 24, 24)) < 0)
+        return ret;
+    if ((ret = ffio_read_size(pb, st->codecpar->extradata + 2 + 48, 4096)) < 0)
+        return ret;
 
     if (pb->seekable & AVIO_SEEKABLE_NORMAL) {
         int got_width = 0;
@@ -330,7 +331,7 @@ static int idf_read_header(AVFormatContext *s)
     int64_t fsize;
 
     if (!(pb->seekable & AVIO_SEEKABLE_NORMAL))
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
 
     st = init_stream(s);
     if (!st)
@@ -349,10 +350,10 @@ static int idf_read_header(AVFormatContext *s)
 
     avio_seek(pb, bin->fsize + 12, SEEK_SET);
 
-    if (avio_read(pb, st->codecpar->extradata + 2 + 48, 4096) < 0)
-        return AVERROR(EIO);
-    if (avio_read(pb, st->codecpar->extradata + 2, 48) < 0)
-        return AVERROR(EIO);
+    if ((ret = ffio_read_size(pb, st->codecpar->extradata + 2 + 48, 4096)) < 0)
+        return ret;
+    if ((ret = ffio_read_size(pb, st->codecpar->extradata + 2, 48)) < 0)
+        return ret;
 
     ff_sauce_read(s, &bin->fsize, &got_width, 0);
     if (st->codecpar->width < 8)
@@ -371,15 +372,15 @@ static int read_packet(AVFormatContext *s,
 
     if (bin->fsize > 0) {
         if (av_get_packet(s->pb, pkt, bin->fsize) < 0)
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         bin->fsize = -1; /* done */
     } else if (!bin->fsize) {
         if (avio_feof(s->pb))
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         if (av_get_packet(s->pb, pkt, bin->chars_per_frame) < 0)
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
     } else {
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     }
 
     pkt->flags |= AV_PKT_FLAG_KEY;
diff --git a/libavformat/bit.c b/libavformat/bit.c
index 5c3eb31c57..1f3dc31f38 100644
--- a/libavformat/bit.c
+++ b/libavformat/bit.c
@@ -22,6 +22,7 @@
 #include "config_components.h"
 
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 #include "mux.h"
@@ -93,11 +94,9 @@ static int read_packet(AVFormatContext *s,
     if(packet_size > MAX_FRAME_SIZE)
         return AVERROR_INVALIDDATA;
 
-    ret = avio_read(pb, (uint8_t*)buf, (8 * packet_size) * sizeof(uint16_t));
+    ret = ffio_read_size(pb, (uint8_t*)buf, (8 * packet_size) * sizeof(uint16_t));
     if(ret<0)
         return ret;
-    if(ret != 8 * packet_size * sizeof(uint16_t))
-        return AVERROR(EIO);
 
     if ((ret = av_new_packet(pkt, packet_size)) < 0)
         return ret;
diff --git a/libavformat/bmv.c b/libavformat/bmv.c
index 84ab2aac5a..db2b4076c0 100644
--- a/libavformat/bmv.c
+++ b/libavformat/bmv.c
@@ -22,6 +22,7 @@
 #include "libavutil/channel_layout.h"
 #include "libavutil/mem.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 
@@ -88,8 +89,8 @@ static int bmv_read_packet(AVFormatContext *s, AVPacket *pkt)
         if ((err = av_reallocp(&c->packet, c->size + 1)) < 0)
             return err;
         c->packet[0] = type;
-        if (avio_read(s->pb, c->packet + 1, c->size) != c->size)
-            return AVERROR(EIO);
+        if ((err = ffio_read_size(s->pb, c->packet + 1, c->size)) < 0)
+            return err;
         if (type & BMV_AUDIO) {
             int audio_size = c->packet[1] * 65 + 1;
             if (audio_size >= c->size) {
diff --git a/libavformat/brstm.c b/libavformat/brstm.c
index d29004155b..3fe19fff72 100644
--- a/libavformat/brstm.c
+++ b/libavformat/brstm.c
@@ -23,6 +23,7 @@
 #include "libavutil/mem.h"
 #include "libavcodec/bytestream.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 
@@ -425,11 +426,11 @@ static int read_packet(AVFormatContext *s, AVPacket *pkt)
                                     (b->current_block - 1), 4 * channels);
 
         for (i = 0; i < channels; i++) {
-            ret = avio_read(s->pb, dst, size);
+            ret = ffio_read_size(s->pb, dst, size);
             dst += size;
             avio_skip(s->pb, skip);
-            if (ret != size) {
-                return AVERROR(EIO);
+            if (ret < 0) {
+                return ret;
             }
         }
         pkt->duration = samples;
@@ -441,7 +442,7 @@ static int read_packet(AVFormatContext *s, AVPacket *pkt)
     pkt->stream_index = 0;
 
     if (ret != size)
-        ret = AVERROR(EIO);
+        ret = AVERROR_INVALIDDATA;
 
     return ret;
 }
diff --git a/libavformat/c93.c b/libavformat/c93.c
index 933fe4a99e..1fbc093612 100644
--- a/libavformat/c93.c
+++ b/libavformat/c93.c
@@ -20,6 +20,7 @@
  */
 
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 #include "voc.h"
@@ -157,9 +158,9 @@ static int read_packet(AVFormatContext *s, AVPacket *pkt)
     pkt->data[0] = 0;
     pkt->size = datasize + 1;
 
-    ret = avio_read(pb, pkt->data + 1, datasize);
-    if (ret < datasize) {
-        return AVERROR(EIO);
+    ret = ffio_read_size(pb, pkt->data + 1, datasize);
+    if (ret < 0) {
+        return ret;
     }
 
     datasize = avio_rl16(pb); /* palette size */
@@ -169,9 +170,9 @@ static int read_packet(AVFormatContext *s, AVPacket *pkt)
             return AVERROR_INVALIDDATA;
         }
         pkt->data[0] |= C93_HAS_PALETTE;
-        ret = avio_read(pb, pkt->data + pkt->size, datasize);
-        if (ret < datasize) {
-            return AVERROR(EIO);
+        ret = ffio_read_size(pb, pkt->data + pkt->size, datasize);
+        if (ret < 0) {
+            return ret;
         }
         pkt->size += 768;
     }
diff --git a/libavformat/cafdec.c b/libavformat/cafdec.c
index 5d7dbe8f41..99ae041364 100644
--- a/libavformat/cafdec.c
+++ b/libavformat/cafdec.c
@@ -28,6 +28,7 @@
 #include <inttypes.h>
 
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 #include "isom.h"
@@ -142,9 +143,9 @@ static int read_kuki_chunk(AVFormatContext *s, int64_t size)
             avio_skip(pb, size);
             return AVERROR_INVALIDDATA;
         }
-        if (avio_read(pb, preamble, ALAC_PREAMBLE) != ALAC_PREAMBLE) {
+        if ((ret = ffio_read_size(pb, preamble, ALAC_PREAMBLE)) < 0) {
             av_log(s, AV_LOG_ERROR, "failed to read preamble\n");
-            return AVERROR_INVALIDDATA;
+            return ret;
         }
 
         if ((ret = ff_alloc_extradata(st->codecpar, ALAC_HEADER)) < 0)
@@ -443,7 +444,7 @@ static int read_packet(AVFormatContext *s, AVPacket *pkt)
         if (!left)
             return AVERROR_EOF;
         if (left < 0)
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
     }
 
     pkt_frames = caf->frames_per_packet;
@@ -461,12 +462,12 @@ static int read_packet(AVFormatContext *s, AVPacket *pkt)
             pkt_size   = caf->num_bytes - sti->index_entries[caf->packet_cnt].pos;
             pkt_frames = st->duration   - sti->index_entries[caf->packet_cnt].timestamp;
         } else {
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         }
     }
 
     if (pkt_size == 0 || pkt_frames == 0 || pkt_size > left)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
 
     res = av_get_packet(pb, pkt, pkt_size);
     if (res < 0)
diff --git a/libavformat/cinedec.c b/libavformat/cinedec.c
index cd13f132c3..7bbf198b19 100644
--- a/libavformat/cinedec.c
+++ b/libavformat/cinedec.c
@@ -354,7 +354,7 @@ static int cine_read_seek(AVFormatContext *avctx, int stream_index, int64_t time
         return AVERROR(ENOSYS);
 
     if (!(avctx->pb->seekable & AVIO_SEEKABLE_NORMAL))
-        return AVERROR(EIO);
+        return AVERROR(ENOSYS);
 
     cine->pts = timestamp;
     return 0;
diff --git a/libavformat/dfa.c b/libavformat/dfa.c
index 1d78c348b1..580e3ddd24 100644
--- a/libavformat/dfa.c
+++ b/libavformat/dfa.c
@@ -89,7 +89,7 @@ static int dfa_read_packet(AVFormatContext *s, AVPacket *pkt)
         return AVERROR_EOF;
 
     if (av_get_packet(pb, pkt, 12) != 12)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     while (!avio_feof(pb)) {
         if (!first) {
             ret = av_append_packet(pb, pkt, 12);
@@ -101,7 +101,7 @@ static int dfa_read_packet(AVFormatContext *s, AVPacket *pkt)
         frame_size = AV_RL32(pkt->data + pkt->size - 8);
         if (frame_size > INT_MAX - 4) {
             av_log(s, AV_LOG_ERROR, "Too large chunk size: %"PRIu32"\n", frame_size);
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         }
         if (AV_RL32(pkt->data + pkt->size - 12) == MKTAG('E', 'O', 'F', 'R')) {
             if (frame_size) {
diff --git a/libavformat/dsicin.c b/libavformat/dsicin.c
index 6eff38e010..3c325f7cfd 100644
--- a/libavformat/dsicin.c
+++ b/libavformat/dsicin.c
@@ -153,7 +153,7 @@ static int cin_read_frame_header(CinDemuxContext *cin, AVIOContext *pb) {
     hdr->audio_frame_size = avio_rl32(pb);
 
     if (avio_feof(pb) || pb->error)
-        return AVERROR(EIO);
+        return pb->error ? pb->error : AVERROR_INVALIDDATA;
 
     if (avio_rl32(pb) != 0xAA55AA55)
         return AVERROR_INVALIDDATA;
diff --git a/libavformat/dss.c b/libavformat/dss.c
index 47c8f49d67..6cabdb5421 100644
--- a/libavformat/dss.c
+++ b/libavformat/dss.c
@@ -116,6 +116,7 @@ static int dss_read_header(AVFormatContext *s)
     DSSDemuxContext *ctx = s->priv_data;
     AVIOContext *pb = s->pb;
     AVStream *st;
+    int64_t ret64;
     int ret, version;
 
     st = avformat_new_stream(s, NULL);
@@ -164,8 +165,8 @@ static int dss_read_header(AVFormatContext *s)
 
     /* Jump over header */
 
-    if (avio_seek(pb, ctx->dss_header_size, SEEK_SET) != ctx->dss_header_size)
-        return AVERROR(EIO);
+    if ((ret64 = avio_seek(pb, ctx->dss_header_size, SEEK_SET)) < 0)
+        return (int)ret64;
 
     ctx->counter = 0;
     ctx->swap    = 0;
diff --git a/libavformat/dv.c b/libavformat/dv.c
index 8af0d5a652..84677284bd 100644
--- a/libavformat/dv.c
+++ b/libavformat/dv.c
@@ -33,6 +33,7 @@
 
 #include <time.h>
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 #include "libavcodec/dv_profile.h"
@@ -576,6 +577,7 @@ static int dv_read_header(AVFormatContext *s)
 {
     unsigned state, marker_pos = 0;
     RawDVContext *c = s->priv_data;
+    int64_t ret64;
     int ret;
 
     if ((ret = dv_init_demux(s, &c->dv_demux)) < 0)
@@ -598,10 +600,10 @@ static int dv_read_header(AVFormatContext *s)
     }
     AV_WB32(c->buf, state);
 
-    if (avio_read(s->pb, c->buf + 4, DV_PROFILE_BYTES - 4) != DV_PROFILE_BYTES - 4 ||
-        avio_seek(s->pb, -DV_PROFILE_BYTES, SEEK_CUR) < 0) {
-        return AVERROR(EIO);
-    }
+    if ((ret = ffio_read_size(s->pb, c->buf + 4, DV_PROFILE_BYTES - 4)) < 0)
+        return ret;
+    if ((ret64 = avio_seek(s->pb, -DV_PROFILE_BYTES, SEEK_CUR)) < 0)
+        return (int)ret64;
 
     c->dv_demux.sys = av_dv_frame_profile(c->dv_demux.sys,
                                            c->buf,
@@ -633,13 +635,13 @@ static int dv_read_packet(AVFormatContext *s, AVPacket *pkt)
         int ret;
         int64_t pos = avio_tell(s->pb);
         if (!c->dv_demux.sys)
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         size = c->dv_demux.sys->frame_size;
         ret = avio_read(s->pb, c->buf, size);
         if (ret < 0) {
             return ret;
         } else if (ret == 0) {
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         }
 
         size = avpriv_dv_produce_packet(&c->dv_demux, pkt, c->buf, size, pos);
diff --git a/libavformat/dxa.c b/libavformat/dxa.c
index 56b19a7fca..76bc7a543d 100644
--- a/libavformat/dxa.c
+++ b/libavformat/dxa.c
@@ -23,6 +23,7 @@
 
 #include "libavutil/intreadwrite.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 #include "riff.h"
@@ -170,7 +171,7 @@ static int dxa_read_packet(AVFormatContext *s, AVPacket *pkt)
         ret = av_get_packet(s->pb, pkt, size);
         pkt->stream_index = 1;
         if(ret != size)
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         c->bytes_left -= size;
         c->wavpos = avio_tell(s->pb);
         return 0;
@@ -214,10 +215,9 @@ static int dxa_read_packet(AVFormatContext *s, AVPacket *pkt)
             if (ret < 0)
                 return ret;
             memcpy(pkt->data + pal_size, buf, DXA_EXTRA_SIZE);
-            ret = avio_read(s->pb, pkt->data + DXA_EXTRA_SIZE + pal_size, size);
-            if(ret != size){
-                return AVERROR(EIO);
-            }
+            ret = ffio_read_size(s->pb, pkt->data + DXA_EXTRA_SIZE + pal_size, size);
+            if (ret < 0)
+                return ret;
             if(pal_size) memcpy(pkt->data, pal, pal_size);
             pkt->stream_index = 0;
             c->frames--;
diff --git a/libavformat/electronicarts.c b/libavformat/electronicarts.c
index 04acf3a409..74a050fec6 100644
--- a/libavformat/electronicarts.c
+++ b/libavformat/electronicarts.c
@@ -537,7 +537,7 @@ static int ea_read_header(AVFormatContext *s)
     AVStream *st;
 
     if (process_ea_header(s)<=0)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
 
     if (init_video_stream(s, &ea->video) || init_video_stream(s, &ea->alpha))
         return AVERROR(ENOMEM);
diff --git a/libavformat/filmstripdec.c b/libavformat/filmstripdec.c
index 5ce0af234c..1a3f45f61c 100644
--- a/libavformat/filmstripdec.c
+++ b/libavformat/filmstripdec.c
@@ -43,7 +43,7 @@ static int read_header(AVFormatContext *s)
     AVStream *st;
 
     if (!(s->pb->seekable & AVIO_SEEKABLE_NORMAL))
-        return AVERROR(EIO);
+        return AVERROR(ENOSYS);
 
     avio_seek(pb, avio_size(pb) - 36, SEEK_SET);
     if (avio_rb32(pb) != RAND_TAG) {
diff --git a/libavformat/flic.c b/libavformat/flic.c
index 41dfb4f39e..01cd4698cf 100644
--- a/libavformat/flic.c
+++ b/libavformat/flic.c
@@ -34,6 +34,7 @@
 #include "libavutil/channel_layout.h"
 #include "libavutil/intreadwrite.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 
@@ -97,8 +98,8 @@ static int flic_read_header(AVFormatContext *s)
     flic->frame_number = 0;
 
     /* load the whole header and pull out the width and height */
-    if (avio_read(pb, header, FLIC_HEADER_SIZE) != FLIC_HEADER_SIZE)
-        return AVERROR(EIO);
+    if ((ret = ffio_read_size(pb, header, FLIC_HEADER_SIZE)) < 0)
+        return ret;
 
     magic_number = AV_RL16(&header[4]);
     speed = AV_RL32(&header[0x10]);
@@ -131,9 +132,9 @@ static int flic_read_header(AVFormatContext *s)
     memcpy(st->codecpar->extradata, header, FLIC_HEADER_SIZE);
 
     /* peek at the preamble to detect TFTD videos - they seem to always start with an audio chunk */
-    if (avio_read(pb, preamble, FLIC_PREAMBLE_SIZE) != FLIC_PREAMBLE_SIZE) {
+    if ((ret = ffio_read_size(pb, preamble, FLIC_PREAMBLE_SIZE)) < 0) {
         av_log(s, AV_LOG_ERROR, "Failed to peek at preamble\n");
-        return AVERROR(EIO);
+        return ret;
     }
 
     avio_seek(pb, -FLIC_PREAMBLE_SIZE, SEEK_CUR);
@@ -206,11 +207,8 @@ static int flic_read_packet(AVFormatContext *s,
 
     while (!packet_read && !avio_feof(pb)) {
 
-        if ((ret = avio_read(pb, preamble, FLIC_PREAMBLE_SIZE)) !=
-            FLIC_PREAMBLE_SIZE) {
-            ret = AVERROR(EIO);
+        if ((ret = ffio_read_size(pb, preamble, FLIC_PREAMBLE_SIZE)) < 0)
             break;
-        }
 
         size = AV_RL32(&preamble[0]);
         magic = AV_RL16(&preamble[4]);
@@ -222,11 +220,8 @@ static int flic_read_packet(AVFormatContext *s,
             pkt->stream_index = flic->video_stream_index;
             pkt->pos = pos;
             memcpy(pkt->data, preamble, FLIC_PREAMBLE_SIZE);
-            ret = avio_read(pb, pkt->data + FLIC_PREAMBLE_SIZE,
+            ret = ffio_read_size(pb, pkt->data + FLIC_PREAMBLE_SIZE,
                 size - FLIC_PREAMBLE_SIZE);
-            if (ret != size - FLIC_PREAMBLE_SIZE) {
-                ret = AVERROR(EIO);
-            }
             pkt->flags = flic->frame_number == 0 ? AV_PKT_FLAG_KEY : 0;
             pkt->pts = flic->frame_number;
             if (flic->frame_number == 0)
@@ -243,12 +238,10 @@ static int flic_read_packet(AVFormatContext *s,
             pkt->stream_index = flic->audio_stream_index;
             pkt->pos = pos;
             pkt->flags = AV_PKT_FLAG_KEY;
-            ret = avio_read(pb, pkt->data, size);
+            ret = ffio_read_size(pb, pkt->data, size);
 
-            if (ret != size) {
-                ret = AVERROR(EIO);
+            if (ret < 0)
                 break;
-            }
 
             packet_read = 1;
         } else {
diff --git a/libavformat/gifdec.c b/libavformat/gifdec.c
index d5f06adc64..28722dc50b 100644
--- a/libavformat/gifdec.c
+++ b/libavformat/gifdec.c
@@ -118,6 +118,7 @@ static int gif_read_header(AVFormatContext *s)
     AVStream        *st;
     int type, width, height, ret, n, flags;
     int64_t nb_frames = 0, duration = 0, pos;
+    int64_t ret64;
 
     if ((ret = resync(pb)) < 0)
         return ret;
@@ -215,8 +216,9 @@ static int gif_read_header(AVFormatContext *s)
 
 skip:
     /* jump to start because gif decoder needs header data too */
-    if (avio_seek(pb, pos - 6, SEEK_SET) != pos - 6)
-        return AVERROR(EIO);
+    ret64 = avio_seek(pb, pos - 6, SEEK_SET);
+    if (ret64 < 0)
+        return (int)ret64;
 
     /* GIF format operates with time in "hundredths of second",
      * therefore timebase is 1/100 */
diff --git a/libavformat/gsmdec.c b/libavformat/gsmdec.c
index 10fba212e9..649d67c009 100644
--- a/libavformat/gsmdec.c
+++ b/libavformat/gsmdec.c
@@ -63,7 +63,7 @@ static int gsm_read_packet(AVFormatContext *s, AVPacket *pkt)
 
     ret = av_get_packet(s->pb, pkt, size);
     if (ret < GSM_BLOCK_SIZE) {
-        return ret < 0 ? ret : AVERROR(EIO);
+        return ret < 0 ? ret : AVERROR_INVALIDDATA;
     }
     pkt->duration = 1;
     pkt->pts      = pkt->pos / GSM_BLOCK_SIZE;
diff --git a/libavformat/hca.c b/libavformat/hca.c
index 713082f8b0..e24a21e081 100644
--- a/libavformat/hca.c
+++ b/libavformat/hca.c
@@ -24,6 +24,7 @@
 #include "libavcodec/bytestream.h"
 
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 
@@ -76,9 +77,9 @@ static int hca_read_header(AVFormatContext *s)
     if (ret < 0)
         return ret;
 
-    ret = avio_read(pb, par->extradata + 8, par->extradata_size - 8 - 10);
-    if (ret < par->extradata_size - 8 - 10)
-        return AVERROR(EIO);
+    ret = ffio_read_size(pb, par->extradata + 8, par->extradata_size - 8 - 10);
+    if (ret < 0)
+        return AVERROR_INVALIDDATA;
     AV_WL32(par->extradata, MKTAG('H', 'C', 'A', 0));
     AV_WB16(par->extradata + 4, version);
     AV_WB16(par->extradata + 6, data_offset);
diff --git a/libavformat/icoenc.c b/libavformat/icoenc.c
index 7a7d839d84..9eba9e7926 100644
--- a/libavformat/icoenc.c
+++ b/libavformat/icoenc.c
@@ -122,7 +122,7 @@ static int ico_write_packet(AVFormatContext *s, AVPacket *pkt)
 
     if (ico->current_image >= ico->nb_images) {
         av_log(s, AV_LOG_ERROR, "ICO already contains %d images\n", ico->current_image);
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     }
 
     image = &ico->images[ico->current_image++];
diff --git a/libavformat/idcin.c b/libavformat/idcin.c
index 561715d3d9..e2064acac3 100644
--- a/libavformat/idcin.c
+++ b/libavformat/idcin.c
@@ -267,7 +267,7 @@ static int idcin_read_packet(AVFormatContext *s,
     if (idcin->next_chunk_is_video) {
         command = avio_rl32(pb);
         if (command == 2) {
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         } else if (command == 1) {
             /* trigger a palette change */
             ret = avio_read(pb, palette_buffer, 768);
@@ -275,7 +275,7 @@ static int idcin_read_packet(AVFormatContext *s,
                 return ret;
             } else if (ret != 768) {
                 av_log(s, AV_LOG_ERROR, "incomplete packet\n");
-                return AVERROR(EIO);
+                return AVERROR_INVALIDDATA;
             }
             /* scale the palette as necessary */
             palette_scale = 2;
@@ -312,7 +312,7 @@ static int idcin_read_packet(AVFormatContext *s,
             return ret;
         else if (ret != chunk_size) {
             av_log(s, AV_LOG_ERROR, "incomplete packet\n");
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         }
         if (command == 1) {
             uint8_t *pal;
diff --git a/libavformat/idroqdec.c b/libavformat/idroqdec.c
index 67bc1246e6..9c3aaec260 100644
--- a/libavformat/idroqdec.c
+++ b/libavformat/idroqdec.c
@@ -74,11 +74,11 @@ static int roq_read_header(AVFormatContext *s)
     RoqDemuxContext *roq = s->priv_data;
     AVIOContext *pb = s->pb;
     unsigned char preamble[RoQ_CHUNK_PREAMBLE_SIZE];
+    int ret;
 
     /* get the main header */
-    if (avio_read(pb, preamble, RoQ_CHUNK_PREAMBLE_SIZE) !=
-        RoQ_CHUNK_PREAMBLE_SIZE)
-        return AVERROR(EIO);
+    if ((ret = ffio_read_size(pb, preamble, RoQ_CHUNK_PREAMBLE_SIZE)) < 0)
+        return ret;
     roq->frame_rate = AV_RL16(&preamble[6]);
 
     /* init private context parameters */
@@ -111,9 +111,8 @@ static int roq_read_packet(AVFormatContext *s,
             return AVERROR_EOF;
 
         /* get the next chunk preamble */
-        if ((ret = avio_read(pb, preamble, RoQ_CHUNK_PREAMBLE_SIZE)) !=
-            RoQ_CHUNK_PREAMBLE_SIZE)
-            return AVERROR(EIO);
+        if ((ret = ffio_read_size(pb, preamble, RoQ_CHUNK_PREAMBLE_SIZE)) < 0)
+            return ret;
 
         chunk_type = AV_RL16(&preamble[0]);
         chunk_size = AV_RL32(&preamble[2]);
@@ -135,8 +134,8 @@ static int roq_read_packet(AVFormatContext *s,
                 st->codecpar->codec_id     = AV_CODEC_ID_ROQ;
                 st->codecpar->codec_tag    = 0;  /* no fourcc */
 
-                if (avio_read(pb, preamble, RoQ_CHUNK_PREAMBLE_SIZE) != RoQ_CHUNK_PREAMBLE_SIZE)
-                    return AVERROR(EIO);
+                if ((ret = ffio_read_size(pb, preamble, RoQ_CHUNK_PREAMBLE_SIZE)) < 0)
+                    return ret;
                 st->codecpar->width  = roq->width  = AV_RL16(preamble);
                 st->codecpar->height = roq->height = AV_RL16(preamble + 2);
                 break;
@@ -152,9 +151,8 @@ static int roq_read_packet(AVFormatContext *s,
             codebook_offset = avio_tell(pb) - RoQ_CHUNK_PREAMBLE_SIZE;
             codebook_size = chunk_size;
             avio_skip(pb, codebook_size);
-            if (avio_read(pb, preamble, RoQ_CHUNK_PREAMBLE_SIZE) !=
-                RoQ_CHUNK_PREAMBLE_SIZE)
-                return AVERROR(EIO);
+            if ((ret = ffio_read_size(pb, preamble, RoQ_CHUNK_PREAMBLE_SIZE)) < 0)
+                return ret;
             chunk_size = AV_RL32(&preamble[2]) + RoQ_CHUNK_PREAMBLE_SIZE * 2 +
                 codebook_size;
 
@@ -167,7 +165,7 @@ static int roq_read_packet(AVFormatContext *s,
             /* load up the packet */
             ret= av_get_packet(pb, pkt, chunk_size);
             if (ret != chunk_size)
-                return AVERROR(EIO);
+                return AVERROR_INVALIDDATA;
             pkt->stream_index = roq->video_stream_index;
             pkt->pts = roq->video_pts++;
 
@@ -220,11 +218,10 @@ static int roq_read_packet(AVFormatContext *s,
             }
 
             pkt->pos= avio_tell(pb);
-            ret = avio_read(pb, pkt->data + RoQ_CHUNK_PREAMBLE_SIZE,
+            ret = ffio_read_size(pb, pkt->data + RoQ_CHUNK_PREAMBLE_SIZE,
                 chunk_size);
-            if (ret != chunk_size) {
-                return AVERROR(EIO);
-            }
+            if (ret < 0)
+                return ret;
 
             packet_read = 1;
             break;
diff --git a/libavformat/iff.c b/libavformat/iff.c
index 44ba5a9023..fc40ef1aea 100644
--- a/libavformat/iff.c
+++ b/libavformat/iff.c
@@ -37,6 +37,7 @@
 #include "libavutil/mem.h"
 #include "libavcodec/bytestream.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "id3v2.h"
 #include "internal.h"
@@ -135,13 +136,14 @@ static int get_metadata(AVFormatContext *s,
                         const unsigned data_size)
 {
     uint8_t *buf = ((data_size + 1) == 0) ? NULL : av_malloc(data_size + 1);
+    int res;
 
     if (!buf)
         return AVERROR(ENOMEM);
 
-    if (avio_read(s->pb, buf, data_size) != data_size) {
+    if ((res = ffio_read_size(s->pb, buf, data_size)) < 0) {
         av_free(buf);
-        return AVERROR(EIO);
+        return res;
     }
     buf[data_size] = 0;
     av_dict_set(&s->metadata, tag, buf, AV_DICT_DONT_STRDUP_VAL);
@@ -563,10 +565,10 @@ static int iff_read_header(AVFormatContext *s)
                                      data_size + IFF_EXTRA_VIDEO_SIZE);
             if (res < 0)
                 return res;
-            if (avio_read(pb, stv->codecpar->extradata + IFF_EXTRA_VIDEO_SIZE, data_size) < 0) {
+            if ((res = avio_read(pb, stv->codecpar->extradata + IFF_EXTRA_VIDEO_SIZE, data_size)) < 0) {
                 av_freep(&stv->codecpar->extradata);
                 stv->codecpar->extradata_size = 0;
-                return AVERROR(EIO);
+                return res;
             }
             break;
 
diff --git a/libavformat/img2dec.c b/libavformat/img2dec.c
index 789f8b94c9..1f7e0fcce1 100644
--- a/libavformat/img2dec.c
+++ b/libavformat/img2dec.c
@@ -400,13 +400,13 @@ int ff_img_read_packet(AVFormatContext *s1, AVPacket *pkt)
                 !s->loop &&
                 !s->split_planes) {
                 f[i] = s1->pb;
-            } else if (s1->io_open(s1, &f[i], filename.str, AVIO_FLAG_READ, NULL) < 0) {
+            } else if ((res = s1->io_open(s1, &f[i], filename.str, AVIO_FLAG_READ, NULL)) < 0) {
                 if (i >= 1)
                     break;
                 av_log(s1, AV_LOG_ERROR, "Could not open file : %s\n",
                        filename.str);
                 av_bprint_finalize(&filename, NULL);
-                return AVERROR(EIO);
+                return res;
             }
             size[i] = avio_size(f[i]);
 
@@ -466,7 +466,7 @@ int ff_img_read_packet(AVFormatContext *s1, AVPacket *pkt)
         struct stat img_stat;
         av_assert0(!s->is_pipe); // The ts_from_file option is not supported by piped input demuxers
         if (stat(filename.str, &img_stat)) {
-            res = AVERROR(EIO);
+            res = AVERROR(errno);
             goto fail;
         }
         pkt->pts = (int64_t)img_stat.st_mtime;
diff --git a/libavformat/img2enc.c b/libavformat/img2enc.c
index d14bc5ea3f..fb51151090 100644
--- a/libavformat/img2enc.c
+++ b/libavformat/img2enc.c
@@ -194,9 +194,8 @@ static int write_packet(AVFormatContext *s, AVPacket *pkt)
                 goto fail;
             }
         }
-        if (s->io_open(s, &pb[i], tmp[i] ? tmp[i] : filename.str, AVIO_FLAG_WRITE, &options) < 0) {
+        if ((ret = s->io_open(s, &pb[i], tmp[i] ? tmp[i] : filename.str, AVIO_FLAG_WRITE, &options)) < 0) {
             av_log(s, AV_LOG_ERROR, "Could not open file : %s\n", tmp[i] ? tmp[i] : filename.str);
-            ret = AVERROR(EIO);
             goto fail;
         }
         if (options) {
diff --git a/libavformat/ingenientdec.c b/libavformat/ingenientdec.c
index 63624372a6..64b5e8a407 100644
--- a/libavformat/ingenientdec.c
+++ b/libavformat/ingenientdec.c
@@ -39,7 +39,7 @@ static int ingenient_read_packet(AVFormatContext *s, AVPacket *pkt)
     int ret, size, w, h, unk1, unk2;
 
     if (avio_rl32(s->pb) != MKTAG('M', 'J', 'P', 'G'))
-        return AVERROR(EIO); // FIXME
+        return AVERROR_INVALIDDATA; // FIXME
 
     size = avio_rl32(s->pb);
 
diff --git a/libavformat/ipmovie.c b/libavformat/ipmovie.c
index 5a8abde842..1ef357e088 100644
--- a/libavformat/ipmovie.c
+++ b/libavformat/ipmovie.c
@@ -639,9 +639,8 @@ static int ipmovie_read_header(AVFormatContext *s)
 
     /* peek ahead to the next chunk-- if it is an init audio chunk, process
      * it; if it is the first video chunk, this is a silent file */
-    if (avio_read(pb, chunk_preamble, CHUNK_PREAMBLE_SIZE) !=
-        CHUNK_PREAMBLE_SIZE)
-        return AVERROR(EIO);
+    if ((ret = ffio_read_size(pb, chunk_preamble, CHUNK_PREAMBLE_SIZE)) < 0)
+        return ret;
     chunk_type = AV_RL16(&chunk_preamble[2]);
     avio_seek(pb, -CHUNK_PREAMBLE_SIZE, SEEK_CUR);
 
@@ -688,7 +687,7 @@ static int ipmovie_read_packet(AVFormatContext *s,
         if (ret == CHUNK_BAD)
             ret = AVERROR_INVALIDDATA;
         else if (ret == CHUNK_EOF)
-            ret = AVERROR(EIO);
+            ret = AVERROR_INVALIDDATA;
         else if (ret == CHUNK_NOMEM)
             ret = AVERROR(ENOMEM);
         else if (ret == CHUNK_END || ret == CHUNK_SHUTDOWN)
diff --git a/libavformat/iss.c b/libavformat/iss.c
index 7a68fcaf63..b0e17994d4 100644
--- a/libavformat/iss.c
+++ b/libavformat/iss.c
@@ -136,7 +136,7 @@ static int iss_read_packet(AVFormatContext *s, AVPacket *pkt)
     int ret = av_get_packet(s->pb, pkt, iss->packet_size);
 
     if(ret != iss->packet_size)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
 
     pkt->stream_index = 0;
     pkt->pts = avio_tell(s->pb) - iss->sample_start_pos;
diff --git a/libavformat/jvdec.c b/libavformat/jvdec.c
index c4580b6a01..4f4566f64b 100644
--- a/libavformat/jvdec.c
+++ b/libavformat/jvdec.c
@@ -217,7 +217,7 @@ static int read_packet(AVFormatContext *s, AVPacket *pkt)
     if (s->pb->eof_reached)
         return AVERROR_EOF;
 
-    return AVERROR(EIO);
+    return AVERROR_INVALIDDATA;
 }
 
 static int read_seek(AVFormatContext *s, int stream_index,
diff --git a/libavformat/libmodplug.c b/libavformat/libmodplug.c
index 680a5fe9bc..01566d121e 100644
--- a/libavformat/libmodplug.c
+++ b/libavformat/libmodplug.c
@@ -348,7 +348,7 @@ static int modplug_read_packet(AVFormatContext *s, AVPacket *pkt)
 
     pkt->size = ModPlug_Read(modplug->f, pkt->data, AUDIO_PKT_SIZE);
     if (pkt->size <= 0) {
-        return pkt->size == 0 ? AVERROR_EOF : AVERROR(EIO);
+        return pkt->size == 0 ? AVERROR_EOF : AVERROR_EXTERNAL;
     }
     return 0;
 }
diff --git a/libavformat/lmlm4.c b/libavformat/lmlm4.c
index cec2f7ca05..aeb5580620 100644
--- a/libavformat/lmlm4.c
+++ b/libavformat/lmlm4.c
@@ -103,7 +103,7 @@ static int lmlm4_read_packet(AVFormatContext *s, AVPacket *pkt)
 
     frame_size  = packet_size - 8;
     if ((ret = av_get_packet(pb, pkt, frame_size)) <= 0)
-        return ret < 0 ? ret : AVERROR(EIO);
+        return ret < 0 ? ret : AVERROR_INVALIDDATA;
 
     avio_skip(pb, padding);
 
diff --git a/libavformat/mca.c b/libavformat/mca.c
index e707de3c3b..74a33e25c3 100644
--- a/libavformat/mca.c
+++ b/libavformat/mca.c
@@ -105,7 +105,7 @@ static int read_header(AVFormatContext *s)
     if (version <= 4) {
         // version <= 4 needs to use the file size to calculate the offsets
         if (file_size < 0) {
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         }
         if (file_size - data_size > UINT32_MAX)
             return AVERROR_INVALIDDATA;
diff --git a/libavformat/mgsts.c b/libavformat/mgsts.c
index 07ea66163c..f8392888bd 100644
--- a/libavformat/mgsts.c
+++ b/libavformat/mgsts.c
@@ -44,7 +44,7 @@ static int read_header(AVFormatContext *s)
     avio_skip(pb, 4);
     chunk_size = avio_rb32(pb);
     if (chunk_size != 80)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     avio_skip(pb, 20);
 
     st = avformat_new_stream(s, 0);
@@ -84,7 +84,7 @@ static int read_packet(AVFormatContext *s, AVPacket *pkt)
     payload_size = avio_rb32(pb);
 
     if (chunk_size < payload_size + 16)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
 
     ret = av_get_packet(pb, pkt, payload_size);
     if (ret < 0)
diff --git a/libavformat/mlvdec.c b/libavformat/mlvdec.c
index a0d5e7fb55..3a5d211085 100644
--- a/libavformat/mlvdec.c
+++ b/libavformat/mlvdec.c
@@ -548,7 +548,7 @@ static int read_packet(AVFormatContext *avctx, AVPacket *pkt)
     index = av_index_search_timestamp(st, mlv->pts, AVSEEK_FLAG_ANY);
     if (index < 0) {
         av_log(avctx, AV_LOG_ERROR, "could not find index entry for frame %"PRId64"\n", mlv->pts);
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     }
 
     pb = mlv->pb[sti->index_entries[index].size];
@@ -611,7 +611,7 @@ static int read_seek(AVFormatContext *avctx, int stream_index, int64_t timestamp
         return AVERROR(ENOSYS);
 
     if (!(avctx->pb->seekable & AVIO_SEEKABLE_NORMAL))
-        return AVERROR(EIO);
+        return AVERROR(ENOSYS);
 
     mlv->pts = timestamp;
     return 0;
diff --git a/libavformat/mpc.c b/libavformat/mpc.c
index 1e0e170c7d..45ba76c703 100644
--- a/libavformat/mpc.c
+++ b/libavformat/mpc.c
@@ -171,7 +171,7 @@ static int mpc_read_packet(AVFormatContext *s, AVPacket *pkt)
     if(c->curbits)
         avio_seek(s->pb, -4, SEEK_CUR);
     if(ret < size){
-        return ret < 0 ? ret : AVERROR(EIO);
+        return ret < 0 ? ret : AVERROR_INVALIDDATA;
     }
     pkt->size = ret + 4;
 
diff --git a/libavformat/mtv.c b/libavformat/mtv.c
index 01379a18e7..c888443b7a 100644
--- a/libavformat/mtv.c
+++ b/libavformat/mtv.c
@@ -105,6 +105,7 @@ static int mtv_read_header(AVFormatContext *s)
     AVIOContext   *pb  = s->pb;
     AVStream        *st;
     unsigned int    audio_subsegments;
+    int64_t ret64;
 
     avio_skip(pb, 3);
     mtv->file_size         = avio_rl32(pb);
@@ -190,8 +191,8 @@ static int mtv_read_header(AVFormatContext *s)
 
     // Jump over header
 
-    if(avio_seek(pb, MTV_HEADER_SIZE, SEEK_SET) != MTV_HEADER_SIZE)
-        return AVERROR(EIO);
+    if ((ret64 = avio_seek(pb, MTV_HEADER_SIZE, SEEK_SET)) < 0)
+        return (int)ret64;
 
     return 0;
 
diff --git a/libavformat/mvdec.c b/libavformat/mvdec.c
index aa45d23b39..113f133687 100644
--- a/libavformat/mvdec.c
+++ b/libavformat/mvdec.c
@@ -489,7 +489,7 @@ static int mv_read_packet(AVFormatContext *avctx, AVPacket *pkt)
             avio_skip(pb, index->pos - pos);
         else if (index->pos < pos) {
             if (!(pb->seekable & AVIO_SEEKABLE_NORMAL))
-                return AVERROR(EIO);
+                return AVERROR(ENOSYS);
             ret = avio_seek(pb, index->pos, SEEK_SET);
             if (ret < 0)
                 return ret;
@@ -531,7 +531,7 @@ static int mv_read_seek(AVFormatContext *avctx, int stream_index,
         return AVERROR(ENOSYS);
 
     if (!(avctx->pb->seekable & AVIO_SEEKABLE_NORMAL))
-        return AVERROR(EIO);
+        return AVERROR(ENOSYS);
 
     frame = av_index_search_timestamp(st, timestamp, flags);
     if (frame < 0)
diff --git a/libavformat/mvi.c b/libavformat/mvi.c
index 05aa25f348..42127bbb5d 100644
--- a/libavformat/mvi.c
+++ b/libavformat/mvi.c
@@ -119,7 +119,7 @@ static int read_packet(AVFormatContext *s, AVPacket *pkt)
     if (mvi->video_frame_size == 0) {
         mvi->video_frame_size = (mvi->get_int)(pb);
         if (mvi->audio_size_left == 0)
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         if (mvi->audio_size_counter + 512 > UINT64_MAX - mvi->audio_frame_size ||
             mvi->audio_size_counter + 512 + mvi->audio_frame_size >= ((uint64_t)INT32_MAX) << MVI_FRAC_BITS)
             return AVERROR_INVALIDDATA;
diff --git a/libavformat/ncdec.c b/libavformat/ncdec.c
index 050d98bf5d..6eb693093a 100644
--- a/libavformat/ncdec.c
+++ b/libavformat/ncdec.c
@@ -69,7 +69,7 @@ static int nc_read_packet(AVFormatContext *s, AVPacket *pkt)
     uint32_t state=-1;
     while (state != NC_VIDEO_FLAG) {
         if (avio_feof(s->pb))
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         state = (state<<8) + avio_r8(s->pb);
     }
 
@@ -84,7 +84,7 @@ static int nc_read_packet(AVFormatContext *s, AVPacket *pkt)
 
     ret = av_get_packet(s->pb, pkt, size);
     if (ret != size) {
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     }
 
     pkt->stream_index = 0;
diff --git a/libavformat/nuv.c b/libavformat/nuv.c
index 49915ecf16..17a041b254 100644
--- a/libavformat/nuv.c
+++ b/libavformat/nuv.c
@@ -320,7 +320,7 @@ static int nuv_packet(AVFormatContext *s, AVPacket *pkt)
         }
     }
 
-    return AVERROR(EIO);
+    return AVERROR_INVALIDDATA;
 }
 
 /**
diff --git a/libavformat/pdvdec.c b/libavformat/pdvdec.c
index cd118f0e37..792188b019 100644
--- a/libavformat/pdvdec.c
+++ b/libavformat/pdvdec.c
@@ -112,7 +112,7 @@ static int pdv_read_packet(AVFormatContext *s, AVPacket *pkt)
         return AVERROR_EOF;
 
     if (p->current_frame >= sti->nb_index_entries)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
 
     pos   = sti->index_entries[p->current_frame].pos;
     flags = sti->index_entries[p->current_frame].flags;
diff --git a/libavformat/pp_bnk.c b/libavformat/pp_bnk.c
index 5360b7c5d7..15bb49ea26 100644
--- a/libavformat/pp_bnk.c
+++ b/libavformat/pp_bnk.c
@@ -20,6 +20,7 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 #include "libavutil/intreadwrite.h"
@@ -117,10 +118,8 @@ static int pp_bnk_read_header(AVFormatContext *s)
     uint8_t buf[FFMAX(PP_BNK_FILE_HEADER_SIZE, PP_BNK_TRACK_SIZE)];
     PPBnkHeader hdr;
 
-    if ((ret = avio_read(s->pb, buf, PP_BNK_FILE_HEADER_SIZE)) < 0)
+    if ((ret = ffio_read_size(s->pb, buf, PP_BNK_FILE_HEADER_SIZE)) < 0)
         return ret;
-    else if (ret != PP_BNK_FILE_HEADER_SIZE)
-        return AVERROR(EIO);
 
     pp_bnk_parse_header(&hdr, buf);
 
@@ -246,7 +245,7 @@ static int pp_bnk_read_packet(AVFormatContext *s, AVPacket *pkt)
         if ((ret = avio_seek(s->pb, trk->data_offset + trk->bytes_read, SEEK_SET)) < 0)
             return ret;
         else if (ret != trk->data_offset + trk->bytes_read)
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
 
         size = FFMIN(trk->data_size - trk->bytes_read, PP_BNK_MAX_READ_SIZE);
 
diff --git a/libavformat/psxstr.c b/libavformat/psxstr.c
index 0dd4e8d377..06148713b2 100644
--- a/libavformat/psxstr.c
+++ b/libavformat/psxstr.c
@@ -33,6 +33,7 @@
 #include "libavutil/internal.h"
 #include "libavutil/intreadwrite.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 
@@ -134,11 +135,12 @@ static int str_read_header(AVFormatContext *s)
     StrDemuxContext *str = s->priv_data;
     unsigned char sector[RAW_CD_SECTOR_SIZE];
     int start;
+    int ret;
     int i;
 
     /* skip over any RIFF header */
-    if (avio_read(pb, sector, RIFF_HEADER_SIZE) != RIFF_HEADER_SIZE)
-        return AVERROR(EIO);
+    if ((ret = ffio_read_size(pb, sector, RIFF_HEADER_SIZE)) < 0)
+        return ret;
     if (AV_RL32(&sector[0]) == RIFF_TAG)
         start = RIFF_HEADER_SIZE;
     else
@@ -173,7 +175,7 @@ static int str_read_packet(AVFormatContext *s,
             return AVERROR_EOF;
 
         if (read != RAW_CD_SECTOR_SIZE)
-            return AVERROR(EIO);
+            return read < 0 ? read : AVERROR_INVALIDDATA;
 
         channel = sector[0x11];
         if (channel >= 32)
@@ -287,7 +289,7 @@ static int str_read_packet(AVFormatContext *s,
         }
 
         if (avio_feof(pb))
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
     }
 }
 
diff --git a/libavformat/pva.c b/libavformat/pva.c
index 047c93c9c4..9de1a186a9 100644
--- a/libavformat/pva.c
+++ b/libavformat/pva.c
@@ -103,18 +103,18 @@ recover:
 
     if (syncword != PVA_MAGIC) {
         pva_log(s, AV_LOG_ERROR, "invalid syncword\n");
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     }
     if (streamid != PVA_VIDEO_PAYLOAD && streamid != PVA_AUDIO_PAYLOAD) {
         pva_log(s, AV_LOG_ERROR, "invalid streamid\n");
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     }
     if (reserved != 0x55) {
         pva_log(s, AV_LOG_WARNING, "expected reserved byte to be 0x55\n");
     }
     if (length > PVA_MAX_PAYLOAD_LENGTH) {
         pva_log(s, AV_LOG_ERROR, "invalid payload length %u\n", length);
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     }
 
     if (streamid == PVA_VIDEO_PAYLOAD && pts_flag) {
@@ -145,7 +145,7 @@ recover:
                                           "trying to recover\n");
                 avio_skip(pb, length - 9);
                 if (!read_packet)
-                    return AVERROR(EIO);
+                    return AVERROR_INVALIDDATA;
                 goto recover;
             }
 
@@ -192,7 +192,7 @@ static int pva_read_packet(AVFormatContext *s, AVPacket *pkt) {
 
     if (read_part_of_packet(s, &pva_pts, &length, &streamid, 1) < 0 ||
        (ret = av_get_packet(pb, pkt, length)) <= 0)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
 
     pkt->stream_index = streamid - 1;
     pkt->pts = pva_pts;
diff --git a/libavformat/qoadec.c b/libavformat/qoadec.c
index a9632c46c3..6f8fe111ad 100644
--- a/libavformat/qoadec.c
+++ b/libavformat/qoadec.c
@@ -93,9 +93,9 @@ static int qoa_read_packet(AVFormatContext *s, AVPacket *pkt)
         return ret;
 
     memcpy(pkt->data, hdr, sizeof(hdr));
-    ret = avio_read(pb, pkt->data + sizeof(hdr), size - sizeof(hdr));
-    if (ret != size - sizeof(hdr))
-        return AVERROR(EIO);
+    ret = ffio_read_size(pb, pkt->data + sizeof(hdr), size - sizeof(hdr));
+    if (ret < 0)
+        return ret;
     pkt->stream_index = 0;
     pkt->pos = pos;
     pkt->duration = duration;
diff --git a/libavformat/redspark.c b/libavformat/redspark.c
index 2642d7af67..fded46ab43 100644
--- a/libavformat/redspark.c
+++ b/libavformat/redspark.c
@@ -141,7 +141,7 @@ static int redspark_read_packet(AVFormatContext *s, AVPacket *pkt)
 
     ret = av_get_packet(s->pb, pkt, size);
     if (ret != size) {
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     }
 
     pkt->duration = 14;
diff --git a/libavformat/rl2.c b/libavformat/rl2.c
index aa59332783..e26e14f9cd 100644
--- a/libavformat/rl2.c
+++ b/libavformat/rl2.c
@@ -259,7 +259,7 @@ static int rl2_read_packet(AVFormatContext *s,
     /** fill the packet */
     ret = av_get_packet(pb, pkt, sample->size);
     if(ret != sample->size){
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     }
 
     pkt->stream_index = stream_id;
diff --git a/libavformat/rmdec.c b/libavformat/rmdec.c
index 85643a358f..2909698cda 100644
--- a/libavformat/rmdec.c
+++ b/libavformat/rmdec.c
@@ -569,7 +569,7 @@ static int rm_read_header(AVFormatContext *s)
         /* very old .ra format */
         return rm_read_header_old(s);
     } else if (tag != MKTAG('.', 'R', 'M', 'F') && tag != MKTAG('.', 'R', 'M', 'P')) {
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     }
 
     tag_size = avio_rb32(pb);
@@ -1064,7 +1064,7 @@ static int rm_read_packet(AVFormatContext *s, AVPacket *pkt)
             if (avio_feof(s->pb))
                 return AVERROR_EOF;
             if (len <= 0)
-                return AVERROR(EIO);
+                return AVERROR_INVALIDDATA;
 
             res = ff_rm_parse_packet (s, s->pb, st, st->priv_data, len, pkt,
                                       &seq, flags, timestamp);
@@ -1410,7 +1410,7 @@ static int ivr_read_packet(AVFormatContext *s, AVPacket *pkt)
                 }
             } else {
                 av_log(s, AV_LOG_ERROR, "Unsupported opcode=%d at %"PRIX64"\n", opcode, avio_tell(pb) - 1);
-                return AVERROR(EIO);
+                return AVERROR_INVALIDDATA;
             }
         }
 
diff --git a/libavformat/rpl.c b/libavformat/rpl.c
index 781dabf7ba..06e3f354fc 100644
--- a/libavformat/rpl.c
+++ b/libavformat/rpl.c
@@ -313,7 +313,7 @@ static int rpl_read_header(AVFormatContext *s)
     }
 
     if (error)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
 
     return 0;
 }
@@ -341,8 +341,9 @@ static int rpl_read_packet(AVFormatContext *s, AVPacket *pkt)
     index_entry = &sti->index_entries[rpl->chunk_number];
 
     if (rpl->frame_in_part == 0) {
-        if (avio_seek(pb, index_entry->pos, SEEK_SET) < 0)
-            return AVERROR(EIO);
+        int64_t ret64 = avio_seek(pb, index_entry->pos, SEEK_SET);
+        if (ret64 < 0)
+            return (int)ret64;
     }
 
     if (stream->codecpar->codec_type == AVMEDIA_TYPE_VIDEO &&
@@ -350,17 +351,20 @@ static int rpl_read_packet(AVFormatContext *s, AVPacket *pkt)
         // We have to split Escape 124 frames because there are
         // multiple frames per chunk in Escape 124 samples.
         uint32_t frame_size;
+        int64_t ret64;
 
         avio_skip(pb, 4); /* flags */
         frame_size = avio_rl32(pb);
-        if (avio_feof(pb) || avio_seek(pb, -8, SEEK_CUR) < 0 || !frame_size)
-            return AVERROR(EIO);
+        if (avio_feof(pb) || !frame_size)
+            return AVERROR_INVALIDDATA;
+        if ((ret64 = avio_seek(pb, -8, SEEK_CUR)) < 0)
+            return (int)ret64;
 
         ret = av_get_packet(pb, pkt, frame_size);
         if (ret < 0)
             return ret;
         if (ret != frame_size)
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
 
         pkt->duration = 1;
         pkt->pts = index_entry->timestamp + rpl->frame_in_part;
@@ -376,7 +380,7 @@ static int rpl_read_packet(AVFormatContext *s, AVPacket *pkt)
         if (ret < 0)
             return ret;
         if (ret != index_entry->size)
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
 
         if (stream->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
             // frames_per_chunk should always be one here; the header
diff --git a/libavformat/segafilm.c b/libavformat/segafilm.c
index e72d872f96..2b853017db 100644
--- a/libavformat/segafilm.c
+++ b/libavformat/segafilm.c
@@ -30,6 +30,7 @@
 #include "libavutil/intreadwrite.h"
 #include "libavutil/mem.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 
@@ -95,20 +96,21 @@ static int film_read_header(AVFormatContext *s)
     unsigned int data_offset;
     unsigned int audio_frame_counter;
     unsigned int video_frame_counter;
+    int ret;
 
     film->sample_table = NULL;
 
     /* load the main FILM header */
-    if (avio_read(pb, scratch, 16) != 16)
-        return AVERROR(EIO);
+    if ((ret = ffio_read_size(pb, scratch, 16)) < 0)
+        return ret;
     data_offset = AV_RB32(&scratch[4]);
     film->version = AV_RB32(&scratch[8]);
 
     /* load the FDSC chunk */
     if (film->version == 0) {
         /* special case for Lemmings .film files; 20-byte header */
-        if (avio_read(pb, scratch, 20) != 20)
-            return AVERROR(EIO);
+        if ((ret = ffio_read_size(pb, scratch, 20)) < 0)
+            return ret;
         /* make some assumptions about the audio parameters */
         film->audio_type = AV_CODEC_ID_PCM_S8;
         film->audio_samplerate = 22050;
@@ -116,8 +118,8 @@ static int film_read_header(AVFormatContext *s)
         film->audio_bits = 8;
     } else {
         /* normal Saturn .cpk files; 32-byte header */
-        if (avio_read(pb, scratch, 32) != 32)
-            return AVERROR(EIO);
+        if ((ret = ffio_read_size(pb, scratch, 32)) < 0)
+            return ret;
         film->audio_samplerate = AV_RB16(&scratch[24]);
         film->audio_channels = scratch[21];
         film->audio_bits = scratch[22];
@@ -196,8 +198,8 @@ static int film_read_header(AVFormatContext *s)
     }
 
     /* load the sample table */
-    if (avio_read(pb, scratch, 16) != 16)
-        return AVERROR(EIO);
+    if ((ret = ffio_read_size(pb, scratch, 16)) < 0)
+        return ret;
     if (AV_RB32(&scratch[0]) != STAB_TAG)
         return AVERROR_INVALIDDATA;
     film->base_clock = AV_RB32(&scratch[8]);
@@ -217,8 +219,8 @@ static int film_read_header(AVFormatContext *s)
     audio_frame_counter = video_frame_counter = 0;
     for (i = 0; i < film->sample_count; i++) {
         /* load the next sample record and transfer it to an internal struct */
-        if (avio_read(pb, scratch, 16) != 16)
-            return AVERROR(EIO);
+        if ((ret = ffio_read_size(pb, scratch, 16)) < 0)
+            return ret;
         film->sample_table[i].sample_offset =
             data_offset + AV_RB32(&scratch[0]);
         film->sample_table[i].sample_size = AV_RB32(&scratch[4]);
@@ -294,7 +296,7 @@ static int film_read_packet(AVFormatContext *s,
 
     ret = av_get_packet(pb, pkt, sample->sample_size);
     if (ret != sample->sample_size)
-        ret = AVERROR(EIO);
+        ret = AVERROR_INVALIDDATA;
 
     pkt->stream_index = sample->stream;
     pkt->dts = sample->pts;
diff --git a/libavformat/sierravmd.c b/libavformat/sierravmd.c
index 2103ff64db..bb1d1c5df7 100644
--- a/libavformat/sierravmd.c
+++ b/libavformat/sierravmd.c
@@ -31,6 +31,7 @@
 #include "libavutil/intreadwrite.h"
 #include "libavutil/mem.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 #include "avio_internal.h"
@@ -102,8 +103,8 @@ static int vmd_read_header(AVFormatContext *s)
 
     /* fetch the main header, including the 2 header length bytes */
     avio_seek(pb, 0, SEEK_SET);
-    if (avio_read(pb, vmd->vmd_header, VMD_HEADER_SIZE) != VMD_HEADER_SIZE)
-        return AVERROR(EIO);
+    if ((ret = ffio_read_size(pb, vmd->vmd_header, VMD_HEADER_SIZE) < 0))
+        return ret;
 
     width = AV_RL16(&vmd->vmd_header[12]);
     height = AV_RL16(&vmd->vmd_header[14]);
@@ -192,11 +193,8 @@ static int vmd_read_header(AVFormatContext *s)
         ret = AVERROR(ENOMEM);
         goto error;
     }
-    if (avio_read(pb, raw_frame_table, raw_frame_table_size) !=
-        raw_frame_table_size) {
-        ret = AVERROR(EIO);
+    if ((ret = ffio_read_size(pb, raw_frame_table, raw_frame_table_size)) < 0)
         goto error;
-    }
 
     total_frames = 0;
     for (i = 0; i < vmd->frame_count; i++) {
@@ -279,21 +277,18 @@ static int vmd_read_packet(AVFormatContext *s,
     avio_seek(pb, frame->frame_offset, SEEK_SET);
 
     if(ffio_limit(pb, frame->frame_size) != frame->frame_size)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     ret = av_new_packet(pkt, frame->frame_size + BYTES_PER_FRAME_RECORD);
     if (ret < 0)
         return ret;
     pkt->pos= avio_tell(pb);
     memcpy(pkt->data, frame->frame_record, BYTES_PER_FRAME_RECORD);
     if(vmd->is_indeo3 && frame->frame_record[0] == 0x02)
-        ret = avio_read(pb, pkt->data, frame->frame_size);
+        ret = ffio_read_size(pb, pkt->data, frame->frame_size);
     else
-        ret = avio_read(pb, pkt->data + BYTES_PER_FRAME_RECORD,
+        ret = ffio_read_size(pb, pkt->data + BYTES_PER_FRAME_RECORD,
             frame->frame_size);
 
-    if (ret != frame->frame_size) {
-        ret = AVERROR(EIO);
-    }
     pkt->stream_index = frame->stream_index;
     pkt->pts = frame->pts;
     av_log(s, AV_LOG_DEBUG, " dispatching %s frame with %d bytes and pts %"PRId64"\n",
diff --git a/libavformat/siff.c b/libavformat/siff.c
index b33746d51d..b0fff80b09 100644
--- a/libavformat/siff.c
+++ b/libavformat/siff.c
@@ -232,7 +232,7 @@ static int siff_read_packet(AVFormatContext *s, AVPacket *pkt)
         } else {
             int pktsize = av_get_packet(s->pb, pkt, c->sndsize - 4);
             if (pktsize < 0)
-                return AVERROR(EIO);
+                return AVERROR_INVALIDDATA;
             pkt->stream_index = 1;
             pkt->duration     = pktsize;
             c->curstrm        = 0;
@@ -246,7 +246,7 @@ static int siff_read_packet(AVFormatContext *s, AVPacket *pkt)
         if (!pktsize)
             return AVERROR_EOF;
         if (pktsize <= 0)
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         pkt->duration = pktsize;
     }
     return pkt->size;
diff --git a/libavformat/smush.c b/libavformat/smush.c
index d380bfbff1..587125dbc0 100644
--- a/libavformat/smush.c
+++ b/libavformat/smush.c
@@ -235,7 +235,7 @@ static int smush_read_packet(AVFormatContext *ctx, AVPacket *pkt)
             if (size < 13)
                 return AVERROR_INVALIDDATA;
             if (av_get_packet(pb, pkt, size) < 13)
-                return AVERROR(EIO);
+                return AVERROR_INVALIDDATA;
 
             pkt->stream_index = smush->audio_stream_index;
             pkt->flags       |= AV_PKT_FLAG_KEY;
diff --git a/libavformat/soxdec.c b/libavformat/soxdec.c
index ba349c870e..c710b9a152 100644
--- a/libavformat/soxdec.c
+++ b/libavformat/soxdec.c
@@ -34,6 +34,7 @@
 #include "libavutil/dict.h"
 #include "libavutil/mem.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 #include "pcm.h"
@@ -106,11 +107,12 @@ static int sox_read_header(AVFormatContext *s)
 
     if (comment_size && comment_size < UINT_MAX) {
         char *comment = av_malloc(comment_size+1);
+        int ret;
         if(!comment)
             return AVERROR(ENOMEM);
-        if (avio_read(pb, comment, comment_size) != comment_size) {
+        if ((ret = ffio_read_size(pb, comment, comment_size)) < 0) {
             av_freep(&comment);
-            return AVERROR(EIO);
+            return ret;
         }
         comment[comment_size] = 0;
 
diff --git a/libavformat/swfdec.c b/libavformat/swfdec.c
index 29eefc68a2..1290f2c70f 100644
--- a/libavformat/swfdec.c
+++ b/libavformat/swfdec.c
@@ -174,10 +174,10 @@ static int swf_read_header(AVFormatContext *s)
         pb = swf->zpb;
 #else
         av_log(s, AV_LOG_ERROR, "zlib support is required to read SWF compressed files\n");
-        return AVERROR(EIO);
+        return AVERROR(ENOSYS);
 #endif
     } else if (tag != MKBETAG('F', 'W', 'S', 0))
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     /* skip rectangle size */
     nbits = avio_r8(pb) >> 3;
     len = (4 * nbits - 3 + 7) / 8;
diff --git a/libavformat/thp.c b/libavformat/thp.c
index 76db7fc581..c1a7418a2c 100644
--- a/libavformat/thp.c
+++ b/libavformat/thp.c
@@ -193,7 +193,7 @@ static int thp_read_packet(AVFormatContext *s,
         if (ret < 0)
             return ret;
         if (ret != size) {
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         }
 
         pkt->stream_index = thp->video_stream_index;
@@ -202,7 +202,7 @@ static int thp_read_packet(AVFormatContext *s,
         if (ret < 0)
             return ret;
         if (ret != thp->audiosize) {
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         }
 
         pkt->stream_index = thp->audio_stream_index;
diff --git a/libavformat/tiertexseq.c b/libavformat/tiertexseq.c
index 844b98e182..5dc7d9f110 100644
--- a/libavformat/tiertexseq.c
+++ b/libavformat/tiertexseq.c
@@ -27,6 +27,7 @@
 #include "libavutil/channel_layout.h"
 #include "libavutil/mem.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 
@@ -109,6 +110,7 @@ static int seq_init_frame_buffers(SeqDemuxContext *seq, AVIOContext *pb)
 static int seq_fill_buffer(SeqDemuxContext *seq, AVIOContext *pb, int buffer_num, unsigned int data_offs, int data_size)
 {
     TiertexSeqFrameBuffer *seq_buffer;
+    int ret;
 
     if (buffer_num >= SEQ_NUM_FRAME_BUFFERS)
         return AVERROR_INVALIDDATA;
@@ -118,8 +120,8 @@ static int seq_fill_buffer(SeqDemuxContext *seq, AVIOContext *pb, int buffer_num
         return AVERROR_INVALIDDATA;
 
     avio_seek(pb, seq->current_frame_offs + data_offs, SEEK_SET);
-    if (avio_read(pb, seq_buffer->data + seq_buffer->fill_size, data_size) != data_size)
-        return AVERROR(EIO);
+    if ((ret = ffio_read_size(pb, seq_buffer->data + seq_buffer->fill_size, data_size)) < 0)
+        return ret;
 
     seq_buffer->fill_size += data_size;
     return 0;
@@ -273,10 +275,11 @@ static int seq_read_packet(AVFormatContext *s, AVPacket *pkt)
 
             pkt->data[0] = 0;
             if (seq->current_pal_data_size) {
+                int ret;
                 pkt->data[0] |= 1;
                 avio_seek(pb, seq->current_frame_offs + seq->current_pal_data_offs, SEEK_SET);
-                if (avio_read(pb, &pkt->data[1], seq->current_pal_data_size) != seq->current_pal_data_size)
-                    return AVERROR(EIO);
+                if ((ret = ffio_read_size(pb, &pkt->data[1], seq->current_pal_data_size)) < 0)
+                    return ret;
             }
             if (seq->current_video_data_size) {
                 pkt->data[0] |= 2;
@@ -295,7 +298,7 @@ static int seq_read_packet(AVFormatContext *s, AVPacket *pkt)
 
     /* audio packet */
     if (seq->current_audio_data_offs == 0) /* end of data reached */
-        return AVERROR(EIO);
+        return AVERROR_EOF;
 
     avio_seek(pb, seq->current_frame_offs + seq->current_audio_data_offs, SEEK_SET);
     rc = av_get_packet(pb, pkt, seq->current_audio_data_size);
diff --git a/libavformat/ty.c b/libavformat/ty.c
index f524b74bad..acd5e62157 100644
--- a/libavformat/ty.c
+++ b/libavformat/ty.c
@@ -303,7 +303,7 @@ static int ty_read_header(AVFormatContext *s)
     if (ty->tivo_series == TIVO_SERIES_UNKNOWN ||
         ty->audio_type == TIVO_AUDIO_UNKNOWN ||
         ty->tivo_type == TIVO_TYPE_UNKNOWN)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
 
     st = avformat_new_stream(s, NULL);
     if (!st)
diff --git a/libavformat/vc1test.c b/libavformat/vc1test.c
index 394a70c1ac..239b35bf33 100644
--- a/libavformat/vc1test.c
+++ b/libavformat/vc1test.c
@@ -108,7 +108,7 @@ static int vc1t_read_packet(AVFormatContext *s,
         keyframe = 1;
     pts = avio_rl32(pb);
     if(av_get_packet(pb, pkt, frame_size) < 0)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     if(s->streams[0]->time_base.den == 1000)
         pkt->pts = pts;
     pkt->flags |= keyframe ? AV_PKT_FLAG_KEY : 0;
diff --git a/libavformat/vividas.c b/libavformat/vividas.c
index dd25539201..b708d71c65 100644
--- a/libavformat/vividas.c
+++ b/libavformat/vividas.c
@@ -601,7 +601,7 @@ static int viv_read_header(AVFormatContext *s)
         k2 = b22_key;
         buf = read_vblock(pb, &v, b22_key, &k2, 0);
         if (!buf)
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
 
         av_free(buf);
     }
@@ -609,7 +609,7 @@ static int viv_read_header(AVFormatContext *s)
     k2 = key;
     buf = read_vblock(pb, &v, key, &k2, 0);
     if (!buf)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     ret = track_header(viv, s, buf, v);
     av_free(buf);
     if (ret < 0)
@@ -617,7 +617,7 @@ static int viv_read_header(AVFormatContext *s)
 
     buf = read_vblock(pb, &v, key, &k2, v);
     if (!buf)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     ret = track_index(viv, s, buf, v);
     av_free(buf);
     if (ret < 0)
@@ -643,7 +643,7 @@ static int viv_read_packet(AVFormatContext *s,
     int ret;
 
     if (!viv->sb_pb)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     if (avio_feof(viv->sb_pb))
         return AVERROR_EOF;
 
@@ -670,7 +670,7 @@ static int viv_read_packet(AVFormatContext *s,
 
     if (viv->current_sb_entry >= viv->n_sb_entries) {
         if (viv->current_sb+1 >= viv->n_sb_blocks)
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
         viv->current_sb++;
 
         load_sb_block(s, viv, 0);
@@ -679,7 +679,7 @@ static int viv_read_packet(AVFormatContext *s,
 
     pb = viv->sb_pb;
     if (!pb)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     off = avio_tell(pb);
 
     if (viv->current_sb_entry >= viv->n_sb_entries)
diff --git a/libavformat/voc_packet.c b/libavformat/voc_packet.c
index 32f8b29aa7..cee6ac5746 100644
--- a/libavformat/voc_packet.c
+++ b/libavformat/voc_packet.c
@@ -53,7 +53,7 @@ ff_voc_get_packet(AVFormatContext *s, AVPacket *pkt, AVStream *st, int max_size)
         if (!voc->remaining_size) {
             int64_t filesize;
             if (!(s->pb->seekable & AVIO_SEEKABLE_NORMAL))
-                return AVERROR(EIO);
+                return AVERROR(ENOSYS);
             filesize = avio_size(pb);
             if (filesize - avio_tell(pb) > INT_MAX)
                 return AVERROR_INVALIDDATA;
diff --git a/libavformat/vpk.c b/libavformat/vpk.c
index 001ad33555..f6270a11ae 100644
--- a/libavformat/vpk.c
+++ b/libavformat/vpk.c
@@ -21,6 +21,7 @@
 
 #include "libavutil/intreadwrite.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 
@@ -94,10 +95,10 @@ static int vpk_read_packet(AVFormatContext *s, AVPacket *pkt)
         if (ret < 0)
             return ret;
         for (i = 0; i < par->ch_layout.nb_channels; i++) {
-            ret = avio_read(s->pb, pkt->data + i * size, size);
+            ret = ffio_read_size(s->pb, pkt->data + i * size, size);
             avio_skip(s->pb, skip);
-            if (ret != size) {
-                return AVERROR(EIO);
+            if (ret < 0) {
+                return ret;
             }
         }
         pkt->pos = pos;
diff --git a/libavformat/wavarc.c b/libavformat/wavarc.c
index 9d7029f209..6467d7d578 100644
--- a/libavformat/wavarc.c
+++ b/libavformat/wavarc.c
@@ -71,8 +71,8 @@ static int wavarc_read_header(AVFormatContext *s)
         return AVERROR_INVALIDDATA;
     id = avio_rl32(pb);
     w->data_end = avio_tell(pb);
-    if (avio_read(pb, data, sizeof(data)) != sizeof(data))
-        return AVERROR(EIO);
+    if ((ret = ffio_read_size(pb, data, sizeof(data))) < 0)
+        return ret;
     w->data_end += 16LL + AV_RL32(data + 4);
     fmt_len = AV_RL32(data + 32);
     if (fmt_len < 12)
diff --git a/libavformat/wc3movie.c b/libavformat/wc3movie.c
index f4063353b6..b4e7f1e31d 100644
--- a/libavformat/wc3movie.c
+++ b/libavformat/wc3movie.c
@@ -33,6 +33,7 @@
 #include "libavutil/dict.h"
 #include "libavutil/mem.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 
@@ -141,9 +142,9 @@ static int wc3_read_header(AVFormatContext *s)
             buffer = av_malloc(size+1);
             if (!buffer)
                 return AVERROR(ENOMEM);
-            if ((ret = avio_read(pb, buffer, size)) != size) {
+            if ((ret = ffio_read_size(pb, buffer, size)) < 0) {
                 av_freep(&buffer);
-                return AVERROR(EIO);
+                return ret;
             }
             buffer[size] = 0;
             av_dict_set(&s->metadata, "title", buffer,
@@ -172,7 +173,7 @@ static int wc3_read_header(AVFormatContext *s)
         /* chunk sizes are 16-bit aligned */
         size = (avio_rb32(pb) + 1) & (~1);
         if (avio_feof(pb))
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
 
     } while (fourcc_tag != BRCH_TAG);
 
@@ -223,7 +224,7 @@ static int wc3_read_packet(AVFormatContext *s,
         /* chunk sizes are 16-bit aligned */
         size = (avio_rb32(pb) + 1) & (~1);
         if (avio_feof(pb))
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
 
         switch (fourcc_tag) {
 
@@ -252,9 +253,9 @@ static int wc3_read_packet(AVFormatContext *s,
 
         case TEXT_TAG:
             /* subtitle chunk */
-            if ((unsigned)size > sizeof(text) || (ret = avio_read(pb, text, size)) != size)
-                ret = AVERROR(EIO);
-            else {
+            if ((unsigned)size > sizeof(text))
+                ret = AVERROR_INVALIDDATA;
+            else if ((ret = ffio_read_size(pb, text, size)) == size) {
                 int i = 0;
                 av_log (s, AV_LOG_DEBUG, "Subtitle time!\n");
                 if (i >= size || av_strnlen(&text[i + 1], size - i - 1) >= size - i - 1)
diff --git a/libavformat/westwood_aud.c b/libavformat/westwood_aud.c
index f83913a22f..68b60be5b1 100644
--- a/libavformat/westwood_aud.c
+++ b/libavformat/westwood_aud.c
@@ -36,6 +36,7 @@
 #include "libavutil/channel_layout.h"
 #include "libavutil/intreadwrite.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 
@@ -87,9 +88,10 @@ static int wsaud_read_header(AVFormatContext *s)
     AVStream *st;
     unsigned char header[AUD_HEADER_SIZE];
     int sample_rate, channels, codec;
+    int ret;
 
-    if (avio_read(pb, header, AUD_HEADER_SIZE) != AUD_HEADER_SIZE)
-        return AVERROR(EIO);
+    if ((ret = ffio_read_size(pb, header, AUD_HEADER_SIZE)) < 0)
+        return ret;
 
     sample_rate = AV_RL16(&header[0]);
     channels    = (header[10] & 0x1) + 1;
@@ -134,9 +136,8 @@ static int wsaud_read_packet(AVFormatContext *s,
     int ret = 0;
     AVStream *st = s->streams[0];
 
-    if (avio_read(pb, preamble, AUD_CHUNK_PREAMBLE_SIZE) !=
-        AUD_CHUNK_PREAMBLE_SIZE)
-        return AVERROR(EIO);
+    if ((ret = ffio_read_size(pb, preamble, AUD_CHUNK_PREAMBLE_SIZE)) < 0)
+        return ret;
 
     /* validate the chunk */
     if (AV_RL32(&preamble[4]) != AUD_CHUNK_SIGNATURE)
@@ -152,8 +153,8 @@ static int wsaud_read_packet(AVFormatContext *s,
         int out_size = AV_RL16(&preamble[2]);
         if ((ret = av_new_packet(pkt, chunk_size + 4)) < 0)
             return ret;
-        if ((ret = avio_read(pb, &pkt->data[4], chunk_size)) != chunk_size)
-            return ret < 0 ? ret : AVERROR(EIO);
+        if ((ret = ffio_read_size(pb, &pkt->data[4], chunk_size)) < 0)
+            return ret;
         AV_WL16(&pkt->data[0], out_size);
         AV_WL16(&pkt->data[2], chunk_size);
 
@@ -161,7 +162,7 @@ static int wsaud_read_packet(AVFormatContext *s,
     } else {
         ret = av_get_packet(pb, pkt, chunk_size);
         if (ret != chunk_size)
-            return AVERROR(EIO);
+            return AVERROR_INVALIDDATA;
 
         if (st->codecpar->ch_layout.nb_channels <= 0) {
             av_log(s, AV_LOG_ERROR, "invalid number of channels %d\n",
diff --git a/libavformat/westwood_vqa.c b/libavformat/westwood_vqa.c
index 9755fcc9c1..8e5179f697 100644
--- a/libavformat/westwood_vqa.c
+++ b/libavformat/westwood_vqa.c
@@ -138,8 +138,8 @@ static int wsvqa_read_header(AVFormatContext *s)
     /* there are 0 or more chunks before the FINF chunk; iterate until
      * FINF has been skipped and the file will be ready to be demuxed */
     do {
-        if (avio_read(pb, scratch, VQA_PREAMBLE_SIZE) != VQA_PREAMBLE_SIZE)
-            return AVERROR(EIO);
+        if ((ret = ffio_read_size(pb, scratch, VQA_PREAMBLE_SIZE)) < 0)
+            return ret;
         chunk_tag = AV_RB32(&scratch[0]);
         chunk_size = AV_RB32(&scratch[4]);
 
@@ -211,7 +211,7 @@ static int wsvqa_read_packet(AVFormatContext *s,
 
             ret= av_get_packet(pb, pkt, chunk_size);
             if (ret<0)
-                return AVERROR(EIO);
+                return AVERROR_INVALIDDATA;
 
             switch (chunk_type) {
             case SND0_TAG:
@@ -272,20 +272,20 @@ static int wsvqa_read_packet(AVFormatContext *s,
                 /* if a new codebook is available inside an earlier a VQFL chunk then
                  * append it to 'pkt' */
                 if (wsvqa->vqfl_chunk_size > 0) {
-                    int64_t current_pos = pkt->pos;
+                    int64_t ret64, current_pos = pkt->pos;
 
-                    if (avio_seek(pb, wsvqa->vqfl_chunk_pos, SEEK_SET) < 0)
-                        return AVERROR(EIO);
+                    if ((ret64 = avio_seek(pb, wsvqa->vqfl_chunk_pos, SEEK_SET)) < 0)
+                        return (int)ret64;
 
                     /* the decoder expects chunks to be 16-bit aligned */
                     if (wsvqa->vqfl_chunk_size % 2 == 1)
                         wsvqa->vqfl_chunk_size++;
 
                     if (av_append_packet(pb, pkt, wsvqa->vqfl_chunk_size) < 0)
-                        return AVERROR(EIO);
+                        return AVERROR_INVALIDDATA;
 
-                    if (avio_seek(pb, current_pos, SEEK_SET) < 0)
-                        return AVERROR(EIO);
+                    if ((ret64 = avio_seek(pb, current_pos, SEEK_SET)) < 0)
+                        return (int)ret64;
 
                     wsvqa->vqfl_chunk_pos = 0;
                     wsvqa->vqfl_chunk_size = 0;
diff --git a/libavformat/wsddec.c b/libavformat/wsddec.c
index b0bf49cb04..f36c254621 100644
--- a/libavformat/wsddec.c
+++ b/libavformat/wsddec.c
@@ -24,6 +24,7 @@
 #include "libavutil/mem.h"
 #include "libavutil/timecode.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "rawdec.h"
 
@@ -73,6 +74,7 @@ static int wsd_to_av_channel_layoyt(AVFormatContext *s, int bit)
 static int get_metadata(AVFormatContext *s, const char *const tag, const unsigned size)
 {
     uint8_t *buf;
+    int ret;
     if (!(size + 1))
         return AVERROR(ENOMEM);
 
@@ -80,9 +82,9 @@ static int get_metadata(AVFormatContext *s, const char *const tag, const unsigne
     if (!buf)
         return AVERROR(ENOMEM);
 
-    if (avio_read(s->pb, buf, size) != size) {
+    if ((ret = avio_read(s->pb, buf, size)) < 0) {
         av_free(buf);
-        return AVERROR(EIO);
+        return ret;
     }
 
     if (empty_string(buf, size)) {
diff --git a/libavformat/wtvdec.c b/libavformat/wtvdec.c
index 9d26e35e22..1f299510c9 100644
--- a/libavformat/wtvdec.c
+++ b/libavformat/wtvdec.c
@@ -761,7 +761,7 @@ static int recover(WtvContext *wtv, uint64_t broken_pos)
             return 0;
          }
      }
-     return AVERROR(EIO);
+     return AVERROR_INVALIDDATA;
 }
 
 /**
diff --git a/libavformat/wvdec.c b/libavformat/wvdec.c
index e2a79957f7..e69f42baf5 100644
--- a/libavformat/wvdec.c
+++ b/libavformat/wvdec.c
@@ -23,6 +23,7 @@
 #include "libavutil/intreadwrite.h"
 #include "libavutil/dict.h"
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 #include "apetag.h"
@@ -295,9 +296,9 @@ static int wv_read_packet(AVFormatContext *s, AVPacket *pkt)
     if ((ret = av_new_packet(pkt, wc->header.blocksize + WV_HEADER_SIZE)) < 0)
         return ret;
     memcpy(pkt->data, wc->block_header, WV_HEADER_SIZE);
-    ret = avio_read(s->pb, pkt->data + WV_HEADER_SIZE, wc->header.blocksize);
-    if (ret != wc->header.blocksize) {
-        return AVERROR(EIO);
+    ret = ffio_read_size(s->pb, pkt->data + WV_HEADER_SIZE, wc->header.blocksize);
+    if (ret < 0) {
+        return ret;
     }
     while (!(wc->header.flags & WV_FLAG_FINAL_BLOCK)) {
         if ((ret = wv_read_block_header(s, s->pb)) < 0) {
diff --git a/libavformat/xmv.c b/libavformat/xmv.c
index ed59f7b85b..c0b402860e 100644
--- a/libavformat/xmv.c
+++ b/libavformat/xmv.c
@@ -31,6 +31,7 @@
 #include "libavutil/mem.h"
 
 #include "avformat.h"
+#include "avio_internal.h"
 #include "demux.h"
 #include "internal.h"
 #include "riff.h"
@@ -273,8 +274,8 @@ static int xmv_process_packet_header(AVFormatContext *s)
 
     /* Packet video header */
 
-    if (avio_read(pb, data, 8) != 8)
-        return AVERROR(EIO);
+    if ((ret = ffio_read_size(pb, data, 8)) < 0)
+        return ret;
 
     xmv->video.data_size     = AV_RL32(data) & 0x007FFFFF;
 
@@ -325,8 +326,8 @@ static int xmv_process_packet_header(AVFormatContext *s)
     for (audio_track = 0; audio_track < xmv->audio_track_count; audio_track++) {
         XMVAudioPacket *packet = &xmv->audio[audio_track];
 
-        if (avio_read(pb, data, 4) != 4)
-            return AVERROR(EIO);
+        if ((ret = ffio_read_size(pb, data, 4)) < 0)
+            return ret;
 
         if (!packet->created) {
             AVStream *ast = avformat_new_stream(s, NULL);
@@ -417,12 +418,12 @@ static int xmv_fetch_new_packet(AVFormatContext *s)
     /* Seek to it */
     xmv->this_packet_offset = xmv->next_packet_offset;
     if (avio_seek(pb, xmv->this_packet_offset, SEEK_SET) != xmv->this_packet_offset)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
 
     /* Update the size */
     xmv->this_packet_size = xmv->next_packet_size;
     if (xmv->this_packet_size < (12 + xmv->audio_track_count * 4))
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
 
     /* Process the header */
     result = xmv_process_packet_header(s);
@@ -448,7 +449,7 @@ static int xmv_fetch_audio_packet(AVFormatContext *s,
 
     /* Seek to it */
     if (avio_seek(pb, audio->data_offset, SEEK_SET) != audio->data_offset)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
 
     if ((xmv->video.current_frame + 1) < xmv->video.frame_count)
         /* Not the last frame, get at most frame_size bytes. */
@@ -495,7 +496,7 @@ static int xmv_fetch_video_packet(AVFormatContext *s,
 
     /* Seek to it */
     if (avio_seek(pb, video->data_offset, SEEK_SET) != video->data_offset)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
 
     /* Read the frame header */
     frame_header = avio_rl32(pb);
@@ -504,7 +505,7 @@ static int xmv_fetch_video_packet(AVFormatContext *s,
     frame_timestamp = (frame_header >> 17);
 
     if ((frame_size + 4) > video->data_size)
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
 
     /* Get the packet data */
     result = av_get_packet(pb, pkt, frame_size);
diff --git a/libavformat/yuv4mpegdec.c b/libavformat/yuv4mpegdec.c
index 2b66a1e596..5c21858908 100644
--- a/libavformat/yuv4mpegdec.c
+++ b/libavformat/yuv4mpegdec.c
@@ -291,7 +291,7 @@ static int yuv4_read_packet(AVFormatContext *s, AVPacket *pkt)
     if (ret < 0)
         return ret;
     else if (ret != s->packet_size - Y4M_FRAME_MAGIC_LEN) {
-        return s->pb->eof_reached ? AVERROR_EOF : AVERROR(EIO);
+        return s->pb->eof_reached ? AVERROR_EOF : AVERROR_INVALIDDATA;
     }
     pkt->stream_index = 0;
     pkt->pts = (off - ffformatcontext(s)->data_offset) / s->packet_size;
diff --git a/libavformat/yuv4mpegenc.c b/libavformat/yuv4mpegenc.c
index 35397cbde0..371d3745c1 100644
--- a/libavformat/yuv4mpegenc.c
+++ b/libavformat/yuv4mpegenc.c
@@ -282,7 +282,7 @@ static int yuv4_init(AVFormatContext *s)
                "gray9, gray10, gray12 "
                "and gray16 pixel formats. "
                "Use -pix_fmt to select one.\n");
-        return AVERROR(EIO);
+        return AVERROR_INVALIDDATA;
     }
 
     return 0;
-- 
2.52.0


From 5c33b249877c2cdc72e0f50476a321e68de89c0e Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Fri, 21 Nov 2025 13:58:22 +0100
Subject: [PATCH 053/304] av{codec,util}/tests: Remove pointless undefs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Before commit e96d90eed66a198566c409958432d282e1b03869 lavu/internal.h
contained redefined various discouraged/forbidden functions to induce
compilation failures upon use, like e.g.
 #define malloc please_use_av_malloc
In order to use these functions, some files had to undefine these
macros. This commit removes the remaining pointless undefs.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/jacosubdec.c          | 2 --
 libavcodec/tests/motion.c        | 2 --
 libavcodec/tests/snowenc.c       | 4 ----
 libavfilter/tests/formats.c      | 2 --
 libavformat/sbgdec.c             | 1 -
 libavutil/file_open.c            | 1 -
 libavutil/tests/bprint.c         | 2 --
 libavutil/tests/error.c          | 2 --
 libavutil/tests/file.c           | 2 --
 libavutil/tests/imgutils.c       | 1 -
 libavutil/tests/pca.c            | 1 -
 libavutil/tests/random_seed.c    | 1 -
 libswresample/tests/swresample.c | 2 --
 tools/fourcc2pixfmt.c            | 3 ---
 14 files changed, 26 deletions(-)

diff --git a/libavcodec/jacosubdec.c b/libavcodec/jacosubdec.c
index 08349a9ec8..fd90dcc4b6 100644
--- a/libavcodec/jacosubdec.c
+++ b/libavcodec/jacosubdec.c
@@ -32,8 +32,6 @@
 #include "libavutil/bprint.h"
 #include "libavutil/time_internal.h"
 
-#undef time
-
 static int insert_text(AVBPrint *dst, const char *in, const char *arg)
 {
     av_bprintf(dst, "%s", arg);
diff --git a/libavcodec/tests/motion.c b/libavcodec/tests/motion.c
index 719fba537d..ad2d65ec78 100644
--- a/libavcodec/tests/motion.c
+++ b/libavcodec/tests/motion.c
@@ -36,8 +36,6 @@
 #include "libavutil/mem.h"
 #include "libavutil/time.h"
 
-#undef printf
-
 #define WIDTH 64
 #define HEIGHT 64
 
diff --git a/libavcodec/tests/snowenc.c b/libavcodec/tests/snowenc.c
index 311374e5d4..35feedba06 100644
--- a/libavcodec/tests/snowenc.c
+++ b/libavcodec/tests/snowenc.c
@@ -20,10 +20,6 @@
 
 #include "libavcodec/snowenc.c"
 
-#undef malloc
-#undef free
-#undef printf
-
 #include "libavutil/lfg.h"
 #include "libavutil/mathematics.h"
 #include "libavutil/mem.h"
diff --git a/libavfilter/tests/formats.c b/libavfilter/tests/formats.c
index 5cc3ca3371..3fdad7427e 100644
--- a/libavfilter/tests/formats.c
+++ b/libavfilter/tests/formats.c
@@ -22,8 +22,6 @@
 #include "libavfilter/audio.h"
 #include "libavfilter/formats.c"
 
-#undef printf
-
 const int64_t avfilter_all_channel_layouts[] = {
     AV_CH_FRONT_CENTER,
     AV_CH_FRONT_CENTER|AV_CH_LOW_FREQUENCY,
diff --git a/libavformat/sbgdec.c b/libavformat/sbgdec.c
index 4afb51b844..0e3d860d9f 100644
--- a/libavformat/sbgdec.c
+++ b/libavformat/sbgdec.c
@@ -905,7 +905,6 @@ static int expand_timestamps(void *log, struct sbg_script *s)
         av_log(log, AV_LOG_WARNING,
                "Scripts with mixed absolute and relative timestamps can give "
                "unexpected results (pause, seeking, time zone change).\n");
-#undef time
         time(&now0);
         tm = localtime_r(&now0, &tmpbuf);
         now = tm ? tm->tm_hour * 3600 + tm->tm_min * 60 + tm->tm_sec :
diff --git a/libavutil/file_open.c b/libavutil/file_open.c
index 4692035d55..121a562fcb 100644
--- a/libavutil/file_open.c
+++ b/libavutil/file_open.c
@@ -120,7 +120,6 @@ int avpriv_tempfile(const char *prefix, char **filename, int log_offset, void *l
     if(!ptr)
         ptr= tempnam(".", prefix);
     *filename = av_strdup(ptr);
-#undef free
     free(ptr);
 #else
     return AVERROR(ENOSYS);
diff --git a/libavutil/tests/bprint.c b/libavutil/tests/bprint.c
index d7f9abf23e..9a87283302 100644
--- a/libavutil/tests/bprint.c
+++ b/libavutil/tests/bprint.c
@@ -21,8 +21,6 @@
 #include "libavutil/avassert.h"
 #include "libavutil/bprint.c"
 
-#undef printf
-
 static void bprint_pascal(AVBPrint *b, unsigned size)
 {
     unsigned i, j;
diff --git a/libavutil/tests/error.c b/libavutil/tests/error.c
index 16efc8ac45..b7b253b7b5 100644
--- a/libavutil/tests/error.c
+++ b/libavutil/tests/error.c
@@ -18,8 +18,6 @@
 
 #include "libavutil/error.c"
 
-#undef printf
-
 int main(void)
 {
     int i;
diff --git a/libavutil/tests/file.c b/libavutil/tests/file.c
index 3608bcccbe..0b151de9a1 100644
--- a/libavutil/tests/file.c
+++ b/libavutil/tests/file.c
@@ -18,8 +18,6 @@
 
 #include "libavutil/file.c"
 
-#undef printf
-
 int main(void)
 {
     uint8_t *buf;
diff --git a/libavutil/tests/imgutils.c b/libavutil/tests/imgutils.c
index 6a5097bc35..cd363145ad 100644
--- a/libavutil/tests/imgutils.c
+++ b/libavutil/tests/imgutils.c
@@ -20,7 +20,6 @@
 #include "libavutil/crc.h"
 #include "libavutil/mem.h"
 
-#undef printf
 static int check_image_fill(enum AVPixelFormat pix_fmt, int w, int h) {
     uint8_t *data[4];
     size_t sizes[4];
diff --git a/libavutil/tests/pca.c b/libavutil/tests/pca.c
index b2afbea3b5..a2a620799a 100644
--- a/libavutil/tests/pca.c
+++ b/libavutil/tests/pca.c
@@ -22,7 +22,6 @@
 #include "libavutil/pca.c"
 #include "libavutil/lfg.h"
 
-#undef printf
 #include <stdio.h>
 #include <stdlib.h>
 
diff --git a/libavutil/tests/random_seed.c b/libavutil/tests/random_seed.c
index bf0c6c7986..614c958d42 100644
--- a/libavutil/tests/random_seed.c
+++ b/libavutil/tests/random_seed.c
@@ -21,7 +21,6 @@
 #define TEST 1
 #include "libavutil/random_seed.c"
 
-#undef printf
 #define N 256
 #define F 2
 #include <stdio.h>
diff --git a/libswresample/tests/swresample.c b/libswresample/tests/swresample.c
index b627aa4ad0..2a5f9d59c4 100644
--- a/libswresample/tests/swresample.c
+++ b/libswresample/tests/swresample.c
@@ -26,9 +26,7 @@
 
 #include "libswresample/swresample.h"
 
-#undef time
 #include <time.h>
-#undef fprintf
 
 #define SAMPLES 1000
 
diff --git a/tools/fourcc2pixfmt.c b/tools/fourcc2pixfmt.c
index 519cc1cad0..0b107db2ff 100644
--- a/tools/fourcc2pixfmt.c
+++ b/tools/fourcc2pixfmt.c
@@ -29,9 +29,6 @@
 #include "libavcodec/raw.h"
 #include "libavcodec/raw_pix_fmt_tags.h"
 
-#undef printf
-#undef fprintf
-
 #if !HAVE_GETOPT
 #include "compat/getopt.c"
 #endif
-- 
2.52.0


From 3aa9b5258a8a962e9134100ac51f50bb2a2c1ffb Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Fri, 21 Nov 2025 14:18:05 +0100
Subject: [PATCH 054/304] avcodec/j2kenc: Remove dead, disabled debug code
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Seems to have never worked, even when this was added in
83654c7b1b598add9041c7add6b77478eb91177f.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/j2kenc.c | 70 ---------------------------------------------
 1 file changed, 70 deletions(-)

diff --git a/libavcodec/j2kenc.c b/libavcodec/j2kenc.c
index 7bec5bbe36..cba4d18adf 100644
--- a/libavcodec/j2kenc.c
+++ b/libavcodec/j2kenc.c
@@ -148,76 +148,6 @@ typedef struct {
 } Jpeg2000EncoderContext;
 
 
-/* debug */
-#if 0
-#undef ifprintf
-#undef printf
-
-static void nspaces(FILE *fd, int n)
-{
-    while(n--) putc(' ', fd);
-}
-
-static void printcomp(Jpeg2000Component *comp)
-{
-    int i;
-    for (i = 0; i < comp->y1 - comp->y0; i++)
-        ff_jpeg2000_printv(comp->i_data + i * (comp->x1 - comp->x0), comp->x1 - comp->x0);
-}
-
-static void dump(Jpeg2000EncoderContext *s, FILE *fd)
-{
-    int tileno, compno, reslevelno, bandno, precno;
-    fprintf(fd, "XSiz = %d, YSiz = %d, tile_width = %d, tile_height = %d\n"
-                "numXtiles = %d, numYtiles = %d, ncomponents = %d\n"
-                "tiles:\n",
-            s->width, s->height, s->tile_width, s->tile_height,
-            s->numXtiles, s->numYtiles, s->ncomponents);
-    for (tileno = 0; tileno < s->numXtiles * s->numYtiles; tileno++){
-        Jpeg2000Tile *tile = s->tile + tileno;
-        nspaces(fd, 2);
-        fprintf(fd, "tile %d:\n", tileno);
-        for(compno = 0; compno < s->ncomponents; compno++){
-            Jpeg2000Component *comp = tile->comp + compno;
-            nspaces(fd, 4);
-            fprintf(fd, "component %d:\n", compno);
-            nspaces(fd, 4);
-            fprintf(fd, "x0 = %d, x1 = %d, y0 = %d, y1 = %d\n",
-                        comp->x0, comp->x1, comp->y0, comp->y1);
-            for(reslevelno = 0; reslevelno < s->nreslevels; reslevelno++){
-                Jpeg2000ResLevel *reslevel = comp->reslevel + reslevelno;
-                nspaces(fd, 6);
-                fprintf(fd, "reslevel %d:\n", reslevelno);
-                nspaces(fd, 6);
-                fprintf(fd, "x0 = %d, x1 = %d, y0 = %d, y1 = %d, nbands = %d\n",
-                        reslevel->x0, reslevel->x1, reslevel->y0,
-                        reslevel->y1, reslevel->nbands);
-                for(bandno = 0; bandno < reslevel->nbands; bandno++){
-                    Jpeg2000Band *band = reslevel->band + bandno;
-                    nspaces(fd, 8);
-                    fprintf(fd, "band %d:\n", bandno);
-                    nspaces(fd, 8);
-                    fprintf(fd, "x0 = %d, x1 = %d, y0 = %d, y1 = %d,"
-                                "codeblock_width = %d, codeblock_height = %d cblknx = %d cblkny = %d\n",
-                                band->x0, band->x1,
-                                band->y0, band->y1,
-                                band->codeblock_width, band->codeblock_height,
-                                band->cblknx, band->cblkny);
-                    for (precno = 0; precno < reslevel->num_precincts_x * reslevel->num_precincts_y; precno++){
-                        Jpeg2000Prec *prec = band->prec + precno;
-                        nspaces(fd, 10);
-                        fprintf(fd, "prec %d:\n", precno);
-                        nspaces(fd, 10);
-                        fprintf(fd, "xi0 = %d, xi1 = %d, yi0 = %d, yi1 = %d\n",
-                                     prec->xi0, prec->xi1, prec->yi0, prec->yi1);
-                    }
-                }
-            }
-        }
-    }
-}
-#endif
-
 /* bitstream routines */
 
 /** put n times val bit */
-- 
2.52.0


From 40debd537f6f457e62d9e7fd8d87a17a31489be8 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Fri, 21 Nov 2025 14:53:21 +0100
Subject: [PATCH 055/304] avutil/error: Avoid relocations and unused
 information
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Use a string table instead to avoid relocations and don't add
the tag string to libavutil/error.c: It is only used
during the test program, so move it there.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavutil/error.c       | 197 ++++++++++++++++++++++------------------
 libavutil/tests/error.c |  13 ++-
 2 files changed, 119 insertions(+), 91 deletions(-)

diff --git a/libavutil/error.c b/libavutil/error.c
index 90bab7b9d3..23a2a0a53a 100644
--- a/libavutil/error.c
+++ b/libavutil/error.c
@@ -25,109 +25,128 @@
 #include "error.h"
 #include "macros.h"
 
-struct error_entry {
-    int num;
-    const char *tag;
-    const char *str;
+#define AVERROR_INPUT_AND_OUTPUT_CHANGED (AVERROR_INPUT_CHANGED | AVERROR_OUTPUT_CHANGED)
+
+#define AVERROR_LIST(E, E2)                                                                     \
+    E(BSF_NOT_FOUND,            "Bitstream filter not found")                                   \
+    E(BUG,                      "Internal bug, should not have happened")                       \
+    E2(BUG2, BUG,               "Internal bug, should not have happened")                       \
+    E(BUFFER_TOO_SMALL,         "Buffer too small")                                             \
+    E(DECODER_NOT_FOUND,        "Decoder not found")                                            \
+    E(DEMUXER_NOT_FOUND,        "Demuxer not found")                                            \
+    E(ENCODER_NOT_FOUND,        "Encoder not found")                                            \
+    E(EOF,                      "End of file")                                                  \
+    E(EXIT,                     "Immediate exit requested")                                     \
+    E(EXTERNAL,                 "Generic error in an external library")                         \
+    E(FILTER_NOT_FOUND,         "Filter not found")                                             \
+    E(INPUT_CHANGED,            "Input changed")                                                \
+    E(INVALIDDATA,              "Invalid data found when processing input")                     \
+    E(MUXER_NOT_FOUND,          "Muxer not found")                                              \
+    E(OPTION_NOT_FOUND,         "Option not found")                                             \
+    E(OUTPUT_CHANGED,           "Output changed")                                               \
+    E(PATCHWELCOME,             "Not yet implemented in FFmpeg, patches welcome")               \
+    E(PROTOCOL_NOT_FOUND,       "Protocol not found")                                           \
+    E(STREAM_NOT_FOUND,         "Stream not found")                                             \
+    E(UNKNOWN,                  "Unknown error occurred")                                       \
+    E(EXPERIMENTAL,             "Experimental feature")                                         \
+    E(INPUT_AND_OUTPUT_CHANGED, "Input and output changed")                                     \
+    E(HTTP_BAD_REQUEST,         "Server returned 400 Bad Request")                              \
+    E(HTTP_UNAUTHORIZED,        "Server returned 401 Unauthorized (authorization failed)")      \
+    E(HTTP_FORBIDDEN,           "Server returned 403 Forbidden (access denied)")                \
+    E(HTTP_NOT_FOUND,           "Server returned 404 Not Found")                                \
+    E(HTTP_TOO_MANY_REQUESTS,   "Server returned 429 Too Many Requests")                        \
+    E(HTTP_OTHER_4XX,           "Server returned 4XX Client Error, but not one of 40{0,1,3,4}") \
+    E(HTTP_SERVER_ERROR,        "Server returned 5XX Server Error reply")                       \
+
+#define STRERROR_LIST(E)                                                     \
+    E(E2BIG,             "Argument list too long")                           \
+    E(EACCES,            "Permission denied")                                \
+    E(EAGAIN,            "Resource temporarily unavailable")                 \
+    E(EBADF,             "Bad file descriptor")                              \
+    E(EBUSY,             "Device or resource busy")                          \
+    E(ECHILD,            "No child processes")                               \
+    E(EDEADLK,           "Resource deadlock avoided")                        \
+    E(EDOM,              "Numerical argument out of domain")                 \
+    E(EEXIST,            "File exists")                                      \
+    E(EFAULT,            "Bad address")                                      \
+    E(EFBIG,             "File too large")                                   \
+    E(EILSEQ,            "Illegal byte sequence")                            \
+    E(EINTR,             "Interrupted system call")                          \
+    E(EINVAL,            "Invalid argument")                                 \
+    E(EIO,               "I/O error")                                        \
+    E(EISDIR,            "Is a directory")                                   \
+    E(EMFILE,            "Too many open files")                              \
+    E(EMLINK,            "Too many links")                                   \
+    E(ENAMETOOLONG,      "File name too long")                               \
+    E(ENFILE,            "Too many open files in system")                    \
+    E(ENODEV,            "No such device")                                   \
+    E(ENOENT,            "No such file or directory")                        \
+    E(ENOEXEC,           "Exec format error")                                \
+    E(ENOLCK,            "No locks available")                               \
+    E(ENOMEM,            "Cannot allocate memory")                           \
+    E(ENOSPC,            "No space left on device")                          \
+    E(ENOSYS,            "Function not implemented")                         \
+    E(ENOTDIR,           "Not a directory")                                  \
+    E(ENOTEMPTY,         "Directory not empty")                              \
+    E(ENOTTY,            "Inappropriate I/O control operation")              \
+    E(ENXIO,             "No such device or address")                        \
+    E(EPERM,             "Operation not permitted")                          \
+    E(EPIPE,             "Broken pipe")                                      \
+    E(ERANGE,            "Result too large")                                 \
+    E(EROFS,             "Read-only file system")                            \
+    E(ESPIPE,            "Illegal seek")                                     \
+    E(ESRCH,             "No such process")                                  \
+    E(EXDEV,             "Cross-device link")                                \
+
+enum {
+#define OFFSET(CODE, DESC)     \
+    ERROR_ ## CODE ## _OFFSET, \
+    ERROR_ ## CODE ## _END_OFFSET = ERROR_ ## CODE ## _OFFSET + sizeof(DESC) - 1,
+#define NOTHING(CODE, CODE2, DESC)
+    AVERROR_LIST(OFFSET, NOTHING)
+#if !HAVE_STRERROR_R
+    STRERROR_LIST(OFFSET)
+#endif
+    ERROR_LIST_SIZE
 };
 
-#define ERROR_TAG(tag) AVERROR_##tag, #tag
-#define EERROR_TAG(tag) AVERROR(tag), #tag
-#define AVERROR_INPUT_AND_OUTPUT_CHANGED (AVERROR_INPUT_CHANGED | AVERROR_OUTPUT_CHANGED)
-static const struct error_entry error_entries[] = {
-    { ERROR_TAG(BSF_NOT_FOUND),      "Bitstream filter not found"                     },
-    { ERROR_TAG(BUG),                "Internal bug, should not have happened"         },
-    { ERROR_TAG(BUG2),               "Internal bug, should not have happened"         },
-    { ERROR_TAG(BUFFER_TOO_SMALL),   "Buffer too small"                               },
-    { ERROR_TAG(DECODER_NOT_FOUND),  "Decoder not found"                              },
-    { ERROR_TAG(DEMUXER_NOT_FOUND),  "Demuxer not found"                              },
-    { ERROR_TAG(ENCODER_NOT_FOUND),  "Encoder not found"                              },
-    { ERROR_TAG(EOF),                "End of file"                                    },
-    { ERROR_TAG(EXIT),               "Immediate exit requested"                       },
-    { ERROR_TAG(EXTERNAL),           "Generic error in an external library"           },
-    { ERROR_TAG(FILTER_NOT_FOUND),   "Filter not found"                               },
-    { ERROR_TAG(INPUT_CHANGED),      "Input changed"                                  },
-    { ERROR_TAG(INVALIDDATA),        "Invalid data found when processing input"       },
-    { ERROR_TAG(MUXER_NOT_FOUND),    "Muxer not found"                                },
-    { ERROR_TAG(OPTION_NOT_FOUND),   "Option not found"                               },
-    { ERROR_TAG(OUTPUT_CHANGED),     "Output changed"                                 },
-    { ERROR_TAG(PATCHWELCOME),       "Not yet implemented in FFmpeg, patches welcome" },
-    { ERROR_TAG(PROTOCOL_NOT_FOUND), "Protocol not found"                             },
-    { ERROR_TAG(STREAM_NOT_FOUND),   "Stream not found"                               },
-    { ERROR_TAG(UNKNOWN),            "Unknown error occurred"                         },
-    { ERROR_TAG(EXPERIMENTAL),       "Experimental feature"                           },
-    { ERROR_TAG(INPUT_AND_OUTPUT_CHANGED), "Input and output changed"                 },
-    { ERROR_TAG(HTTP_BAD_REQUEST),   "Server returned 400 Bad Request"         },
-    { ERROR_TAG(HTTP_UNAUTHORIZED),  "Server returned 401 Unauthorized (authorization failed)" },
-    { ERROR_TAG(HTTP_FORBIDDEN),     "Server returned 403 Forbidden (access denied)" },
-    { ERROR_TAG(HTTP_NOT_FOUND),     "Server returned 404 Not Found"           },
-    { ERROR_TAG(HTTP_TOO_MANY_REQUESTS), "Server returned 429 Too Many Requests"      },
-    { ERROR_TAG(HTTP_OTHER_4XX),     "Server returned 4XX Client Error, but not one of 40{0,1,3,4}" },
-    { ERROR_TAG(HTTP_SERVER_ERROR),  "Server returned 5XX Server Error reply" },
+#define STRING(CODE, DESC) DESC "\0"
+static const char error_stringtable[ERROR_LIST_SIZE] =
+    AVERROR_LIST(STRING, NOTHING)
 #if !HAVE_STRERROR_R
-    { EERROR_TAG(E2BIG),             "Argument list too long" },
-    { EERROR_TAG(EACCES),            "Permission denied" },
-    { EERROR_TAG(EAGAIN),            "Resource temporarily unavailable" },
-    { EERROR_TAG(EBADF),             "Bad file descriptor" },
-    { EERROR_TAG(EBUSY),             "Device or resource busy" },
-    { EERROR_TAG(ECHILD),            "No child processes" },
-    { EERROR_TAG(EDEADLK),           "Resource deadlock avoided" },
-    { EERROR_TAG(EDOM),              "Numerical argument out of domain" },
-    { EERROR_TAG(EEXIST),            "File exists" },
-    { EERROR_TAG(EFAULT),            "Bad address" },
-    { EERROR_TAG(EFBIG),             "File too large" },
-    { EERROR_TAG(EILSEQ),            "Illegal byte sequence" },
-    { EERROR_TAG(EINTR),             "Interrupted system call" },
-    { EERROR_TAG(EINVAL),            "Invalid argument" },
-    { EERROR_TAG(EIO),               "I/O error" },
-    { EERROR_TAG(EISDIR),            "Is a directory" },
-    { EERROR_TAG(EMFILE),            "Too many open files" },
-    { EERROR_TAG(EMLINK),            "Too many links" },
-    { EERROR_TAG(ENAMETOOLONG),      "File name too long" },
-    { EERROR_TAG(ENFILE),            "Too many open files in system" },
-    { EERROR_TAG(ENODEV),            "No such device" },
-    { EERROR_TAG(ENOENT),            "No such file or directory" },
-    { EERROR_TAG(ENOEXEC),           "Exec format error" },
-    { EERROR_TAG(ENOLCK),            "No locks available" },
-    { EERROR_TAG(ENOMEM),            "Cannot allocate memory" },
-    { EERROR_TAG(ENOSPC),            "No space left on device" },
-    { EERROR_TAG(ENOSYS),            "Function not implemented" },
-    { EERROR_TAG(ENOTDIR),           "Not a directory" },
-    { EERROR_TAG(ENOTEMPTY),         "Directory not empty" },
-    { EERROR_TAG(ENOTTY),            "Inappropriate I/O control operation" },
-    { EERROR_TAG(ENXIO),             "No such device or address" },
-    { EERROR_TAG(EPERM),             "Operation not permitted" },
-    { EERROR_TAG(EPIPE),             "Broken pipe" },
-    { EERROR_TAG(ERANGE),            "Result too large" },
-    { EERROR_TAG(EROFS),             "Read-only file system" },
-    { EERROR_TAG(ESPIPE),            "Illegal seek" },
-    { EERROR_TAG(ESRCH),             "No such process" },
-    { EERROR_TAG(EXDEV),             "Cross-device link" },
+    STRERROR_LIST(STRING)
+#endif
+;
+
+static const struct ErrorEntry {
+    int num;
+    unsigned offset;
+} error_entries[] = {
+#define ENTRY(CODE, DESC) { .num = AVERROR_ ## CODE, .offset = ERROR_ ## CODE ## _OFFSET },
+#define ENTRY2(CODE, CODE2, DESC) { .num = AVERROR_ ## CODE, .offset = ERROR_ ## CODE2 ## _OFFSET },
+    AVERROR_LIST(ENTRY, ENTRY2)
+#if !HAVE_STRERROR_R
+#undef ENTRY
+#define ENTRY(CODE, DESC) { .num = AVERROR(CODE), .offset = ERROR_ ## CODE ## _OFFSET },
+    STRERROR_LIST(ENTRY)
 #endif
 };
 
 int av_strerror(int errnum, char *errbuf, size_t errbuf_size)
 {
-    int ret = 0, i;
-    const struct error_entry *entry = NULL;
-
-    for (i = 0; i < FF_ARRAY_ELEMS(error_entries); i++) {
+    for (size_t i = 0; i < FF_ARRAY_ELEMS(error_entries); ++i) {
         if (errnum == error_entries[i].num) {
-            entry = &error_entries[i];
-            break;
+            av_strlcpy(errbuf, error_stringtable + error_entries[i].offset, errbuf_size);
+            return 0;
         }
     }
-    if (entry) {
-        av_strlcpy(errbuf, entry->str, errbuf_size);
-    } else {
 #if HAVE_STRERROR_R
-        ret = AVERROR(strerror_r(AVUNERROR(errnum), errbuf, errbuf_size));
+    int ret = AVERROR(strerror_r(AVUNERROR(errnum), errbuf, errbuf_size));
 #else
-        ret = -1;
+    int ret = -1;
 #endif
-        if (ret < 0)
-            snprintf(errbuf, errbuf_size, "Error number %d occurred", errnum);
-    }
+    if (ret < 0)
+        snprintf(errbuf, errbuf_size, "Error number %d occurred", errnum);
 
     return ret;
 }
diff --git a/libavutil/tests/error.c b/libavutil/tests/error.c
index b7b253b7b5..774c71fcb9 100644
--- a/libavutil/tests/error.c
+++ b/libavutil/tests/error.c
@@ -18,13 +18,22 @@
 
 #include "libavutil/error.c"
 
+static const char *const tag_list[] = {
+#define ERROR_TAG(CODE, DESC) #CODE,
+#define ERROR_TAG2(CODE, CODE2, DESC) #CODE,
+    AVERROR_LIST(ERROR_TAG, ERROR_TAG2)
+#if !HAVE_STRERROR_R
+    STRERROR_LIST(ERROR_TAG)
+#endif
+};
+
 int main(void)
 {
     int i;
 
     for (i = 0; i < FF_ARRAY_ELEMS(error_entries); i++) {
-        const struct error_entry *entry = &error_entries[i];
-        printf("%d: %s [%s]\n", entry->num, av_err2str(entry->num), entry->tag);
+        const struct ErrorEntry *entry = &error_entries[i];
+        printf("%d: %s [%s]\n", entry->num, av_err2str(entry->num), tag_list[i]);
     }
 
     for (i = 0; i < 256; i++) {
-- 
2.52.0


From 12fd7d7477d605d52a383c4c0061095428b9918c Mon Sep 17 00:00:00 2001
From: Ayose <ayosec@gmail.com>
Date: Mon, 24 Nov 2025 15:40:20 +0000
Subject: [PATCH 056/304] fftools/tf_mermaid: logic to complete subgraphs in a
 reusable function.

`subgraph` blocks can be closed in different places. The logic to complete the
header is moved to a function, so it can be reused later.
---
 fftools/textformat/tf_mermaid.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/fftools/textformat/tf_mermaid.c b/fftools/textformat/tf_mermaid.c
index ef730d570b..eb28ffb6ab 100644
--- a/fftools/textformat/tf_mermaid.c
+++ b/fftools/textformat/tf_mermaid.c
@@ -240,6 +240,20 @@ static void set_str(const char **dst, const char *src)
         *dst = av_strdup(src);
 }
 
+static void mermaid_subgraph_complete_start(MermaidContext *mmc, AVTextFormatContext *tfc, int level) {
+    struct section_data parent_sec_data = mmc->section_data[level];
+    AVBPrint *parent_buf = &tfc->section_pbuf[level];
+
+    if (parent_sec_data.subgraph_start_incomplete) {
+        if (parent_buf->len > 0)
+            writer_printf(tfc, "%s", parent_buf->str);
+
+        writer_put_str(tfc, "</div>\"]\n");
+
+        mmc->section_data[level].subgraph_start_incomplete = 0;
+    }
+}
+
 #define MM_INDENT() writer_printf(tfc, "%*c", mmc->indent_level * 2, ' ')
 
 static void mermaid_print_section_header(AVTextFormatContext *tfc, const void *data)
@@ -296,19 +310,7 @@ static void mermaid_print_section_header(AVTextFormatContext *tfc, const void *d
     }
 
     if (parent_section && parent_section->flags & AV_TEXTFORMAT_SECTION_FLAG_IS_SUBGRAPH) {
-
-        struct section_data parent_sec_data = mmc->section_data[tfc->level - 1];
-        AVBPrint *parent_buf = &tfc->section_pbuf[tfc->level - 1];
-
-        if (parent_sec_data.subgraph_start_incomplete) {
-
-            if (parent_buf->len > 0)
-                writer_printf(tfc, "%s", parent_buf->str);
-
-            writer_put_str(tfc, "</div>\"]\n");
-
-            mmc->section_data[tfc->level - 1].subgraph_start_incomplete = 0;
-        }
+        mermaid_subgraph_complete_start(mmc, tfc, tfc->level - 1);
     }
 
     av_freep(&mmc->section_data[tfc->level].section_id);
-- 
2.52.0


From 1d5e65340f2165b5b2143646ffaa90640acb1702 Mon Sep 17 00:00:00 2001
From: Ayose <ayosec@gmail.com>
Date: Mon, 24 Nov 2025 15:43:51 +0000
Subject: [PATCH 057/304] fftools/tf_mermaid: close subgraph header when there
 are no inputs.

Ensure that the fragment to close the header (`</div>\"]`) is written when the
function `mermaid_print_section_header` is called only once, which happens when
the filtergraph has no inputs.
---
 fftools/textformat/tf_mermaid.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fftools/textformat/tf_mermaid.c b/fftools/textformat/tf_mermaid.c
index eb28ffb6ab..fae53d9c40 100644
--- a/fftools/textformat/tf_mermaid.c
+++ b/fftools/textformat/tf_mermaid.c
@@ -456,6 +456,8 @@ static void mermaid_print_section_footer(AVTextFormatContext *tfc)
 
     } else if ((section->flags & AV_TEXTFORMAT_SECTION_FLAG_IS_SUBGRAPH)) {
 
+        mermaid_subgraph_complete_start(mmc, tfc, tfc->level);
+
         MM_INDENT();
         writer_put_str(tfc, "end\n");
 
-- 
2.52.0


From 671d80f427415ef30d6f6f8b60777b9b97e30655 Mon Sep 17 00:00:00 2001
From: Dmitrii Ovchinnikov <ovchinnikov.dmitrii@gmail.com>
Date: Thu, 13 Nov 2025 17:40:26 +0100
Subject: [PATCH 058/304] avutil/hwcontext_amf: move AVMutex to internal
 context

---
 doc/APIchanges            |  3 +++
 libavcodec/amfenc.c       |  6 ++++--
 libavutil/hwcontext_amf.c | 32 +++++++++++++++++++++++++++++---
 libavutil/hwcontext_amf.h |  6 ++++--
 libavutil/version.h       |  2 +-
 5 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/doc/APIchanges b/doc/APIchanges
index bfea83c73a..93c6f92704 100644
--- a/doc/APIchanges
+++ b/doc/APIchanges
@@ -2,6 +2,9 @@ The last version increases of all libraries were on 2025-03-28
 
 API changes, most recent first:
 
+2025-11-18 - xxxxxxxxxx - lavu 60.19.100 - hwcontext_amf.h
+  avutil/hwcontext_amf: add lock and unlock for AVAMFDeviceContext.
+
 2025-11-16 - xxxxxxxxxx - lavu 60.18.100 - cpu.h
   Deprecate AV_CPU_FLAG_FORCE without replacement.
 
diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
index 2174c5bdb2..b07da236c7 100644
--- a/libavcodec/amfenc.c
+++ b/libavcodec/amfenc.c
@@ -564,9 +564,11 @@ static int amf_submit_frame_locked(AVCodecContext *avctx, AVFrame *frame, AMFSur
     AVHWDeviceContext     *hw_device_ctx = (AVHWDeviceContext*)ctx->device_ctx_ref->data;
     AVAMFDeviceContext    *amf_device_ctx = (AVAMFDeviceContext *)hw_device_ctx->hwctx;
 
-    ff_mutex_lock(&amf_device_ctx->mutex);
+    if (amf_device_ctx->lock)
+        amf_device_ctx->lock(amf_device_ctx->lock_ctx);
     ret = amf_submit_frame(avctx, frame, surface_resubmit);
-    ff_mutex_unlock(&amf_device_ctx->mutex);
+    if (amf_device_ctx->unlock)
+        amf_device_ctx->unlock(amf_device_ctx->lock_ctx);
 
     return ret;
 }
diff --git a/libavutil/hwcontext_amf.c b/libavutil/hwcontext_amf.c
index acd9627c68..c754dc4ee5 100644
--- a/libavutil/hwcontext_amf.c
+++ b/libavutil/hwcontext_amf.c
@@ -39,6 +39,7 @@
 #include "pixdesc.h"
 #include "pixfmt.h"
 #include "imgutils.h"
+#include "thread.h"
 #include "libavutil/avassert.h"
 #include <AMF/core/Surface.h>
 #include <AMF/core/Trace.h>
@@ -49,6 +50,15 @@
 #endif
 #define FFMPEG_AMF_WRITER_ID L"ffmpeg_amf"
 
+static void amf_lock_default(void *opaque)
+{
+    ff_mutex_lock((AVMutex*)opaque);
+}
+
+static void amf_unlock_default(void *opaque)
+{
+    ff_mutex_unlock((AVMutex*)opaque);
+}
 
 typedef struct AmfTraceWriter {
     AMFTraceWriterVtbl *vtblp;
@@ -352,7 +362,7 @@ static int amf_transfer_data_from(AVHWFramesContext *ctx, AVFrame *dst,
 
 static void amf_device_uninit(AVHWDeviceContext *device_ctx)
 {
-    AVAMFDeviceContext      *amf_ctx = device_ctx->hwctx;
+    AVAMFDeviceContext *amf_ctx = device_ctx->hwctx;
     AMF_RESULT          res = AMF_NOT_INITIALIZED;
     AMFTrace           *trace;
 
@@ -377,8 +387,14 @@ static void amf_device_uninit(AVHWDeviceContext *device_ctx)
         amf_writer_free(amf_ctx->trace_writer);
     }
 
+    if (amf_ctx->lock_ctx == amf_lock_default) {
+        ff_mutex_destroy((AVMutex*)amf_ctx->lock_ctx);
+        av_freep(&amf_ctx->lock_ctx);
+        amf_ctx->lock = NULL;
+        amf_ctx->unlock = NULL;
+    }
+
     amf_ctx->version = 0;
-    ff_mutex_destroy(&amf_ctx->mutex);
 }
 
 static int amf_device_init(AVHWDeviceContext *ctx)
@@ -387,6 +403,16 @@ static int amf_device_init(AVHWDeviceContext *ctx)
     AMFContext1 *context1 = NULL;
     AMF_RESULT res;
 
+    if (!amf_ctx->lock) {
+        amf_ctx->lock_ctx = av_mallocz(sizeof(AVMutex));
+        if (!amf_ctx->lock_ctx) {
+            return AVERROR(ENOMEM);
+        }
+        ff_mutex_init((AVMutex*)amf_ctx->lock_ctx, NULL);
+        amf_ctx->lock   = amf_lock_default;
+        amf_ctx->unlock = amf_unlock_default;
+    }
+
 #ifdef _WIN32
     res = amf_ctx->context->pVtbl->InitDX11(amf_ctx->context, NULL, AMF_DX11_1);
     if (res == AMF_OK || res == AMF_ALREADY_INITIALIZED) {
@@ -415,7 +441,7 @@ static int amf_device_init(AVHWDeviceContext *ctx)
         }
      }
 #endif
-    ff_mutex_init(&amf_ctx->mutex, NULL);
+
     return 0;
 }
 
diff --git a/libavutil/hwcontext_amf.h b/libavutil/hwcontext_amf.h
index 5b726e3b9e..6f2cabc878 100644
--- a/libavutil/hwcontext_amf.h
+++ b/libavutil/hwcontext_amf.h
@@ -26,7 +26,6 @@
 #include <AMF/core/Context.h>
 #include <AMF/core/Trace.h>
 #include <AMF/core/Debug.h>
-#include "thread.h"
 
 /**
  * This struct is allocated as AVHWDeviceContext.hwctx
@@ -39,7 +38,10 @@ typedef struct AVAMFDeviceContext {
     int64_t             version; ///< version of AMF runtime
     AMFContext         *context;
     AMF_MEMORY_TYPE     memory_type;
-	AVMutex             mutex;
+
+    void (*lock)(void *lock_ctx);
+    void (*unlock)(void *lock_ctx);
+    void *lock_ctx;
 } AVAMFDeviceContext;
 
 enum AMF_SURFACE_FORMAT av_av_to_amf_format(enum AVPixelFormat fmt);
diff --git a/libavutil/version.h b/libavutil/version.h
index d1385a8829..db250d5c9e 100644
--- a/libavutil/version.h
+++ b/libavutil/version.h
@@ -79,7 +79,7 @@
  */
 
 #define LIBAVUTIL_VERSION_MAJOR  60
-#define LIBAVUTIL_VERSION_MINOR  18
+#define LIBAVUTIL_VERSION_MINOR  19
 #define LIBAVUTIL_VERSION_MICRO 100
 
 #define LIBAVUTIL_VERSION_INT   AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \
-- 
2.52.0


From 15eb3770fc75d1a50c78a230c5d2fdb5ab34d79f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Martin=20Storsj=C3=B6?= <martin@martin.st>
Date: Tue, 18 Nov 2025 12:13:37 +0200
Subject: [PATCH 059/304] tools: Make indent_arm_assembly.pl able to reformat a
 file in place

This allows using the tool for one-off reindentations without needing
the check_arm_indent.sh script (e.g. for use outside of ffmpeg),
without having to pipe the file through stdin/stdout.
---
 tools/indent_arm_assembly.pl | 48 ++++++++++++++++++++++++++++++++----
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/tools/indent_arm_assembly.pl b/tools/indent_arm_assembly.pl
index 7d5d1ecef2..359c2bcf4f 100755
--- a/tools/indent_arm_assembly.pl
+++ b/tools/indent_arm_assembly.pl
@@ -32,8 +32,11 @@
 # - Optionally align operand columns vertically according to their
 #   maximum width (accommodating for e.g. x0 vs x10, or v0.8b vs v16.16b).
 #
-# The input code is passed to stdin, and the reformatted code is written
-# on stdout.
+# The script can be executed as "indent_arm_assembly.pl file [outfile]".
+# If no outfile is specified, the given file is overwritten in place.
+#
+# Alternatively, the if no file parameters are given, the script reads input
+# code on stdin, and outputs the reformatted code on stdout.
 
 use strict;
 
@@ -41,6 +44,8 @@ my $indent_operands = 0;
 my $instr_indent = 8;
 my $operand_indent = 24;
 my $match_indent = 0;
+my $file;
+my $outfile;
 
 while (@ARGV) {
     my $opt = shift;
@@ -54,7 +59,13 @@ while (@ARGV) {
     } elsif ($opt eq "-match-indent") {
         $match_indent = 1;
     } else {
-        die "Unrecognized parameter $opt\n";
+        if (!$file) {
+            $file = $opt;
+        } elsif (!$outfile) {
+            $outfile = $opt;
+        } else {
+            die "Unrecognized parameter $opt\n";
+        }
     }
 }
 
@@ -130,7 +141,26 @@ sub columns {
     return indentcolumns($rest, 3);
 }
 
-while (<STDIN>) {
+my $in;
+my $out;
+my $tempfile;
+
+if ($file) {
+    open(INPUT, "$file") or die "Unable to open $file: $!";
+    $in = *INPUT;
+    if ($outfile) {
+        open(OUTPUT, ">$outfile") or die "Unable to open $outfile: $!";
+    } else {
+        $tempfile = "$file.tmp";
+        open(OUTPUT, ">$tempfile") or die "Unable to open $tempfile: $!";
+    }
+    $out = *OUTPUT;
+} else {
+    $in = *STDIN;
+    $out = *STDOUT;
+}
+
+while (<$in>) {
     # Trim off trailing whitespace.
     chomp;
     if (/^([\.\w\d]+:)?(\s+)([\w\\][\w\\\.]*)(?:(\s+)(.*)|$)/) {
@@ -201,5 +231,13 @@ while (<STDIN>) {
             $_ = $label . $indent . $instr . $operand_space . $rest;
         }
     }
-    print $_ . "\n";
+    print $out $_ . "\n";
+}
+
+if ($file) {
+    close(INPUT);
+    close(OUTPUT);
+}
+if ($tempfile) {
+    rename($tempfile, $file);
 }
-- 
2.52.0


From b44d9866135428a980e167e238351c009869d2ad Mon Sep 17 00:00:00 2001
From: Georgii Zagoruiko <george.zaguri@gmail.com>
Date: Mon, 24 Nov 2025 19:51:26 +0000
Subject: [PATCH 060/304] aarch64/vvc: Optimisations of put_luma_h() functions
 for 10/12-bit

RPi4 (auto-vectorisation is turned on)
put_luma_h_10_4x4_c:                                   282.8 ( 1.00x)
put_luma_h_10_8x8_c:                                  1069.5 ( 1.00x)
put_luma_h_10_8x8_neon:                                207.5 ( 5.15x)
put_luma_h_10_16x16_c:                                1999.6 ( 1.00x)
put_luma_h_10_16x16_neon:                              777.5 ( 2.57x)
put_luma_h_10_32x32_c:                                6612.9 ( 1.00x)
put_luma_h_10_32x32_neon:                             3201.6 ( 2.07x)
put_luma_h_10_64x64_c:                               25059.0 ( 1.00x)
put_luma_h_10_64x64_neon:                            13623.5 ( 1.84x)
put_luma_h_10_128x128_c:                             91310.1 ( 1.00x)
put_luma_h_10_128x128_neon:                          50358.3 ( 1.81x)
put_luma_h_12_4x4_c:                                   282.1 ( 1.00x)
put_luma_h_12_8x8_c:                                  1068.4 ( 1.00x)
put_luma_h_12_8x8_neon:                                207.7 ( 5.14x)
put_luma_h_12_16x16_c:                                1998.0 ( 1.00x)
put_luma_h_12_16x16_neon:                              777.5 ( 2.57x)
put_luma_h_12_32x32_c:                                6612.0 ( 1.00x)
put_luma_h_12_32x32_neon:                             3201.6 ( 2.07x)
put_luma_h_12_64x64_c:                               25036.8 ( 1.00x)
put_luma_h_12_64x64_neon:                            13595.1 ( 1.84x)
put_luma_h_12_128x128_c:                             91305.8 ( 1.00x)
put_luma_h_12_128x128_neon:                          50359.7 ( 1.81x)

Apple M2 Air (auto-vectorisation is turned on)
put_luma_h_10_4x4_c:                                     0.3 ( 1.00x)
put_luma_h_10_8x8_c:                                     1.0 ( 1.00x)
put_luma_h_10_8x8_neon:                                  0.4 ( 2.59x)
put_luma_h_10_16x16_c:                                   2.9 ( 1.00x)
put_luma_h_10_16x16_neon:                                1.4 ( 2.01x)
put_luma_h_10_32x32_c:                                   9.4 ( 1.00x)
put_luma_h_10_32x32_neon:                                5.8 ( 1.62x)
put_luma_h_10_64x64_c:                                  35.6 ( 1.00x)
put_luma_h_10_64x64_neon:                               23.6 ( 1.51x)
put_luma_h_10_128x128_c:                               131.1 ( 1.00x)
put_luma_h_10_128x128_neon:                             92.6 ( 1.42x)
put_luma_h_12_4x4_c:                                     0.3 ( 1.00x)
put_luma_h_12_8x8_c:                                     1.0 ( 1.00x)
put_luma_h_12_8x8_neon:                                  0.4 ( 2.58x)
put_luma_h_12_16x16_c:                                   2.9 ( 1.00x)
put_luma_h_12_16x16_neon:                                1.4 ( 2.00x)
put_luma_h_12_32x32_c:                                   9.4 ( 1.00x)
put_luma_h_12_32x32_neon:                                5.8 ( 1.61x)
put_luma_h_12_64x64_c:                                  35.3 ( 1.00x)
put_luma_h_12_64x64_neon:                               23.3 ( 1.52x)
put_luma_h_12_128x128_c:                               131.2 ( 1.00x)
put_luma_h_12_128x128_neon:                             92.4 ( 1.42x)
---
 libavcodec/aarch64/vvc/dsp_init.c |  22 ++++++
 libavcodec/aarch64/vvc/inter.S    | 119 ++++++++++++++++++++++++++++++
 2 files changed, 141 insertions(+)

diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c
index b7dc1d89f8..aa75d22b78 100644
--- a/libavcodec/aarch64/vvc/dsp_init.c
+++ b/libavcodec/aarch64/vvc/dsp_init.c
@@ -30,6 +30,18 @@
 #define BDOF_BLOCK_SIZE         16
 #define BDOF_MIN_BLOCK_SIZE     4
 
+void ff_vvc_put_luma_h8_10_neon(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_stride,
+                                const int height, const int8_t *hf, const int8_t *vf, const int width);
+void ff_vvc_put_luma_h16_10_neon(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_stride,
+                                 const int height, const int8_t *hf, const int8_t *vf, const int width);
+void ff_vvc_put_luma_h_x16_10_neon(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_stride,
+                                   const int height, const int8_t *hf, const int8_t *vf, const int width);
+void ff_vvc_put_luma_h8_12_neon(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_stride,
+                                const int height, const int8_t *hf, const int8_t *vf, const int width);
+void ff_vvc_put_luma_h16_12_neon(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_stride,
+                                 const int height, const int8_t *hf, const int8_t *vf, const int width);
+void ff_vvc_put_luma_h_x16_12_neon(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_stride,
+                                   const int height, const int8_t *hf, const int8_t *vf, const int width);
 
 void ff_alf_classify_sum_neon(int *sum0, int *sum1, int16_t *grad, uint32_t gshift, uint32_t steps);
 
@@ -245,6 +257,11 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd)
         c->inter.dmvr[0][1] = ff_vvc_dmvr_h_10_neon;
         c->inter.dmvr[1][1] = ff_vvc_dmvr_hv_10_neon;
         c->inter.apply_bdof = ff_vvc_apply_bdof_10_neon;
+        c->inter.put[0][2][0][1] = ff_vvc_put_luma_h8_10_neon;
+        c->inter.put[0][3][0][1] = ff_vvc_put_luma_h16_10_neon;
+        c->inter.put[0][4][0][1] =
+        c->inter.put[0][5][0][1] =
+        c->inter.put[0][6][0][1] = ff_vvc_put_luma_h_x16_10_neon;
 
         c->alf.filter[LUMA] = alf_filter_luma_10_neon;
         c->alf.filter[CHROMA] = alf_filter_chroma_10_neon;
@@ -256,6 +273,11 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd)
         c->inter.dmvr[0][1] = ff_vvc_dmvr_h_12_neon;
         c->inter.dmvr[1][1] = ff_vvc_dmvr_hv_12_neon;
         c->inter.apply_bdof = ff_vvc_apply_bdof_12_neon;
+        c->inter.put[0][2][0][1] = ff_vvc_put_luma_h8_12_neon;
+        c->inter.put[0][3][0][1] = ff_vvc_put_luma_h16_12_neon;
+        c->inter.put[0][4][0][1] =
+        c->inter.put[0][5][0][1] =
+        c->inter.put[0][6][0][1] = ff_vvc_put_luma_h_x16_12_neon;
 
         c->alf.filter[LUMA] = alf_filter_luma_12_neon;
         c->alf.filter[CHROMA] = alf_filter_chroma_12_neon;
diff --git a/libavcodec/aarch64/vvc/inter.S b/libavcodec/aarch64/vvc/inter.S
index a874edf889..41444ec44c 100644
--- a/libavcodec/aarch64/vvc/inter.S
+++ b/libavcodec/aarch64/vvc/inter.S
@@ -1713,3 +1713,122 @@ endfunc
 #undef GRADIENT_V1_OFFSET
 #undef VX_OFFSET
 #undef VY_OFFSET
+
+#define VVC_MAX_PB_SIZE 128
+
+.macro put_luma_h_x8_vector_filter shift
+        // 8 bytes from hf loaded to v0.8h
+        // 32 bytes from _src loaded to v20.8h & v21.8h where v21.8h is loaded for shift to v1.8h,..,v6.8h,v17.8h
+        // v24.4h & v25.4h are output vectors to store
+        ext             v1.16b, v20.16b, v21.16b, #2
+        ext             v2.16b, v20.16b, v21.16b, #4
+        ext             v3.16b, v20.16b, v21.16b, #6
+        ext             v4.16b, v20.16b, v21.16b, #8
+        ext             v5.16b, v20.16b, v21.16b, #10
+        ext             v6.16b, v20.16b, v21.16b, #12
+        ext             v17.16b, v20.16b, v21.16b, #14
+        smull           v24.4s, v20.4h, v0.h[0]
+        smull2          v25.4s, v20.8h, v0.h[0]
+        smlal           v24.4s, v1.4h, v0.h[1]
+        smlal2          v25.4s, v1.8h, v0.h[1]
+        smlal           v24.4s, v2.4h, v0.h[2]
+        smlal2          v25.4s, v2.8h, v0.h[2]
+        smlal           v24.4s, v3.4h, v0.h[3]
+        smlal2          v25.4s, v3.8h, v0.h[3]
+        smlal           v24.4s, v4.4h, v0.h[4]
+        smlal2          v25.4s, v4.8h, v0.h[4]
+        smlal           v24.4s, v5.4h, v0.h[5]
+        smlal2          v25.4s, v5.8h, v0.h[5]
+        smlal           v24.4s, v6.4h, v0.h[6]
+        smlal2          v25.4s, v6.8h, v0.h[6]
+        smlal           v24.4s, v17.4h, v0.h[7]
+        smlal2          v25.4s, v17.8h, v0.h[7]
+        sqshrn          v24.4h, v24.4s, #(\shift)
+        sqshrn          v25.4h, v25.4s, #(\shift)
+.endm
+
+.macro put_luma_h8_xx_neon shift
+        mov             x9, #(VVC_MAX_PB_SIZE * 2)
+        ld1             {v0.8b}, [x4]
+        sub             x1, x1, #6
+        sxtl            v0.8h, v0.8b
+1:
+        ld1             {v20.8h, v21.8h}, [x1], x2
+        put_luma_h_x8_vector_filter \shift
+        subs            w3, w3, #1
+        st1             {v24.4h, v25.4h}, [x0], x9
+        b.gt            1b
+        ret
+.endm
+
+.macro put_luma_h16_xx_neon shift
+        mov             x9, #(VVC_MAX_PB_SIZE * 2)
+        ld1             {v0.8b}, [x4]
+        sub             x9, x9, #16
+        sub             x1, x1, #6
+        sxtl            v0.8h, v0.8b
+1:
+        ld1             {v20.8h, v21.8h, v22.8h}, [x1], x2
+        put_luma_h_x8_vector_filter \shift
+        mov             v20.16b, v21.16b
+        mov             v21.16b, v22.16b
+        st1             {v24.4h, v25.4h}, [x0], #16
+        put_luma_h_x8_vector_filter \shift
+        subs            w3, w3, #1
+        st1             {v24.4h, v25.4h}, [x0], x9
+        b.gt            1b
+        ret
+.endm
+
+.macro put_luma_h_x16_xx_neon shift
+        mov             x9, #(VVC_MAX_PB_SIZE * 2)
+        ld1             {v0.8b}, [x4]
+        sub             x9, x9, w6, uxtw #1
+        sub             x2, x2, w6, uxtw #1
+        sxtl            v0.8h, v0.8b
+        sub             x1, x1, #6
+        sub             x2, x2, #16
+1:
+        ld1             {v20.8h}, [x1], #16
+        mov             w8, w6
+2:
+        ld1             {v21.8h, v22.8h}, [x1], #32
+        put_luma_h_x8_vector_filter \shift
+        mov             v20.16b, v21.16b
+        mov             v21.16b, v22.16b
+        st1             {v24.4h, v25.4h}, [x0], #16
+        put_luma_h_x8_vector_filter \shift
+        mov             v20.16b, v21.16b
+        subs            w8, w8, #16
+        st1             {v24.4h, v25.4h}, [x0], #16
+        b.gt            2b
+        subs            w3, w3, #1
+        add             x0, x0, x9
+        add             x1, x1, x2
+        b.gt            1b
+        ret
+.endm
+
+function ff_vvc_put_luma_h8_10_neon, export=1
+        put_luma_h8_xx_neon 2
+endfunc
+
+function ff_vvc_put_luma_h8_12_neon, export=1
+        put_luma_h8_xx_neon 4
+endfunc
+
+function ff_vvc_put_luma_h16_10_neon, export=1
+        put_luma_h16_xx_neon 2
+endfunc
+
+function ff_vvc_put_luma_h16_12_neon, export=1
+        put_luma_h16_xx_neon 4
+endfunc
+
+function ff_vvc_put_luma_h_x16_10_neon, export=1
+        put_luma_h_x16_xx_neon 2
+endfunc
+
+function ff_vvc_put_luma_h_x16_12_neon, export=1
+        put_luma_h_x16_xx_neon 4
+endfunc
-- 
2.52.0


From d26901c5074255dd122cab0e002d16b10ec04a74 Mon Sep 17 00:00:00 2001
From: Jack Lau <jacklau1222gm@gmail.com>
Date: Mon, 17 Nov 2025 13:32:05 +0800
Subject: [PATCH 061/304] configure: replace openssl header check with 1.1.1
 API

Fix #20571

Avoid build errors with openssl forks (like libressl)
that lack some APIs.

This patch replace header check for OPENSSL_init_ssl
(was added in 1.1.0) with the OpenSSL 1.1.1 new API
DTLS_get_data_mtu.

Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
---
 configure | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/configure b/configure
index 7d6061b55c..99734e9d03 100755
--- a/configure
+++ b/configure
@@ -7353,12 +7353,12 @@ enabled omx_rpi           && { test_code cc OMX_Core.h OMX_IndexConfigBrcmVideoR
 enabled omx               && require_headers OMX_Core.h && \
     warn "The OpenMAX encoders are deprecated and will be removed in future versions"
 
-enabled openssl           && { { check_pkg_config openssl "openssl >= 3.0.0" openssl/ssl.h OPENSSL_init_ssl &&
+enabled openssl           && { { check_pkg_config openssl "openssl >= 3.0.0" openssl/ssl.h DTLS_get_data_mtu &&
                                  { enabled gplv3 || ! enabled gpl || enabled nonfree || die "ERROR: OpenSSL >=3.0.0 requires --enable-version3"; }; } ||
                                { enabled gpl && ! enabled nonfree && die "ERROR: OpenSSL <3.0.0 is incompatible with the gpl"; } ||
-                               check_pkg_config openssl "openssl >= 1.1.1" openssl/ssl.h OPENSSL_init_ssl ||
-                               check_lib openssl openssl/ssl.h OPENSSL_init_ssl -lssl -lcrypto ||
-                               check_lib openssl openssl/ssl.h OPENSSL_init_ssl -lssl -lcrypto -lws2_32 -lgdi32 ||
+                               check_pkg_config openssl "openssl >= 1.1.1" openssl/ssl.h DTLS_get_data_mtu ||
+                               check_lib openssl openssl/ssl.h DTLS_get_data_mtu -lssl -lcrypto ||
+                               check_lib openssl openssl/ssl.h DTLS_get_data_mtu -lssl -lcrypto -lws2_32 -lgdi32 ||
                                die "ERROR: openssl (>= 1.1.1) not found"; }
 enabled pocketsphinx      && require_pkg_config pocketsphinx pocketsphinx pocketsphinx/pocketsphinx.h ps_init
 enabled rkmpp             && { require_pkg_config rkmpp rockchip_mpp  rockchip/rk_mpi.h mpp_create &&
-- 
2.52.0


From 4132785530872a8da15fec804c94266f29f28c16 Mon Sep 17 00:00:00 2001
From: Gyan Doshi <ffmpeg@gyani.pro>
Date: Tue, 25 Nov 2025 12:41:37 +0530
Subject: [PATCH 062/304] avfilter/zscale: add support for resize filter
 spline64

Fixes #20928
---
 doc/filters.texi        | 1 +
 libavfilter/vf_zscale.c | 5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/doc/filters.texi b/doc/filters.texi
index 7605812428..168ea0d2da 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -26504,6 +26504,7 @@ Possible values are:
 @item bicubic
 @item spline16
 @item spline36
+@item spline64
 @item lanczos
 @end table
 
diff --git a/libavfilter/vf_zscale.c b/libavfilter/vf_zscale.c
index 9ea50133c6..96cbd3c4b2 100644
--- a/libavfilter/vf_zscale.c
+++ b/libavfilter/vf_zscale.c
@@ -940,13 +940,14 @@ static const AVOption zscale_options[] = {
     {     "ordered",          0,       0,                 AV_OPT_TYPE_CONST, {.i64 = ZIMG_DITHER_ORDERED},  0, 0, FLAGS, .unit = "dither" },
     {     "random",           0,       0,                 AV_OPT_TYPE_CONST, {.i64 = ZIMG_DITHER_RANDOM},   0, 0, FLAGS, .unit = "dither" },
     {     "error_diffusion",  0,       0,                 AV_OPT_TYPE_CONST, {.i64 = ZIMG_DITHER_ERROR_DIFFUSION}, 0, 0, FLAGS, .unit = "dither" },
-    { "filter", "set filter type",     OFFSET(filter),    AV_OPT_TYPE_INT, {.i64 = ZIMG_RESIZE_BILINEAR}, 0, ZIMG_RESIZE_LANCZOS, FLAGS, .unit = "filter" },
-    { "f",      "set filter type",     OFFSET(filter),    AV_OPT_TYPE_INT, {.i64 = ZIMG_RESIZE_BILINEAR}, 0, ZIMG_RESIZE_LANCZOS, FLAGS, .unit = "filter" },
+    { "filter", "set filter type",     OFFSET(filter),    AV_OPT_TYPE_INT, {.i64 = ZIMG_RESIZE_BILINEAR}, 0, ZIMG_RESIZE_SPLINE64, FLAGS, .unit = "filter" },
+    { "f",      "set filter type",     OFFSET(filter),    AV_OPT_TYPE_INT, {.i64 = ZIMG_RESIZE_BILINEAR}, 0, ZIMG_RESIZE_SPLINE64, FLAGS, .unit = "filter" },
     {     "point",            0,       0,                 AV_OPT_TYPE_CONST, {.i64 = ZIMG_RESIZE_POINT},    0, 0, FLAGS, .unit = "filter" },
     {     "bilinear",         0,       0,                 AV_OPT_TYPE_CONST, {.i64 = ZIMG_RESIZE_BILINEAR}, 0, 0, FLAGS, .unit = "filter" },
     {     "bicubic",          0,       0,                 AV_OPT_TYPE_CONST, {.i64 = ZIMG_RESIZE_BICUBIC},  0, 0, FLAGS, .unit = "filter" },
     {     "spline16",         0,       0,                 AV_OPT_TYPE_CONST, {.i64 = ZIMG_RESIZE_SPLINE16}, 0, 0, FLAGS, .unit = "filter" },
     {     "spline36",         0,       0,                 AV_OPT_TYPE_CONST, {.i64 = ZIMG_RESIZE_SPLINE36}, 0, 0, FLAGS, .unit = "filter" },
+    {     "spline64",         0,       0,                 AV_OPT_TYPE_CONST, {.i64 = ZIMG_RESIZE_SPLINE64}, 0, 0, FLAGS, .unit = "filter" },
     {     "lanczos",          0,       0,                 AV_OPT_TYPE_CONST, {.i64 = ZIMG_RESIZE_LANCZOS},  0, 0, FLAGS, .unit = "filter" },
     { "out_range", "set color range",  OFFSET(range),     AV_OPT_TYPE_INT, {.i64 = -1}, -1, ZIMG_RANGE_FULL, FLAGS, .unit = "range" },
     { "range", "set color range",      OFFSET(range),     AV_OPT_TYPE_INT, {.i64 = -1}, -1, ZIMG_RANGE_FULL, FLAGS, .unit = "range" },
-- 
2.52.0


From 5d5f06828093f1a0be59499ccfef981a06bdf9b9 Mon Sep 17 00:00:00 2001
From: averne <averne381@gmail.com>
Date: Sun, 2 Nov 2025 20:23:28 +0100
Subject: [PATCH 063/304] vulkan/prores: Adopt the same IDCT routine as the
 prores-raw hwaccel

The added rounding at the final output conforms
to the SMPTE document and reduces the deviation
against the software decoder.
---
 libavcodec/vulkan/prores_idct.comp | 105 +++++++++++++++++++----------
 1 file changed, 68 insertions(+), 37 deletions(-)

diff --git a/libavcodec/vulkan/prores_idct.comp b/libavcodec/vulkan/prores_idct.comp
index 4b39b3d8ae..05ba8e4967 100644
--- a/libavcodec/vulkan/prores_idct.comp
+++ b/libavcodec/vulkan/prores_idct.comp
@@ -37,47 +37,77 @@ void put_px(uint tex_idx, ivec2 pos, uint v)
 #endif
 }
 
+const float idct_8x8_scales[] = {
+    0.353553390593274f, // cos(4 * pi/16) / 2
+    0.490392640201615f, // cos(1 * pi/16) / 2
+    0.461939766255643f, // cos(2 * pi/16) / 2
+    0.415734806151273f, // cos(3 * pi/16) / 2
+    0.353553390593274f, // cos(4 * pi/16) / 2
+    0.277785116509801f, // cos(5 * pi/16) / 2
+    0.191341716182545f, // cos(6 * pi/16) / 2
+    0.097545161008064f, // cos(7 * pi/16) / 2
+};
+
 /* 7.4 Inverse Transform */
 void idct(uint block, uint offset, uint stride)
 {
-    float c0 = blocks[block][0*stride + offset];
-    float c1 = blocks[block][1*stride + offset];
-    float c2 = blocks[block][2*stride + offset];
-    float c3 = blocks[block][3*stride + offset];
-    float c4 = blocks[block][4*stride + offset];
-    float c5 = blocks[block][5*stride + offset];
-    float c6 = blocks[block][6*stride + offset];
-    float c7 = blocks[block][7*stride + offset];
+    float t0, t1, t2, t3, t4, t5, t6, t7, u8;
+    float u0, u1, u2, u3, u4, u5, u6, u7;
 
-    float tmp1 = c6 * 1.4142134189605712891 + (c2 - c6);
-    float tmp2 = c6 * 1.4142134189605712891 - (c2 - c6);
+    /* Input */
+    t0 = blocks[block][0*stride + offset];
+    u4 = blocks[block][1*stride + offset];
+    t2 = blocks[block][2*stride + offset];
+    u6 = blocks[block][3*stride + offset];
+    t1 = blocks[block][4*stride + offset];
+    u5 = blocks[block][5*stride + offset];
+    t3 = blocks[block][6*stride + offset];
+    u7 = blocks[block][7*stride + offset];
 
-    float a1 = (c0 + c4) * 0.35355341434478759766 + tmp1 * 0.46193981170654296875;
-    float a4 = (c0 + c4) * 0.35355341434478759766 - tmp1 * 0.46193981170654296875;
+    /* Embedded scaled inverse 4-point Type-II DCT */
+    u0 = t0 + t1;
+    u1 = t0 - t1;
+    u3 = t2 + t3;
+    u2 = (t2 - t3)*(1.4142135623730950488016887242097f) - u3;
+    t0 = u0 + u3;
+    t3 = u0 - u3;
+    t1 = u1 + u2;
+    t2 = u1 - u2;
 
-    float a3 = (c0 - c4) * 0.35355341434478759766 + tmp2 * 0.19134169816970825195;
-    float a2 = (c0 - c4) * 0.35355341434478759766 - tmp2 * 0.19134169816970825195;
+    /* Embedded scaled inverse 4-point Type-IV DST */
+    t5 = u5 + u6;
+    t6 = u5 - u6;
+    t7 = u4 + u7;
+    t4 = u4 - u7;
+    u7 = t7 + t5;
+    u5 = (t7 - t5)*(1.4142135623730950488016887242097f);
+    u8 = (t4 + t6)*(1.8477590650225735122563663787936f);
+    u4 = u8 - t4*(1.0823922002923939687994464107328f);
+    u6 = u8 - t6*(2.6131259297527530557132863468544f);
+    t7 = u7;
+    t6 = t7 - u6;
+    t5 = t6 + u5;
+    t4 = t5 - u4;
 
-    float tmp3 = (c3 - c5) * 0.70710682868957519531 + c7;
-    float tmp4 = (c3 - c5) * 0.70710682868957519531 - c7;
+    /* Butterflies */
+    u0 = t0 + t7;
+    u7 = t0 - t7;
+    u6 = t1 + t6;
+    u1 = t1 - t6;
+    u2 = t2 + t5;
+    u5 = t2 - t5;
+    u4 = t3 + t4;
+    u3 = t3 - t4;
 
-    float tmp5 = (c5 - c7) *  1.4142134189605712891 + (c5 - c7) + (c1 - c3);
-    float tmp6 = (c5 - c7) * -1.4142134189605712891 + (c5 - c7) + (c1 - c3);
-
-    float m1 = tmp3 *  2.6131260395050048828 + tmp5;
-    float m4 = tmp3 * -2.6131260395050048828 + tmp5;
-
-    float m2 = tmp4 *  1.0823919773101806641 + tmp6;
-    float m3 = tmp4 * -1.0823919773101806641 + tmp6;
-
-    blocks[block][0*stride + offset] = m1 *  0.49039259552955627441  + a1;
-    blocks[block][7*stride + offset] = m1 * -0.49039259552955627441  + a1;
-    blocks[block][1*stride + offset] = m2 *  0.41573479771614074707  + a2;
-    blocks[block][6*stride + offset] = m2 * -0.41573479771614074707  + a2;
-    blocks[block][2*stride + offset] = m3 *  0.27778509259223937988  + a3;
-    blocks[block][5*stride + offset] = m3 * -0.27778509259223937988  + a3;
-    blocks[block][3*stride + offset] = m4 *  0.097545139491558074951 + a4;
-    blocks[block][4*stride + offset] = m4 * -0.097545139491558074951 + a4;
+    /* Output */
+    blocks[block][0*stride + offset] = u0;
+    blocks[block][1*stride + offset] = u1;
+    blocks[block][2*stride + offset] = u2;
+    blocks[block][3*stride + offset] = u3;
+    blocks[block][4*stride + offset] = u4;
+    blocks[block][5*stride + offset] = u5;
+    blocks[block][6*stride + offset] = u6;
+    blocks[block][7*stride + offset] = u7;
 }
 
 void main(void)
@@ -90,7 +120,7 @@ void main(void)
     /* Coalesced load of DCT coeffs in shared memory, inverse quantization */
     if (act) {
         /**
-         * According to spec indexing an array in push constant memory with
+         * According to the VK spec indexing an array in push constant memory with
          * a non-dynamically uniform value is illegal ($15.9.1 in v1.4.326),
          * so copy the whole matrix locally.
          */
@@ -101,8 +131,9 @@ void main(void)
         int qscale = qidx > 128 ? (qidx - 96) << 2 : qidx;
 
         [[unroll]] for (uint i = 0; i < 8; ++i) {
-            int v = sign_extend(int(get_px(comp, ivec2(gid.x, (gid.y << 3) + i))), 16);
-            blocks[block][i * 9 + idx] = float(v * qscale * int(qmat[(i << 3) + idx]));
+            int   c = sign_extend(int(get_px(comp, ivec2(gid.x, (gid.y << 3) + i))), 16);
+            float v = float(c * qscale * int(qmat[(i << 3) + idx]));
+            blocks[block][i * 9 + idx] = v * idct_8x8_scales[idx] * idct_8x8_scales[i];
         }
     }
 
@@ -121,7 +152,7 @@ void main(void)
     barrier();
     if (act) {
         [[unroll]] for (uint i = 0; i < 8; ++i) {
-            float v = blocks[block][i * 9 + idx] * fact + off;
+            float v = round(blocks[block][i * 9 + idx] * fact + off);
             put_px(comp, ivec2(gid.x, (gid.y << 3) + i), clamp(int(v), 0, maxv));
         }
     }
-- 
2.52.0


From b5ad68e76a0b89e0f6749b279b83d616235670ea Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Tue, 25 Nov 2025 18:06:33 +0100
Subject: [PATCH 064/304] avfilter/vf_drawvg: round color values to avoid
 differences on some platforms
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This ensures consistent color conversion between double and u8 and
guarantees that values remain consistent across different platforms,
especially when x87 math is used.

Note that libcairo also performs rounding internally when converting
doubles to integers, see _cairo_color_double_to_short().

Fixes: fate-filter-drawvg-interpreter
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 libavfilter/tests/drawvg.c               | 2 +-
 libavfilter/vf_drawvg.c                  | 2 +-
 tests/ref/fate/filter-drawvg-interpreter | 8 ++++----
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/libavfilter/tests/drawvg.c b/libavfilter/tests/drawvg.c
index 2d151bf4d2..9fb233d969 100644
--- a/libavfilter/tests/drawvg.c
+++ b/libavfilter/tests/drawvg.c
@@ -167,7 +167,7 @@ void cairo_set_source(cairo_t *cr, cairo_pattern_t *source) {
     printf("%s", __func__);
 
 #define PRINT_COLOR(prefix) \
-    printf(prefix "#%02x%02x%02x%02x", (int)(r*255), (int)(g*255), (int)(b*255), (int)(a*255))
+    printf(prefix "#%02lx%02lx%02lx%02lx", lround(r*255), lround(g*255), lround(b*255), lround(a*255))
 
     switch (cairo_pattern_get_type(source)) {
     case CAIRO_PATTERN_TYPE_SOLID:
diff --git a/libavfilter/vf_drawvg.c b/libavfilter/vf_drawvg.c
index fd9270ee13..5d3008084c 100644
--- a/libavfilter/vf_drawvg.c
+++ b/libavfilter/vf_drawvg.c
@@ -1981,7 +1981,7 @@ static int vgs_eval(
                 b = numerics[3];
             }
 
-            #define C(v, o) ((uint32_t)(av_clipd(v, 0, 1) * 255) << o)
+            #define C(v, o) ((uint32_t)lround(av_clipd(v, 0, 1) * 255) << o)
 
             state->vars[user_var] = (double)(
                 C(r, 24)
diff --git a/tests/ref/fate/filter-drawvg-interpreter b/tests/ref/fate/filter-drawvg-interpreter
index 21c6ccd848..3fc33e9c07 100644
--- a/tests/ref/fate/filter-drawvg-interpreter
+++ b/tests/ref/fate/filter-drawvg-interpreter
@@ -64,16 +64,16 @@ cairo_set_dash [ -1.0 ] 4.0
 cairo_set_dash [ ] 0.0
 cairo_move_to 1.0 2.0
 cairo_rel_line_to -1.0 -2.0
-cairo_set_source #19334c66
+cairo_set_source #1a334d66
 cairo_set_fill_rule 0
 cairo_fill
-cairo_set_source #475b3d66
+cairo_set_source #475c3d66
 cairo_set_fill_rule 0
 cairo_fill
-cairo_set_source #7f99b2cc
+cairo_set_source #8099b3cc
 cairo_set_fill_rule 0
 cairo_fill
-cairo_set_source #a8d7efe5
+cairo_set_source #a8d8f0e6
 cairo_set_fill_rule 0
 cairo_fill
 cairo_rel_line_to 1.0 3.0
-- 
2.52.0


From 12c7dd4c3082df378b25629bc299e0a32e7665d0 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Mon, 17 Nov 2025 10:50:54 +0100
Subject: [PATCH 065/304] avfilter/x86/f_ebur128: only use filter_channels_avx
 for >= 2 channels

The approach of this ASM routine is to process two channels at a time using
AVX instructions. Obviously, there is no point in doing this if there is only
a single channel; in which case the scalar loop would be better.

Fixes a performance regression when filtering mono audio on certain CPUs,
notably e.g. the Intel N100.
---
 libavfilter/x86/f_ebur128_init.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libavfilter/x86/f_ebur128_init.c b/libavfilter/x86/f_ebur128_init.c
index 4a7109746d..6d43b4887f 100644
--- a/libavfilter/x86/f_ebur128_init.c
+++ b/libavfilter/x86/f_ebur128_init.c
@@ -33,7 +33,8 @@ av_cold void ff_ebur128_init_x86(EBUR128DSPContext *dsp, int nb_channels)
     int cpu_flags = av_get_cpu_flags();
 
     if (ARCH_X86_64 && EXTERNAL_AVX(cpu_flags)) {
-        dsp->filter_channels = ff_ebur128_filter_channels_avx;
+        if (nb_channels >= 2)
+            dsp->filter_channels = ff_ebur128_filter_channels_avx;
         if (nb_channels == 2)
             dsp->find_peak = ff_ebur128_find_peak_2ch_avx;
     }
-- 
2.52.0


From d8e4f9e3208e2b1b3b782058b82fe4c055f09da0 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 25 Nov 2025 15:29:19 +0100
Subject: [PATCH 066/304] avcodec/x86/me_cmp: Avoid manual stack handling

Use x86inc's stack alignment feature instead of allocating the stack
manually*; this means that this code now also automatically supports
unaligned stacks, so that the SSE2 and SSSE3 functions will now be
available everywhere.

*: The code for this was also buggy: It resulted in the stack pointer
to be 4 mod 8 for x64 for the mmxext version before it was disabled
in 542765ce3eccbca587d54262a512cbdb1407230d, because it hardcode 4
instead of using gprsize.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/me_cmp.asm    | 23 ++++-------------------
 libavcodec/x86/me_cmp_init.c |  4 ----
 2 files changed, 4 insertions(+), 23 deletions(-)

diff --git a/libavcodec/x86/me_cmp.asm b/libavcodec/x86/me_cmp.asm
index 4545eae276..e123089ba3 100644
--- a/libavcodec/x86/me_cmp.asm
+++ b/libavcodec/x86/me_cmp.asm
@@ -152,23 +152,11 @@ SECTION .text
 %endmacro
 
 %macro hadamard8_16_wrapper 2
-cglobal hadamard8_diff, 4, 4, %1
-%ifndef m8
-    %assign pad %2*mmsize-(4+stack_offset&(mmsize-1))
-    SUB            rsp, pad
-%endif
+cglobal hadamard8_diff, 4, 4, %1, %2*mmsize
     call hadamard8x8_diff %+ SUFFIX
-%ifndef m8
-    ADD            rsp, pad
-%endif
     RET
 
-cglobal hadamard8_diff16, 5, 6, %1
-%ifndef m8
-    %assign pad %2*mmsize-(4+stack_offset&(mmsize-1))
-    SUB            rsp, pad
-%endif
-
+cglobal hadamard8_diff16, 5, 6, %1, %2*mmsize
     call hadamard8x8_diff %+ SUFFIX
     mov            r5d, eax
 
@@ -192,9 +180,6 @@ cglobal hadamard8_diff16, 5, 6, %1
 
 .done:
     mov            eax, r5d
-%ifndef m8
-    ADD            rsp, pad
-%endif
     RET
 %endmacro
 
@@ -215,7 +200,7 @@ hadamard8x8_diff %+ SUFFIX:
     and                         eax, 0xFFFF
     ret
 
-hadamard8_16_wrapper %1, 3
+hadamard8_16_wrapper %1, 2*ARCH_X86_32
 %elif cpuflag(mmx)
 ALIGN 16
 ; int ff_hadamard8_diff_ ## cpu(MPVEncContext *s, const uint8_t *src1,
@@ -261,7 +246,7 @@ hadamard8x8_diff %+ SUFFIX:
     and                         rax, 0xFFFF
     ret
 
-hadamard8_16_wrapper 0, 14
+hadamard8_16_wrapper 0, 13
 %endif
 %endmacro
 
diff --git a/libavcodec/x86/me_cmp_init.c b/libavcodec/x86/me_cmp_init.c
index d4503eef3b..35abbbf7f5 100644
--- a/libavcodec/x86/me_cmp_init.c
+++ b/libavcodec/x86/me_cmp_init.c
@@ -146,10 +146,8 @@ av_cold void ff_me_cmp_init_x86(MECmpContext *c, AVCodecContext *avctx)
         c->pix_abs[0][2] = ff_sad16_y2_sse2;
         c->pix_abs[0][3] = ff_sad16_xy2_sse2;
 
-#if HAVE_ALIGNED_STACK
         c->hadamard8_diff[0] = ff_hadamard8_diff16_sse2;
         c->hadamard8_diff[1] = ff_hadamard8_diff_sse2;
-#endif
         if (avctx->codec_id != AV_CODEC_ID_SNOW) {
             c->sad[0]        = ff_sad16_sse2;
 
@@ -179,10 +177,8 @@ av_cold void ff_me_cmp_init_x86(MECmpContext *c, AVCodecContext *avctx)
         c->nsse[1]           = nsse8_ssse3;
 
         c->sum_abs_dctelem   = ff_sum_abs_dctelem_ssse3;
-#if HAVE_ALIGNED_STACK
         c->hadamard8_diff[0] = ff_hadamard8_diff16_ssse3;
         c->hadamard8_diff[1] = ff_hadamard8_diff_ssse3;
-#endif
     }
 #endif
 }
-- 
2.52.0


From f808b78ebbc07af24b550f27242585459ca37b85 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 25 Nov 2025 15:45:44 +0100
Subject: [PATCH 067/304] avcodec/me_cmp: Remove MMXEXT hadamard diff functions

The SSE2 and SSSE3 functions are now available everywhere,
making the MMXEXT functions irrelevant.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/me_cmp.asm    | 88 ++----------------------------------
 libavcodec/x86/me_cmp_init.c |  6 ---
 2 files changed, 4 insertions(+), 90 deletions(-)

diff --git a/libavcodec/x86/me_cmp.asm b/libavcodec/x86/me_cmp.asm
index e123089ba3..2a196d03bb 100644
--- a/libavcodec/x86/me_cmp.asm
+++ b/libavcodec/x86/me_cmp.asm
@@ -112,7 +112,6 @@ SECTION .text
 ; about 100k on extreme inputs. But that's very unlikely to occur in natural video,
 ; and it's even more unlikely to not have any alternative mvs/modes with lower cost.
 %macro HSUM 3
-%if cpuflag(sse2)
     movhlps         %2, %1
     paddusw         %1, %2
     pshuflw         %2, %1, 0xE
@@ -120,35 +119,6 @@ SECTION .text
     pshuflw         %2, %1, 0x1
     paddusw         %1, %2
     movd            %3, %1
-%elif cpuflag(mmxext)
-    pshufw          %2, %1, 0xE
-    paddusw         %1, %2
-    pshufw          %2, %1, 0x1
-    paddusw         %1, %2
-    movd            %3, %1
-%elif cpuflag(mmx)
-    mova            %2, %1
-    psrlq           %1, 32
-    paddusw         %1, %2
-    mova            %2, %1
-    psrlq           %1, 16
-    paddusw         %1, %2
-    movd            %3, %1
-%endif
-%endmacro
-
-%macro STORE4 5
-    mova [%1+mmsize*0], %2
-    mova [%1+mmsize*1], %3
-    mova [%1+mmsize*2], %4
-    mova [%1+mmsize*3], %5
-%endmacro
-
-%macro LOAD4 5
-    mova            %2, [%1+mmsize*0]
-    mova            %3, [%1+mmsize*1]
-    mova            %4, [%1+mmsize*2]
-    mova            %5, [%1+mmsize*3]
 %endmacro
 
 %macro hadamard8_16_wrapper 2
@@ -183,8 +153,10 @@ cglobal hadamard8_diff16, 5, 6, %1, %2*mmsize
     RET
 %endmacro
 
-%macro HADAMARD8_DIFF 0-1
-%if cpuflag(sse2)
+%macro HADAMARD8_DIFF 1
+; r1, r2 and r3 are not clobbered in this function, so 16x16 can
+; simply call this 2x2x (and that's why we access rsp+gprsize
+; everywhere, which is rsp of calling function)
 hadamard8x8_diff %+ SUFFIX:
     lea                          r0, [r3*3]
     DIFF_PIXELS_8                r1, r2,  0, r3, r0, rsp+gprsize
@@ -201,60 +173,8 @@ hadamard8x8_diff %+ SUFFIX:
     ret
 
 hadamard8_16_wrapper %1, 2*ARCH_X86_32
-%elif cpuflag(mmx)
-ALIGN 16
-; int ff_hadamard8_diff_ ## cpu(MPVEncContext *s, const uint8_t *src1,
-;                               const uint8_t *src2, ptrdiff_t stride, int h)
-; r0 = void *s = unused, int h = unused (always 8)
-; note how r1, r2 and r3 are not clobbered in this function, so 16x16
-; can simply call this 2x2x (and that's why we access rsp+gprsize
-; everywhere, which is rsp of calling func
-hadamard8x8_diff %+ SUFFIX:
-    lea                          r0, [r3*3]
-
-    ; first 4x8 pixels
-    DIFF_PIXELS_8                r1, r2,  0, r3, r0, rsp+gprsize+0x60
-    HADAMARD8
-    mova         [rsp+gprsize+0x60], m7
-    TRANSPOSE4x4W                 0,  1,  2,  3,  7
-    STORE4              rsp+gprsize, m0, m1, m2, m3
-    mova                         m7, [rsp+gprsize+0x60]
-    TRANSPOSE4x4W                 4,  5,  6,  7,  0
-    STORE4         rsp+gprsize+0x40, m4, m5, m6, m7
-
-    ; second 4x8 pixels
-    DIFF_PIXELS_8                r1, r2,  4, r3, r0, rsp+gprsize+0x60
-    HADAMARD8
-    mova         [rsp+gprsize+0x60], m7
-    TRANSPOSE4x4W                 0,  1,  2,  3,  7
-    STORE4         rsp+gprsize+0x20, m0, m1, m2, m3
-    mova                         m7, [rsp+gprsize+0x60]
-    TRANSPOSE4x4W                 4,  5,  6,  7,  0
-
-    LOAD4          rsp+gprsize+0x40, m0, m1, m2, m3
-    HADAMARD8
-    ABS_SUM_8x8_32 rsp+gprsize+0x60
-    mova         [rsp+gprsize+0x60], m0
-
-    LOAD4          rsp+gprsize     , m0, m1, m2, m3
-    LOAD4          rsp+gprsize+0x20, m4, m5, m6, m7
-    HADAMARD8
-    ABS_SUM_8x8_32 rsp+gprsize
-    paddusw                      m0, [rsp+gprsize+0x60]
-
-    HSUM                         m0, m1, eax
-    and                         rax, 0xFFFF
-    ret
-
-hadamard8_16_wrapper 0, 13
-%endif
 %endmacro
 
-%if HAVE_ALIGNED_STACK == 0
-INIT_MMX mmxext
-HADAMARD8_DIFF
-%endif
-
 INIT_XMM sse2
 %if ARCH_X86_64
 %define ABS_SUM_8x8 ABS_SUM_8x8_64
diff --git a/libavcodec/x86/me_cmp_init.c b/libavcodec/x86/me_cmp_init.c
index 35abbbf7f5..3a8b46f4e1 100644
--- a/libavcodec/x86/me_cmp_init.c
+++ b/libavcodec/x86/me_cmp_init.c
@@ -77,7 +77,6 @@ int ff_vsad16u_approx_sse2(MPVEncContext *v, const uint8_t *pix1, const uint8_t
     int ff_hadamard8_diff16_ ## cpu(MPVEncContext *s, const uint8_t *src1,       \
                                     const uint8_t *src2, ptrdiff_t stride, int h);
 
-hadamard_func(mmxext)
 hadamard_func(sse2)
 hadamard_func(ssse3)
 
@@ -116,11 +115,6 @@ av_cold void ff_me_cmp_init_x86(MECmpContext *c, AVCodecContext *avctx)
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_MMXEXT(cpu_flags)) {
-#if !HAVE_ALIGNED_STACK
-        c->hadamard8_diff[0] = ff_hadamard8_diff16_mmxext;
-        c->hadamard8_diff[1] = ff_hadamard8_diff_mmxext;
-#endif
-
         c->sad[1] = ff_sad8_mmxext;
 
         c->pix_abs[1][0] = ff_sad8_mmxext;
-- 
2.52.0


From 7601da20b7c7982e07501d4b0a997f524ed60d8f Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 25 Nov 2025 16:28:51 +0100
Subject: [PATCH 068/304] avcodec/x86/me_cmp: Avoid call on UNIX64

The internal functions for calculating the hadamard difference
of two 8x8 blocks have no epilogue on UNIX64, so one can avoid
the call altogether by placing the 8x8 function so that it directly
falls into the internal function.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/me_cmp.asm | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/libavcodec/x86/me_cmp.asm b/libavcodec/x86/me_cmp.asm
index 2a196d03bb..3ac8acee2c 100644
--- a/libavcodec/x86/me_cmp.asm
+++ b/libavcodec/x86/me_cmp.asm
@@ -121,12 +121,8 @@ SECTION .text
     movd            %3, %1
 %endmacro
 
-%macro hadamard8_16_wrapper 2
-cglobal hadamard8_diff, 4, 4, %1, %2*mmsize
-    call hadamard8x8_diff %+ SUFFIX
-    RET
-
-cglobal hadamard8_diff16, 5, 6, %1, %2*mmsize
+%macro HADAMARD8_DIFF 1
+cglobal hadamard8_diff16, 5, 6, %1, 2*mmsize*ARCH_X86_32
     call hadamard8x8_diff %+ SUFFIX
     mov            r5d, eax
 
@@ -151,9 +147,10 @@ cglobal hadamard8_diff16, 5, 6, %1, %2*mmsize
 .done:
     mov            eax, r5d
     RET
-%endmacro
 
-%macro HADAMARD8_DIFF 1
+cglobal hadamard8_diff, 4, 4, %1, 2*mmsize*ARCH_X86_32
+    TAIL_CALL hadamard8x8_diff %+ SUFFIX, 0
+
 ; r1, r2 and r3 are not clobbered in this function, so 16x16 can
 ; simply call this 2x2x (and that's why we access rsp+gprsize
 ; everywhere, which is rsp of calling function)
@@ -171,8 +168,6 @@ hadamard8x8_diff %+ SUFFIX:
     HSUM                        m0, m1, eax
     and                         eax, 0xFFFF
     ret
-
-hadamard8_16_wrapper %1, 2*ARCH_X86_32
 %endmacro
 
 INIT_XMM sse2
-- 
2.52.0


From 19e1941b120ec6d9a52e661a55910dd0122698e3 Mon Sep 17 00:00:00 2001
From: Araz Iusubov <Primeadvice@gmail.com>
Date: Mon, 10 Nov 2025 17:23:25 +0100
Subject: [PATCH 069/304] avcodec/d3d12va_encode: D3D12 AV1 encoding support

Implement AV1 hardware encoding
using Direct3D 12 Video API (D3D12VA).
---
 Changelog                       |    1 +
 configure                       |    3 +
 libavcodec/Makefile             |    1 +
 libavcodec/allcodecs.c          |    1 +
 libavcodec/d3d12va_encode.c     |   45 +-
 libavcodec/d3d12va_encode.h     |   13 +
 libavcodec/d3d12va_encode_av1.c | 1182 +++++++++++++++++++++++++++++++
 7 files changed, 1241 insertions(+), 5 deletions(-)
 create mode 100644 libavcodec/d3d12va_encode_av1.c

diff --git a/Changelog b/Changelog
index b4c23ac1ee..4648899ca3 100644
--- a/Changelog
+++ b/Changelog
@@ -10,6 +10,7 @@ version <next>:
 - D3D12 H.264 encoder
 - drawvg filter via libcairo
 - ffmpeg CLI tiled HEIF support
+- D3D12 AV1 encoder
 
 
 version 8.0:
diff --git a/configure b/configure
index 99734e9d03..7ef50095a3 100755
--- a/configure
+++ b/configure
@@ -3444,6 +3444,8 @@ amrwb_mediacodec_decoder_select="amr_parser"
 av1_amf_encoder_deps="amf"
 av1_amf_decoder_deps="amf"
 av1_cuvid_decoder_deps="cuvid CUVIDAV1PICPARAMS"
+av1_d3d12va_encoder_deps="d3d12va d3d12va_av1_headers"
+av1_d3d12va_encoder_select="cbs_av1 d3d12va_encode"
 av1_mediacodec_decoder_deps="mediacodec"
 av1_mediacodec_encoder_deps="mediacodec"
 av1_mediacodec_encoder_select="extract_extradata_bsf"
@@ -6958,6 +6960,7 @@ check_type "windows.h d3d12video.h" "ID3D12VideoDecoder"
 check_type "windows.h d3d12video.h" "ID3D12VideoEncoder"
 test_code cc "windows.h d3d12video.h" "D3D12_FEATURE_VIDEO feature = D3D12_FEATURE_VIDEO_ENCODER_CODEC" && \
 test_code cc "windows.h d3d12video.h" "D3D12_FEATURE_DATA_VIDEO_ENCODER_RESOURCE_REQUIREMENTS req" && enable d3d12_encoder_feature
+test_code cc "windows.h d3d12video.h" "D3D12_VIDEO_ENCODER_CODEC c = D3D12_VIDEO_ENCODER_CODEC_AV1; (void)c;" && enable d3d12va_av1_headers
 check_type "windows.h" "DPI_AWARENESS_CONTEXT" -D_WIN32_WINNT=0x0A00
 check_type "windows.h security.h schnlsp.h" SecPkgContext_KeyingMaterialInfo -DSECURITY_WIN32
 check_type "d3d9.h dxva2api.h" DXVA2_ConfigPictureDecode -D_WIN32_WINNT=0x0602
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 0cd2408865..fba9f0aff0 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -274,6 +274,7 @@ OBJS-$(CONFIG_AURA_DECODER)            += cyuv.o
 OBJS-$(CONFIG_AURA2_DECODER)           += aura.o
 OBJS-$(CONFIG_AV1_DECODER)             += av1dec.o av1_parse.o
 OBJS-$(CONFIG_AV1_CUVID_DECODER)       += cuviddec.o
+OBJS-$(CONFIG_AV1_D3D12VA_ENCODER)     += d3d12va_encode_av1.o av1_levels.o
 OBJS-$(CONFIG_AV1_MEDIACODEC_DECODER)  += mediacodecdec.o
 OBJS-$(CONFIG_AV1_MEDIACODEC_ENCODER)  += mediacodecenc.o
 OBJS-$(CONFIG_AV1_NVENC_ENCODER)       += nvenc_av1.o nvenc.o
diff --git a/libavcodec/allcodecs.c b/libavcodec/allcodecs.c
index 251ffae390..cce23d1541 100644
--- a/libavcodec/allcodecs.c
+++ b/libavcodec/allcodecs.c
@@ -855,6 +855,7 @@ extern const FFCodec ff_libaom_av1_decoder;
 /* hwaccel hooks only, so prefer external decoders */
 extern const FFCodec ff_av1_decoder;
 extern const FFCodec ff_av1_cuvid_decoder;
+extern const FFCodec ff_av1_d3d12va_encoder;
 extern const FFCodec ff_av1_mediacodec_decoder;
 extern const FFCodec ff_av1_mediacodec_encoder;
 extern const FFCodec ff_av1_nvenc_encoder;
diff --git a/libavcodec/d3d12va_encode.c b/libavcodec/d3d12va_encode.c
index aa8a5982be..d28447c3c9 100644
--- a/libavcodec/d3d12va_encode.c
+++ b/libavcodec/d3d12va_encode.c
@@ -29,6 +29,7 @@
 #include "libavutil/hwcontext_d3d12va_internal.h"
 #include "libavutil/hwcontext_d3d12va.h"
 
+#include "config_components.h"
 #include "avcodec.h"
 #include "d3d12va_encode.h"
 #include "encode.h"
@@ -144,6 +145,12 @@ static int d3d12va_encode_create_metadata_buffers(AVCodecContext *avctx,
 {
     D3D12VAEncodeContext *ctx = avctx->priv_data;
     int width = sizeof(D3D12_VIDEO_ENCODER_OUTPUT_METADATA) + sizeof(D3D12_VIDEO_ENCODER_FRAME_SUBREGION_METADATA);
+#if CONFIG_AV1_D3D12VA_ENCODER
+    if (ctx->codec->d3d12_codec == D3D12_VIDEO_ENCODER_CODEC_AV1) {
+        width += sizeof(D3D12_VIDEO_ENCODER_AV1_PICTURE_CONTROL_SUBREGIONS_LAYOUT_DATA_TILES)
+            + sizeof(D3D12_VIDEO_ENCODER_AV1_POST_ENCODE_VALUES);
+    }
+#endif
     D3D12_HEAP_PROPERTIES encoded_meta_props = { .Type = D3D12_HEAP_TYPE_DEFAULT }, resolved_meta_props;
     D3D12_HEAP_TYPE resolved_heap_type = D3D12_HEAP_TYPE_READBACK;
     HRESULT hr;
@@ -211,7 +218,7 @@ static int d3d12va_encode_issue(AVCodecContext *avctx,
             .RateControl = ctx->rc,
             .PictureTargetResolution = ctx->resolution,
             .SelectedLayoutMode = D3D12_VIDEO_ENCODER_FRAME_SUBREGION_LAYOUT_MODE_FULL_FRAME,
-            .FrameSubregionsLayoutData = { 0 },
+            .FrameSubregionsLayoutData = ctx->subregions_layout,
             .CodecGopSequence = ctx->gop,
         },
         .pInputFrame = pic->input_surface->texture,
@@ -732,16 +739,21 @@ end:
 static int d3d12va_encode_output(AVCodecContext *avctx,
                                  FFHWBaseEncodePicture *base_pic, AVPacket *pkt)
 {
+    D3D12VAEncodeContext       *ctx = avctx->priv_data;
     FFHWBaseEncodeContext *base_ctx = avctx->priv_data;
-    D3D12VAEncodePicture *pic = base_pic->priv;
-    AVPacket *pkt_ptr = pkt;
-    int err;
+    D3D12VAEncodePicture       *pic = base_pic->priv;
+    AVPacket               *pkt_ptr = pkt;
+    int                         err = 0;
 
     err = d3d12va_encode_wait(avctx, base_pic);
     if (err < 0)
         return err;
 
-    err = d3d12va_encode_get_coded_data(avctx, pic, pkt);
+    if (ctx->codec->get_coded_data)
+        err = ctx->codec->get_coded_data(avctx, pic, pkt);
+    else
+        err = d3d12va_encode_get_coded_data(avctx, pic, pkt);
+
     if (err < 0)
         return err;
 
@@ -1129,6 +1141,9 @@ static int d3d12va_encode_init_gop_structure(AVCodecContext *avctx)
     union {
         D3D12_VIDEO_ENCODER_CODEC_PICTURE_CONTROL_SUPPORT_H264 h264;
         D3D12_VIDEO_ENCODER_CODEC_PICTURE_CONTROL_SUPPORT_HEVC hevc;
+#if CONFIG_AV1_D3D12VA_ENCODER
+        D3D12_VIDEO_ENCODER_CODEC_AV1_PICTURE_CONTROL_SUPPORT  av1;
+#endif
     } codec_support;
 
     support.NodeIndex = 0;
@@ -1146,6 +1161,13 @@ static int d3d12va_encode_init_gop_structure(AVCodecContext *avctx)
             support.PictureSupport.pHEVCSupport = &codec_support.hevc;
             break;
 
+#if CONFIG_AV1_D3D12VA_ENCODER
+            case D3D12_VIDEO_ENCODER_CODEC_AV1:
+            memset(&codec_support.av1, 0, sizeof(codec_support.av1));
+            support.PictureSupport.DataSize = sizeof(codec_support.av1);
+            support.PictureSupport.pAV1Support = &codec_support.av1;
+            break;
+#endif
         default:
             av_assert0(0);
     }
@@ -1171,6 +1193,13 @@ static int d3d12va_encode_init_gop_structure(AVCodecContext *avctx)
                 ref_l1 = support.PictureSupport.pHEVCSupport->MaxL1ReferencesForB;
                 break;
 
+#if CONFIG_AV1_D3D12VA_ENCODER
+            case D3D12_VIDEO_ENCODER_CODEC_AV1:
+                ref_l0 = support.PictureSupport.pAV1Support->MaxUniqueReferencesPerFrame;
+                // AV1 doesn't use traditional L1 references like H.264/HEVC
+                ref_l1 = 0;
+                break;
+#endif
             default:
                 av_assert0(0);
         }
@@ -1521,6 +1550,12 @@ int ff_d3d12va_encode_init(AVCodecContext *avctx)
     if (err < 0)
         goto fail;
 
+    if (ctx->codec->set_tile) {
+        err = ctx->codec->set_tile(avctx);
+        if (err < 0)
+            goto fail;
+    }
+
     err = d3d12va_encode_init_rate_control(avctx);
     if (err < 0)
         goto fail;
diff --git a/libavcodec/d3d12va_encode.h b/libavcodec/d3d12va_encode.h
index 5bd1eedb7f..b7cfea0582 100644
--- a/libavcodec/d3d12va_encode.h
+++ b/libavcodec/d3d12va_encode.h
@@ -264,6 +264,8 @@ typedef struct D3D12VAEncodeContext {
     D3D12_VIDEO_ENCODER_SEQUENCE_GOP_STRUCTURE gop;
 
     D3D12_VIDEO_ENCODER_LEVEL_SETTING level;
+
+    D3D12_VIDEO_ENCODER_PICTURE_CONTROL_SUBREGIONS_LAYOUT_DATA subregions_layout;
 } D3D12VAEncodeContext;
 
 typedef struct D3D12VAEncodeType {
@@ -306,6 +308,11 @@ typedef struct D3D12VAEncodeType {
      */
     int (*set_level)(AVCodecContext *avctx);
 
+    /**
+     * Set codec-specific tile setting.
+     */
+    int (*set_tile)(AVCodecContext *avctx);
+
     /**
      * The size of any private data structure associated with each
      * picture (can be zero if not required).
@@ -327,6 +334,12 @@ typedef struct D3D12VAEncodeType {
      */
     int (*write_sequence_header)(AVCodecContext *avctx,
                                  char *data, size_t *data_len);
+
+    /**
+     * Fill the coded data into AVPacket
+     */
+    int (*get_coded_data)(AVCodecContext *avctx,
+                          D3D12VAEncodePicture *pic, AVPacket *pkt);
 } D3D12VAEncodeType;
 
 int ff_d3d12va_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt);
diff --git a/libavcodec/d3d12va_encode_av1.c b/libavcodec/d3d12va_encode_av1.c
new file mode 100644
index 0000000000..e7a115a2ee
--- /dev/null
+++ b/libavcodec/d3d12va_encode_av1.c
@@ -0,0 +1,1182 @@
+/*
+ * Direct3D 12 HW acceleration video encoder
+ *
+ * Copyright (c) 2024 Intel Corporation
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/opt.h"
+#include "libavutil/common.h"
+#include "libavutil/mem.h"
+#include "libavutil/pixdesc.h"
+#include "libavutil/hwcontext_d3d12va_internal.h"
+
+#include "config_components.h"
+#include "avcodec.h"
+#include "cbs.h"
+#include "cbs_av1.h"
+#include "av1_levels.h"
+#include "codec_internal.h"
+#include "d3d12va_encode.h"
+#include "encode.h"
+#include "hw_base_encode.h"
+
+#include <d3d12.h>
+#include <d3d12video.h>
+
+#ifndef D3D12_VIDEO_ENCODER_AV1_INVALID_DPB_RESOURCE_INDEX
+#define	D3D12_VIDEO_ENCODER_AV1_INVALID_DPB_RESOURCE_INDEX	( 0xff )
+#endif
+
+typedef struct D3D12VAHWBaseEncodeAV1 {
+    AV1RawOBU    raw_sequence_header;
+    AV1RawOBU       raw_frame_header;
+    AV1RawOBU         raw_tile_group;
+} D3D12VAHWBaseEncodeAV1;
+
+typedef struct D3D12VAHWBaseEncodeAV1Opts {
+    int                 tier; // 0: Main tier, 1: High tier
+    int                level; // AV1 level (2.0-7.3 map to 0-23)
+
+    int          enable_cdef; // Constrained Directional Enhancement Filter
+    int   enable_restoration; // loop restoration
+    int      enable_superres; // super-resolution
+    int enable_ref_frame_mvs;
+
+    int            enable_jnt_comp;
+    int  enable_128x128_superblock;
+
+    int       enable_warped_motion;
+    int   enable_intra_edge_filter;
+    int enable_interintra_compound;
+    int     enable_masked_compound;
+    int        enable_filter_intra;
+
+    int         enable_loop_filter;
+    int   enable_loop_filter_delta;
+    int         enable_dual_filter;
+
+    int             enable_palette;
+    int    enable_intra_block_copy;
+} D3D12VAHWBaseEncodeAV1Opts;
+
+typedef struct D3D12VAEncodeAV1Picture {
+    uint8_t     temporal_id;
+    uint8_t      spatial_id;
+    uint8_t      show_frame;
+    uint8_t      frame_type;
+    uint16_t last_idr_frame;
+    uint8_t            slot;
+} D3D12VAEncodeAV1Picture;
+
+typedef struct D3D12VAEncodeAV1Context {
+    D3D12VAEncodeContext common;
+    // User options.
+    int      qp;
+    int profile;
+    int   level;
+    int    tier;
+
+    uint8_t q_idx_idr;
+    uint8_t   q_idx_p;
+
+    // Writer structures.
+    D3D12VAHWBaseEncodeAV1         units;
+    D3D12VAHWBaseEncodeAV1Opts unit_opts;
+
+    CodedBitstreamContext              *cbc;
+    CodedBitstreamFragment      current_obu;
+    D3D12_VIDEO_ENCODER_AV1_POST_ENCODE_VALUES_FLAGS post_encode_values_flag;
+    AVFifo             *picture_header_list;
+} D3D12VAEncodeAV1Context;
+
+typedef struct D3D12VAEncodeAV1Level {
+    uint8_t                              level;
+    D3D12_VIDEO_ENCODER_AV1_LEVELS d3d12_level;
+} D3D12VAEncodeAV1Level;
+
+
+static const D3D12VAEncodeAV1Level av1_levels[] = {
+    { 0,  D3D12_VIDEO_ENCODER_AV1_LEVELS_2_0 },
+    { 1,  D3D12_VIDEO_ENCODER_AV1_LEVELS_2_1 },
+    { 2,  D3D12_VIDEO_ENCODER_AV1_LEVELS_2_2 },
+    { 3,  D3D12_VIDEO_ENCODER_AV1_LEVELS_2_3 },
+    { 4,  D3D12_VIDEO_ENCODER_AV1_LEVELS_3_0 },
+    { 5,  D3D12_VIDEO_ENCODER_AV1_LEVELS_3_1 },
+    { 6,  D3D12_VIDEO_ENCODER_AV1_LEVELS_3_2 },
+    { 7,  D3D12_VIDEO_ENCODER_AV1_LEVELS_3_3 },
+    { 8,  D3D12_VIDEO_ENCODER_AV1_LEVELS_4_0 },
+    { 9,  D3D12_VIDEO_ENCODER_AV1_LEVELS_4_1 },
+    { 10, D3D12_VIDEO_ENCODER_AV1_LEVELS_4_2 },
+    { 11, D3D12_VIDEO_ENCODER_AV1_LEVELS_4_3 },
+    { 12, D3D12_VIDEO_ENCODER_AV1_LEVELS_5_0 },
+    { 13, D3D12_VIDEO_ENCODER_AV1_LEVELS_5_1 },
+    { 14, D3D12_VIDEO_ENCODER_AV1_LEVELS_5_2 },
+    { 15, D3D12_VIDEO_ENCODER_AV1_LEVELS_5_3 },
+    { 16, D3D12_VIDEO_ENCODER_AV1_LEVELS_6_0 },
+    { 17, D3D12_VIDEO_ENCODER_AV1_LEVELS_6_1 },
+    { 18, D3D12_VIDEO_ENCODER_AV1_LEVELS_6_2 },
+    { 19, D3D12_VIDEO_ENCODER_AV1_LEVELS_6_3 },
+    { 20, D3D12_VIDEO_ENCODER_AV1_LEVELS_7_0 },
+    { 21, D3D12_VIDEO_ENCODER_AV1_LEVELS_7_1 },
+    { 22, D3D12_VIDEO_ENCODER_AV1_LEVELS_7_2 },
+    { 23, D3D12_VIDEO_ENCODER_AV1_LEVELS_7_3 },
+};
+
+static const D3D12_VIDEO_ENCODER_AV1_PROFILE         profile_main = D3D12_VIDEO_ENCODER_AV1_PROFILE_MAIN;
+static const D3D12_VIDEO_ENCODER_AV1_PROFILE         profile_high = D3D12_VIDEO_ENCODER_AV1_PROFILE_HIGH;
+static const D3D12_VIDEO_ENCODER_AV1_PROFILE profile_professional = D3D12_VIDEO_ENCODER_AV1_PROFILE_PROFESSIONAL;
+
+#define D3D_PROFILE_DESC(name) \
+    { sizeof(D3D12_VIDEO_ENCODER_AV1_PROFILE), { .pAV1Profile = (D3D12_VIDEO_ENCODER_AV1_PROFILE *)&profile_ ## name } }
+static const D3D12VAEncodeProfile d3d12va_encode_av1_profiles[] = {
+    { AV_PROFILE_AV1_MAIN,          8, 3, 1, 1, D3D_PROFILE_DESC(main)         },
+    { AV_PROFILE_AV1_MAIN,         10, 3, 1, 1, D3D_PROFILE_DESC(main)         },
+    { AV_PROFILE_AV1_HIGH,         10, 3, 1, 1, D3D_PROFILE_DESC(high)         },
+    { AV_PROFILE_AV1_PROFESSIONAL,  8, 3, 1, 1, D3D_PROFILE_DESC(professional) },
+    { AV_PROFILE_AV1_PROFESSIONAL, 10, 3, 1, 1, D3D_PROFILE_DESC(professional) },
+    { AV_PROFILE_AV1_PROFESSIONAL, 12, 3, 1, 1, D3D_PROFILE_DESC(professional) },
+    { AV_PROFILE_UNKNOWN },
+};
+
+static int d3d12va_encode_av1_write_obu(AVCodecContext *avctx,
+                                        char *data, size_t *data_len,
+                                        CodedBitstreamFragment *obu)
+{
+    D3D12VAEncodeAV1Context *priv = avctx->priv_data;
+    int                       err = 0;
+
+    err = ff_cbs_write_fragment_data(priv->cbc, obu);
+    if (err < 0) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to write packed OBU data.\n");
+        return err;
+    }
+
+    memcpy(data, obu->data, obu->data_size);
+    *data_len = (8 * obu->data_size) - obu->data_bit_padding;
+
+    return 0;
+}
+
+static int d3d12va_encode_av1_add_obu(AVCodecContext* avctx,
+                                      CodedBitstreamFragment* au,
+                                      CodedBitstreamUnitType obu_type,
+                                      void* obu_unit)
+{
+    int err = 0;
+
+    err = ff_cbs_insert_unit_content(au, -1, obu_type, obu_unit, NULL);
+    if (err < 0) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to add OBU unit: "
+            "type = %d.\n", obu_type);
+        return err;
+    }
+    return 0;
+}
+
+static int d3d12va_encode_av1_write_sequence_header(AVCodecContext *avctx,
+                                                    char *data, size_t *data_len)
+{
+    D3D12VAEncodeAV1Context *priv = avctx->priv_data;
+    CodedBitstreamFragment   *obu = &priv->current_obu;
+    int                       err = 0;
+
+    priv->units.raw_sequence_header.header.obu_type = AV1_OBU_SEQUENCE_HEADER;
+    err = d3d12va_encode_av1_add_obu(avctx, obu, AV1_OBU_SEQUENCE_HEADER, &priv->units.raw_sequence_header);
+    if (err < 0)
+        goto fail;
+
+    err = d3d12va_encode_av1_write_obu(avctx, data, data_len, obu);
+
+fail:
+    ff_cbs_fragment_reset(obu);
+    return err;
+}
+
+static int d3d12va_encode_av1_update_current_frame_picture_header(AVCodecContext *avctx,
+                                                                  D3D12VAEncodePicture *pic,
+                                                                  AV1RawOBU *frameheader_obu)
+{
+    D3D12VAEncodeAV1Context *priv = avctx->priv_data;
+    AV1RawFrameHeader         *fh = &frameheader_obu->obu.frame_header;
+    uint8_t                 *data = NULL;
+    HRESULT                    hr = S_OK;
+    int                       err = 0;
+    D3D12_VIDEO_ENCODER_AV1_POST_ENCODE_VALUES *post_encode_values = NULL;
+
+    // Update the frame header according to the picture post_encode_values
+    hr = ID3D12Resource_Map(pic->resolved_metadata, 0, NULL, (void **)&data);
+    if (FAILED(hr)) {
+        err = AVERROR_UNKNOWN;
+        return err;
+    }
+    post_encode_values = (D3D12_VIDEO_ENCODER_AV1_POST_ENCODE_VALUES*) (data +
+            sizeof(D3D12_VIDEO_ENCODER_OUTPUT_METADATA) +
+            sizeof(D3D12_VIDEO_ENCODER_FRAME_SUBREGION_METADATA) +
+            sizeof(D3D12_VIDEO_ENCODER_AV1_PICTURE_CONTROL_SUBREGIONS_LAYOUT_DATA_TILES));
+
+    if (priv->post_encode_values_flag & D3D12_VIDEO_ENCODER_AV1_POST_ENCODE_VALUES_FLAG_QUANTIZATION) {
+        fh->base_q_idx = post_encode_values->Quantization.BaseQIndex;
+        fh->delta_q_y_dc = post_encode_values->Quantization.YDCDeltaQ;
+        fh->delta_q_u_dc = post_encode_values->Quantization.UDCDeltaQ;
+        fh->delta_q_u_ac = post_encode_values->Quantization.UACDeltaQ;
+        fh->delta_q_v_dc = post_encode_values->Quantization.VDCDeltaQ;
+        fh->delta_q_v_ac = post_encode_values->Quantization.VACDeltaQ;
+        fh->using_qmatrix = post_encode_values->Quantization.UsingQMatrix;
+        fh->qm_y = post_encode_values->Quantization.QMY;
+        fh->qm_u = post_encode_values->Quantization.QMU;
+        fh->qm_v = post_encode_values->Quantization.QMV;
+    }
+
+    if (priv->post_encode_values_flag & D3D12_VIDEO_ENCODER_AV1_POST_ENCODE_VALUES_FLAG_LOOP_FILTER) {
+        fh->loop_filter_level[0] = post_encode_values->LoopFilter.LoopFilterLevel[0];
+        fh->loop_filter_level[1] = post_encode_values->LoopFilter.LoopFilterLevel[1];
+        fh->loop_filter_level[2] = post_encode_values->LoopFilter.LoopFilterLevelU;
+        fh->loop_filter_level[3] = post_encode_values->LoopFilter.LoopFilterLevelV;
+        fh->loop_filter_sharpness = post_encode_values->LoopFilter.LoopFilterSharpnessLevel;
+        fh->loop_filter_delta_enabled = post_encode_values->LoopFilter.LoopFilterDeltaEnabled;
+        if (fh->loop_filter_delta_enabled) {
+            for (int i = 0; i < AV1_TOTAL_REFS_PER_FRAME; i++) {
+                fh->loop_filter_ref_deltas[i] = post_encode_values->LoopFilter.RefDeltas[i];
+                fh->update_ref_delta[i]       = post_encode_values->LoopFilter.RefDeltas[i];
+            }
+            for (int i = 0; i < 2; i++) {
+                fh->loop_filter_mode_deltas[i] = post_encode_values->LoopFilter.ModeDeltas[i];
+                fh->update_mode_delta[i]       = post_encode_values->LoopFilter.ModeDeltas[i];
+            }
+        }
+    }
+    if (priv->post_encode_values_flag & D3D12_VIDEO_ENCODER_AV1_POST_ENCODE_VALUES_FLAG_CDEF_DATA) {
+        fh->cdef_damping_minus_3 = post_encode_values->CDEF.CdefDampingMinus3;
+        fh->cdef_bits = post_encode_values->CDEF.CdefBits;
+        for (int i = 0; i < 8; i++) {
+            fh->cdef_y_pri_strength[i]  = post_encode_values->CDEF.CdefYPriStrength[i];
+            fh->cdef_y_sec_strength[i]  = post_encode_values->CDEF.CdefYSecStrength[i];
+            fh->cdef_uv_pri_strength[i] = post_encode_values->CDEF.CdefUVPriStrength[i];
+            fh->cdef_uv_sec_strength[i] = post_encode_values->CDEF.CdefUVSecStrength[i];
+        }
+    }
+    if (priv->post_encode_values_flag & D3D12_VIDEO_ENCODER_AV1_POST_ENCODE_VALUES_FLAG_QUANTIZATION_DELTA) {
+        fh->delta_q_present = post_encode_values->QuantizationDelta.DeltaQPresent;
+        fh->delta_q_res = post_encode_values->QuantizationDelta.DeltaQRes;
+    }
+
+    if (priv->post_encode_values_flag & D3D12_VIDEO_ENCODER_AV1_POST_ENCODE_VALUES_FLAG_REFERENCE_INDICES) {
+        for (int i = 0; i < AV1_REFS_PER_FRAME; i++) {
+            fh->ref_frame_idx[i] = post_encode_values->ReferenceIndices[i];
+        }
+    }
+
+    ID3D12Resource_Unmap(pic->resolved_metadata, 0, NULL);
+    return 0;
+}
+
+static int d3d12va_encode_av1_write_picture_header(AVCodecContext *avctx,
+                                                   D3D12VAEncodePicture *pic,
+                                                   char *data, size_t *data_len)
+{
+    D3D12VAEncodeAV1Context *priv = avctx->priv_data;
+    CodedBitstreamFragment  *obu  = &priv->current_obu;
+    AV1RawOBU    *frameheader_obu = av_mallocz(sizeof(AV1RawOBU));
+    int                       err = 0;
+
+    av_fifo_read(priv->picture_header_list, frameheader_obu, 1);
+    err = d3d12va_encode_av1_update_current_frame_picture_header(avctx, pic,frameheader_obu);
+    if (err < 0) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to update current frame picture header: %d.\n", err);
+        return err;
+    }
+
+    // Add the frame header OBU
+    frameheader_obu->header.obu_has_size_field = 1;
+
+    err = d3d12va_encode_av1_add_obu(avctx, obu, AV1_OBU_FRAME_HEADER, frameheader_obu);
+    if (err < 0)
+        goto fail;
+    err = d3d12va_encode_av1_write_obu(avctx, data, data_len, obu);
+
+fail:
+    ff_cbs_fragment_reset(obu);
+    av_freep(&frameheader_obu);
+    return err;
+}
+
+static int d3d12va_encode_av1_write_tile_group(AVCodecContext *avctx,
+                                               uint8_t* tile_group,
+                                               uint32_t tile_group_size,
+                                               char *data, size_t *data_len)
+{
+    D3D12VAEncodeAV1Context *priv = avctx->priv_data;
+    CodedBitstreamFragment  *obu  = &priv->current_obu;
+    AV1RawOBU     *tile_group_obu = &priv->units.raw_tile_group;
+    AV1RawTileGroup           *tg = &tile_group_obu->obu.tile_group;
+    int                       err = 0;
+
+    tg->tile_data.data = tile_group;
+    tg->tile_data.data_ref = NULL;
+    tg->tile_data.data_size = tile_group_size;
+    tile_group_obu->header.obu_has_size_field = 1;
+    tile_group_obu->header.obu_type = AV1_OBU_TILE_GROUP;
+
+    err = d3d12va_encode_av1_add_obu(avctx, obu, AV1_OBU_TILE_GROUP, tile_group_obu);
+    if (err < 0)
+        goto fail;
+    err = d3d12va_encode_av1_write_obu(avctx, data, data_len, obu);
+
+fail:
+    ff_cbs_fragment_reset(obu);
+    return err;
+}
+
+static int d3d12va_encode_av1_get_buffer_size(AVCodecContext *avctx,
+                                              D3D12VAEncodePicture *pic, size_t *size)
+{
+    D3D12VAEncodeContext                                    *ctx = avctx->priv_data;
+    D3D12_VIDEO_ENCODER_OUTPUT_METADATA                    *meta = NULL;
+    D3D12_VIDEO_ENCODER_FRAME_SUBREGION_METADATA *subregion_meta = NULL;
+    uint8_t                                                *data = NULL;
+    HRESULT                                                   hr = S_OK;
+    int                                                      err = 0;
+
+    hr = ID3D12Resource_Map(pic->resolved_metadata, 0, NULL, (void **)&data);
+    if (FAILED(hr)) {
+        err = AVERROR_UNKNOWN;
+        return err;
+    }
+
+    subregion_meta = (D3D12_VIDEO_ENCODER_FRAME_SUBREGION_METADATA*)(data + sizeof(D3D12_VIDEO_ENCODER_OUTPUT_METADATA));
+    if (subregion_meta->bSize == 0) {
+        av_log(avctx, AV_LOG_ERROR, "No subregion metadata found\n");
+        err = AVERROR(EINVAL);
+        return err;
+    }
+    *size = subregion_meta->bSize;
+
+    ID3D12Resource_Unmap(pic->resolved_metadata, 0, NULL);
+
+    return 0;
+}
+
+static int d3d12va_encode_av1_get_coded_data(AVCodecContext *avctx,
+                                             D3D12VAEncodePicture *pic, AVPacket *pkt)
+{
+    int                   err = 0;
+    uint8_t              *ptr = NULL;
+    uint8_t      *mapped_data = NULL;
+    size_t         total_size = 0;
+    HRESULT                hr = S_OK;
+    size_t    av1_pic_hd_size = 0;
+    int tile_group_extra_size = 0;
+    size_t            bit_len = 0;
+    D3D12VAEncodeContext *ctx = avctx->priv_data;
+
+    char pic_hd_data[MAX_PARAM_BUFFER_SIZE] = { 0 };
+
+    err = d3d12va_encode_av1_get_buffer_size(avctx, pic, &total_size);
+    if (err < 0)
+        goto end;
+
+    // Update the picture header and calculate the picture header size
+    memset(pic_hd_data, 0, sizeof(pic_hd_data));
+    err = d3d12va_encode_av1_write_picture_header(avctx, pic, pic_hd_data, &av1_pic_hd_size);
+    if (err < 0) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to write picture header: %d.\n", err);
+        return err;
+    }
+    av1_pic_hd_size /= 8;
+    av_log(avctx, AV_LOG_DEBUG, "AV1 picture header size: %zu bytes.\n", av1_pic_hd_size);
+
+
+    tile_group_extra_size = (av_log2(total_size) + 7) / 7 + 1; // 1 byte for obu header, rest for tile group LEB128 size
+    av_log(avctx, AV_LOG_DEBUG, "Tile group extra size: %d bytes.\n", tile_group_extra_size);
+
+    total_size += (pic->header_size + tile_group_extra_size + av1_pic_hd_size);
+    av_log(avctx, AV_LOG_DEBUG, "Output buffer size %"SIZE_SPECIFIER"\n", total_size);
+
+    hr = ID3D12Resource_Map(pic->output_buffer, 0, NULL, (void **)&mapped_data);
+    if (FAILED(hr)) {
+        err = AVERROR_UNKNOWN;
+        goto end;
+    }
+
+    err = ff_get_encode_buffer(avctx, pkt, total_size, 0);
+    if (err < 0)
+        goto end;
+    ptr = pkt->data;
+
+    memcpy(ptr, mapped_data, pic->header_size);
+
+    ptr += pic->header_size;
+    mapped_data += pic->aligned_header_size;
+    total_size -= pic->header_size;
+
+    memcpy(ptr, pic_hd_data, av1_pic_hd_size);
+    ptr += av1_pic_hd_size;
+    total_size -= av1_pic_hd_size;
+    av_log(avctx, AV_LOG_DEBUG, "AV1 total_size after write picture header: %d.\n", total_size);
+
+    total_size -= tile_group_extra_size;
+    err = d3d12va_encode_av1_write_tile_group(avctx, mapped_data, total_size, ptr, &bit_len);
+    if (err < 0) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to write tile group: %d.\n", err);
+        goto end;
+    }
+    assert((total_size + tile_group_extra_size) * 8 == bit_len);
+
+    ID3D12Resource_Unmap(pic->output_buffer, 0, NULL);
+
+end:
+    av_buffer_unref(&pic->output_buffer_ref);
+    pic->output_buffer = NULL;
+    return err;
+}
+
+static int d3d12va_hw_base_encode_init_params_av1(FFHWBaseEncodeContext *base_ctx,
+                                                  AVCodecContext *avctx,
+                                                  D3D12VAHWBaseEncodeAV1 *common,
+                                                  D3D12VAHWBaseEncodeAV1Opts *opts)
+{
+    AV1RawOBU      *seqheader_obu = &common->raw_sequence_header;
+    AV1RawSequenceHeader     *seq = &seqheader_obu->obu.sequence_header;
+    const AVPixFmtDescriptor *desc;
+
+    seq->seq_profile = avctx->profile;
+    if (!seq->seq_force_screen_content_tools)
+        seq->seq_force_integer_mv = AV1_SELECT_INTEGER_MV;
+    seq->seq_tier[0] = opts->tier;
+
+    desc = av_pix_fmt_desc_get(base_ctx->input_frames->sw_format);
+    seq->color_config = (AV1RawColorConfig){
+        .high_bitdepth = desc->comp[0].depth == 8 ? 0 : 1,
+        .color_primaries = avctx->color_primaries,
+        .transfer_characteristics = avctx->color_trc,
+        .matrix_coefficients = avctx->colorspace,
+        .color_description_present_flag = (avctx->color_primaries != AVCOL_PRI_UNSPECIFIED ||
+                                           avctx->color_trc != AVCOL_TRC_UNSPECIFIED ||
+                                           avctx->colorspace != AVCOL_SPC_UNSPECIFIED),
+        .color_range = avctx->color_range == AVCOL_RANGE_JPEG,
+        .subsampling_x = desc->log2_chroma_w,
+        .subsampling_y = desc->log2_chroma_h,
+    };
+
+    switch (avctx->chroma_sample_location) {
+    case AVCHROMA_LOC_LEFT:
+        seq->color_config.chroma_sample_position = AV1_CSP_VERTICAL;
+        break;
+    case AVCHROMA_LOC_TOPLEFT:
+        seq->color_config.chroma_sample_position = AV1_CSP_COLOCATED;
+        break;
+    default:
+        seq->color_config.chroma_sample_position = AV1_CSP_UNKNOWN;
+        break;
+    }
+
+    if (avctx->level != AV_LEVEL_UNKNOWN) {
+        seq->seq_level_idx[0] = avctx->level;
+    }
+    else {
+        const AV1LevelDescriptor *level;
+        float framerate;
+
+        if (avctx->framerate.num > 0 && avctx->framerate.den > 0)
+            framerate = avctx->framerate.num / avctx->framerate.den;
+        else
+            framerate = 0;
+
+        //currently only supporting 1 tile
+        level = ff_av1_guess_level(avctx->bit_rate, opts->tier,
+            base_ctx->surface_width, base_ctx->surface_height,
+            /*priv->tile_rows*/1 * 1/*priv->tile_cols*/,
+            /*priv->tile_cols*/1, framerate);
+        if (level) {
+            av_log(avctx, AV_LOG_VERBOSE, "Using level %s.\n", level->name);
+            seq->seq_level_idx[0] = level->level_idx;
+        }
+        else {
+            av_log(avctx, AV_LOG_VERBOSE, "Stream will not conform to "
+                "any normal level, using maximum parameters level by default.\n");
+            seq->seq_level_idx[0] = 31;
+            seq->seq_tier[0] = 1;
+        }
+    }
+
+    // Still picture mode
+    seq->still_picture = (base_ctx->gop_size == 1);
+    seq->reduced_still_picture_header = seq->still_picture;
+
+    // Feature flags
+    seq->enable_filter_intra = opts->enable_filter_intra;
+    seq->enable_intra_edge_filter = opts->enable_intra_edge_filter;
+    seq->enable_interintra_compound = opts->enable_interintra_compound;
+    seq->enable_masked_compound = opts->enable_masked_compound;
+    seq->enable_warped_motion = opts->enable_warped_motion;
+    seq->enable_dual_filter = opts->enable_dual_filter;
+    seq->enable_order_hint = !seq->still_picture;
+    if (seq->enable_order_hint) {
+        seq->order_hint_bits_minus_1 = 7;
+    }
+    seq->enable_jnt_comp = opts->enable_jnt_comp && seq->enable_order_hint;
+    seq->enable_ref_frame_mvs = opts->enable_ref_frame_mvs && seq->enable_order_hint;
+    seq->enable_superres = opts->enable_superres;
+    seq->enable_cdef = opts->enable_cdef;
+    seq->enable_restoration = opts->enable_restoration;
+
+    return 0;
+
+}
+
+static int d3d12va_encode_av1_init_sequence_params(AVCodecContext *avctx)
+{
+    FFHWBaseEncodeContext *base_ctx = avctx->priv_data;
+    D3D12VAEncodeContext      *ctx  = avctx->priv_data;
+    D3D12VAEncodeAV1Context   *priv = avctx->priv_data;
+    AVD3D12VAFramesContext   *hwctx = base_ctx->input_frames->hwctx;
+    AV1RawOBU        *seqheader_obu = &priv->units.raw_sequence_header;
+    AV1RawSequenceHeader       *seq = &priv->units.raw_sequence_header.obu.sequence_header;
+
+    D3D12_VIDEO_ENCODER_AV1_PROFILE profile = D3D12_VIDEO_ENCODER_AV1_PROFILE_MAIN;
+    D3D12_VIDEO_ENCODER_AV1_LEVEL_TIER_CONSTRAINTS level = { 0 };
+    HRESULT hr;
+    int err;
+
+    D3D12_FEATURE_DATA_VIDEO_ENCODER_SUPPORT1 support = {
+        .NodeIndex                        = 0,
+        .Codec                            = D3D12_VIDEO_ENCODER_CODEC_AV1,
+        .InputFormat                      = hwctx->format,
+        .RateControl                      = ctx->rc,
+        .IntraRefresh                     = D3D12_VIDEO_ENCODER_INTRA_REFRESH_MODE_NONE,
+        .SubregionFrameEncoding           = D3D12_VIDEO_ENCODER_FRAME_SUBREGION_LAYOUT_MODE_FULL_FRAME,
+        .ResolutionsListCount             = 1,
+        .pResolutionList                  = &ctx->resolution,
+        .CodecGopSequence                 = ctx->gop,
+        .MaxReferenceFramesInDPB          = AV1_NUM_REF_FRAMES,
+        .CodecConfiguration               = ctx->codec_conf,
+        .SuggestedProfile.DataSize        = sizeof(D3D12_VIDEO_ENCODER_AV1_PROFILE),
+        .SuggestedProfile.pAV1Profile     = &profile,
+        .SuggestedLevel.DataSize          = sizeof(D3D12_VIDEO_ENCODER_AV1_LEVEL_TIER_CONSTRAINTS),
+        .SuggestedLevel.pAV1LevelSetting  = &level,
+        .pResolutionDependentSupport      = &ctx->res_limits,
+        .SubregionFrameEncodingData.pTilesPartition_AV1 = ctx->subregions_layout.pTilesPartition_AV1,
+    };
+
+    hr = ID3D12VideoDevice3_CheckFeatureSupport(ctx->video_device3, D3D12_FEATURE_VIDEO_ENCODER_SUPPORT1,
+                                                &support, sizeof(support));
+
+    if (FAILED(hr)) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to check encoder support(%lx).\n", (long)hr);
+        return AVERROR(EINVAL);
+    }
+
+    if (!(support.SupportFlags & D3D12_VIDEO_ENCODER_SUPPORT_FLAG_GENERAL_SUPPORT_OK)) {
+        av_log(avctx, AV_LOG_ERROR, "Driver does not support some request D3D12VA AV1 features. %#x\n",
+               support.ValidationFlags);
+        return AVERROR(EINVAL);
+    }
+
+    if (support.SupportFlags & D3D12_VIDEO_ENCODER_SUPPORT_FLAG_RECONSTRUCTED_FRAMES_REQUIRE_TEXTURE_ARRAYS) {
+        ctx->is_texture_array = 1;
+        av_log(avctx, AV_LOG_DEBUG, "D3D12 video encode on this device uses texture array mode.\n");
+    }
+
+    memset(seqheader_obu, 0, sizeof(*seqheader_obu));
+    seq->seq_profile = profile;
+    seq->seq_level_idx[0] = level.Level;
+    seq->seq_tier[0] = level.Tier;
+
+    seq->max_frame_width_minus_1 = ctx->resolution.Width - 1;
+    seq->max_frame_height_minus_1 = ctx->resolution.Height - 1;
+    seq->frame_width_bits_minus_1 = av_log2(ctx->resolution.Width);
+    seq->frame_height_bits_minus_1 = av_log2(ctx->resolution.Height);
+
+    seqheader_obu->header.obu_type = AV1_OBU_SEQUENCE_HEADER;
+
+    err = d3d12va_hw_base_encode_init_params_av1(base_ctx, avctx,
+                                                 &priv->units, &priv->unit_opts);
+    if (err < 0)
+        return err;
+
+    if (avctx->level == AV_LEVEL_UNKNOWN)
+        avctx->level = level.Level;
+
+    return 0;
+}
+
+static int d3d12va_encode_av1_get_encoder_caps(AVCodecContext *avctx)
+{
+    HRESULT                      hr = S_OK;
+    FFHWBaseEncodeContext *base_ctx = avctx->priv_data;
+    D3D12VAEncodeContext       *ctx = avctx->priv_data;
+    D3D12VAEncodeAV1Context   *priv = avctx->priv_data;
+
+    D3D12_VIDEO_ENCODER_AV1_CODEC_CONFIGURATION *config;
+    D3D12_VIDEO_ENCODER_AV1_CODEC_CONFIGURATION_SUPPORT av1_caps;
+
+    D3D12_FEATURE_DATA_VIDEO_ENCODER_CODEC_CONFIGURATION_SUPPORT codec_caps = {
+        .NodeIndex                   = 0,
+        .Codec                       = D3D12_VIDEO_ENCODER_CODEC_AV1,
+        .Profile                     = ctx->profile->d3d12_profile,
+        .CodecSupportLimits.DataSize = sizeof(D3D12_VIDEO_ENCODER_AV1_CODEC_CONFIGURATION_SUPPORT),
+    };
+
+    codec_caps.CodecSupportLimits.pAV1Support = &av1_caps;
+
+    hr = ID3D12VideoDevice3_CheckFeatureSupport(ctx->video_device3, D3D12_FEATURE_VIDEO_ENCODER_CODEC_CONFIGURATION_SUPPORT,
+                                                &codec_caps, sizeof(codec_caps));
+    if (!(SUCCEEDED(hr) && codec_caps.IsSupported))
+        return AVERROR(EINVAL);
+
+    ctx->codec_conf.DataSize = sizeof(D3D12_VIDEO_ENCODER_AV1_CODEC_CONFIGURATION);
+    ctx->codec_conf.pAV1Config = av_mallocz(ctx->codec_conf.DataSize);
+    if (!ctx->codec_conf.pAV1Config)
+        return AVERROR(ENOMEM);
+
+    priv->post_encode_values_flag = av1_caps.PostEncodeValuesFlags;
+    config = ctx->codec_conf.pAV1Config;
+
+    config->FeatureFlags = D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_NONE;
+    if (av1_caps.SupportedFeatureFlags & D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_128x128_SUPERBLOCK) {
+        config->FeatureFlags |= D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_128x128_SUPERBLOCK;
+        priv->unit_opts.enable_128x128_superblock = 1;
+    }
+
+    base_ctx->surface_width  = FFALIGN(avctx->width,  priv->unit_opts.enable_128x128_superblock ? 128 : 64);
+    base_ctx->surface_height = FFALIGN(avctx->height, priv->unit_opts.enable_128x128_superblock ? 128 : 64);
+
+    if (av1_caps.SupportedFeatureFlags & D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_LOOP_RESTORATION_FILTER) {
+        config->FeatureFlags |= D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_LOOP_RESTORATION_FILTER;
+        priv->unit_opts.enable_loop_filter = 1;
+    }
+
+    if (av1_caps.SupportedFeatureFlags & D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_PALETTE_ENCODING) {
+        config->FeatureFlags |= D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_PALETTE_ENCODING;
+        priv->unit_opts.enable_palette = 1;
+    }
+
+    if (av1_caps.SupportedFeatureFlags & D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_INTRA_BLOCK_COPY) {
+        config->FeatureFlags |= D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_INTRA_BLOCK_COPY;
+        priv->unit_opts.enable_intra_block_copy = 1;
+    }
+
+    if (av1_caps.SupportedFeatureFlags & D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_LOOP_FILTER_DELTAS) {
+        // Loop filter deltas
+        config->FeatureFlags |= D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_LOOP_FILTER_DELTAS;
+        priv->unit_opts.enable_loop_filter_delta = 1;
+    }
+
+    if (av1_caps.SupportedFeatureFlags & D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_CDEF_FILTERING) {
+        // CDEF (Constrained Directional Enhancement Filter)
+        config->FeatureFlags |= D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_CDEF_FILTERING;
+        priv->unit_opts.enable_cdef = 1;
+    }
+
+    if (av1_caps.SupportedFeatureFlags & D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_DUAL_FILTER) {
+        // Dual filter
+        config->FeatureFlags |= D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_DUAL_FILTER;
+        priv->unit_opts.enable_dual_filter = 1;
+    }
+
+    if (av1_caps.SupportedFeatureFlags & D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_JNT_COMP) {
+        // Joint compound prediction
+        config->FeatureFlags |= D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_JNT_COMP;
+        priv->unit_opts.enable_jnt_comp = 1;
+    }
+
+    if (av1_caps.SupportedFeatureFlags & D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_FRAME_REFERENCE_MOTION_VECTORS) {
+        // Frame reference motion vectors
+        config->FeatureFlags |= D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_FRAME_REFERENCE_MOTION_VECTORS;
+        priv->unit_opts.enable_ref_frame_mvs = 1;
+    }
+
+    if (av1_caps.SupportedFeatureFlags & D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_SUPER_RESOLUTION) {
+        // Super-resolution
+        config->FeatureFlags |= D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_SUPER_RESOLUTION;
+        priv->unit_opts.enable_superres = 1;
+    }
+
+    if (av1_caps.SupportedFeatureFlags & D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_WARPED_MOTION) {
+        // Warped motion
+        config->FeatureFlags |= D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_WARPED_MOTION;
+        priv->unit_opts.enable_warped_motion = 1;
+    }
+
+    if (av1_caps.SupportedFeatureFlags & D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_INTERINTRA_COMPOUND) {
+        // Inter-intra compound prediction
+        config->FeatureFlags |= D3D12_VIDEO_ENCODER_AV1_FEATURE_FLAG_INTERINTRA_COMPOUND;
+        priv->unit_opts.enable_interintra_compound = 1;
+    }
+
+    return 0;
+}
+
+static int d3d12va_encode_av1_configure(AVCodecContext *avctx)
+{
+    FFHWBaseEncodeContext *base_ctx = avctx->priv_data;
+    D3D12VAEncodeContext       *ctx = avctx->priv_data;
+    D3D12VAEncodeAV1Context   *priv = avctx->priv_data;
+    int                         err = 0;
+    int fixed_qp_key, fixed_qp_inter;
+
+    err = ff_cbs_init(&priv->cbc, AV_CODEC_ID_AV1, avctx);
+    if (err < 0)
+        return err;
+
+    if (ctx->rc.Mode == D3D12_VIDEO_ENCODER_RATE_CONTROL_MODE_CQP) {
+        D3D12_VIDEO_ENCODER_RATE_CONTROL_CQP *cqp_ctl;
+        fixed_qp_inter = av_clip_uintp2(ctx->rc_quality, 8);
+
+        if (avctx->i_quant_factor > 0.0)
+            fixed_qp_key = av_clip_uintp2((avctx->i_quant_factor * fixed_qp_inter +
+                                    avctx->i_quant_offset) + 0.5, 8);
+        else
+            fixed_qp_key = fixed_qp_inter;
+
+        av_log(avctx, AV_LOG_DEBUG, "Using fixed QP = "
+               "%d / %d for Key / Inter frames.\n",
+               fixed_qp_key, fixed_qp_inter);
+
+        ctx->rc.ConfigParams.DataSize = sizeof(D3D12_VIDEO_ENCODER_RATE_CONTROL_CQP);
+        cqp_ctl = av_mallocz(ctx->rc.ConfigParams.DataSize);
+        if (!cqp_ctl)
+            return AVERROR(ENOMEM);
+
+        cqp_ctl->ConstantQP_FullIntracodedFrame                  = fixed_qp_key;
+        cqp_ctl->ConstantQP_InterPredictedFrame_PrevRefOnly      = fixed_qp_inter;
+        cqp_ctl->ConstantQP_InterPredictedFrame_BiDirectionalRef = fixed_qp_inter;
+
+        ctx->rc.ConfigParams.pConfiguration_CQP = cqp_ctl;
+
+        priv->q_idx_idr = fixed_qp_key;
+        priv->q_idx_p   = fixed_qp_inter;
+
+    }
+
+    // GOP configuration for AV1
+    ctx->gop.DataSize = sizeof(D3D12_VIDEO_ENCODER_AV1_SEQUENCE_STRUCTURE);
+    ctx->gop.pAV1SequenceStructure = av_mallocz(ctx->gop.DataSize);
+    if (!ctx->gop.pAV1SequenceStructure)
+        return AVERROR(ENOMEM);
+
+    ctx->gop.pAV1SequenceStructure->IntraDistance = base_ctx->gop_size;
+    ctx->gop.pAV1SequenceStructure->InterFramePeriod = base_ctx->b_per_p + 1;
+
+    return 0;
+}
+
+static int d3d12va_encode_av1_set_level(AVCodecContext *avctx)
+{
+    D3D12VAEncodeContext     *ctx = avctx->priv_data;
+    D3D12VAEncodeAV1Context *priv = avctx->priv_data;
+    int                         i = 0;
+
+    ctx->level.DataSize = sizeof(D3D12_VIDEO_ENCODER_AV1_LEVEL_TIER_CONSTRAINTS);
+    ctx->level.pAV1LevelSetting = av_mallocz(ctx->level.DataSize);
+    if (!ctx->level.pAV1LevelSetting)
+        return AVERROR(ENOMEM);
+
+    if (avctx->level != AV_LEVEL_UNKNOWN) {
+        for (i = 0; i < FF_ARRAY_ELEMS(av1_levels); i++) {
+            if (avctx->level == av1_levels[i].level) {
+                ctx->level.pAV1LevelSetting->Level = av1_levels[i].d3d12_level;
+                break;
+            }
+        }
+
+        if (i == FF_ARRAY_ELEMS(av1_levels) ) {
+            av_log(avctx, AV_LOG_ERROR, "Invalid AV1 level %d.\n", avctx->level);
+            return AVERROR(EINVAL);
+        }
+    } else {
+        ctx->level.pAV1LevelSetting->Level = D3D12_VIDEO_ENCODER_AV1_LEVELS_5_2;
+        avctx->level = D3D12_VIDEO_ENCODER_AV1_LEVELS_5_2;
+        av_log(avctx, AV_LOG_DEBUG, "Using default AV1 level 5.2\n");
+    }
+
+    if (priv->tier == 1 || avctx->bit_rate > 30000000) {
+        ctx->level.pAV1LevelSetting->Tier = D3D12_VIDEO_ENCODER_AV1_TIER_HIGH;
+        av_log(avctx, AV_LOG_DEBUG, "Using AV1 High tier\n");
+    } else {
+        ctx->level.pAV1LevelSetting->Tier = D3D12_VIDEO_ENCODER_AV1_TIER_MAIN;
+        av_log(avctx, AV_LOG_DEBUG, "Using AV1 Main tier\n");
+    }
+
+    if (priv->tier >= 0) {
+        ctx->level.pAV1LevelSetting->Tier = priv->tier == 0 ?
+                                            D3D12_VIDEO_ENCODER_AV1_TIER_MAIN :
+                                            D3D12_VIDEO_ENCODER_AV1_TIER_HIGH;
+    }
+
+    av_log(avctx, AV_LOG_DEBUG, "AV1 level set to %d, tier: %s\n",
+           ctx->level.pAV1LevelSetting->Level,
+           ctx->level.pAV1LevelSetting->Tier == D3D12_VIDEO_ENCODER_AV1_TIER_MAIN ? "Main" : "High");
+
+    return 0;
+}
+
+static int d3d12va_encode_av1_set_tile(AVCodecContext *avctx)
+{
+    D3D12VAEncodeContext *ctx = avctx->priv_data;
+
+    ctx->subregions_layout.DataSize = sizeof(D3D12_VIDEO_ENCODER_AV1_PICTURE_CONTROL_SUBREGIONS_LAYOUT_DATA_TILES);
+    D3D12_VIDEO_ENCODER_AV1_PICTURE_CONTROL_SUBREGIONS_LAYOUT_DATA_TILES *tiles_layout = av_mallocz(ctx->subregions_layout.DataSize);
+    ctx->subregions_layout.pTilesPartition_AV1 = tiles_layout;
+
+    // Currently only support 1 tile
+    tiles_layout->RowCount = 1;
+    tiles_layout->ColCount = 1;
+
+    return 0;
+}
+
+static void d3d12va_encode_av1_free_picture_params(D3D12VAEncodePicture *pic)
+{
+    if (!pic->pic_ctl.pAV1PicData)
+        return;
+
+    av_freep(&pic->pic_ctl.pAV1PicData);
+}
+
+static int d3d12va_encode_av1_init_picture_params(AVCodecContext *avctx,
+                                                  FFHWBaseEncodePicture *pic)
+{
+    FFHWBaseEncodeContext             *base_ctx = avctx->priv_data;
+    D3D12VAEncodeAV1Context               *priv = avctx->priv_data;
+    D3D12VAEncodeContext                   *ctx = avctx->priv_data;
+    D3D12VAEncodePicture           *d3d12va_pic = pic->priv;
+    D3D12VAEncodeAV1Picture               *hpic = pic->codec_priv;
+    CodedBitstreamAV1Context             *cbctx = priv->cbc->priv_data;
+    AV1RawOBU                  *frameheader_obu = &priv->units.raw_frame_header;
+    AV1RawFrameHeader                       *fh = &frameheader_obu->obu.frame_header;
+
+    FFHWBaseEncodePicture *ref;
+    D3D12VAEncodeAV1Picture *href;
+    int i;
+
+    static const int8_t default_loop_filter_ref_deltas[AV1_TOTAL_REFS_PER_FRAME] =
+        { 1, 0, 0, 0, -1, 0, -1, -1 };
+
+    memset(frameheader_obu, 0, sizeof(*frameheader_obu));
+
+    frameheader_obu->header.obu_type = AV1_OBU_FRAME_HEADER;
+
+    d3d12va_pic->pic_ctl.DataSize = sizeof(D3D12_VIDEO_ENCODER_AV1_PICTURE_CONTROL_CODEC_DATA);
+    d3d12va_pic->pic_ctl.pAV1PicData = av_mallocz(d3d12va_pic->pic_ctl.DataSize);
+    if (!d3d12va_pic->pic_ctl.pAV1PicData)
+        return AVERROR(ENOMEM);
+
+    // Initialize frame type and reference frame management
+    switch(pic->type) {
+        case FF_HW_PICTURE_TYPE_IDR:
+            fh->frame_type = AV1_FRAME_KEY;
+            fh->refresh_frame_flags = 0xFF;
+            fh->base_q_idx = priv->q_idx_idr;
+            hpic->slot = 0;
+            hpic->last_idr_frame = pic->display_order;
+            fh->tx_mode = AV1_TX_MODE_LARGEST;
+            break;
+
+        case FF_HW_PICTURE_TYPE_P:
+            fh->frame_type = AV1_FRAME_INTER;
+            fh->base_q_idx = priv->q_idx_p;
+            fh->tx_mode = AV1_TX_MODE_SELECT;
+
+            ref = pic->refs[0][pic->nb_refs[0] - 1];
+            href = ref->codec_priv;
+
+            /**
+             * The encoder uses a simple alternating reference frame strategy:
+             * - For P-frames, it uses the last reconstructed frame as a reference.
+             * - To simplify the reference model of the encoder, the encoder alternates between
+             * two reference frame slots (typically slot 0 and slot 1) for storing reconstructed
+             * images and providing prediction references for the next frame.
+             */
+            if (base_ctx->ref_l0 > 1) {
+                hpic->slot = !href->slot;
+            } else {
+                hpic->slot = 0;
+            }
+            hpic->last_idr_frame = href->last_idr_frame;
+            fh->refresh_frame_flags = 1 << hpic->slot;
+
+            // Set the nearest frame in L0 as all reference frame.
+            for (i = 0; i < AV1_REFS_PER_FRAME; i++)
+                fh->ref_frame_idx[i] = href->slot;
+
+            fh->primary_ref_frame = href->slot;
+            fh->ref_order_hint[href->slot] = ref->display_order - href->last_idr_frame;
+
+            // Set the 2nd nearest frame in L0 as Golden frame.
+            if (pic->nb_refs[0] > 1) {
+                ref = pic->refs[0][pic->nb_refs[0] - 2];
+                href = ref->codec_priv;
+                // Reference frame index 3 is the GOLDEN_FRAME
+                fh->ref_frame_idx[3] = href->slot;
+                fh->ref_order_hint[href->slot] = ref->display_order - href->last_idr_frame;
+            } else if (base_ctx->ref_l0 == 1) {
+                fh->ref_order_hint[!href->slot] = cbctx->ref[!href->slot].order_hint;
+            }
+            break;
+
+        case FF_HW_PICTURE_TYPE_B:
+            av_log(avctx, AV_LOG_ERROR, "D3D12 AV1 video encode on this device requires B-frame support, "
+                "but it's not implemented.\n");
+            return AVERROR_PATCHWELCOME;
+        default:
+            av_log(avctx, AV_LOG_ERROR, "Unsupported picture type %d.\n", pic->type);
+    }
+
+
+    cbctx->seen_frame_header = 0;
+
+    fh->show_frame                = pic->display_order <= pic->encode_order;
+    fh->showable_frame            = fh->frame_type != AV1_FRAME_KEY;
+    fh->order_hint                = pic->display_order - hpic->last_idr_frame;
+    fh->frame_width_minus_1       = ctx->resolution.Width - 1;
+    fh->frame_height_minus_1      = ctx->resolution.Height - 1;
+    fh->render_width_minus_1      = fh->frame_width_minus_1;
+    fh->render_height_minus_1     = fh->frame_height_minus_1;
+    fh->is_filter_switchable      = 1;
+    fh->interpolation_filter      = AV1_INTERPOLATION_FILTER_SWITCHABLE;
+    fh->uniform_tile_spacing_flag = 1;
+    fh->width_in_sbs_minus_1[0]   = (ctx->resolution.Width  + 63 >> 6) -1; // 64x64 superblock size
+    fh->height_in_sbs_minus_1[0]  = (ctx->resolution.Height + 63 >> 6) -1; // 64x64 superblock size
+
+    memcpy(fh->loop_filter_ref_deltas, default_loop_filter_ref_deltas,
+           AV1_TOTAL_REFS_PER_FRAME * sizeof(int8_t));
+
+    if (fh->frame_type == AV1_FRAME_KEY && fh->show_frame)
+        fh->error_resilient_mode = 1;
+
+    if (fh->frame_type == AV1_FRAME_KEY || fh->error_resilient_mode)
+        fh->primary_ref_frame = AV1_PRIMARY_REF_NONE;
+
+    d3d12va_pic->pic_ctl.pAV1PicData->FrameType = fh->frame_type;
+    d3d12va_pic->pic_ctl.pAV1PicData->TxMode = fh->tx_mode;
+    d3d12va_pic->pic_ctl.pAV1PicData->RefreshFrameFlags = fh->refresh_frame_flags;
+    d3d12va_pic->pic_ctl.pAV1PicData->TemporalLayerIndexPlus1 = hpic->temporal_id + 1;
+    d3d12va_pic->pic_ctl.pAV1PicData->SpatialLayerIndexPlus1 = hpic->spatial_id + 1;
+    d3d12va_pic->pic_ctl.pAV1PicData->PictureIndex = pic->display_order;
+    d3d12va_pic->pic_ctl.pAV1PicData->InterpolationFilter = D3D12_VIDEO_ENCODER_AV1_INTERPOLATION_FILTERS_SWITCHABLE;
+    d3d12va_pic->pic_ctl.pAV1PicData->PrimaryRefFrame = fh->primary_ref_frame;
+    if (fh->error_resilient_mode)
+        d3d12va_pic->pic_ctl.pAV1PicData->Flags |= D3D12_VIDEO_ENCODER_AV1_PICTURE_CONTROL_FLAG_ENABLE_ERROR_RESILIENT_MODE;
+
+    if (pic->type == FF_HW_PICTURE_TYPE_IDR)
+    {
+        for (int i = 0; i < AV1_NUM_REF_FRAMES; i++) {
+            d3d12va_pic->pic_ctl.pAV1PicData->ReferenceFramesReconPictureDescriptors[i].ReconstructedPictureResourceIndex =
+            D3D12_VIDEO_ENCODER_AV1_INVALID_DPB_RESOURCE_INDEX;
+        }
+    } else if (pic->type == FF_HW_PICTURE_TYPE_P) {
+        for (i = 0; i < pic->nb_refs[0]; i++) {
+            FFHWBaseEncodePicture *ref_pic = pic->refs[0][i];
+            d3d12va_pic->pic_ctl.pAV1PicData->ReferenceFramesReconPictureDescriptors[i].ReconstructedPictureResourceIndex =
+            ((D3D12VAEncodeAV1Picture*)ref_pic->codec_priv)->slot;
+        }
+    }
+    // Set reference frame management
+    memset(d3d12va_pic->pic_ctl.pAV1PicData->ReferenceIndices, 0, sizeof(UINT) * AV1_REFS_PER_FRAME);
+    if (pic->type == FF_HW_PICTURE_TYPE_P) {
+        for (i = 0; i < AV1_REFS_PER_FRAME; i++)
+            d3d12va_pic->pic_ctl.pAV1PicData->ReferenceIndices[i] = fh->ref_frame_idx[i];
+    }
+
+    int ret = av_fifo_write(priv->picture_header_list, &priv->units.raw_frame_header, 1);
+
+    return 0;
+}
+
+
+static const D3D12VAEncodeType d3d12va_encode_type_av1 = {
+    .profiles               = d3d12va_encode_av1_profiles,
+
+    .d3d12_codec            = D3D12_VIDEO_ENCODER_CODEC_AV1,
+
+    .flags                  = FF_HW_FLAG_B_PICTURES |
+                              FF_HW_FLAG_B_PICTURE_REFERENCES |
+                              FF_HW_FLAG_NON_IDR_KEY_PICTURES,
+
+    .default_quality        = 25,
+
+    .get_encoder_caps       = &d3d12va_encode_av1_get_encoder_caps,
+
+    .configure              = &d3d12va_encode_av1_configure,
+
+    .set_level              = &d3d12va_encode_av1_set_level,
+
+    .set_tile               = &d3d12va_encode_av1_set_tile,
+
+    .picture_priv_data_size = sizeof(D3D12VAEncodeAV1Picture),
+
+    .init_sequence_params   = &d3d12va_encode_av1_init_sequence_params,
+
+    .init_picture_params    = &d3d12va_encode_av1_init_picture_params,
+
+    .free_picture_params    = &d3d12va_encode_av1_free_picture_params,
+
+    .write_sequence_header  = &d3d12va_encode_av1_write_sequence_header,
+
+#ifdef CONFIG_AV1_D3D12VA_ENCODER
+    .get_coded_data         = &d3d12va_encode_av1_get_coded_data,
+#endif
+};
+
+static int d3d12va_encode_av1_init(AVCodecContext *avctx)
+{
+    D3D12VAEncodeContext     *ctx = avctx->priv_data;
+    D3D12VAEncodeAV1Context *priv = avctx->priv_data;
+
+    ctx->codec = &d3d12va_encode_type_av1;
+
+    if (avctx->profile == AV_PROFILE_UNKNOWN)
+        avctx->profile = priv->profile;
+    if (avctx->level == AV_LEVEL_UNKNOWN)
+        avctx->level = priv->level;
+
+    if (avctx->level != AV_LEVEL_UNKNOWN && avctx->level & ~0xff) {
+        av_log(avctx, AV_LOG_ERROR, "Invalid level %d: must fit "
+               "in 8-bit unsigned integer.\n", avctx->level);
+        return AVERROR(EINVAL);
+    }
+
+    if (priv->qp > 0)
+        ctx->explicit_qp = priv->qp;
+
+    priv->picture_header_list = av_fifo_alloc2(2, sizeof(AV1RawOBU), AV_FIFO_FLAG_AUTO_GROW);
+
+    return ff_d3d12va_encode_init(avctx);
+}
+
+static int d3d12va_encode_av1_close(AVCodecContext *avctx)
+{
+    D3D12VAEncodeAV1Context *priv = avctx->priv_data;
+
+    ff_cbs_fragment_free(&priv->current_obu);
+    ff_cbs_close(&priv->cbc);
+
+    av_freep(&priv->common.codec_conf.pAV1Config);
+    av_freep(&priv->common.gop.pAV1SequenceStructure);
+    av_freep(&priv->common.level.pAV1LevelSetting);
+    av_freep(&priv->common.subregions_layout.pTilesPartition_AV1);
+
+    av_fifo_freep2(&priv->picture_header_list);
+
+    return ff_d3d12va_encode_close(avctx);
+}
+
+#define OFFSET(x) offsetof(D3D12VAEncodeAV1Context, x)
+#define FLAGS (AV_OPT_FLAG_VIDEO_PARAM | AV_OPT_FLAG_ENCODING_PARAM)
+static const AVOption d3d12va_encode_av1_options[] = {
+    HW_BASE_ENCODE_COMMON_OPTIONS,
+    D3D12VA_ENCODE_RC_OPTIONS,
+
+    { "qp", "Constant QP (for P-frames; scaled by qfactor/qoffset for I/B)",
+      OFFSET(qp), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, 52, FLAGS },
+
+    { "profile", "Set profile (general_profile_idc)",
+      OFFSET(profile), AV_OPT_TYPE_INT,
+      { .i64 = AV_PROFILE_UNKNOWN }, AV_PROFILE_UNKNOWN, 0xff, FLAGS, "profile" },
+
+#define PROFILE(name, value)  name, NULL, 0, AV_OPT_TYPE_CONST, \
+      { .i64 = value }, 0, 0, FLAGS, "profile"
+    { PROFILE("main",             AV_PROFILE_AV1_MAIN) },
+    { PROFILE("high",             AV_PROFILE_AV1_HIGH) },
+    { PROFILE("professional",     AV_PROFILE_AV1_PROFESSIONAL) },
+#undef PROFILE
+
+    { "tier", "Set tier (general_tier_flag)",
+      OFFSET(unit_opts.tier), AV_OPT_TYPE_INT,
+      { .i64 = 0 }, 0, 1, FLAGS, "tier" },
+    { "main", NULL, 0, AV_OPT_TYPE_CONST,
+      { .i64 = 0 }, 0, 0, FLAGS, "tier" },
+    { "high", NULL, 0, AV_OPT_TYPE_CONST,
+      { .i64 = 1 }, 0, 0, FLAGS, "tier" },
+
+    { "level", "Set level (general_level_idc)",
+      OFFSET(level), AV_OPT_TYPE_INT,
+      { .i64 = AV_LEVEL_UNKNOWN }, AV_LEVEL_UNKNOWN, 0xff, FLAGS, "level" },
+
+#define LEVEL(name, value) name, NULL, 0, AV_OPT_TYPE_CONST, \
+      { .i64 = value }, 0, 0, FLAGS, "level"
+    { LEVEL("2.0",  0) },
+    { LEVEL("2.1",  1) },
+    { LEVEL("2.2",  2) },
+    { LEVEL("2.3",  3) },
+    { LEVEL("3.0",  4) },
+    { LEVEL("3.1",  5) },
+    { LEVEL("3.2",  6) },
+    { LEVEL("3.3",  7) },
+    { LEVEL("4.0",  8) },
+    { LEVEL("4.1",  9) },
+    { LEVEL("4.2",  10) },
+    { LEVEL("4.3",  11) },
+    { LEVEL("5.0",  12) },
+    { LEVEL("5.1",  13) },
+    { LEVEL("5.2",  14) },
+    { LEVEL("5.3",  15) },
+    { LEVEL("6.0",  16) },
+    { LEVEL("6.1",  17) },
+    { LEVEL("6.2",  18) },
+    { LEVEL("6.3",  19) },
+    { LEVEL("7.0",  20) },
+    { LEVEL("7.1",  21) },
+    { LEVEL("7.2",  22) },
+    { LEVEL("7.3",  23) },
+#undef LEVEL
+    { NULL },
+};
+
+static const FFCodecDefault d3d12va_encode_av1_defaults[] = {
+    { "b",              "0"   },
+    { "bf",             "0"   },
+    { "g",              "120" },
+    { "i_qfactor",      "1"   },
+    { "i_qoffset",      "0"   },
+    { "b_qfactor",      "1"   },
+    { "b_qoffset",      "0"   },
+    { "qmin",           "-1"  },
+    { "qmax",           "-1"  },
+    { "refs",           "0"   },
+    { NULL },
+};
+
+static const AVClass d3d12va_encode_av1_class = {
+    .class_name = "av1_d3d12va",
+    .item_name  = av_default_item_name,
+    .option     = d3d12va_encode_av1_options,
+    .version    = LIBAVUTIL_VERSION_INT,
+};
+
+const FFCodec ff_av1_d3d12va_encoder = {
+    .p.name         = "av1_d3d12va",
+    CODEC_LONG_NAME("D3D12VA av1 encoder"),
+    .p.type         = AVMEDIA_TYPE_VIDEO,
+    .p.id           = AV_CODEC_ID_AV1,
+    .priv_data_size = sizeof(D3D12VAEncodeAV1Context),
+    .init           = &d3d12va_encode_av1_init,
+    FF_CODEC_RECEIVE_PACKET_CB(&ff_d3d12va_encode_receive_packet),
+    .close          = &d3d12va_encode_av1_close,
+    .p.priv_class   = &d3d12va_encode_av1_class,
+    .p.capabilities = AV_CODEC_CAP_DELAY | AV_CODEC_CAP_HARDWARE |
+                      AV_CODEC_CAP_DR1 | AV_CODEC_CAP_ENCODER_REORDERED_OPAQUE,
+    .caps_internal  = FF_CODEC_CAP_NOT_INIT_THREADSAFE |
+                      FF_CODEC_CAP_INIT_CLEANUP,
+    .defaults       = d3d12va_encode_av1_defaults,
+    CODEC_PIXFMTS(AV_PIX_FMT_D3D12),
+    .hw_configs     = ff_d3d12va_encode_hw_configs,
+    .p.wrapper_name = "d3d12va",
+};
-- 
2.52.0


From 634f0f0902110b519b6ab2fb1d6f1071d44ad705 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Wed, 24 Sep 2025 15:14:29 +0200
Subject: [PATCH 070/304] avfilter/buffersrc: add av_buffersrc_get_status()

There is currently no way for API users to know that a buffersrc is no longer
accepting input, except by trying to feed it a frame and seeing what happens.

Of course, this is not possible if the user does not *have* a frame to feed,
but may still wish to know if the filter is still accepting input or not.

Since passing `frame == NULL` to `av_buffersrc_add_frame()` is already treated
as closing the input, we are left with no choice but to introduce a new
function for this.

We don't explicitly return the result of `ff_outlink_get_status()` to avoid
leaking internal status codes, and instead translate them all to AVERROR(EOF).
---
 doc/APIchanges          |  3 +++
 libavfilter/buffersrc.c | 10 ++++++++++
 libavfilter/buffersrc.h |  8 ++++++++
 libavfilter/version.h   |  2 +-
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/doc/APIchanges b/doc/APIchanges
index 93c6f92704..239137f1f2 100644
--- a/doc/APIchanges
+++ b/doc/APIchanges
@@ -2,6 +2,9 @@ The last version increases of all libraries were on 2025-03-28
 
 API changes, most recent first:
 
+2025-09-xx - xxxxxxxxxx - lavfi 11.10.100 - buffersrc.h
+  Add av_buffersrc_get_status().
+
 2025-11-18 - xxxxxxxxxx - lavu 60.19.100 - hwcontext_amf.h
   avutil/hwcontext_amf: add lock and unlock for AVAMFDeviceContext.
 
diff --git a/libavfilter/buffersrc.c b/libavfilter/buffersrc.c
index 30f4df83a1..b18d3f24dd 100644
--- a/libavfilter/buffersrc.c
+++ b/libavfilter/buffersrc.c
@@ -286,6 +286,16 @@ int av_buffersrc_close(AVFilterContext *ctx, int64_t pts, unsigned flags)
     return (flags & AV_BUFFERSRC_FLAG_PUSH) ? push_frame(ctx->graph) : 0;
 }
 
+int av_buffersrc_get_status(AVFilterContext *ctx)
+{
+    BufferSourceContext *s = ctx->priv;
+
+    if (!s->eof && ff_outlink_get_status(ctx->outputs[0]))
+        s->eof = 1;
+
+    return s->eof ? AVERROR(EOF) : 0;
+}
+
 static av_cold int init_video(AVFilterContext *ctx)
 {
     BufferSourceContext *c = ctx->priv;
diff --git a/libavfilter/buffersrc.h b/libavfilter/buffersrc.h
index 54de1fd1f2..c7225b6752 100644
--- a/libavfilter/buffersrc.h
+++ b/libavfilter/buffersrc.h
@@ -216,6 +216,14 @@ int av_buffersrc_add_frame_flags(AVFilterContext *buffer_src,
  */
 int av_buffersrc_close(AVFilterContext *ctx, int64_t pts, unsigned flags);
 
+/**
+ * Returns 0 or a negative AVERROR code. Currently, this will only ever
+ * return AVERROR(EOF), to indicate that the buffer source has been closed,
+ * either as a result of av_buffersrc_close(), or because the downstream
+ * filter is no longer accepting new data.
+ */
+int av_buffersrc_get_status(AVFilterContext *ctx);
+
 /**
  * @}
  */
diff --git a/libavfilter/version.h b/libavfilter/version.h
index 77f38cb9b4..4a69d6be98 100644
--- a/libavfilter/version.h
+++ b/libavfilter/version.h
@@ -31,7 +31,7 @@
 
 #include "version_major.h"
 
-#define LIBAVFILTER_VERSION_MINOR   9
+#define LIBAVFILTER_VERSION_MINOR  10
 #define LIBAVFILTER_VERSION_MICRO 100
 
 
-- 
2.52.0


From dc1b0f2507a0de35af343e2bd37848cb27edd7a4 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Wed, 24 Sep 2025 15:23:04 +0200
Subject: [PATCH 071/304] fftools/ffmpeg_filter: close all no-longer needed
 inputs

Currently, the thread loop of ffmpeg_filter essentially works like this:

while (1) {
    frame, idx = get_from_decoder();
    err = send_to_filter_graph(frame);
    if (err) { // i.e. EOF
        close_input(idx);
        continue;
    }

    while (filtered_frame = get_filtered_frame())
        send_to_encoder(filtered_frame);
}

The exact details are not 100% correct since the actual control flow is a bit
more complicated as a result of the scheduler, but this is the general flow.

Notably, this leaves the possibility of leaving a no-longer-needed input
permanently open if the filter graph starts producing infinite frames (during
the second loop) *after* it finishes reading from an input, e.g. in a filter
graph like -af atrim,apad.

This patch avoids this issue by always querying the status of all filter graph
inputs and explicitly closing any that were closed downstream; after each round
of reading output frames. As a result, information about the filtergraph being
closed can now propagate back upstream, even if the filter is no longer
requesting any input frames (i.e. input_idx == fg->nb_inputs).

Fixes: https://trac.ffmpeg.org/ticket/11061
See-Also: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20457#issuecomment-6208
---
 fftools/ffmpeg_filter.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/fftools/ffmpeg_filter.c b/fftools/ffmpeg_filter.c
index 9f962e6b8c..d5ae59ec03 100644
--- a/fftools/ffmpeg_filter.c
+++ b/fftools/ffmpeg_filter.c
@@ -2616,6 +2616,16 @@ finish:
     fps->dropped_keyframe |= fps->last_dropped && (frame->flags & AV_FRAME_FLAG_KEY);
 }
 
+static void close_input(InputFilterPriv *ifp)
+{
+    FilterGraphPriv *fgp = fgp_from_fg(ifp->ifilter.graph);
+
+    if (!ifp->eof) {
+        sch_filter_receive_finish(fgp->sch, fgp->sch_idx, ifp->ifilter.index);
+        ifp->eof = 1;
+    }
+}
+
 static int close_output(OutputFilterPriv *ofp, FilterGraphThread *fgt)
 {
     FilterGraphPriv *fgp = fgp_from_fg(ofp->ofilter.graph);
@@ -3343,7 +3353,7 @@ static int filter_thread(void *arg)
         if (ret == AVERROR_EOF) {
             av_log(fg, AV_LOG_VERBOSE, "Input %u no longer accepts new data\n",
                    input_idx);
-            sch_filter_receive_finish(fgp->sch, fgp->sch_idx, input_idx);
+            close_input(ifp);
             continue;
         }
         if (ret < 0)
@@ -3362,6 +3372,13 @@ read_frames:
                    av_err2str(ret));
             goto finish;
         }
+
+        // ensure all inputs no longer accepting data are closed
+        for (int i = 0; fgt.graph && i < fg->nb_inputs; i++) {
+            InputFilterPriv *ifp = ifp_from_ifilter(fg->inputs[i]);
+            if (av_buffersrc_get_status(ifp->ifilter.filter))
+                close_input(ifp);
+        }
     }
 
     for (unsigned i = 0; i < fg->nb_outputs; i++) {
-- 
2.52.0


From a3d30c98217690008383349a06ccd4d90f1dbaed Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Tue, 25 Nov 2025 17:03:27 +0100
Subject: [PATCH 072/304] Changelog: add ProRes Vulkan hwaccel to the list

Forgotten when it got merged.
---
 Changelog | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Changelog b/Changelog
index 4648899ca3..3757eac2e4 100644
--- a/Changelog
+++ b/Changelog
@@ -11,6 +11,7 @@ version <next>:
 - drawvg filter via libcairo
 - ffmpeg CLI tiled HEIF support
 - D3D12 AV1 encoder
+- ProRes Vulkan hwaccel
 
 
 version 8.0:
-- 
2.52.0


From 06785fdf300352305a8dc0799b1c271857f66723 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Thu, 13 Nov 2025 12:09:11 +0100
Subject: [PATCH 073/304] hwcontext_vulkan: enable runtime descriptor sizing

We were already using this in places, but it seems validation
layers finally got support to detect it.
---
 libavutil/hwcontext_vulkan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libavutil/hwcontext_vulkan.c b/libavutil/hwcontext_vulkan.c
index a6bf9a590b..0408b9c117 100644
--- a/libavutil/hwcontext_vulkan.c
+++ b/libavutil/hwcontext_vulkan.c
@@ -307,6 +307,7 @@ static void device_features_copy_needed(VulkanDeviceFeatures *dst, VulkanDeviceF
     COPY_VAL(vulkan_1_2.vulkanMemoryModel);
     COPY_VAL(vulkan_1_2.vulkanMemoryModelDeviceScope);
     COPY_VAL(vulkan_1_2.uniformBufferStandardLayout);
+    COPY_VAL(vulkan_1_2.runtimeDescriptorArray);
 
     COPY_VAL(vulkan_1_3.dynamicRendering);
     COPY_VAL(vulkan_1_3.maintenance4);
-- 
2.52.0


From 0c0b3f6ca6bb0818662f6fa7ce019a13e527feb1 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Wed, 26 Nov 2025 13:47:15 +0100
Subject: [PATCH 074/304] hwcontext_vulkan: re-enable host image copy extension

We'll slowly start to use it in the code in safe places
rather than globally.
---
 libavutil/hwcontext_vulkan.c | 54 +++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 25 deletions(-)

diff --git a/libavutil/hwcontext_vulkan.c b/libavutil/hwcontext_vulkan.c
index 0408b9c117..efee6c72ac 100644
--- a/libavutil/hwcontext_vulkan.c
+++ b/libavutil/hwcontext_vulkan.c
@@ -154,9 +154,6 @@ typedef struct VulkanDevicePriv {
     /* Disable multiplane images */
     int disable_multiplane;
 
-    /* Disable host image transfer */
-    int disable_host_transfer;
-
     /* Prefer memcpy over dynamic host pointer imports */
     int avoid_host_import;
 
@@ -656,6 +653,7 @@ static const VulkanOptExtension optional_device_exts[] = {
     { VK_KHR_COOPERATIVE_MATRIX_EXTENSION_NAME,               FF_VK_EXT_COOP_MATRIX            },
     { VK_EXT_SHADER_OBJECT_EXTENSION_NAME,                    FF_VK_EXT_SHADER_OBJECT          },
     { VK_KHR_SHADER_SUBGROUP_ROTATE_EXTENSION_NAME,           FF_VK_EXT_SUBGROUP_ROTATE        },
+    { VK_EXT_HOST_IMAGE_COPY_EXTENSION_NAME,                  FF_VK_EXT_HOST_IMAGE_COPY        },
 #ifdef VK_EXT_zero_initialize_device_memory
     { VK_EXT_ZERO_INITIALIZE_DEVICE_MEMORY_EXTENSION_NAME,    FF_VK_EXT_ZERO_INITIALIZE        },
 #endif
@@ -750,6 +748,26 @@ static VkBool32 VKAPI_CALL vk_dbg_callback(VkDebugUtilsMessageSeverityFlagBitsEX
         av_free((void *)props);                                                \
     }
 
+static int vulkan_device_has_rebar(AVHWDeviceContext *ctx)
+{
+    VulkanDevicePriv *p = ctx->hwctx;
+    VkDeviceSize max_vram = 0, max_visible_vram = 0;
+
+    /* Get device memory properties */
+    av_assert0(p->mprops.memoryTypeCount);
+    for (int i = 0; i < p->mprops.memoryTypeCount; i++) {
+        const VkMemoryType type = p->mprops.memoryTypes[i];
+        const VkMemoryHeap heap = p->mprops.memoryHeaps[type.heapIndex];
+        if (!(type.propertyFlags & VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT))
+            continue;
+        max_vram = FFMAX(max_vram, heap.size);
+        if (type.propertyFlags & VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT)
+            max_visible_vram = FFMAX(max_visible_vram, heap.size);
+    }
+
+    return max_vram - max_visible_vram < 1024; /* 1 kB tolerance */
+}
+
 enum FFVulkanDebugMode {
     FF_VULKAN_DEBUG_NONE = 0,
     /* Standard GPU-assisted validation */
@@ -830,6 +848,11 @@ static int check_extensions(AVHWDeviceContext *ctx, int dev, AVDictionary *opts,
             !strcmp(tstr, VK_EXT_DESCRIPTOR_BUFFER_EXTENSION_NAME))
             continue;
 
+        /* Check if the device has ReBAR for host image copies */
+        if (!strcmp(tstr, VK_EXT_HOST_IMAGE_COPY_EXTENSION_NAME) &&
+            !vulkan_device_has_rebar(ctx))
+            continue;
+
         if (dev &&
             ((debug_mode == FF_VULKAN_DEBUG_VALIDATE) ||
              (debug_mode == FF_VULKAN_DEBUG_PRINTF) ||
@@ -1707,25 +1730,6 @@ static void vulkan_device_uninit(AVHWDeviceContext *ctx)
     ff_vk_uninit(&p->vkctx);
 }
 
-static int vulkan_device_has_rebar(AVHWDeviceContext *ctx)
-{
-    VulkanDevicePriv *p = ctx->hwctx;
-    VkDeviceSize max_vram = 0, max_visible_vram = 0;
-
-    /* Get device memory properties */
-    for (int i = 0; i < p->mprops.memoryTypeCount; i++) {
-        const VkMemoryType type = p->mprops.memoryTypes[i];
-        const VkMemoryHeap heap = p->mprops.memoryHeaps[type.heapIndex];
-        if (!(type.propertyFlags & VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT))
-            continue;
-        max_vram = FFMAX(max_vram, heap.size);
-        if (type.propertyFlags & VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT)
-            max_visible_vram = FFMAX(max_visible_vram, heap.size);
-    }
-
-    return max_vram - max_visible_vram < 1024; /* 1 kB tolerance */
-}
-
 static int vulkan_device_create_internal(AVHWDeviceContext *ctx,
                                          VulkanDeviceSelection *dev_select,
                                          int disable_multiplane,
@@ -1751,6 +1755,9 @@ static int vulkan_device_create_internal(AVHWDeviceContext *ctx,
     if ((err = find_device(ctx, dev_select)))
         goto end;
 
+    /* Get supported memory types */
+    vk->GetPhysicalDeviceMemoryProperties(hwctx->phys_dev, &p->mprops);
+
     /* Find and enable extensions for the physical device */
     if ((err = check_extensions(ctx, 1, opts, &dev_info.ppEnabledExtensionNames,
                                 &dev_info.enabledExtensionCount, debug_mode))) {
@@ -1760,9 +1767,6 @@ static int vulkan_device_create_internal(AVHWDeviceContext *ctx,
         goto end;
     }
 
-    /* Get supported memory types */
-    vk->GetPhysicalDeviceMemoryProperties(hwctx->phys_dev, &p->mprops);
-
     /* Get all supported features for the physical device */
     device_features_init(ctx, &supported_feats);
     vk->GetPhysicalDeviceFeatures2(hwctx->phys_dev, &supported_feats.device);
-- 
2.52.0


From 44a10d409886631fbfd487ba41fd79bc86b5e3df Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Wed, 26 Nov 2025 13:48:17 +0100
Subject: [PATCH 075/304] hwcontext_vulkan: disable host image transfers for
 Nvidia devices

Nvidia's binary drivers have a very buggy implementation that is
yet to be fixed.
---
 libavutil/hwcontext_vulkan.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/libavutil/hwcontext_vulkan.c b/libavutil/hwcontext_vulkan.c
index efee6c72ac..1418c1ea3f 100644
--- a/libavutil/hwcontext_vulkan.c
+++ b/libavutil/hwcontext_vulkan.c
@@ -2082,9 +2082,6 @@ FF_ENABLE_DEPRECATION_WARNINGS
     /* Re-query device capabilities, in case the device was created externally */
     vk->GetPhysicalDeviceMemoryProperties(hwctx->phys_dev, &p->mprops);
 
-    /* Only use host image transfers if ReBAR is enabled */
-    p->disable_host_transfer = !vulkan_device_has_rebar(ctx);
-
 end:
     av_free(qf_vid);
     av_free(qf);
@@ -2915,19 +2912,14 @@ static int vulkan_frames_init(AVHWFramesContext *hwfc)
             return err;
     }
 
-    /* Nvidia is violating the spec because they thought no one would use this. */
-    if (p->dprops.driverID == VK_DRIVER_ID_NVIDIA_PROPRIETARY &&
-        (((fmt->nb_images == 1) && (fmt->vk_planes > 1)) ||
-         (av_pix_fmt_desc_get(hwfc->sw_format)->nb_components == 1)))
-        supported_usage &= ~VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT;
-
     /* Image usage flags */
     hwctx->usage |= supported_usage & (VK_IMAGE_USAGE_TRANSFER_DST_BIT |
                                        VK_IMAGE_USAGE_TRANSFER_SRC_BIT |
                                        VK_IMAGE_USAGE_STORAGE_BIT      |
                                        VK_IMAGE_USAGE_SAMPLED_BIT);
 
-    if ((p->vkctx.extensions & FF_VK_EXT_HOST_IMAGE_COPY) && !p->disable_host_transfer)
+    if (p->vkctx.extensions & FF_VK_EXT_HOST_IMAGE_COPY &&
+        !(p->dprops.driverID == VK_DRIVER_ID_NVIDIA_PROPRIETARY))
         hwctx->usage |= supported_usage & VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT;
 
     /* Enables encoding of images, if supported by format and extensions */
@@ -4481,7 +4473,8 @@ static int vulkan_transfer_frame(AVHWFramesContext *hwfc,
     if (swf->width > hwfc->width || swf->height > hwfc->height)
         return AVERROR(EINVAL);
 
-    if (hwctx->usage & VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT)
+    if (hwctx->usage & VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT &&
+        !(p->dprops.driverID == VK_DRIVER_ID_NVIDIA_PROPRIETARY))
         return vulkan_transfer_host(hwfc, hwf, swf, upload);
 
     for (int i = 0; i < av_pix_fmt_count_planes(swf->format); i++) {
-- 
2.52.0


From 4c99cc6d509fabb0d5b96f64101f6037f28173ee Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Wed, 26 Nov 2025 14:30:21 +0100
Subject: [PATCH 076/304] hwcontext_vulkan: use vkTransitionImageLayoutEXT to
 switch layouts

Falls back to regular submit-based layout switching if unsupported.
---
 libavutil/hwcontext_vulkan.c | 139 ++++++++++++++++++++++++++---------
 1 file changed, 104 insertions(+), 35 deletions(-)

diff --git a/libavutil/hwcontext_vulkan.c b/libavutil/hwcontext_vulkan.c
index 1418c1ea3f..eaa499aa73 100644
--- a/libavutil/hwcontext_vulkan.c
+++ b/libavutil/hwcontext_vulkan.c
@@ -2465,7 +2465,42 @@ enum PrepMode {
     PREP_MODE_ENCODING_DPB,
 };
 
-static int prepare_frame(AVHWFramesContext *hwfc, FFVkExecPool *ectx,
+static void switch_new_props(enum PrepMode pmode, VkImageLayout *new_layout,
+                             VkAccessFlags2 *new_access)
+{
+    switch (pmode) {
+    case PREP_MODE_GENERAL:
+        *new_layout = VK_IMAGE_LAYOUT_GENERAL;
+        *new_access = VK_ACCESS_TRANSFER_WRITE_BIT;
+        break;
+    case PREP_MODE_WRITE:
+        *new_layout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
+        *new_access = VK_ACCESS_TRANSFER_WRITE_BIT;
+        break;
+    case PREP_MODE_EXTERNAL_IMPORT:
+        *new_layout = VK_IMAGE_LAYOUT_GENERAL;
+        *new_access = VK_ACCESS_MEMORY_READ_BIT | VK_ACCESS_MEMORY_WRITE_BIT;
+        break;
+    case PREP_MODE_EXTERNAL_EXPORT:
+        *new_layout = VK_IMAGE_LAYOUT_GENERAL;
+        *new_access = VK_ACCESS_MEMORY_READ_BIT | VK_ACCESS_MEMORY_WRITE_BIT;
+        break;
+    case PREP_MODE_DECODING_DST:
+        *new_layout = VK_IMAGE_LAYOUT_VIDEO_DECODE_DST_KHR;
+        *new_access = VK_ACCESS_TRANSFER_WRITE_BIT;
+        break;
+    case PREP_MODE_DECODING_DPB:
+        *new_layout = VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR;
+        *new_access = VK_ACCESS_TRANSFER_READ_BIT | VK_ACCESS_TRANSFER_WRITE_BIT;
+        break;
+    case PREP_MODE_ENCODING_DPB:
+        *new_layout = VK_IMAGE_LAYOUT_VIDEO_ENCODE_DPB_KHR;
+        *new_access = VK_ACCESS_TRANSFER_READ_BIT | VK_ACCESS_TRANSFER_WRITE_BIT;
+        break;
+    }
+}
+
+static int switch_layout(AVHWFramesContext *hwfc, FFVkExecPool *ectx,
                          AVVkFrame *frame, enum PrepMode pmode)
 {
     int err;
@@ -2474,10 +2509,16 @@ static int prepare_frame(AVHWFramesContext *hwfc, FFVkExecPool *ectx,
     VkImageMemoryBarrier2 img_bar[AV_NUM_DATA_POINTERS];
     int nb_img_bar = 0;
 
-    uint32_t dst_qf = p->nb_img_qfs > 1 ? VK_QUEUE_FAMILY_IGNORED : p->img_qfs[0];
     VkImageLayout new_layout;
     VkAccessFlags2 new_access;
+    switch_new_props(pmode, &new_layout, &new_access);
+
+    uint32_t dst_qf = p->nb_img_qfs > 1 ? VK_QUEUE_FAMILY_IGNORED : p->img_qfs[0];
     VkPipelineStageFlagBits2 src_stage = VK_PIPELINE_STAGE_2_NONE;
+    if (pmode == PREP_MODE_EXTERNAL_EXPORT) {
+        dst_qf     = VK_QUEUE_FAMILY_EXTERNAL_KHR;
+        src_stage  = VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT;
+    }
 
     /* This is dirty - but it works. The vulkan.c dependency system doesn't
      * free non-refcounted frames, and non-refcounted hardware frames cannot
@@ -2501,39 +2542,6 @@ static int prepare_frame(AVHWFramesContext *hwfc, FFVkExecPool *ectx,
     if (err < 0)
         return err;
 
-    switch (pmode) {
-    case PREP_MODE_GENERAL:
-        new_layout = VK_IMAGE_LAYOUT_GENERAL;
-        new_access = VK_ACCESS_TRANSFER_WRITE_BIT;
-        break;
-    case PREP_MODE_WRITE:
-        new_layout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
-        new_access = VK_ACCESS_TRANSFER_WRITE_BIT;
-        break;
-    case PREP_MODE_EXTERNAL_IMPORT:
-        new_layout = VK_IMAGE_LAYOUT_GENERAL;
-        new_access = VK_ACCESS_MEMORY_READ_BIT | VK_ACCESS_MEMORY_WRITE_BIT;
-        break;
-    case PREP_MODE_EXTERNAL_EXPORT:
-        new_layout = VK_IMAGE_LAYOUT_GENERAL;
-        new_access = VK_ACCESS_MEMORY_READ_BIT | VK_ACCESS_MEMORY_WRITE_BIT;
-        dst_qf     = VK_QUEUE_FAMILY_EXTERNAL_KHR;
-        src_stage  = VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT;
-        break;
-    case PREP_MODE_DECODING_DST:
-        new_layout = VK_IMAGE_LAYOUT_VIDEO_DECODE_DST_KHR;
-        new_access = VK_ACCESS_TRANSFER_WRITE_BIT;
-        break;
-    case PREP_MODE_DECODING_DPB:
-        new_layout = VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR;
-        new_access = VK_ACCESS_TRANSFER_READ_BIT | VK_ACCESS_TRANSFER_WRITE_BIT;
-        break;
-    case PREP_MODE_ENCODING_DPB:
-        new_layout = VK_IMAGE_LAYOUT_VIDEO_ENCODE_DPB_KHR;
-        new_access = VK_ACCESS_TRANSFER_READ_BIT | VK_ACCESS_TRANSFER_WRITE_BIT;
-        break;
-    }
-
     ff_vk_frame_barrier(&p->vkctx, exec, &tmp_frame, img_bar, &nb_img_bar,
                         src_stage,
                         VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
@@ -2555,6 +2563,67 @@ static int prepare_frame(AVHWFramesContext *hwfc, FFVkExecPool *ectx,
     return 0;
 }
 
+static int switch_layout_host(AVHWFramesContext *hwfc, FFVkExecPool *ectx,
+                              AVVkFrame *frame, enum PrepMode pmode)
+{
+    VkResult ret;
+    VulkanDevicePriv *p = hwfc->device_ctx->hwctx;
+    FFVulkanFunctions *vk = &p->vkctx.vkfn;
+    VkHostImageLayoutTransitionInfo layout_change[AV_NUM_DATA_POINTERS];
+    int nb_images = ff_vk_count_images(frame);
+
+    VkImageLayout new_layout;
+    VkAccessFlags2 new_access;
+    switch_new_props(pmode, &new_layout, &new_access);
+
+    int i;
+    for (i = 0; i < p->vkctx.host_image_props.copyDstLayoutCount; i++) {
+        if (p->vkctx.host_image_props.pCopyDstLayouts[i] == new_layout)
+            break;
+    }
+    if (i == p->vkctx.host_image_props.copyDstLayoutCount)
+        return AVERROR(ENOTSUP);
+
+    for (i = 0; i < nb_images; i++) {
+        layout_change[i] = (VkHostImageLayoutTransitionInfo) {
+            .sType = VK_STRUCTURE_TYPE_HOST_IMAGE_LAYOUT_TRANSITION_INFO,
+            .image = frame->img[i],
+            .oldLayout = frame->layout[i],
+            .newLayout = new_layout,
+            .subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
+            .subresourceRange.layerCount = 1,
+            .subresourceRange.levelCount = 1,
+        };
+        frame->layout[i] = new_layout;
+    }
+
+    ret = vk->TransitionImageLayoutEXT(p->vkctx.hwctx->act_dev,
+                                       nb_images, layout_change);
+    if (ret != VK_SUCCESS) {
+        av_log(hwfc, AV_LOG_ERROR, "Unable to prepare frame: %s\n",
+               ff_vk_ret2str(ret));
+        return AVERROR_EXTERNAL;
+    }
+
+    return 0;
+}
+
+static int prepare_frame(AVHWFramesContext *hwfc, FFVkExecPool *ectx,
+                         AVVkFrame *frame, enum PrepMode pmode)
+{
+    int err = 0;
+    AVVulkanFramesContext *hwfc_vk = hwfc->hwctx;
+    if (hwfc_vk->usage & VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT &&
+        (pmode != PREP_MODE_EXTERNAL_EXPORT) &&
+        (pmode != PREP_MODE_EXTERNAL_IMPORT))
+        err = switch_layout_host(hwfc, ectx, frame, pmode);
+
+    if (err != AVERROR(ENOTSUP))
+        return err;
+
+    return switch_layout(hwfc, ectx, frame, pmode);
+}
+
 static inline void get_plane_wh(uint32_t *w, uint32_t *h, enum AVPixelFormat format,
                                 int frame_w, int frame_h, int plane)
 {
-- 
2.52.0


From a0b5af5c58b72a38286cf2241598ea10ec9b8a97 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Wed, 5 Nov 2025 10:31:18 +0100
Subject: [PATCH 077/304] vulkan/common: add a function to flush/invalidate a
 buffer and use it

Just for convenience.
---
 libavutil/hwcontext_vulkan.c | 45 ++++++++++++------------------------
 libavutil/vulkan.c           | 31 +++++++++++++++++++++++++
 libavutil/vulkan.h           |  7 ++++++
 3 files changed, 53 insertions(+), 30 deletions(-)

diff --git a/libavutil/hwcontext_vulkan.c b/libavutil/hwcontext_vulkan.c
index eaa499aa73..a2caaa0959 100644
--- a/libavutil/hwcontext_vulkan.c
+++ b/libavutil/hwcontext_vulkan.c
@@ -4237,29 +4237,10 @@ static int copy_buffer_data(AVHWFramesContext *hwfc, AVBufferRef *buf,
                             AVFrame *swf, VkBufferImageCopy *region,
                             int planes, int upload)
 {
-    VkResult ret;
+    int err;
     VulkanDevicePriv *p = hwfc->device_ctx->hwctx;
-    FFVulkanFunctions *vk = &p->vkctx.vkfn;
-    AVVulkanDeviceContext *hwctx = &p->p;
-
     FFVkBuffer *vkbuf = (FFVkBuffer *)buf->data;
 
-    const VkMappedMemoryRange flush_info = {
-        .sType  = VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE,
-        .memory = vkbuf->mem,
-        .size   = VK_WHOLE_SIZE,
-    };
-
-    if (!upload && !(vkbuf->flags & VK_MEMORY_PROPERTY_HOST_COHERENT_BIT)) {
-        ret = vk->InvalidateMappedMemoryRanges(hwctx->act_dev, 1,
-                                               &flush_info);
-        if (ret != VK_SUCCESS) {
-            av_log(hwfc, AV_LOG_ERROR, "Failed to invalidate buffer data: %s\n",
-                   ff_vk_ret2str(ret));
-            return AVERROR_EXTERNAL;
-        }
-    }
-
     if (upload) {
         for (int i = 0; i < planes; i++)
             av_image_copy_plane(vkbuf->mapped_mem + region[i].bufferOffset,
@@ -4268,7 +4249,21 @@ static int copy_buffer_data(AVHWFramesContext *hwfc, AVBufferRef *buf,
                                 swf->linesize[i],
                                 swf->linesize[i],
                                 region[i].imageExtent.height);
+
+        err = ff_vk_flush_buffer(&p->vkctx, vkbuf, 0, VK_WHOLE_SIZE, 1);
+        if (err != VK_SUCCESS) {
+            av_log(hwfc, AV_LOG_ERROR, "Failed to flush buffer data: %s\n",
+                   av_err2str(err));
+            return AVERROR_EXTERNAL;
+        }
     } else {
+        err = ff_vk_flush_buffer(&p->vkctx, vkbuf, 0, VK_WHOLE_SIZE, 0);
+        if (err != VK_SUCCESS) {
+            av_log(hwfc, AV_LOG_ERROR, "Failed to invalidate buffer data: %s\n",
+                   av_err2str(err));
+            return AVERROR_EXTERNAL;
+        }
+
         for (int i = 0; i < planes; i++)
             av_image_copy_plane(swf->data[i],
                                 swf->linesize[i],
@@ -4278,16 +4273,6 @@ static int copy_buffer_data(AVHWFramesContext *hwfc, AVBufferRef *buf,
                                 region[i].imageExtent.height);
     }
 
-    if (upload && !(vkbuf->flags & VK_MEMORY_PROPERTY_HOST_COHERENT_BIT)) {
-        ret = vk->FlushMappedMemoryRanges(hwctx->act_dev, 1,
-                                          &flush_info);
-        if (ret != VK_SUCCESS) {
-            av_log(hwfc, AV_LOG_ERROR, "Failed to flush buffer data: %s\n",
-                   ff_vk_ret2str(ret));
-            return AVERROR_EXTERNAL;
-        }
-    }
-
     return 0;
 }
 
diff --git a/libavutil/vulkan.c b/libavutil/vulkan.c
index 54448a32e5..c7f86d524c 100644
--- a/libavutil/vulkan.c
+++ b/libavutil/vulkan.c
@@ -1167,6 +1167,37 @@ int ff_vk_map_buffers(FFVulkanContext *s, FFVkBuffer **buf, uint8_t *mem[],
     return 0;
 }
 
+int ff_vk_flush_buffer(FFVulkanContext *s, FFVkBuffer *buf,
+                       size_t offset, size_t mem_size,
+                       int flush)
+{
+    VkResult ret;
+    FFVulkanFunctions *vk = &s->vkfn;
+
+    if (buf->host_ref || buf->flags & VK_MEMORY_PROPERTY_HOST_COHERENT_BIT)
+        return 0;
+
+    const VkMappedMemoryRange flush_data = {
+        .sType  = VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE,
+        .memory = buf->mem,
+        .offset = offset,
+        .size   = mem_size,
+    };
+
+    if (flush)
+        ret = vk->FlushMappedMemoryRanges(s->hwctx->act_dev, 1, &flush_data);
+    else
+        ret = vk->InvalidateMappedMemoryRanges(s->hwctx->act_dev, 1, &flush_data);
+
+    if (ret != VK_SUCCESS) {
+        av_log(s, AV_LOG_ERROR, "Failed to flush memory: %s\n",
+               ff_vk_ret2str(ret));
+        return AVERROR_EXTERNAL;
+    }
+
+    return 0;
+}
+
 int ff_vk_unmap_buffers(FFVulkanContext *s, FFVkBuffer **buf, int nb_buffers,
                         int flush)
 {
diff --git a/libavutil/vulkan.h b/libavutil/vulkan.h
index e1c9a5792f..bdc20e4645 100644
--- a/libavutil/vulkan.h
+++ b/libavutil/vulkan.h
@@ -523,6 +523,13 @@ int ff_vk_create_buf(FFVulkanContext *s, FFVkBuffer *buf, size_t size,
                      void *pNext, void *alloc_pNext,
                      VkBufferUsageFlags usage, VkMemoryPropertyFlagBits flags);
 
+/**
+ * Flush or invalidate a single buffer, with a given size and offset.
+ */
+int ff_vk_flush_buffer(FFVulkanContext *s, FFVkBuffer *buf,
+                       size_t offset, size_t mem_size,
+                       int flush);
+
 /**
  * Buffer management code.
  */
-- 
2.52.0


From 26b00cfee7290f2c54adf2e5eaed6cb819d5ebe9 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Tue, 18 Nov 2025 12:09:51 +0100
Subject: [PATCH 078/304] vulkan/common: add reverse2 endian reversal macro

---
 libavcodec/vulkan/common.comp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libavcodec/vulkan/common.comp b/libavcodec/vulkan/common.comp
index 6825693fa3..eda92ce28d 100644
--- a/libavcodec/vulkan/common.comp
+++ b/libavcodec/vulkan/common.comp
@@ -79,6 +79,9 @@ uint64_t align64(uint64_t src, uint64_t a)
     return src + a - res;
 }
 
+#define reverse2(src) \
+    (pack16(unpack8(uint16_t(src)).yx))
+
 #define reverse4(src) \
     (pack32(unpack8(uint32_t(src)).wzyx))
 
-- 
2.52.0


From 903fdbd2c645dd40f75fc8458d38a5a5231a8c20 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Mon, 24 Nov 2025 22:34:51 +0100
Subject: [PATCH 079/304] vulkan: change ff_vk_frame_barrier access and stage
 type to sync2

Cleans up a compiler warning.
---
 libavutil/vulkan.c | 10 +++++-----
 libavutil/vulkan.h |  8 ++++----
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/libavutil/vulkan.c b/libavutil/vulkan.c
index c7f86d524c..1a1d7d0f8d 100644
--- a/libavutil/vulkan.c
+++ b/libavutil/vulkan.c
@@ -2043,11 +2043,11 @@ fail:
 
 void ff_vk_frame_barrier(FFVulkanContext *s, FFVkExecContext *e,
                          AVFrame *pic, VkImageMemoryBarrier2 *bar, int *nb_bar,
-                         VkPipelineStageFlags src_stage,
-                         VkPipelineStageFlags dst_stage,
-                         VkAccessFlagBits     new_access,
-                         VkImageLayout        new_layout,
-                         uint32_t             new_qf)
+                         VkPipelineStageFlags2 src_stage,
+                         VkPipelineStageFlags2 dst_stage,
+                         VkAccessFlagBits2     new_access,
+                         VkImageLayout         new_layout,
+                         uint32_t              new_qf)
 {
     int found = -1;
     AVVkFrame *vkf = (AVVkFrame *)pic->data[0];
diff --git a/libavutil/vulkan.h b/libavutil/vulkan.h
index bdc20e4645..868788c2a4 100644
--- a/libavutil/vulkan.h
+++ b/libavutil/vulkan.h
@@ -507,10 +507,10 @@ int ff_vk_create_imageviews(FFVulkanContext *s, FFVkExecContext *e,
 
 void ff_vk_frame_barrier(FFVulkanContext *s, FFVkExecContext *e,
                          AVFrame *pic, VkImageMemoryBarrier2 *bar, int *nb_bar,
-                         VkPipelineStageFlags src_stage,
-                         VkPipelineStageFlags dst_stage,
-                         VkAccessFlagBits     new_access,
-                         VkImageLayout        new_layout,
+                         VkPipelineStageFlags2 src_stage,
+                         VkPipelineStageFlags2 dst_stage,
+                         VkAccessFlagBits2     new_access,
+                         VkImageLayout         new_layout,
                          uint32_t             new_qf);
 
 /**
-- 
2.52.0


From 28737c3cd90112faed653da70c45ad71ed03f381 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Wed, 26 Nov 2025 12:59:24 +0100
Subject: [PATCH 080/304] ffv1enc_vulkan: add support for x2bgr10/x2rgb10

---
 libavcodec/ffv1enc.c | 2 ++
 libavutil/vulkan.c   | 6 ++++++
 2 files changed, 8 insertions(+)

diff --git a/libavcodec/ffv1enc.c b/libavcodec/ffv1enc.c
index 8e5ebe773c..5daa3aa0cd 100644
--- a/libavcodec/ffv1enc.c
+++ b/libavcodec/ffv1enc.c
@@ -914,6 +914,8 @@ av_cold int ff_ffv1_encode_setup_plane_info(AVCodecContext *avctx,
     case AV_PIX_FMT_GBRP9:
         if (!avctx->bits_per_raw_sample)
             s->bits_per_raw_sample = 9;
+    case AV_PIX_FMT_X2BGR10:
+    case AV_PIX_FMT_X2RGB10:
     case AV_PIX_FMT_GBRP10:
     case AV_PIX_FMT_GBRAP10:
         if (!avctx->bits_per_raw_sample && !s->bits_per_raw_sample)
diff --git a/libavutil/vulkan.c b/libavutil/vulkan.c
index 1a1d7d0f8d..c3d8c1d0d9 100644
--- a/libavutil/vulkan.c
+++ b/libavutil/vulkan.c
@@ -1600,6 +1600,12 @@ void ff_vk_set_perm(enum AVPixelFormat pix_fmt, int lut[4], int inv)
         lut[2] = 0;
         lut[3] = 3;
         break;
+    case AV_PIX_FMT_X2BGR10:
+        lut[0] = 0;
+        lut[1] = 2;
+        lut[2] = 1;
+        lut[3] = 3;
+        break;
     default:
         lut[0] = 0;
         lut[1] = 1;
-- 
2.52.0


From 262dc3d79e17dcebdce6f3b8ba81ad0de17761e0 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Wed, 26 Nov 2025 13:06:11 +0100
Subject: [PATCH 081/304] vulkan_ffv1: fix swapped colors for x2bgr10

---
 libavcodec/vulkan/ffv1_dec.comp | 3 +++
 libavcodec/vulkan_ffv1.c        | 2 --
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/libavcodec/vulkan/ffv1_dec.comp b/libavcodec/vulkan/ffv1_dec.comp
index eb795dcba4..b726491e98 100644
--- a/libavcodec/vulkan/ffv1_dec.comp
+++ b/libavcodec/vulkan/ffv1_dec.comp
@@ -214,6 +214,9 @@ void writeout_rgb(in SliceContext sc, ivec2 sp, int w, int y, bool apply_rct)
 
         if (expectEXT(apply_rct, true))
             pix = transform_sample(pix, sc.slice_rct_coef);
+        else
+            pix = ivec4(pix[fmt_lut[0]], pix[fmt_lut[1]],
+                        pix[fmt_lut[2]], pix[fmt_lut[3]]);
 
         imageStore(dst[0], pos, pix);
         if (planar_rgb != 0) {
diff --git a/libavcodec/vulkan_ffv1.c b/libavcodec/vulkan_ffv1.c
index 1ed9d7dd6c..1aa63c0dd3 100644
--- a/libavcodec/vulkan_ffv1.c
+++ b/libavcodec/vulkan_ffv1.c
@@ -450,8 +450,6 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
     if (sw_format == AV_PIX_FMT_GBRP10 || sw_format == AV_PIX_FMT_GBRP12 ||
         sw_format == AV_PIX_FMT_GBRP14)
         memcpy(pd.fmt_lut, (int [4]) { 2, 1, 0, 3 }, 4*sizeof(int));
-    else if (sw_format == AV_PIX_FMT_X2BGR10)
-        memcpy(pd.fmt_lut, (int [4]) { 0, 2, 1, 3 }, 4*sizeof(int));
     else
         ff_vk_set_perm(sw_format, pd.fmt_lut, 0);
 
-- 
2.52.0


From 8991565f73fd14a8c5038a18f8c6e6726a045a60 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Sat, 22 Nov 2025 13:13:43 +0100
Subject: [PATCH 082/304] vulkan_ffv1/prores: remove unnecessary slice buffer
 unref

The slice buffer is already unref'd by ff_vk_decode_free_frame().
---
 libavcodec/vulkan_ffv1.c   | 1 -
 libavcodec/vulkan_prores.c | 2 --
 2 files changed, 3 deletions(-)

diff --git a/libavcodec/vulkan_ffv1.c b/libavcodec/vulkan_ffv1.c
index 1aa63c0dd3..4787a06e97 100644
--- a/libavcodec/vulkan_ffv1.c
+++ b/libavcodec/vulkan_ffv1.c
@@ -1146,7 +1146,6 @@ static void vk_ffv1_free_frame_priv(AVRefStructOpaque _hwctx, void *data)
                    i, status, crc_res);
     }
 
-    av_buffer_unref(&vp->slices_buf);
     av_buffer_unref(&fp->slice_state);
     av_buffer_unref(&fp->slice_offset_buf);
     av_buffer_unref(&fp->slice_status_buf);
diff --git a/libavcodec/vulkan_prores.c b/libavcodec/vulkan_prores.c
index 42e8fa0b06..b252550f7c 100644
--- a/libavcodec/vulkan_prores.c
+++ b/libavcodec/vulkan_prores.c
@@ -573,11 +573,9 @@ static void vk_prores_free_frame_priv(AVRefStructOpaque _hwctx, void *data)
 {
     AVHWDeviceContext    *dev_ctx = _hwctx.nc;
     ProresVulkanDecodePicture *pp = data;
-    FFVulkanDecodePicture *vp = &pp->vp;
 
     ff_vk_decode_free_frame(dev_ctx, &pp->vp);
 
-    av_buffer_unref(&vp->slices_buf);
     av_buffer_unref(&pp->metadata_buf);
 }
 
-- 
2.52.0


From c92833409b1b8c764cf135e470a8f89fb33d740a Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Sun, 23 Nov 2025 19:49:01 +0100
Subject: [PATCH 083/304] ffv1dec: call ff_get_format if the EC coding changes

Decoders need to track all state that hwaccels may be intersted in,
and trigger a reconfiguration if it changes.
---
 libavcodec/ffv1.h    | 1 +
 libavcodec/ffv1dec.c | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/libavcodec/ffv1.h b/libavcodec/ffv1.h
index adf76b0644..8a48e8e682 100644
--- a/libavcodec/ffv1.h
+++ b/libavcodec/ffv1.h
@@ -140,6 +140,7 @@ typedef struct FFV1Context {
     enum AVPixelFormat pix_fmt;
     enum AVPixelFormat configured_pix_fmt;
     int configured_width, configured_height;
+    int configured_ac;
 
     const AVFrame *cur_enc_frame;
     int plane_count;
diff --git a/libavcodec/ffv1dec.c b/libavcodec/ffv1dec.c
index a70cd74af4..630a60595d 100644
--- a/libavcodec/ffv1dec.c
+++ b/libavcodec/ffv1dec.c
@@ -509,13 +509,15 @@ static int read_header(FFV1Context *f, RangeCoder *c)
 
     if (f->configured_pix_fmt != f->pix_fmt ||
         f->configured_width != f->avctx->width ||
-        f->configured_height != f->avctx->height) {
+        f->configured_height != f->avctx->height ||
+        f->configured_ac != f->ac) {
         f->avctx->pix_fmt = get_pixel_format(f);
         if (f->avctx->pix_fmt < 0)
             return AVERROR(EINVAL);
         f->configured_pix_fmt = f->pix_fmt;
         f->configured_width = f->avctx->width;
         f->configured_height = f->avctx->height;
+        f->configured_ac = f->ac;
     }
 
     ff_dlog(f->avctx, "%d %d %d\n",
@@ -921,6 +923,7 @@ static int update_thread_context(AVCodecContext *dst, const AVCodecContext *src)
     fdst->colorspace          = fsrc->colorspace;
     fdst->pix_fmt             = fsrc->pix_fmt;
     fdst->configured_pix_fmt  = fsrc->configured_pix_fmt;
+    fdst->configured_ac       = fsrc->configured_ac;
 
     fdst->ec                  = fsrc->ec;
     fdst->intra               = fsrc->intra;
-- 
2.52.0


From 16bc07d9243c82a588f679f7bce8a11d77d8a08e Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Sat, 22 Nov 2025 22:37:17 +0100
Subject: [PATCH 084/304] vulkan_ffv1: initialize only the necessary shaders on
 init

The decoder will reinit the hwaccel upon pixfmt/dimension changes,
so we can remove the f->use32bit and is_rgb variants of all shaders.

This speeds up init time.
---
 libavcodec/vulkan_ffv1.c | 101 ++++++++++++++++-----------------------
 1 file changed, 42 insertions(+), 59 deletions(-)

diff --git a/libavcodec/vulkan_ffv1.c b/libavcodec/vulkan_ffv1.c
index 4787a06e97..7d1e0c9ba7 100644
--- a/libavcodec/vulkan_ffv1.c
+++ b/libavcodec/vulkan_ffv1.c
@@ -59,11 +59,11 @@ typedef struct FFv1VulkanDecodePicture {
 } FFv1VulkanDecodePicture;
 
 typedef struct FFv1VulkanDecodeContext {
-    AVBufferRef *intermediate_frames_ref[2]; /* 16/32 bit */
+    AVBufferRef *intermediate_frames_ref;
 
     FFVulkanShader setup;
-    FFVulkanShader reset[2]; /* AC/Golomb */
-    FFVulkanShader decode[2][2][2]; /* 16/32 bit, AC/Golomb, Normal/RGB */
+    FFVulkanShader reset;
+    FFVulkanShader decode;
 
     FFVkBuffer rangecoder_static_buf;
     FFVkBuffer quant_buf;
@@ -239,7 +239,7 @@ static int vk_ffv1_start_frame(AVCodecContext          *avctx,
         if (!vp->dpb_frame)
             return AVERROR(ENOMEM);
 
-        err = av_hwframe_get_buffer(fv->intermediate_frames_ref[f->use32bit],
+        err = av_hwframe_get_buffer(fv->intermediate_frames_ref,
                                     vp->dpb_frame, 0);
         if (err < 0)
             return err;
@@ -472,7 +472,7 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
     }
 
     /* Reset shader */
-    reset_shader = &fv->reset[f->ac == AC_GOLOMB_RICE];
+    reset_shader = &fv->reset;
     ff_vk_shader_update_desc_buffer(&ctx->s, exec, reset_shader,
                                     1, 0, 0,
                                     slice_state,
@@ -525,7 +525,7 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
                     f->plane_count);
 
     /* Decode */
-    decode_shader = &fv->decode[f->use32bit][f->ac == AC_GOLOMB_RICE][is_rgb];
+    decode_shader = &fv->decode;
     ff_vk_shader_update_desc_buffer(&ctx->s, exec, decode_shader,
                                     1, 0, 0,
                                     slice_state,
@@ -821,7 +821,7 @@ static int init_decode_shader(FFV1Context *f, FFVulkanContext *s,
                               FFVulkanShader *shd,
                               AVHWFramesContext *dec_frames_ctx,
                               AVHWFramesContext *out_frames_ctx,
-                              int use32bit, int ac, int rgb)
+                              int ac, int rgb)
 {
     int err;
     FFVulkanDescriptorSetBinding *desc_set;
@@ -829,6 +829,7 @@ static int init_decode_shader(FFV1Context *f, FFVulkanContext *s,
     uint8_t *spv_data;
     size_t spv_len;
     void *spv_opaque = NULL;
+
     int use_cached_reader = ac != AC_GOLOMB_RICE &&
                             s->driver_props.driverID == VK_DRIVER_ID_MESA_RADV;
 
@@ -877,7 +878,7 @@ static int init_decode_shader(FFV1Context *f, FFVulkanContext *s,
 
     RET(ff_vk_shader_add_descriptor_set(s, shd, desc_set, 2, 1, 0));
 
-    define_shared_code(shd, use32bit);
+    define_shared_code(shd, f->use32bit);
     if (ac == AC_GOLOMB_RICE)
         GLSLD(ff_source_ffv1_vlc_comp);
 
@@ -973,18 +974,11 @@ static void vk_decode_ffv1_uninit(FFVulkanDecodeShared *ctx)
 {
     FFv1VulkanDecodeContext *fv = ctx->sd_ctx;
 
+    av_buffer_unref(&fv->intermediate_frames_ref);
+
     ff_vk_shader_free(&ctx->s, &fv->setup);
-
-    for (int i = 0; i < 2; i++) /* 16/32 bit */
-        av_buffer_unref(&fv->intermediate_frames_ref[i]);
-
-    for (int i = 0; i < 2; i++) /* AC/Golomb */
-        ff_vk_shader_free(&ctx->s, &fv->reset[i]);
-
-    for (int i = 0; i < 2; i++) /* 16/32 bit */
-        for (int j = 0; j < 2; j++) /* AC/Golomb */
-            for (int k = 0; k < 2; k++) /* Normal/RGB */
-                ff_vk_shader_free(&ctx->s, &fv->decode[i][j][k]);
+    ff_vk_shader_free(&ctx->s, &fv->reset);
+    ff_vk_shader_free(&ctx->s, &fv->decode);
 
     ff_vk_free_buf(&ctx->s, &fv->quant_buf);
     ff_vk_free_buf(&ctx->s, &fv->rangecoder_static_buf);
@@ -1029,38 +1023,33 @@ static int vk_decode_ffv1_init(AVCodecContext *avctx)
 
     ctx->sd_ctx_free = &vk_decode_ffv1_uninit;
 
+    AVHWFramesContext *hwfc = (AVHWFramesContext *)avctx->hw_frames_ctx->data;
+    AVHWFramesContext *dctx = hwfc;
+    enum AVPixelFormat sw_format = hwfc->sw_format;
+    int is_rgb = !(f->colorspace == 0 && sw_format != AV_PIX_FMT_YA8) &&
+                 !(sw_format == AV_PIX_FMT_YA8);
+
     /* Intermediate frame pool for RCT */
-    for (int i = 0; i < 2; i++) { /* 16/32 bit */
-        RET(init_indirect(avctx, &ctx->s, &fv->intermediate_frames_ref[i],
-                          i ? AV_PIX_FMT_GBRAP32 : AV_PIX_FMT_GBRAP16));
+    if (is_rgb) {
+        RET(init_indirect(avctx, &ctx->s, &fv->intermediate_frames_ref,
+                          f->use32bit ? AV_PIX_FMT_GBRAP32 : AV_PIX_FMT_GBRAP16));
+        dctx = (AVHWFramesContext *)fv->intermediate_frames_ref->data;
     }
 
     /* Setup shader */
     RET(init_setup_shader(f, &ctx->s, &ctx->exec_pool, spv, &fv->setup));
 
-    /* Reset shaders */
-    for (int i = 0; i < 2; i++) { /* AC/Golomb */
-        RET(init_reset_shader(f, &ctx->s, &ctx->exec_pool,
-                              spv, &fv->reset[i], !i ? AC_RANGE_CUSTOM_TAB : 0));
-    }
+    /* Reset shader */
+    RET(init_reset_shader(f, &ctx->s, &ctx->exec_pool,
+                          spv, &fv->reset, f->ac));
 
     /* Decode shaders */
-    for (int i = 0; i < 2; i++) { /* 16/32 bit */
-        for (int j = 0; j < 2; j++) { /* AC/Golomb */
-            for (int k = 0; k < 2; k++) { /* Normal/RGB */
-                AVHWFramesContext *dec_frames_ctx;
-                dec_frames_ctx = k ? (AVHWFramesContext *)fv->intermediate_frames_ref[i]->data :
-                                     (AVHWFramesContext *)avctx->hw_frames_ctx->data;
-                RET(init_decode_shader(f, &ctx->s, &ctx->exec_pool,
-                                       spv, &fv->decode[i][j][k],
-                                       dec_frames_ctx,
-                                       (AVHWFramesContext *)avctx->hw_frames_ctx->data,
-                                       i,
-                                       !j ? AC_RANGE_CUSTOM_TAB : AC_GOLOMB_RICE,
-                                       k));
-            }
-        }
-    }
+    RET(init_decode_shader(f, &ctx->s, &ctx->exec_pool,
+                           spv, &fv->decode,
+                           dctx,
+                           hwfc,
+                           f->ac,
+                           is_rgb));
 
     /* Range coder data */
     RET(ff_ffv1_vk_init_state_transition_data(&ctx->s,
@@ -1090,22 +1079,16 @@ static int vk_decode_ffv1_init(AVCodecContext *avctx)
                                         VK_FORMAT_UNDEFINED));
 
     /* Update decode global descriptors */
-    for (int i = 0; i < 2; i++) { /* 16/32 bit */
-        for (int j = 0; j < 2; j++) { /* AC/Golomb */
-            for (int k = 0; k < 2; k++) { /* Normal/RGB */
-                RET(ff_vk_shader_update_desc_buffer(&ctx->s, &ctx->exec_pool.contexts[0],
-                                                    &fv->decode[i][j][k], 0, 0, 0,
-                                                    &fv->rangecoder_static_buf,
-                                                    0, fv->rangecoder_static_buf.size,
-                                                    VK_FORMAT_UNDEFINED));
-                RET(ff_vk_shader_update_desc_buffer(&ctx->s, &ctx->exec_pool.contexts[0],
-                                                    &fv->decode[i][j][k], 0, 1, 0,
-                                                    &fv->quant_buf,
-                                                    0, fv->quant_buf.size,
-                                                    VK_FORMAT_UNDEFINED));
-            }
-        }
-    }
+    RET(ff_vk_shader_update_desc_buffer(&ctx->s, &ctx->exec_pool.contexts[0],
+                                        &fv->decode, 0, 0, 0,
+                                        &fv->rangecoder_static_buf,
+                                        0, fv->rangecoder_static_buf.size,
+                                        VK_FORMAT_UNDEFINED));
+    RET(ff_vk_shader_update_desc_buffer(&ctx->s, &ctx->exec_pool.contexts[0],
+                                        &fv->decode, 0, 1, 0,
+                                        &fv->quant_buf,
+                                        0, fv->quant_buf.size,
+                                        VK_FORMAT_UNDEFINED));
 
 fail:
     spv->uninit(&spv);
-- 
2.52.0


From e0e496b0dc4a840aa3b0dd28c03e7dbcffcf1755 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Sun, 23 Nov 2025 20:00:21 +0100
Subject: [PATCH 085/304] proresdec: call ff_get_format if the interlacing
 changes

Decoders need to track all state that hwaccels may be intersted in,
and trigger a reconfiguration if it changes.
---
 libavcodec/proresdec.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/libavcodec/proresdec.c b/libavcodec/proresdec.c
index 5c6b505527..b971d17972 100644
--- a/libavcodec/proresdec.c
+++ b/libavcodec/proresdec.c
@@ -189,6 +189,7 @@ static int decode_frame_header(ProresContext *ctx, const uint8_t *buf,
     int version;
     const uint8_t *ptr;
     enum AVPixelFormat pix_fmt;
+    int old_frame_type = ctx->frame_type;
 
     hdr_size = AV_RB16(buf);
     ff_dlog(avctx, "header size %d\n", hdr_size);
@@ -251,7 +252,8 @@ static int decode_frame_header(ProresContext *ctx, const uint8_t *buf,
         }
     }
 
-    if (pix_fmt != ctx->pix_fmt || dimensions_changed) {
+    if (pix_fmt != ctx->pix_fmt || dimensions_changed ||
+        ctx->frame_type != old_frame_type) {
 #define HWACCEL_MAX (CONFIG_PRORES_VIDEOTOOLBOX_HWACCEL + CONFIG_PRORES_VULKAN_HWACCEL)
 #if HWACCEL_MAX
         enum AVPixelFormat pix_fmts[HWACCEL_MAX + 2], *fmtp = pix_fmts;
@@ -856,6 +858,7 @@ static int update_thread_context(AVCodecContext *dst, const AVCodecContext *src)
     ProresContext *cdst = dst->priv_data;
 
     cdst->pix_fmt = csrc->pix_fmt;
+    cdst->frame_type = csrc->frame_type;
 
     return 0;
 }
-- 
2.52.0


From 1c99ceab1fcba2a7bad79e56a00f0733d6c66aa7 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Sun, 23 Nov 2025 20:05:47 +0100
Subject: [PATCH 086/304] vulkan_prores: initialize only the necessary shaders
 on init

---
 libavcodec/vulkan_prores.c | 188 +++++++++++++++++--------------------
 1 file changed, 88 insertions(+), 100 deletions(-)

diff --git a/libavcodec/vulkan_prores.c b/libavcodec/vulkan_prores.c
index b252550f7c..0c704c3d1c 100644
--- a/libavcodec/vulkan_prores.c
+++ b/libavcodec/vulkan_prores.c
@@ -47,11 +47,9 @@ typedef struct ProresVulkanDecodePicture {
 } ProresVulkanDecodePicture;
 
 typedef struct ProresVulkanDecodeContext {
-    struct ProresVulkanShaderVariants {
-        FFVulkanShader reset;
-        FFVulkanShader vld;
-        FFVulkanShader idct;
-    } shaders[2]; /* Progressive/interlaced */
+    FFVulkanShader reset;
+    FFVulkanShader vld;
+    FFVulkanShader idct;
 
     AVBufferPool *metadata_pool;
 } ProresVulkanDecodeContext;
@@ -163,7 +161,6 @@ static int vk_prores_end_frame(AVCodecContext *avctx)
 
     ProresVkParameters pd;
     FFVkBuffer *slice_data, *metadata;
-    struct ProresVulkanShaderVariants *shaders;
     VkImageMemoryBarrier2 img_bar[AV_NUM_DATA_POINTERS];
     VkBufferMemoryBarrier2 buf_bar[2];
     int nb_img_bar = 0, nb_buf_bar = 0, err;
@@ -179,8 +176,6 @@ static int vk_prores_end_frame(AVCodecContext *avctx)
     slice_data = (FFVkBuffer *)vp->slices_buf->data;
     metadata   = (FFVkBuffer *)pp->metadata_buf->data;
 
-    shaders = &pv->shaders[pr->frame_type != 0];
-
     pd = (ProresVkParameters) {
         .slice_data       = slice_data->address,
         .bitstream_size   = pp->bitstream_size,
@@ -253,14 +248,14 @@ static int vk_prores_end_frame(AVCodecContext *avctx)
     nb_img_bar = nb_buf_bar = 0;
 
     /* Reset */
-    ff_vk_shader_update_img_array(&ctx->s, exec, &shaders->reset,
+    ff_vk_shader_update_img_array(&ctx->s, exec, &pv->reset,
                                   pr->frame, vp->view.out,
                                   0, 0,
                                   VK_IMAGE_LAYOUT_GENERAL,
                                   VK_NULL_HANDLE);
 
-    ff_vk_exec_bind_shader(&ctx->s, exec, &shaders->reset);
-    ff_vk_shader_update_push_const(&ctx->s, exec, &shaders->reset,
+    ff_vk_exec_bind_shader(&ctx->s, exec, &pv->reset);
+    ff_vk_shader_update_push_const(&ctx->s, exec, &pv->reset,
                                    VK_SHADER_STAGE_COMPUTE_BIT,
                                    0, sizeof(pd), &pd);
 
@@ -284,24 +279,24 @@ static int vk_prores_end_frame(AVCodecContext *avctx)
     nb_img_bar = nb_buf_bar = 0;
 
     /* Entropy decode */
-    ff_vk_shader_update_desc_buffer(&ctx->s, exec, &shaders->vld,
+    ff_vk_shader_update_desc_buffer(&ctx->s, exec, &pv->vld,
                                     0, 0, 0,
                                     metadata, 0,
                                     pp->slice_offsets_sz,
                                     VK_FORMAT_UNDEFINED);
-    ff_vk_shader_update_desc_buffer(&ctx->s, exec, &shaders->vld,
+    ff_vk_shader_update_desc_buffer(&ctx->s, exec, &pv->vld,
                                     0, 1, 0,
                                     metadata, pp->slice_offsets_sz,
                                     pp->mb_params_sz,
                                     VK_FORMAT_UNDEFINED);
-    ff_vk_shader_update_img_array(&ctx->s, exec, &shaders->vld,
+    ff_vk_shader_update_img_array(&ctx->s, exec, &pv->vld,
                                   pr->frame, vp->view.out,
                                   0, 2,
                                   VK_IMAGE_LAYOUT_GENERAL,
                                   VK_NULL_HANDLE);
 
-    ff_vk_exec_bind_shader(&ctx->s, exec, &shaders->vld);
-    ff_vk_shader_update_push_const(&ctx->s, exec, &shaders->vld,
+    ff_vk_exec_bind_shader(&ctx->s, exec, &pv->vld);
+    ff_vk_shader_update_push_const(&ctx->s, exec, &pv->vld,
                                    VK_SHADER_STAGE_COMPUTE_BIT,
                                    0, sizeof(pd), &pd);
 
@@ -341,19 +336,19 @@ static int vk_prores_end_frame(AVCodecContext *avctx)
     nb_img_bar = nb_buf_bar = 0;
 
     /* Inverse transform */
-    ff_vk_shader_update_desc_buffer(&ctx->s, exec, &shaders->idct,
+    ff_vk_shader_update_desc_buffer(&ctx->s, exec, &pv->idct,
                                     0, 0, 0,
                                     metadata, pp->slice_offsets_sz,
                                     pp->mb_params_sz,
                                     VK_FORMAT_UNDEFINED);
-    ff_vk_shader_update_img_array(&ctx->s, exec, &shaders->idct,
+    ff_vk_shader_update_img_array(&ctx->s, exec, &pv->idct,
                                   pr->frame, vp->view.out,
                                   0, 1,
                                   VK_IMAGE_LAYOUT_GENERAL,
                                   VK_NULL_HANDLE);
 
-    ff_vk_exec_bind_shader(&ctx->s, exec, &shaders->idct);
-    ff_vk_shader_update_push_const(&ctx->s, exec, &shaders->idct,
+    ff_vk_exec_bind_shader(&ctx->s, exec, &pv->idct);
+    ff_vk_shader_update_push_const(&ctx->s, exec, &pv->idct,
                                    VK_SHADER_STAGE_COMPUTE_BIT,
                                    0, sizeof(pd), &pd);
 
@@ -439,13 +434,10 @@ fail:
 static void vk_decode_prores_uninit(FFVulkanDecodeShared *ctx)
 {
     ProresVulkanDecodeContext *pv = ctx->sd_ctx;
-    int i;
 
-    for (i = 0; i < FF_ARRAY_ELEMS(pv->shaders); ++i) {
-        ff_vk_shader_free(&ctx->s, &pv->shaders[i].reset);
-        ff_vk_shader_free(&ctx->s, &pv->shaders[i].vld);
-        ff_vk_shader_free(&ctx->s, &pv->shaders[i].idct);
-    }
+    ff_vk_shader_free(&ctx->s, &pv->reset);
+    ff_vk_shader_free(&ctx->s, &pv->vld);
+    ff_vk_shader_free(&ctx->s, &pv->idct);
 
     av_buffer_pool_uninit(&pv->metadata_pool);
 
@@ -456,12 +448,13 @@ static int vk_decode_prores_init(AVCodecContext *avctx)
 {
     FFVulkanDecodeContext *dec = avctx->internal->hwaccel_priv_data;
     FFVulkanDecodeShared  *ctx = NULL;
+    ProresContext          *pr = avctx->priv_data;
 
     AVHWFramesContext *out_frames_ctx;
     ProresVulkanDecodeContext *pv;
     FFVkSPIRVCompiler *spv;
     FFVulkanDescriptorSetBinding *desc_set;
-    int max_num_mbs, i, err;
+    int max_num_mbs, err;
 
     max_num_mbs = (avctx->coded_width >> 4) * (avctx->coded_height >> 4);
 
@@ -486,80 +479,75 @@ static int vk_decode_prores_init(AVCodecContext *avctx)
 
     ctx->sd_ctx_free = vk_decode_prores_uninit;
 
-    for (i = 0; i < FF_ARRAY_ELEMS(pv->shaders); ++i) { /* Progressive/interlaced */
-        struct ProresVulkanShaderVariants *shaders = &pv->shaders[i];
+    desc_set = (FFVulkanDescriptorSetBinding []) {
+        {
+            .name       = "dst",
+            .type       = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
+            .dimensions = 2,
+            .mem_layout = ff_vk_shader_rep_fmt(out_frames_ctx->sw_format,
+            FF_VK_REP_NATIVE),
+            .mem_quali  = "writeonly",
+            .elems      = av_pix_fmt_count_planes(out_frames_ctx->sw_format),
+            .stages     = VK_SHADER_STAGE_COMPUTE_BIT,
+        },
+    };
+    RET(init_shader(avctx, &ctx->s, &ctx->exec_pool, spv, &pv->reset,
+                    "prores_dec_reset", "main", desc_set, 1,
+                    ff_source_prores_reset_comp, 0x080801, pr->frame_type != 0));
+    desc_set = (FFVulkanDescriptorSetBinding []) {
+        {
+            .name        = "slice_offsets_buf",
+            .type        = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
+            .stages      = VK_SHADER_STAGE_COMPUTE_BIT,
+            .mem_quali   = "readonly",
+            .buf_content = "uint32_t slice_offsets",
+            .buf_elems   = max_num_mbs + 1,
+        },
+        {
+            .name        = "quant_idx_buf",
+            .type        = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
+            .stages      = VK_SHADER_STAGE_COMPUTE_BIT,
+            .mem_quali   = "writeonly",
+            .buf_content = "uint8_t quant_idx",
+            .buf_elems   = max_num_mbs,
+        },
+        {
+            .name       = "dst",
+            .type       = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
+            .dimensions = 2,
+            .mem_layout = ff_vk_shader_rep_fmt(out_frames_ctx->sw_format,
+            FF_VK_REP_NATIVE),
+            .mem_quali  = "writeonly",
+            .elems      = av_pix_fmt_count_planes(out_frames_ctx->sw_format),
+            .stages     = VK_SHADER_STAGE_COMPUTE_BIT,
+        },
+    };
+    RET(init_shader(avctx, &ctx->s, &ctx->exec_pool, spv, &pv->vld,
+                    "prores_dec_vld", "main", desc_set, 3,
+                    ff_source_prores_vld_comp, 0x080801, pr->frame_type != 0));
 
-        desc_set = (FFVulkanDescriptorSetBinding []) {
-            {
-                .name       = "dst",
-                .type       = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
-                .dimensions = 2,
-                .mem_layout = ff_vk_shader_rep_fmt(out_frames_ctx->sw_format,
-                                                   FF_VK_REP_NATIVE),
-                .mem_quali  = "writeonly",
-                .elems      = av_pix_fmt_count_planes(out_frames_ctx->sw_format),
-                .stages     = VK_SHADER_STAGE_COMPUTE_BIT,
-            },
-        };
-        RET(init_shader(avctx, &ctx->s, &ctx->exec_pool, spv, &shaders->reset,
-                        "prores_dec_reset", "main", desc_set, 1,
-                        ff_source_prores_reset_comp, 0x080801, i));
-
-        desc_set = (FFVulkanDescriptorSetBinding []) {
-            {
-                .name        = "slice_offsets_buf",
-                .type        = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-                .stages      = VK_SHADER_STAGE_COMPUTE_BIT,
-                .mem_quali   = "readonly",
-                .buf_content = "uint32_t slice_offsets",
-                .buf_elems   = max_num_mbs + 1,
-            },
-            {
-                .name        = "quant_idx_buf",
-                .type        = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-                .stages      = VK_SHADER_STAGE_COMPUTE_BIT,
-                .mem_quali   = "writeonly",
-                .buf_content = "uint8_t quant_idx",
-                .buf_elems   = max_num_mbs,
-            },
-            {
-                .name       = "dst",
-                .type       = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
-                .dimensions = 2,
-                .mem_layout = ff_vk_shader_rep_fmt(out_frames_ctx->sw_format,
-                                                   FF_VK_REP_NATIVE),
-                .mem_quali  = "writeonly",
-                .elems      = av_pix_fmt_count_planes(out_frames_ctx->sw_format),
-                .stages     = VK_SHADER_STAGE_COMPUTE_BIT,
-            },
-        };
-        RET(init_shader(avctx, &ctx->s, &ctx->exec_pool, spv, &shaders->vld,
-                        "prores_dec_vld", "main", desc_set, 3,
-                        ff_source_prores_vld_comp, 0x080801, i));
-
-        desc_set = (FFVulkanDescriptorSetBinding []) {
-            {
-                .name        = "quant_idx_buf",
-                .type        = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-                .stages      = VK_SHADER_STAGE_COMPUTE_BIT,
-                .mem_quali   = "readonly",
-                .buf_content = "uint8_t quant_idx",
-                .buf_elems   = max_num_mbs,
-            },
-            {
-                .name       = "dst",
-                .type       = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
-                .dimensions = 2,
-                .mem_layout = ff_vk_shader_rep_fmt(out_frames_ctx->sw_format,
-                                                   FF_VK_REP_NATIVE),
-                .elems      = av_pix_fmt_count_planes(out_frames_ctx->sw_format),
-                .stages     = VK_SHADER_STAGE_COMPUTE_BIT,
-            },
-        };
-        RET(init_shader(avctx, &ctx->s, &ctx->exec_pool, spv, &shaders->idct,
-                        "prores_dec_idct", "main", desc_set, 2,
-                        ff_source_prores_idct_comp, 0x200201, i));
-    }
+    desc_set = (FFVulkanDescriptorSetBinding []) {
+        {
+            .name        = "quant_idx_buf",
+            .type        = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
+            .stages      = VK_SHADER_STAGE_COMPUTE_BIT,
+            .mem_quali   = "readonly",
+            .buf_content = "uint8_t quant_idx",
+            .buf_elems   = max_num_mbs,
+        },
+        {
+            .name       = "dst",
+            .type       = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
+            .dimensions = 2,
+            .mem_layout = ff_vk_shader_rep_fmt(out_frames_ctx->sw_format,
+            FF_VK_REP_NATIVE),
+            .elems      = av_pix_fmt_count_planes(out_frames_ctx->sw_format),
+            .stages     = VK_SHADER_STAGE_COMPUTE_BIT,
+        },
+    };
+    RET(init_shader(avctx, &ctx->s, &ctx->exec_pool, spv, &pv->idct,
+                    "prores_dec_idct", "main", desc_set, 2,
+                    ff_source_prores_idct_comp, 0x200201, pr->frame_type != 0));
 
     err = 0;
 
-- 
2.52.0


From 5d651f4ba25063c89d4d51b058a814163e62fb0d Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Sun, 23 Nov 2025 20:11:58 +0100
Subject: [PATCH 087/304] prores_raw: call ff_get_format if the version changes

---
 libavcodec/prores_raw.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/libavcodec/prores_raw.c b/libavcodec/prores_raw.c
index 7017480336..01f1bbd2fb 100644
--- a/libavcodec/prores_raw.c
+++ b/libavcodec/prores_raw.c
@@ -327,8 +327,8 @@ static int decode_frame(AVCodecContext *avctx,
                         AVFrame *frame, int *got_frame_ptr,
                         AVPacket *avpkt)
 {
-    int ret, dimensions_changed = 0;
     ProResRAWContext *s = avctx->priv_data;
+    int ret, dimensions_changed = 0, old_version = s->version;
     DECLARE_ALIGNED(32, uint8_t, qmat)[64];
     memset(qmat, 1, 64);
 
@@ -395,7 +395,8 @@ static int decode_frame(AVCodecContext *avctx,
     avctx->coded_height = FFALIGN(h, 16);
 
     enum AVPixelFormat pix_fmt = AV_PIX_FMT_BAYER_RGGB16;
-    if (pix_fmt != s->pix_fmt || dimensions_changed) {
+    if (pix_fmt != s->pix_fmt || dimensions_changed ||
+        s->version != old_version) {
         s->pix_fmt = pix_fmt;
 
         ret = get_pixel_format(avctx, pix_fmt);
@@ -525,6 +526,7 @@ static int update_thread_context(AVCodecContext *dst, const AVCodecContext *src)
     ProResRAWContext *rdst = dst->priv_data;
 
     rdst->pix_fmt = rsrc->pix_fmt;
+    rdst->version = rsrc->version;
 
     return 0;
 }
-- 
2.52.0


From 0ad9bca0983aff9d4a0e7f8b58b9313bb5aff745 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Sun, 23 Nov 2025 20:10:11 +0100
Subject: [PATCH 088/304] vulkan_prores: initialize only the necessary shaders
 on init

---
 libavcodec/vulkan_prores_raw.c | 41 ++++++++++++++++------------------
 1 file changed, 19 insertions(+), 22 deletions(-)

diff --git a/libavcodec/vulkan_prores_raw.c b/libavcodec/vulkan_prores_raw.c
index 7a1f97a640..78bbddc614 100644
--- a/libavcodec/vulkan_prores_raw.c
+++ b/libavcodec/vulkan_prores_raw.c
@@ -42,7 +42,7 @@ typedef struct ProResRAWVulkanDecodePicture {
 } ProResRAWVulkanDecodePicture;
 
 typedef struct ProResRAWVulkanDecodeContext {
-    FFVulkanShader decode[2];
+    FFVulkanShader decode;
 
     AVBufferPool *tile_data_pool;
 
@@ -187,7 +187,7 @@ static int vk_prores_raw_end_frame(AVCodecContext *avctx)
     });
     nb_img_bar = 0;
 
-    FFVulkanShader *decode_shader = &prv->decode[prr->version];
+    FFVulkanShader *decode_shader = &prv->decode;
     ff_vk_shader_update_img_array(&ctx->s, exec, decode_shader,
                                   prr->frame, vp->view.out,
                                   0, 0,
@@ -346,8 +346,7 @@ static void vk_decode_prores_raw_uninit(FFVulkanDecodeShared *ctx)
 {
     ProResRAWVulkanDecodeContext *fv = ctx->sd_ctx;
 
-    ff_vk_shader_free(&ctx->s, &fv->decode[0]);
-    ff_vk_shader_free(&ctx->s, &fv->decode[1]);
+    ff_vk_shader_free(&ctx->s, &fv->decode);
 
     ff_vk_free_buf(&ctx->s, &fv->uniform_buf);
 
@@ -359,8 +358,8 @@ static void vk_decode_prores_raw_uninit(FFVulkanDecodeShared *ctx)
 static int vk_decode_prores_raw_init(AVCodecContext *avctx)
 {
     int err;
-    ProResRAWContext *prr = avctx->priv_data;
     FFVulkanDecodeContext *dec = avctx->internal->hwaccel_priv_data;
+    ProResRAWContext *prr = avctx->priv_data;
 
     FFVkSPIRVCompiler *spv = ff_vk_spirv_init();
     if (!spv) {
@@ -382,8 +381,8 @@ static int vk_decode_prores_raw_init(AVCodecContext *avctx)
     ctx->sd_ctx_free = &vk_decode_prores_raw_uninit;
 
     /* Setup decode shader */
-    RET(init_decode_shader(prr, &ctx->s, &ctx->exec_pool, spv, &prv->decode[0], 0));
-    RET(init_decode_shader(prr, &ctx->s, &ctx->exec_pool, spv, &prv->decode[1], 1));
+    RET(init_decode_shader(prr, &ctx->s, &ctx->exec_pool, spv, &prv->decode,
+                           prr->version));
 
     /* Size in bytes of each codebook table */
     size_t cb_size[5] = {
@@ -447,24 +446,22 @@ static int vk_decode_prores_raw_init(AVCodecContext *avctx)
     RET(ff_vk_unmap_buffer(&ctx->s, &prv->uniform_buf, 1));
 
     /* Done; update descriptors */
-    for (int i = 0; i < 2; i++) {
+    RET(ff_vk_shader_update_desc_buffer(&ctx->s, &ctx->exec_pool.contexts[0],
+                                        &prv->decode, 1, 0, 0,
+                                        &prv->uniform_buf,
+                                        0, 64*sizeof(float),
+                                        VK_FORMAT_UNDEFINED));
+    RET(ff_vk_shader_update_desc_buffer(&ctx->s, &ctx->exec_pool.contexts[0],
+                                        &prv->decode, 1, 1, 0,
+                                        &prv->uniform_buf,
+                                        64*sizeof(float), 64*sizeof(uint8_t),
+                                        VK_FORMAT_UNDEFINED));
+    for (int j = 0; j < 4; j++)
         RET(ff_vk_shader_update_desc_buffer(&ctx->s, &ctx->exec_pool.contexts[0],
-                                            &prv->decode[i], 1, 0, 0,
+                                            &prv->decode, 1, 2 + j, 0,
                                             &prv->uniform_buf,
-                                            0, 64*sizeof(float),
+                                            cb_offset[j], cb_size[j],
                                             VK_FORMAT_UNDEFINED));
-        RET(ff_vk_shader_update_desc_buffer(&ctx->s, &ctx->exec_pool.contexts[0],
-                                            &prv->decode[i], 1, 1, 0,
-                                            &prv->uniform_buf,
-                                            64*sizeof(float), 64*sizeof(uint8_t),
-                                            VK_FORMAT_UNDEFINED));
-        for (int j = 0; j < 4; j++)
-            RET(ff_vk_shader_update_desc_buffer(&ctx->s, &ctx->exec_pool.contexts[0],
-                                                &prv->decode[i], 1, 2 + j, 0,
-                                                &prv->uniform_buf,
-                                                cb_offset[j], cb_size[j],
-                                                VK_FORMAT_UNDEFINED));
-    }
 
 fail:
     spv->uninit(&spv);
-- 
2.52.0


From aa0274b75c2e4b867475ede158ad6027e62db53e Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Wed, 12 Nov 2025 15:04:38 +0100
Subject: [PATCH 089/304] vulkan_prores_raw: split up decoding and DCT

This commit optimizes the Vulkan decoder by splitting up decoding
from iDCT, and merging the few tables needed directly into the shader.

The speedup on Intel is 10x.
---
 libavcodec/vulkan/Makefile               |   3 +-
 libavcodec/vulkan/prores_raw.comp        | 347 -----------------------
 libavcodec/vulkan/prores_raw_decode.comp | 232 +++++++++++++++
 libavcodec/vulkan/prores_raw_idct.comp   | 168 +++++++++++
 libavcodec/vulkan_prores_raw.c           | 299 +++++++++----------
 5 files changed, 534 insertions(+), 515 deletions(-)
 delete mode 100644 libavcodec/vulkan/prores_raw.comp
 create mode 100644 libavcodec/vulkan/prores_raw_decode.comp
 create mode 100644 libavcodec/vulkan/prores_raw_idct.comp

diff --git a/libavcodec/vulkan/Makefile b/libavcodec/vulkan/Makefile
index ec3015fee6..16a4116ef1 100644
--- a/libavcodec/vulkan/Makefile
+++ b/libavcodec/vulkan/Makefile
@@ -15,7 +15,8 @@ OBJS-$(CONFIG_FFV1_VULKAN_HWACCEL)  +=  vulkan/common.o \
 					vulkan/ffv1_dec_setup.o vulkan/ffv1_dec.o
 
 OBJS-$(CONFIG_PRORES_RAW_VULKAN_HWACCEL) += vulkan/common.o \
-                                            vulkan/prores_raw.o
+                                            vulkan/prores_raw_decode.o \
+                                            vulkan/prores_raw_idct.o
 
 OBJS-$(CONFIG_PRORES_VULKAN_HWACCEL) += vulkan/common.o \
                                         vulkan/prores_reset.o \
diff --git a/libavcodec/vulkan/prores_raw.comp b/libavcodec/vulkan/prores_raw.comp
deleted file mode 100644
index 89eece3c7e..0000000000
--- a/libavcodec/vulkan/prores_raw.comp
+++ /dev/null
@@ -1,347 +0,0 @@
-/*
- * ProRes RAW decoder
- *
- * Copyright (c) 2025 Lynne <dev@lynne.ee>
- *
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#define I16(x) (int16_t(x))
-
-#define COMP_ID (gl_LocalInvocationID.z)
-#define BLOCK_ID (gl_LocalInvocationID.y)
-#define ROW_ID (gl_LocalInvocationID.x)
-
-GetBitContext gb;
-shared float btemp[gl_WorkGroupSize.z][16][64] = { };
-shared float block[gl_WorkGroupSize.z][16][64];
-
-void idct8_horiz(const uint row_id)
-{
-    float t0, t1, t2, t3, t4, t5, t6, t7, u8;
-    float u0, u1, u2, u3, u4, u5, u6, u7;
-
-    /* Input */
-    t0 = block[COMP_ID][BLOCK_ID][8*row_id + 0];
-    u4 = block[COMP_ID][BLOCK_ID][8*row_id + 1];
-    t2 = block[COMP_ID][BLOCK_ID][8*row_id + 2];
-    u6 = block[COMP_ID][BLOCK_ID][8*row_id + 3];
-    t1 = block[COMP_ID][BLOCK_ID][8*row_id + 4];
-    u5 = block[COMP_ID][BLOCK_ID][8*row_id + 5];
-    t3 = block[COMP_ID][BLOCK_ID][8*row_id + 6];
-    u7 = block[COMP_ID][BLOCK_ID][8*row_id + 7];
-
-    /* Embedded scaled inverse 4-point Type-II DCT */
-    u0 = t0 + t1;
-    u1 = t0 - t1;
-    u3 = t2 + t3;
-    u2 = (t2 - t3)*(1.4142135623730950488016887242097f) - u3;
-    t0 = u0 + u3;
-    t3 = u0 - u3;
-    t1 = u1 + u2;
-    t2 = u1 - u2;
-
-    /* Embedded scaled inverse 4-point Type-IV DST */
-    t5 = u5 + u6;
-    t6 = u5 - u6;
-    t7 = u4 + u7;
-    t4 = u4 - u7;
-    u7 = t7 + t5;
-    u5 = (t7 - t5)*(1.4142135623730950488016887242097f);
-    u8 = (t4 + t6)*(1.8477590650225735122563663787936f);
-    u4 = u8 - t4*(1.0823922002923939687994464107328f);
-    u6 = u8 - t6*(2.6131259297527530557132863468544f);
-    t7 = u7;
-    t6 = t7 - u6;
-    t5 = t6 + u5;
-    t4 = t5 - u4;
-
-    /* Butterflies */
-    u0 = t0 + t7;
-    u7 = t0 - t7;
-    u6 = t1 + t6;
-    u1 = t1 - t6;
-    u2 = t2 + t5;
-    u5 = t2 - t5;
-    u4 = t3 + t4;
-    u3 = t3 - t4;
-
-    /* Output */
-    btemp[COMP_ID][BLOCK_ID][0*8 + row_id] = u0;
-    btemp[COMP_ID][BLOCK_ID][1*8 + row_id] = u1;
-    btemp[COMP_ID][BLOCK_ID][2*8 + row_id] = u2;
-    btemp[COMP_ID][BLOCK_ID][3*8 + row_id] = u3;
-    btemp[COMP_ID][BLOCK_ID][4*8 + row_id] = u4;
-    btemp[COMP_ID][BLOCK_ID][5*8 + row_id] = u5;
-    btemp[COMP_ID][BLOCK_ID][6*8 + row_id] = u6;
-    btemp[COMP_ID][BLOCK_ID][7*8 + row_id] = u7;
-}
-
-void idct8_vert(const uint row_id)
-{
-    float t0, t1, t2, t3, t4, t5, t6, t7, u8;
-    float u0, u1, u2, u3, u4, u5, u6, u7;
-
-    /* Input */
-    t0 = btemp[COMP_ID][BLOCK_ID][8*row_id + 0] + 0.5f; // NOTE
-    u4 = btemp[COMP_ID][BLOCK_ID][8*row_id + 1];
-    t2 = btemp[COMP_ID][BLOCK_ID][8*row_id + 2];
-    u6 = btemp[COMP_ID][BLOCK_ID][8*row_id + 3];
-    t1 = btemp[COMP_ID][BLOCK_ID][8*row_id + 4];
-    u5 = btemp[COMP_ID][BLOCK_ID][8*row_id + 5];
-    t3 = btemp[COMP_ID][BLOCK_ID][8*row_id + 6];
-    u7 = btemp[COMP_ID][BLOCK_ID][8*row_id + 7];
-
-    /* Embedded scaled inverse 4-point Type-II DCT */
-    u0 = t0 + t1;
-    u1 = t0 - t1;
-    u3 = t2 + t3;
-    u2 = (t2 - t3)*(1.4142135623730950488016887242097f) - u3;
-    t0 = u0 + u3;
-    t3 = u0 - u3;
-    t1 = u1 + u2;
-    t2 = u1 - u2;
-
-    /* Embedded scaled inverse 4-point Type-IV DST */
-    t5 = u5 + u6;
-    t6 = u5 - u6;
-    t7 = u4 + u7;
-    t4 = u4 - u7;
-    u7 = t7 + t5;
-    u5 = (t7 - t5)*(1.4142135623730950488016887242097f);
-    u8 = (t4 + t6)*(1.8477590650225735122563663787936f);
-    u4 = u8 - t4*(1.0823922002923939687994464107328f);
-    u6 = u8 - t6*(2.6131259297527530557132863468544f);
-    t7 = u7;
-    t6 = t7 - u6;
-    t5 = t6 + u5;
-    t4 = t5 - u4;
-
-    /* Butterflies */
-    u0 = t0 + t7;
-    u7 = t0 - t7;
-    u6 = t1 + t6;
-    u1 = t1 - t6;
-    u2 = t2 + t5;
-    u5 = t2 - t5;
-    u4 = t3 + t4;
-    u3 = t3 - t4;
-
-    /* Output */
-    block[COMP_ID][BLOCK_ID][0*8 + row_id] = u0;
-    block[COMP_ID][BLOCK_ID][1*8 + row_id] = u1;
-    block[COMP_ID][BLOCK_ID][2*8 + row_id] = u2;
-    block[COMP_ID][BLOCK_ID][3*8 + row_id] = u3;
-    block[COMP_ID][BLOCK_ID][4*8 + row_id] = u4;
-    block[COMP_ID][BLOCK_ID][5*8 + row_id] = u5;
-    block[COMP_ID][BLOCK_ID][6*8 + row_id] = u6;
-    block[COMP_ID][BLOCK_ID][7*8 + row_id] = u7;
-}
-
-int16_t get_value(int16_t codebook)
-{
-    const int16_t switch_bits = codebook >> 8;
-    const int16_t rice_order  = codebook & I16(0xf);
-    const int16_t exp_order   = (codebook >> 4) & I16(0xf);
-
-    uint32_t b = show_bits(gb, 32);
-    if (expectEXT(b == 0, false))
-        return I16(0);
-    int16_t q = I16(31) - I16(findMSB(b));
-
-    if ((b & 0x80000000) != 0) {
-        skip_bits(gb, 1 + rice_order);
-        return I16((b & 0x7FFFFFFF) >> (31 - rice_order));
-    }
-
-    if (q <= switch_bits) {
-        skip_bits(gb, q + rice_order + 1);
-        return I16((q << rice_order) +
-                   (((b << (q + 1)) >> 1) >> (31 - rice_order)));
-    }
-
-    int16_t bits = exp_order + (q << 1) - switch_bits;
-    skip_bits(gb, bits);
-    return I16((b >> (32 - bits)) +
-               ((switch_bits + 1) << rice_order) -
-               (1 << exp_order));
-}
-
-#define TODCCODEBOOK(x) ((x + 1) >> 1)
-
-void read_dc_vals(const uint nb_blocks)
-{
-    int16_t dc, dc_add;
-    int16_t prev_dc = I16(0), sign = I16(0);
-
-    /* Special handling for first block */
-    dc = get_value(I16(700));
-    prev_dc = (dc >> 1) ^ -(dc & I16(1));
-    btemp[COMP_ID][0][0] = prev_dc;
-
-    for (uint n = 1; n < nb_blocks; n++) {
-        if (expectEXT(left_bits(gb) <= 0, false))
-            break;
-
-        uint8_t dc_codebook;
-        if ((n & 15) == 1)
-            dc_codebook = uint8_t(100);
-        else
-            dc_codebook = dc_cb[min(TODCCODEBOOK(dc), 13 - 1)];
-
-        dc = get_value(dc_codebook);
-
-        sign = sign ^ dc & int16_t(1);
-        dc_add = (-sign ^ I16(TODCCODEBOOK(dc))) + sign;
-        sign = I16(dc_add < 0);
-        prev_dc += dc_add;
-
-        btemp[COMP_ID][n][0] = prev_dc;
-    }
-}
-
-void read_ac_vals(const uint nb_blocks)
-{
-    const uint nb_codes = nb_blocks << 6;
-    const uint log2_nb_blocks = findMSB(nb_blocks);
-    const uint block_mask = (1 << log2_nb_blocks) - 1;
-
-    int16_t ac, rn, ln;
-    int16_t ac_codebook = I16(49);
-    int16_t rn_codebook = I16( 0);
-    int16_t ln_codebook = I16(66);
-    int16_t sign;
-    int16_t val;
-
-    for (uint n = nb_blocks; n <= nb_codes;) {
-        if (expectEXT(left_bits(gb) <= 0, false))
-            break;
-
-        ln = get_value(ln_codebook);
-        for (uint i = 0; i < ln; i++) {
-            if (expectEXT(left_bits(gb) <= 0, false))
-                break;
-
-            if (expectEXT(n >= nb_codes, false))
-                break;
-
-            ac = get_value(ac_codebook);
-            ac_codebook = ac_cb[min(ac, 95 - 1)];
-            sign = -int16_t(get_bit(gb));
-
-            val = ((ac + I16(1)) ^ sign) - sign;
-            btemp[COMP_ID][n & block_mask][n >> log2_nb_blocks] = val;
-
-            n++;
-        }
-
-        if (expectEXT(n >= nb_codes, false))
-            break;
-
-        rn = get_value(rn_codebook);
-        rn_codebook = rn_cb[min(rn, 28 - 1)];
-
-        n += rn + 1;
-        if (expectEXT(n >= nb_codes, false))
-            break;
-
-        if (expectEXT(left_bits(gb) <= 0, false))
-            break;
-
-        ac = get_value(ac_codebook);
-        sign = -int16_t(get_bit(gb));
-
-        val = ((ac + I16(1)) ^ sign) - sign;
-        btemp[COMP_ID][n & block_mask][n >> log2_nb_blocks] = val;
-
-        ac_codebook = ac_cb[min(ac, 95 - 1)];
-        ln_codebook = ln_cb[min(ac, 15 - 1)];
-
-        n++;
-    }
-}
-
-void main(void)
-{
-    const uint tile_idx = gl_WorkGroupID.y*gl_NumWorkGroups.x + gl_WorkGroupID.x;
-    TileData td = tile_data[tile_idx];
-
-    if (expectEXT(td.pos.x >= frame_size.x, false))
-        return;
-
-    uint64_t pkt_offset = uint64_t(pkt_data) + td.offset;
-    u8vec2buf hdr_data = u8vec2buf(pkt_offset);
-    float qscale = float(pack16(hdr_data[0].v.yx)) / 2.0f;
-
-    ivec4 size = ivec4(td.size,
-                       pack16(hdr_data[2].v.yx),
-                       pack16(hdr_data[1].v.yx),
-                       pack16(hdr_data[3].v.yx));
-    size[0] = size[0] - size[1] - size[2] - size[3] - 8;
-    if (expectEXT(size[0] < 0, false))
-        return;
-
-    const ivec2 offs = td.pos + ivec2(COMP_ID & 1, COMP_ID >> 1);
-    const uint w = min(tile_size.x, frame_size.x - td.pos.x) / 2;
-    const uint nb_blocks = w / 8;
-
-    const ivec4 comp_offset = ivec4(size[2] + size[1] + size[3],
-                                    size[2],
-                                    0,
-                                    size[2] + size[1]);
-
-    if (BLOCK_ID == 0 && ROW_ID == 0) {
-        init_get_bits(gb, u8buf(pkt_offset + 8 + comp_offset[COMP_ID]),
-                      size[COMP_ID]);
-        read_dc_vals(nb_blocks);
-        read_ac_vals(nb_blocks);
-    }
-
-    barrier();
-
-    [[unroll]]
-    for (uint i = gl_LocalInvocationID.x; i < 64; i += gl_WorkGroupSize.x)
-        block[COMP_ID][BLOCK_ID][i] = (btemp[COMP_ID][BLOCK_ID][scan[i]] / 16384.0) *
-                                      (float(qmat[i]) / 295.0) *
-                                      idct_8x8_scales[i] * qscale;
-
-    barrier();
-
-#ifdef PARALLEL_ROWS
-    idct8_horiz(ROW_ID);
-
-    barrier();
-
-    idct8_vert(ROW_ID);
-#else
-    for (uint j = 0; j < 8; j++)
-        idct8_horiz(j);
-
-    barrier();
-
-    for (uint j = 0; j < 8; j++)
-        idct8_vert(j);
-#endif
-
-    barrier();
-
-    [[unroll]]
-    for (uint i = gl_LocalInvocationID.x; i < 64; i += gl_WorkGroupSize.x)
-         imageStore(dst,
-                    offs + 2*ivec2(BLOCK_ID*8 + (i & 7), i >> 3),
-                    vec4(block[COMP_ID][BLOCK_ID][i]));
-}
diff --git a/libavcodec/vulkan/prores_raw_decode.comp b/libavcodec/vulkan/prores_raw_decode.comp
new file mode 100644
index 0000000000..08447e3961
--- /dev/null
+++ b/libavcodec/vulkan/prores_raw_decode.comp
@@ -0,0 +1,232 @@
+/*
+ * ProRes RAW decoder
+ *
+ * Copyright (c) 2025 Lynne <dev@lynne.ee>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#define U8(x) (uint8_t(x))
+#define I16(x) (int16_t(x))
+
+#define COMP_ID (gl_LocalInvocationID.x)
+
+GetBitContext gb;
+
+#define DC_CB_MAX 12
+const uint8_t dc_cb[DC_CB_MAX + 1] = {
+    U8(16), U8(33), U8(50), U8(51), U8(51), U8(51),
+    U8(68), U8(68), U8(68), U8(68), U8(68), U8(68), U8(118)
+};
+
+#define AC_CB_MAX 94
+const int16_t ac_cb[AC_CB_MAX + 1] = {
+    I16(  0), I16(529), I16(273), I16(273), I16(546), I16(546),
+    I16(546), I16(290), I16(290), I16(290), I16(563), I16(563),
+    I16(563), I16(563), I16(563), I16(563), I16(563), I16(563),
+    I16(307), I16(307), I16(580), I16(580), I16(580), I16(580),
+    I16(580), I16(580), I16(580), I16(580), I16(580), I16(580),
+    I16(580), I16(580), I16(580), I16(580), I16(580), I16(580),
+    I16(580), I16(580), I16(580), I16(580), I16(580), I16(580),
+    I16(853), I16(853), I16(853), I16(853), I16(853), I16(853),
+    I16(853), I16(853), I16(853), I16(853), I16(853), I16(853),
+    I16(853), I16(853), I16(853), I16(853), I16(853), I16(853),
+    I16(853), I16(853), I16(853), I16(853), I16(853), I16(853),
+    I16(853), I16(853), I16(853), I16(853), I16(853), I16(853),
+    I16(853), I16(853), I16(853), I16(853), I16(853), I16(853),
+    I16(853), I16(853), I16(853), I16(853), I16(853), I16(853),
+    I16(853), I16(853), I16(853), I16(853), I16(853), I16(853),
+    I16(853), I16(853), I16(853), I16(853), I16(358)
+};
+
+#define RN_CB_MAX 27
+const int16_t rn_cb[RN_CB_MAX + 1] = {
+    I16(512), I16(256), I16(  0), I16(  0), I16(529), I16(529), I16(273),
+    I16(273), I16( 17), I16( 17), I16( 33), I16( 33), I16(546), I16( 34),
+    I16( 34), I16( 34), I16( 34), I16( 34), I16( 34), I16( 34), I16( 34),
+    I16( 34), I16( 34), I16( 34), I16( 34), I16( 50), I16( 50), I16( 68),
+};
+
+#define LN_CB_MAX 14
+const int16_t ln_cb[LN_CB_MAX + 1] = {
+    I16( 256), I16( 273), I16( 546), I16( 546), I16( 290), I16( 290), I16( 1075),
+    I16(1075), I16( 563), I16( 563), I16( 563), I16( 563), I16( 563), I16( 563),
+    I16( 51)
+};
+
+int16_t get_value(int16_t codebook)
+{
+    const int16_t switch_bits = codebook >> 8;
+    const int16_t rice_order  = codebook & I16(0xf);
+    const int16_t exp_order   = (codebook >> 4) & I16(0xf);
+
+    uint32_t b = show_bits(gb, 32);
+    if (expectEXT(b == 0, false))
+        return I16(0);
+    int16_t q = I16(31) - I16(findMSB(b));
+
+    if ((b & 0x80000000) != 0) {
+        skip_bits(gb, 1 + rice_order);
+        return I16((b & 0x7FFFFFFF) >> (31 - rice_order));
+    }
+
+    if (q <= switch_bits) {
+        skip_bits(gb, q + rice_order + 1);
+        return I16((q << rice_order) +
+                   (((b << (q + 1)) >> 1) >> (31 - rice_order)));
+    }
+
+    int16_t bits = exp_order + (q << 1) - switch_bits;
+    skip_bits(gb, bits);
+    return I16((b >> (32 - bits)) +
+               ((switch_bits + 1) << rice_order) -
+               (1 << exp_order));
+}
+
+#define TODCCODEBOOK(x) ((x + 1) >> 1)
+
+void store_val(ivec2 offs, int blk, int c, int16_t v)
+{
+    imageStore(dst, offs + 2*ivec2(blk*8 + (c & 7), c >> 3), ivec4(v));
+}
+
+void read_dc_vals(ivec2 offs, int nb_blocks)
+{
+    int16_t dc, dc_add;
+    int16_t prev_dc = I16(0), sign = I16(0);
+
+    /* Special handling for first block */
+    dc = get_value(I16(700));
+    prev_dc = (dc >> 1) ^ -(dc & I16(1));
+    store_val(offs, 0, 0, prev_dc);
+
+    for (int n = 1; n < nb_blocks; n++) {
+        if (expectEXT(left_bits(gb) <= 0, false))
+            break;
+
+        uint8_t dc_codebook;
+        if ((n & 15) == 1)
+            dc_codebook = uint8_t(100);
+        else
+            dc_codebook = dc_cb[min(TODCCODEBOOK(dc), 13 - 1)];
+
+        dc = get_value(dc_codebook);
+
+        sign = sign ^ dc & int16_t(1);
+        dc_add = (-sign ^ I16(TODCCODEBOOK(dc))) + sign;
+        sign = I16(dc_add < 0);
+        prev_dc += dc_add;
+
+        store_val(offs, n, 0, prev_dc);
+    }
+}
+
+void read_ac_vals(ivec2 offs, int nb_blocks)
+{
+    const int nb_codes = nb_blocks << 6;
+    const int log2_nb_blocks = findMSB(nb_blocks);
+    const int block_mask = (1 << log2_nb_blocks) - 1;
+
+    int16_t ac, rn, ln;
+    int16_t ac_codebook = I16(49);
+    int16_t rn_codebook = I16( 0);
+    int16_t ln_codebook = I16(66);
+    int16_t sign;
+    int16_t val;
+
+    for (int n = nb_blocks; n <= nb_codes;) {
+        if (expectEXT(left_bits(gb) <= 0, false))
+            break;
+
+        ln = get_value(ln_codebook);
+        for (int i = 0; i < ln; i++) {
+            if (expectEXT(left_bits(gb) <= 0, false))
+                break;
+
+            if (expectEXT(n >= nb_codes, false))
+                break;
+
+            ac = get_value(ac_codebook);
+            ac_codebook = ac_cb[min(ac, 95 - 1)];
+            sign = -int16_t(get_bit(gb));
+
+            val = ((ac + I16(1)) ^ sign) - sign;
+            store_val(offs, n & block_mask, n >> log2_nb_blocks, val);
+
+            n++;
+        }
+
+        if (expectEXT(n >= nb_codes, false))
+            break;
+
+        rn = get_value(rn_codebook);
+        rn_codebook = rn_cb[min(rn, 28 - 1)];
+
+        n += rn + 1;
+        if (expectEXT(n >= nb_codes, false))
+            break;
+
+        if (expectEXT(left_bits(gb) <= 0, false))
+            break;
+
+        ac = get_value(ac_codebook);
+        sign = -int16_t(get_bit(gb));
+
+        val = ((ac + I16(1)) ^ sign) - sign;
+        store_val(offs, n & block_mask, n >> log2_nb_blocks, val);
+
+        ac_codebook = ac_cb[min(ac, 95 - 1)];
+        ln_codebook = ln_cb[min(ac, 15 - 1)];
+
+        n++;
+    }
+}
+
+void main(void)
+{
+    const uint tile_idx = gl_WorkGroupID.y*gl_NumWorkGroups.x + gl_WorkGroupID.x;
+    TileData td = tile_data[tile_idx];
+
+    if (expectEXT(td.pos.x >= frame_size.x, false))
+        return;
+
+    uint64_t pkt_offset = uint64_t(pkt_data) + td.offset;
+    u8vec2buf hdr_data = u8vec2buf(pkt_offset);
+
+    ivec4 size = ivec4(td.size,
+                       pack16(hdr_data[2].v.yx),
+                       pack16(hdr_data[1].v.yx),
+                       pack16(hdr_data[3].v.yx));
+    size[0] = size[0] - size[1] - size[2] - size[3] - 8;
+    if (expectEXT(size[0] < 0, false))
+        return;
+
+    const ivec2 offs = td.pos + ivec2(COMP_ID & 1, COMP_ID >> 1);
+    const int w = min(tile_size.x, frame_size.x - td.pos.x) / 2;
+    const int nb_blocks = w / 8;
+
+    const ivec4 comp_offset = ivec4(size[2] + size[1] + size[3],
+                                    size[2],
+                                    0,
+                                    size[2] + size[1]);
+
+    init_get_bits(gb, u8buf(pkt_offset + 8 + comp_offset[COMP_ID]),
+                  size[COMP_ID]);
+
+    read_dc_vals(offs, nb_blocks);
+    read_ac_vals(offs, nb_blocks);
+}
diff --git a/libavcodec/vulkan/prores_raw_idct.comp b/libavcodec/vulkan/prores_raw_idct.comp
new file mode 100644
index 0000000000..e692792301
--- /dev/null
+++ b/libavcodec/vulkan/prores_raw_idct.comp
@@ -0,0 +1,168 @@
+/*
+ * ProRes RAW decoder
+ *
+ * Copyright (c) 2025 Lynne <dev@lynne.ee>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#define COMP_ID (gl_LocalInvocationID.z)
+#define BLOCK_ID (gl_LocalInvocationID.y)
+#define ROW_ID (gl_LocalInvocationID.x)
+
+shared float blocks[16][4*64];
+
+const ivec2 scan[64] = {
+    ivec2( 0,  0), ivec2( 4,  0), ivec2( 0,  2), ivec2( 4,  2),
+    ivec2( 0,  8), ivec2( 4,  8), ivec2( 6,  8), ivec2( 2, 10),
+    ivec2( 2,  0), ivec2( 6,  0), ivec2( 2,  2), ivec2( 6,  2),
+    ivec2( 2,  8), ivec2( 8,  8), ivec2( 0, 10), ivec2( 4, 10),
+    ivec2( 8,  0), ivec2(12,  0), ivec2( 8,  2), ivec2(12,  2),
+    ivec2(10,  8), ivec2(14,  8), ivec2( 6, 10), ivec2( 2, 12),
+    ivec2(10,  0), ivec2(14,  0), ivec2(10,  2), ivec2(14,  2),
+    ivec2(12,  8), ivec2( 8, 10), ivec2( 0, 12), ivec2( 4, 12),
+    ivec2( 0,  4), ivec2( 4,  4), ivec2( 6,  4), ivec2( 2,  6),
+    ivec2(10, 10), ivec2(14, 10), ivec2( 6, 12), ivec2( 2, 14),
+    ivec2( 2,  4), ivec2( 8,  4), ivec2( 0,  6), ivec2( 4,  6),
+    ivec2(12, 10), ivec2( 8, 12), ivec2( 0, 14), ivec2( 4, 14),
+    ivec2(10,  4), ivec2(14,  4), ivec2( 6,  6), ivec2(12,  6),
+    ivec2(10, 12), ivec2(14, 12), ivec2( 6, 14), ivec2(12, 14),
+    ivec2(12,  4), ivec2( 8,  6), ivec2(10,  6), ivec2(14,  6),
+    ivec2(12, 12), ivec2( 8, 14), ivec2(10, 14), ivec2(14, 14),
+};
+
+const float idct_scale[64] = {
+    0.1250000000000000, 0.1733799806652684, 0.1633203706095471, 0.1469844503024199,
+    0.1250000000000000, 0.0982118697983878, 0.0676495125182746, 0.0344874224103679,
+    0.1733799806652684, 0.2404849415639108, 0.2265318615882219, 0.2038732892122293,
+    0.1733799806652684, 0.1362237766939547, 0.0938325693794663, 0.0478354290456362,
+    0.1633203706095471, 0.2265318615882219, 0.2133883476483184, 0.1920444391778541,
+    0.1633203706095471, 0.1283199917898342, 0.0883883476483185, 0.0450599888754343,
+    0.1469844503024199, 0.2038732892122293, 0.1920444391778541, 0.1728354290456362,
+    0.1469844503024199, 0.1154849415639109, 0.0795474112858021, 0.0405529186026822,
+    0.1250000000000000, 0.1733799806652684, 0.1633203706095471, 0.1469844503024199,
+    0.1250000000000000, 0.0982118697983878, 0.0676495125182746, 0.0344874224103679,
+    0.0982118697983878, 0.1362237766939547, 0.1283199917898342, 0.1154849415639109,
+    0.0982118697983878, 0.0771645709543638, 0.0531518809229535, 0.0270965939155924,
+    0.0676495125182746, 0.0938325693794663, 0.0883883476483185, 0.0795474112858021,
+    0.0676495125182746, 0.0531518809229535, 0.0366116523516816, 0.0186644585125857,
+    0.0344874224103679, 0.0478354290456362, 0.0450599888754343, 0.0405529186026822,
+    0.0344874224103679, 0.0270965939155924, 0.0186644585125857, 0.0095150584360892,
+};
+
+void idct8(uint block, uint offset, uint stride)
+{
+    float t0, t1, t2, t3, t4, t5, t6, t7, u8;
+    float u0, u1, u2, u3, u4, u5, u6, u7;
+
+    /* Input */
+    t0 = blocks[block][0*stride + offset];
+    u4 = blocks[block][1*stride + offset];
+    t2 = blocks[block][2*stride + offset];
+    u6 = blocks[block][3*stride + offset];
+    t1 = blocks[block][4*stride + offset];
+    u5 = blocks[block][5*stride + offset];
+    t3 = blocks[block][6*stride + offset];
+    u7 = blocks[block][7*stride + offset];
+
+    /* Embedded scaled inverse 4-point Type-II DCT */
+    u0 = t0 + t1;
+    u1 = t0 - t1;
+    u3 = t2 + t3;
+    u2 = (t2 - t3)*(1.4142135623730950488016887242097f) - u3;
+    t0 = u0 + u3;
+    t3 = u0 - u3;
+    t1 = u1 + u2;
+    t2 = u1 - u2;
+
+    /* Embedded scaled inverse 4-point Type-IV DST */
+    t5 = u5 + u6;
+    t6 = u5 - u6;
+    t7 = u4 + u7;
+    t4 = u4 - u7;
+    u7 = t7 + t5;
+    u5 = (t7 - t5)*(1.4142135623730950488016887242097f);
+    u8 = (t4 + t6)*(1.8477590650225735122563663787936f);
+    u4 = u8 - t4*(1.0823922002923939687994464107328f);
+    u6 = u8 - t6*(2.6131259297527530557132863468544f);
+    t7 = u7;
+    t6 = t7 - u6;
+    t5 = t6 + u5;
+    t4 = t5 - u4;
+
+    /* Butterflies */
+    u0 = t0 + t7;
+    u7 = t0 - t7;
+    u6 = t1 + t6;
+    u1 = t1 - t6;
+    u2 = t2 + t5;
+    u5 = t2 - t5;
+    u4 = t3 + t4;
+    u3 = t3 - t4;
+
+    /* Output */
+    blocks[block][0*stride + offset] = u0;
+    blocks[block][1*stride + offset] = u1;
+    blocks[block][2*stride + offset] = u2;
+    blocks[block][3*stride + offset] = u3;
+    blocks[block][4*stride + offset] = u4;
+    blocks[block][5*stride + offset] = u5;
+    blocks[block][6*stride + offset] = u6;
+    blocks[block][7*stride + offset] = u7;
+}
+
+void main(void)
+{
+    const uint tile_idx = gl_WorkGroupID.y*gl_NumWorkGroups.x + gl_WorkGroupID.x;
+    TileData td = tile_data[tile_idx];
+
+    if (expectEXT(td.pos.x >= frame_size.x, false))
+        return;
+
+    uint64_t pkt_offset = uint64_t(pkt_data) + td.offset;
+    u8vec2buf hdr_data = u8vec2buf(pkt_offset);
+    float qscale = float(pack16(hdr_data[0].v.yx)) / 2.0f;
+
+    const ivec2 offs = td.pos + ivec2(COMP_ID & 1, COMP_ID >> 1);
+    const uint w = min(tile_size.x, frame_size.x - td.pos.x) / 2;
+    const uint nb_blocks = w / 8;
+
+    [[unroll]]
+    for (uint i = gl_LocalInvocationID.x; i < 64; i += gl_WorkGroupSize.x) {
+        int16_t v = int16_t(imageLoad(dst, offs + 2*ivec2(BLOCK_ID*8, 0) + scan[i])[0]);
+        blocks[BLOCK_ID][COMP_ID*64 + i] = (v / 16384.0) *
+                                           (float(qmat[i]) / 295.0) *
+                                           idct_scale[i] * qscale;
+    }
+
+    barrier();
+    idct8(BLOCK_ID, COMP_ID*64 + ROW_ID*8, 1);
+
+    blocks[BLOCK_ID][COMP_ID*64 + ROW_ID] += 0.5;
+
+    barrier();
+    idct8(BLOCK_ID, COMP_ID*64 + ROW_ID, 8);
+
+    barrier();
+    [[unroll]]
+    for (uint i = gl_LocalInvocationID.x; i < 64; i += gl_WorkGroupSize.x) {
+        int v = int(round(blocks[BLOCK_ID][COMP_ID*64 + i]*16384.0));
+        imageStore(dst,
+                   offs + 2*ivec2(BLOCK_ID*8 + (i & 7), i >> 3),
+                   ivec4(v));
+    }
+}
diff --git a/libavcodec/vulkan_prores_raw.c b/libavcodec/vulkan_prores_raw.c
index 78bbddc614..3b6d617eec 100644
--- a/libavcodec/vulkan_prores_raw.c
+++ b/libavcodec/vulkan_prores_raw.c
@@ -26,7 +26,8 @@
 #include "libavutil/mem.h"
 
 extern const char *ff_source_common_comp;
-extern const char *ff_source_prores_raw_comp;
+extern const char *ff_source_prores_raw_decode_comp;
+extern const char *ff_source_prores_raw_idct_comp;
 
 const FFVulkanDecodeDescriptor ff_vk_dec_prores_raw_desc = {
     .codec_id         = AV_CODEC_ID_PRORES_RAW,
@@ -43,17 +44,16 @@ typedef struct ProResRAWVulkanDecodePicture {
 
 typedef struct ProResRAWVulkanDecodeContext {
     FFVulkanShader decode;
+    FFVulkanShader idct;
 
     AVBufferPool *tile_data_pool;
-
-    FFVkBuffer uniform_buf;
 } ProResRAWVulkanDecodeContext;
 
 typedef struct DecodePushData {
     VkDeviceAddress tile_data;
     VkDeviceAddress pkt_data;
-    uint32_t frame_size[2];
-    uint32_t tile_size[2];
+    int32_t frame_size[2];
+    int32_t tile_size[2];
     uint8_t  qmat[64];
 } DecodePushData;
 
@@ -97,7 +97,7 @@ static int vk_prores_raw_start_frame(AVCodecContext          *avctx,
 
     /* Prepare frame to be used */
     err = ff_vk_decode_prepare_frame_sdr(dec, prr->frame, vp, 1,
-                                         FF_VK_REP_FLOAT, 0);
+                                         FF_VK_REP_INT, 0);
     if (err < 0)
         return err;
 
@@ -173,9 +173,13 @@ static int vk_prores_raw_end_frame(AVCodecContext *avctx)
     RET(ff_vk_exec_add_dep_buf(&ctx->s, exec, &vp->slices_buf, 1, 0));
     vp->slices_buf = NULL;
 
+    AVVkFrame *vkf = (AVVkFrame *)prr->frame->data[0];
+    vkf->layout[0] = VK_IMAGE_LAYOUT_UNDEFINED;
+    vkf->access[0] = VK_ACCESS_2_NONE;
+
     ff_vk_frame_barrier(&ctx->s, exec, prr->frame, img_bar, &nb_img_bar,
                         VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
-                        VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
+                        VK_PIPELINE_STAGE_2_CLEAR_BIT,
                         VK_ACCESS_2_TRANSFER_WRITE_BIT,
                         VK_IMAGE_LAYOUT_GENERAL,
                         VK_QUEUE_FAMILY_IGNORED);
@@ -187,6 +191,29 @@ static int vk_prores_raw_end_frame(AVCodecContext *avctx)
     });
     nb_img_bar = 0;
 
+    vk->CmdClearColorImage(exec->buf, vkf->img[0],
+                           VK_IMAGE_LAYOUT_GENERAL,
+                           &((VkClearColorValue) { 0 }),
+                           1, &((VkImageSubresourceRange) {
+                               .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
+                               .levelCount = 1,
+                               .layerCount = 1,
+                           }));
+
+    ff_vk_frame_barrier(&ctx->s, exec, prr->frame, img_bar, &nb_img_bar,
+                        VK_PIPELINE_STAGE_2_CLEAR_BIT,
+                        VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT,
+                        VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT,
+                        VK_IMAGE_LAYOUT_GENERAL,
+                        VK_QUEUE_FAMILY_IGNORED);
+
+    vk->CmdPipelineBarrier2(exec->buf, &(VkDependencyInfo) {
+        .sType = VK_STRUCTURE_TYPE_DEPENDENCY_INFO,
+        .pImageMemoryBarriers = img_bar,
+        .imageMemoryBarrierCount = nb_img_bar,
+    });
+    nb_img_bar = 0;
+
     FFVulkanShader *decode_shader = &prv->decode;
     ff_vk_shader_update_img_array(&ctx->s, exec, decode_shader,
                                   prr->frame, vp->view.out,
@@ -212,6 +239,34 @@ static int vk_prores_raw_end_frame(AVCodecContext *avctx)
 
     vk->CmdDispatch(exec->buf, prr->nb_tw, prr->nb_th, 1);
 
+    ff_vk_frame_barrier(&ctx->s, exec, prr->frame, img_bar, &nb_img_bar,
+                        VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT,
+                        VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT,
+                        VK_ACCESS_2_SHADER_STORAGE_READ_BIT |
+                        VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT,
+                        VK_IMAGE_LAYOUT_GENERAL,
+                        VK_QUEUE_FAMILY_IGNORED);
+
+    FFVulkanShader *idct_shader = &prv->idct;
+    ff_vk_shader_update_img_array(&ctx->s, exec, idct_shader,
+                                  prr->frame, vp->view.out,
+                                  0, 0,
+                                  VK_IMAGE_LAYOUT_GENERAL,
+                                  VK_NULL_HANDLE);
+    ff_vk_exec_bind_shader(&ctx->s, exec, idct_shader);
+    ff_vk_shader_update_push_const(&ctx->s, exec, idct_shader,
+                                   VK_SHADER_STAGE_COMPUTE_BIT,
+                                   0, sizeof(pd_decode), &pd_decode);
+
+    vk->CmdPipelineBarrier2(exec->buf, &(VkDependencyInfo) {
+        .sType = VK_STRUCTURE_TYPE_DEPENDENCY_INFO,
+        .pImageMemoryBarriers = img_bar,
+        .imageMemoryBarrierCount = nb_img_bar,
+    });
+    nb_img_bar = 0;
+
+    vk->CmdDispatch(exec->buf, prr->nb_tw, prr->nb_th, 1);
+
     err = ff_vk_exec_submit(&ctx->s, exec);
     if (err < 0)
         return err;
@@ -220,34 +275,11 @@ fail:
     return 0;
 }
 
-static int init_decode_shader(ProResRAWContext *prr, FFVulkanContext *s,
-                              FFVkExecPool *pool, FFVkSPIRVCompiler *spv,
-                              FFVulkanShader *shd, int version)
+static int add_common_data(AVCodecContext *avctx, FFVulkanContext *s,
+                           FFVulkanShader *shd, int writeonly)
 {
-    int err;
-    FFVulkanDescriptorSetBinding *desc_set;
-    int parallel_rows = 1;
-
-    uint8_t *spv_data;
-    size_t spv_len;
-    void *spv_opaque = NULL;
-
-    if (s->props.properties.limits.maxComputeWorkGroupInvocations < 512 ||
-        s->props.properties.deviceType == VK_PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU)
-        parallel_rows = 0;
-
-    RET(ff_vk_shader_init(s, shd, "prores_raw",
-                          VK_SHADER_STAGE_COMPUTE_BIT,
-                          (const char *[]) { "GL_EXT_buffer_reference",
-                                             "GL_EXT_buffer_reference2",
-                                             "GL_EXT_null_initializer" }, 3,
-                          parallel_rows ? 8 : 1 /* 8x8 transforms, 8-point width */,
-                          version == 0 ? 8 : 16 /* Horizontal blocks */,
-                          4 /* Components */,
-                          0));
-
-    if (parallel_rows)
-        GLSLC(0, #define PARALLEL_ROWS                                               );
+    AVHWFramesContext *dec_frames_ctx;
+    dec_frames_ctx = (AVHWFramesContext *)avctx->hw_frames_ctx->data;
 
     /* Common codec header */
     GLSLD(ff_source_common_comp);
@@ -261,73 +293,84 @@ static int init_decode_shader(ProResRAWContext *prr, FFVulkanContext *s,
     GLSLC(0, layout(push_constant, scalar) uniform pushConstants {                   );
     GLSLC(1,    TileData tile_data;                                                  );
     GLSLC(1,    u8buf pkt_data;                                                      );
-    GLSLC(1,    uvec2 frame_size;                                                    );
-    GLSLC(1,    uvec2 tile_size;                                                     );
+    GLSLC(1,    ivec2 frame_size;                                                    );
+    GLSLC(1,    ivec2 tile_size;                                                     );
     GLSLC(1,    uint8_t qmat[64];                                                    );
     GLSLC(0, };                                                                      );
     GLSLC(0,                                                                         );
     ff_vk_shader_add_push_const(shd, 0, sizeof(DecodePushData),
                                 VK_SHADER_STAGE_COMPUTE_BIT);
 
+    FFVulkanDescriptorSetBinding *desc_set;
     desc_set = (FFVulkanDescriptorSetBinding []) {
         {
             .name       = "dst",
             .type       = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
-            .mem_layout = "r16",
-            .mem_quali  = "writeonly",
+            .mem_layout = ff_vk_shader_rep_fmt(dec_frames_ctx->sw_format,
+                                               FF_VK_REP_INT),
+            .mem_quali  = writeonly ? "writeonly" : NULL,
             .dimensions = 2,
             .stages     = VK_SHADER_STAGE_COMPUTE_BIT,
         },
     };
-    RET(ff_vk_shader_add_descriptor_set(s, shd, desc_set, 1, 0, 0));
 
-    desc_set = (FFVulkanDescriptorSetBinding []) {
-        {
-            .name        = "dct_scale_buf",
-            .type        = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
-            .stages      = VK_SHADER_STAGE_COMPUTE_BIT,
-            .mem_layout  = "scalar",
-            .buf_content = "float idct_8x8_scales[64];",
-        },
-        {
-            .name        = "scan_buf",
-            .type        = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
-            .stages      = VK_SHADER_STAGE_COMPUTE_BIT,
-            .mem_layout  = "scalar",
-            .buf_content = "uint8_t scan[64];",
-        },
-        {
-            .name        = "dc_cb_buf",
-            .type        = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
-            .stages      = VK_SHADER_STAGE_COMPUTE_BIT,
-            .mem_layout  = "scalar",
-            .buf_content = "uint8_t dc_cb[13];",
-        },
-        {
-            .name        = "ac_cb_buf",
-            .type        = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
-            .stages      = VK_SHADER_STAGE_COMPUTE_BIT,
-            .mem_layout  = "scalar",
-            .buf_content = "int16_t ac_cb[95];",
-        },
-        {
-            .name        = "rn_cb_buf",
-            .type        = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
-            .stages      = VK_SHADER_STAGE_COMPUTE_BIT,
-            .mem_layout  = "scalar",
-            .buf_content = "int16_t rn_cb[28];",
-        },
-        {
-            .name        = "ln_cb_buf",
-            .type        = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
-            .stages      = VK_SHADER_STAGE_COMPUTE_BIT,
-            .mem_layout  = "scalar",
-            .buf_content = "int16_t ln_cb[15];",
-        },
-    };
-    RET(ff_vk_shader_add_descriptor_set(s, shd, desc_set, 6, 1, 0));
+    return ff_vk_shader_add_descriptor_set(s, shd, desc_set, 1, 0, 0);
+}
 
-    GLSLD(ff_source_prores_raw_comp);
+static int init_decode_shader(AVCodecContext *avctx, FFVulkanContext *s,
+                              FFVkExecPool *pool, FFVkSPIRVCompiler *spv,
+                              FFVulkanShader *shd, int version)
+{
+    int err;
+    uint8_t *spv_data;
+    size_t spv_len;
+    void *spv_opaque = NULL;
+
+    RET(ff_vk_shader_init(s, shd, "prores_raw",
+                          VK_SHADER_STAGE_COMPUTE_BIT,
+                          (const char *[]) { "GL_EXT_buffer_reference",
+                                             "GL_EXT_buffer_reference2",
+                                             "GL_EXT_null_initializer" }, 3,
+                          4, 1, 1, 0));
+
+    RET(add_common_data(avctx, s, shd, 1));
+
+    GLSLD(ff_source_prores_raw_decode_comp);
+
+    RET(spv->compile_shader(s, spv, shd, &spv_data, &spv_len, "main",
+                            &spv_opaque));
+    RET(ff_vk_shader_link(s, shd, spv_data, spv_len, "main"));
+
+    RET(ff_vk_shader_register_exec(s, pool, shd));
+
+fail:
+    if (spv_opaque)
+        spv->free_shader(spv, &spv_opaque);
+
+    return err;
+}
+
+static int init_idct_shader(AVCodecContext *avctx, FFVulkanContext *s,
+                            FFVkExecPool *pool, FFVkSPIRVCompiler *spv,
+                            FFVulkanShader *shd, int version)
+{
+    int err;
+    uint8_t *spv_data;
+    size_t spv_len;
+    void *spv_opaque = NULL;
+
+    RET(ff_vk_shader_init(s, shd, "prores_raw",
+                          VK_SHADER_STAGE_COMPUTE_BIT,
+                          (const char *[]) { "GL_EXT_buffer_reference",
+                                             "GL_EXT_buffer_reference2" }, 2,
+                          8,
+                          version == 0 ? 8 : 16 /* Horizontal blocks */,
+                          4 /* Components */,
+                          0));
+
+    RET(add_common_data(avctx, s, shd, 0));
+
+    GLSLD(ff_source_prores_raw_idct_comp);
 
     RET(spv->compile_shader(s, spv, shd, &spv_data, &spv_len, "main",
                             &spv_opaque));
@@ -347,8 +390,7 @@ static void vk_decode_prores_raw_uninit(FFVulkanDecodeShared *ctx)
     ProResRAWVulkanDecodeContext *fv = ctx->sd_ctx;
 
     ff_vk_shader_free(&ctx->s, &fv->decode);
-
-    ff_vk_free_buf(&ctx->s, &fv->uniform_buf);
+    ff_vk_shader_free(&ctx->s, &fv->idct);
 
     av_buffer_pool_uninit(&fv->tile_data_pool);
 
@@ -381,87 +423,10 @@ static int vk_decode_prores_raw_init(AVCodecContext *avctx)
     ctx->sd_ctx_free = &vk_decode_prores_raw_uninit;
 
     /* Setup decode shader */
-    RET(init_decode_shader(prr, &ctx->s, &ctx->exec_pool, spv, &prv->decode,
+    RET(init_decode_shader(avctx, &ctx->s, &ctx->exec_pool, spv, &prv->decode,
                            prr->version));
-
-    /* Size in bytes of each codebook table */
-    size_t cb_size[5] = {
-        13*sizeof(uint8_t),
-        95*sizeof(int16_t),
-        28*sizeof(int16_t),
-        15*sizeof(int16_t),
-    };
-
-    /* Offset of each codebook table */
-    size_t cb_offset[5];
-    size_t ua = ctx->s.props.properties.limits.minUniformBufferOffsetAlignment;
-    cb_offset[0] = 64*sizeof(float) + 64*sizeof(uint8_t);
-    cb_offset[1] = cb_offset[0] + FFALIGN(cb_size[0], ua);
-    cb_offset[2] = cb_offset[1] + FFALIGN(cb_size[1], ua);
-    cb_offset[3] = cb_offset[2] + FFALIGN(cb_size[2], ua);
-    cb_offset[4] = cb_offset[3] + FFALIGN(cb_size[3], ua);
-
-    RET(ff_vk_create_buf(&ctx->s, &prv->uniform_buf,
-                         64*sizeof(float) + 64*sizeof(uint8_t) + cb_offset[4] + 256,
-                         NULL, NULL,
-                         VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT |
-                         VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT,
-                         VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
-                         VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT));
-
-    uint8_t *uniform_buf;
-    RET(ff_vk_map_buffer(&ctx->s, &prv->uniform_buf, &uniform_buf, 0));
-
-    /* DCT scales */
-    float *dct_scale_buf = (float *)uniform_buf;
-    double idct_8_scales[8] = {
-        cos(4.0*M_PI/16.0) / 2.0,
-        cos(1.0*M_PI/16.0) / 2.0,
-        cos(2.0*M_PI/16.0) / 2.0,
-        cos(3.0*M_PI/16.0) / 2.0,
-        cos(4.0*M_PI/16.0) / 2.0,
-        cos(5.0*M_PI/16.0) / 2.0,
-        cos(6.0*M_PI/16.0) / 2.0,
-        cos(7.0*M_PI/16.0) / 2.0,
-    };
-    for (int i = 0; i < 64; i++)
-        dct_scale_buf[i] = (float)(idct_8_scales[i >> 3] *
-                                   idct_8_scales[i  & 7]);
-
-    /* Scan table */
-    uint8_t *scan_buf = uniform_buf + 64*sizeof(float);
-    for (int i = 0; i < 64; i++)
-        scan_buf[prr->scan[i]] = i;
-
-    /* Codebooks */
-    memcpy(uniform_buf + cb_offset[0], ff_prores_raw_dc_cb,
-           sizeof(ff_prores_raw_dc_cb));
-    memcpy(uniform_buf + cb_offset[1], ff_prores_raw_ac_cb,
-           sizeof(ff_prores_raw_ac_cb));
-    memcpy(uniform_buf + cb_offset[2], ff_prores_raw_rn_cb,
-           sizeof(ff_prores_raw_rn_cb));
-    memcpy(uniform_buf + cb_offset[3], ff_prores_raw_ln_cb,
-           sizeof(ff_prores_raw_ln_cb));
-
-    RET(ff_vk_unmap_buffer(&ctx->s, &prv->uniform_buf, 1));
-
-    /* Done; update descriptors */
-    RET(ff_vk_shader_update_desc_buffer(&ctx->s, &ctx->exec_pool.contexts[0],
-                                        &prv->decode, 1, 0, 0,
-                                        &prv->uniform_buf,
-                                        0, 64*sizeof(float),
-                                        VK_FORMAT_UNDEFINED));
-    RET(ff_vk_shader_update_desc_buffer(&ctx->s, &ctx->exec_pool.contexts[0],
-                                        &prv->decode, 1, 1, 0,
-                                        &prv->uniform_buf,
-                                        64*sizeof(float), 64*sizeof(uint8_t),
-                                        VK_FORMAT_UNDEFINED));
-    for (int j = 0; j < 4; j++)
-        RET(ff_vk_shader_update_desc_buffer(&ctx->s, &ctx->exec_pool.contexts[0],
-                                            &prv->decode, 1, 2 + j, 0,
-                                            &prv->uniform_buf,
-                                            cb_offset[j], cb_size[j],
-                                            VK_FORMAT_UNDEFINED));
+    RET(init_idct_shader(avctx, &ctx->s, &ctx->exec_pool, spv, &prv->idct,
+                         prr->version));
 
 fail:
     spv->uninit(&spv);
-- 
2.52.0


From e6bf824dc5a72dcacb9845a25b026ff75577859b Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Thu, 13 Nov 2025 12:02:22 +0100
Subject: [PATCH 090/304] vulkan_prores_raw: use regular descriptors for tile
 data instead of BDA

Regular descriptors are faster.
---
 libavcodec/vulkan_prores_raw.c | 47 +++++++++++++++++++++++-----------
 1 file changed, 32 insertions(+), 15 deletions(-)

diff --git a/libavcodec/vulkan_prores_raw.c b/libavcodec/vulkan_prores_raw.c
index 3b6d617eec..bd3c8b6e46 100644
--- a/libavcodec/vulkan_prores_raw.c
+++ b/libavcodec/vulkan_prores_raw.c
@@ -38,7 +38,7 @@ const FFVulkanDecodeDescriptor ff_vk_dec_prores_raw_desc = {
 typedef struct ProResRAWVulkanDecodePicture {
     FFVulkanDecodePicture vp;
 
-    AVBufferRef *tile_data;
+    AVBufferRef *frame_data_buf;
     uint32_t nb_tiles;
 } ProResRAWVulkanDecodePicture;
 
@@ -46,11 +46,10 @@ typedef struct ProResRAWVulkanDecodeContext {
     FFVulkanShader decode;
     FFVulkanShader idct;
 
-    AVBufferPool *tile_data_pool;
+    AVBufferPool *frame_data_pool;
 } ProResRAWVulkanDecodeContext;
 
 typedef struct DecodePushData {
-    VkDeviceAddress tile_data;
     VkDeviceAddress pkt_data;
     int32_t frame_size[2];
     int32_t tile_size[2];
@@ -85,8 +84,8 @@ static int vk_prores_raw_start_frame(AVCodecContext          *avctx,
                               VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT);
 
     /* Allocate tile data */
-    err = ff_vk_get_pooled_buffer(&ctx->s, &prv->tile_data_pool,
-                                  &pp->tile_data,
+    err = ff_vk_get_pooled_buffer(&ctx->s, &prv->frame_data_pool,
+                                  &pp->frame_data_buf,
                                   VK_BUFFER_USAGE_STORAGE_BUFFER_BIT |
                                   VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT,
                                   NULL, prr->nb_tiles*sizeof(TileData),
@@ -113,8 +112,8 @@ static int vk_prores_raw_decode_slice(AVCodecContext *avctx,
     ProResRAWVulkanDecodePicture *pp = prr->hwaccel_picture_private;
     FFVulkanDecodePicture *vp = &pp->vp;
 
-    FFVkBuffer *tile_data_buf = (FFVkBuffer *)pp->tile_data->data;
-    TileData *td = (TileData *)tile_data_buf->mapped_mem;
+    FFVkBuffer *frame_data_buf = (FFVkBuffer *)pp->frame_data_buf->data;
+    TileData *td = (TileData *)frame_data_buf->mapped_mem;
     FFVkBuffer *slices_buf = vp->slices_buf ? (FFVkBuffer *)vp->slices_buf->data : NULL;
 
     td[pp->nb_tiles].pos[0] = prr->tiles[pp->nb_tiles].x;
@@ -150,7 +149,7 @@ static int vk_prores_raw_end_frame(AVCodecContext *avctx)
     FFVulkanDecodePicture *vp = &pp->vp;
 
     FFVkBuffer *slices_buf = (FFVkBuffer *)vp->slices_buf->data;
-    FFVkBuffer *tile_data = (FFVkBuffer *)pp->tile_data->data;
+    FFVkBuffer *frame_data_buf = (FFVkBuffer *)pp->frame_data_buf->data;
 
     VkImageMemoryBarrier2 img_bar[8];
     int nb_img_bar = 0;
@@ -168,8 +167,8 @@ static int vk_prores_raw_end_frame(AVCodecContext *avctx)
     if (err < 0)
         return err;
 
-    RET(ff_vk_exec_add_dep_buf(&ctx->s, exec, &pp->tile_data, 1, 0));
-    pp->tile_data = NULL;
+    RET(ff_vk_exec_add_dep_buf(&ctx->s, exec, &pp->frame_data_buf, 1, 0));
+    pp->frame_data_buf = NULL;
     RET(ff_vk_exec_add_dep_buf(&ctx->s, exec, &vp->slices_buf, 1, 0));
     vp->slices_buf = NULL;
 
@@ -220,12 +219,16 @@ static int vk_prores_raw_end_frame(AVCodecContext *avctx)
                                   0, 0,
                                   VK_IMAGE_LAYOUT_GENERAL,
                                   VK_NULL_HANDLE);
+    ff_vk_shader_update_desc_buffer(&ctx->s, exec, decode_shader,
+                                    0, 1, 0,
+                                    frame_data_buf,
+                                    0, prr->nb_tiles*sizeof(TileData),
+                                    VK_FORMAT_UNDEFINED);
 
     ff_vk_exec_bind_shader(&ctx->s, exec, decode_shader);
 
     /* Update push data */
     DecodePushData pd_decode = (DecodePushData) {
-        .tile_data = tile_data->address,
         .pkt_data = slices_buf->address,
         .frame_size[0] = avctx->width,
         .frame_size[1] = avctx->height,
@@ -253,6 +256,11 @@ static int vk_prores_raw_end_frame(AVCodecContext *avctx)
                                   0, 0,
                                   VK_IMAGE_LAYOUT_GENERAL,
                                   VK_NULL_HANDLE);
+    ff_vk_shader_update_desc_buffer(&ctx->s, exec, idct_shader,
+                                    0, 1, 0,
+                                    frame_data_buf,
+                                    0, prr->nb_tiles*sizeof(TileData),
+                                    VK_FORMAT_UNDEFINED);
     ff_vk_exec_bind_shader(&ctx->s, exec, idct_shader);
     ff_vk_shader_update_push_const(&ctx->s, exec, idct_shader,
                                    VK_SHADER_STAGE_COMPUTE_BIT,
@@ -284,14 +292,13 @@ static int add_common_data(AVCodecContext *avctx, FFVulkanContext *s,
     /* Common codec header */
     GLSLD(ff_source_common_comp);
 
-    GLSLC(0, layout(buffer_reference, buffer_reference_align = 16) buffer TileData { );
+    GLSLC(0, struct TileData {                                                       );
     GLSLC(1,    ivec2 pos;                                                           );
     GLSLC(1,    uint offset;                                                         );
     GLSLC(1,    uint size;                                                           );
     GLSLC(0, };                                                                      );
     GLSLC(0,                                                                         );
     GLSLC(0, layout(push_constant, scalar) uniform pushConstants {                   );
-    GLSLC(1,    TileData tile_data;                                                  );
     GLSLC(1,    u8buf pkt_data;                                                      );
     GLSLC(1,    ivec2 frame_size;                                                    );
     GLSLC(1,    ivec2 tile_size;                                                     );
@@ -312,9 +319,17 @@ static int add_common_data(AVCodecContext *avctx, FFVulkanContext *s,
             .dimensions = 2,
             .stages     = VK_SHADER_STAGE_COMPUTE_BIT,
         },
+        {
+            .name        = "frame_data_buf",
+            .type        = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
+            .stages      = VK_SHADER_STAGE_COMPUTE_BIT,
+            .mem_layout  = "scalar",
+            .mem_quali   = "readonly",
+            .buf_content = "TileData tile_data[];",
+        },
     };
 
-    return ff_vk_shader_add_descriptor_set(s, shd, desc_set, 1, 0, 0);
+    return ff_vk_shader_add_descriptor_set(s, shd, desc_set, 2, 0, 0);
 }
 
 static int init_decode_shader(AVCodecContext *avctx, FFVulkanContext *s,
@@ -392,7 +407,7 @@ static void vk_decode_prores_raw_uninit(FFVulkanDecodeShared *ctx)
     ff_vk_shader_free(&ctx->s, &fv->decode);
     ff_vk_shader_free(&ctx->s, &fv->idct);
 
-    av_buffer_pool_uninit(&fv->tile_data_pool);
+    av_buffer_pool_uninit(&fv->frame_data_pool);
 
     av_freep(&fv);
 }
@@ -442,6 +457,8 @@ static void vk_prores_raw_free_frame_priv(AVRefStructOpaque _hwctx, void *data)
     FFVulkanDecodePicture *vp = &pp->vp;
 
     ff_vk_decode_free_frame(dev_ctx, vp);
+
+    av_buffer_unref(&pp->frame_data_buf);
 }
 
 const FFHWAccel ff_prores_raw_vulkan_hwaccel = {
-- 
2.52.0


From bfa59d5a9d09784d20153f0381469c5c6fa3ff21 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Mon, 24 Nov 2025 19:14:34 +0100
Subject: [PATCH 091/304] vulkan_prores_raw: fix dynamically non-uniform
 accesses to pushconsts

The Vulkan spec requires that all accesses to push data are uniform for
all invocations (e.g. can't be based on gl_WorkGroupID or gl_LocalInvocationID).
---
 libavcodec/vulkan/prores_raw_idct.comp | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/libavcodec/vulkan/prores_raw_idct.comp b/libavcodec/vulkan/prores_raw_idct.comp
index e692792301..d069a6e3c4 100644
--- a/libavcodec/vulkan/prores_raw_idct.comp
+++ b/libavcodec/vulkan/prores_raw_idct.comp
@@ -141,11 +141,14 @@ void main(void)
     const uint w = min(tile_size.x, frame_size.x - td.pos.x) / 2;
     const uint nb_blocks = w / 8;
 
+    /* We have to do non-uniform access, so copy it */
+    uint8_t qmat_buf[64] = qmat;
+
     [[unroll]]
     for (uint i = gl_LocalInvocationID.x; i < 64; i += gl_WorkGroupSize.x) {
         int16_t v = int16_t(imageLoad(dst, offs + 2*ivec2(BLOCK_ID*8, 0) + scan[i])[0]);
         blocks[BLOCK_ID][COMP_ID*64 + i] = (v / 16384.0) *
-                                           (float(qmat[i]) / 295.0) *
+                                           (float(qmat_buf[i]) / 295.0) *
                                            idct_scale[i] * qscale;
     }
 
-- 
2.52.0


From 4a468abd3cdba4927bd239123b884b43cec3199a Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Sat, 15 Nov 2025 14:15:19 +0100
Subject: [PATCH 092/304] vulkan_prores_raw: read the header length rather than
 assuming its 8

In all known samples, it is equal to 8.
---
 libavcodec/vulkan/prores_raw_decode.comp | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/libavcodec/vulkan/prores_raw_decode.comp b/libavcodec/vulkan/prores_raw_decode.comp
index 08447e3961..069aef638c 100644
--- a/libavcodec/vulkan/prores_raw_decode.comp
+++ b/libavcodec/vulkan/prores_raw_decode.comp
@@ -206,12 +206,13 @@ void main(void)
 
     uint64_t pkt_offset = uint64_t(pkt_data) + td.offset;
     u8vec2buf hdr_data = u8vec2buf(pkt_offset);
+    int header_len = hdr_data[0].v.x >> 3;
 
     ivec4 size = ivec4(td.size,
                        pack16(hdr_data[2].v.yx),
                        pack16(hdr_data[1].v.yx),
                        pack16(hdr_data[3].v.yx));
-    size[0] = size[0] - size[1] - size[2] - size[3] - 8;
+    size[0] = size[0] - size[1] - size[2] - size[3] - header_len;
     if (expectEXT(size[0] < 0, false))
         return;
 
@@ -224,7 +225,7 @@ void main(void)
                                     0,
                                     size[2] + size[1]);
 
-    init_get_bits(gb, u8buf(pkt_offset + 8 + comp_offset[COMP_ID]),
+    init_get_bits(gb, u8buf(pkt_offset + header_len + comp_offset[COMP_ID]),
                   size[COMP_ID]);
 
     read_dc_vals(offs, nb_blocks);
-- 
2.52.0


From c3af60698256b7d9cdd23777483d53f0dde3ed85 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Tue, 25 Nov 2025 17:20:10 +0100
Subject: [PATCH 093/304] vulkan_prores_raw: use the native image
 representation

It allows us to easily synchronize the software and hardware
decoders, by removing the abstraction the Vulkan layer added by changing
the values written.
---
 libavcodec/vulkan/prores_raw_decode.comp |  3 ++-
 libavcodec/vulkan/prores_raw_idct.comp   | 15 +++++++++------
 libavcodec/vulkan_prores_raw.c           |  4 ++--
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/libavcodec/vulkan/prores_raw_decode.comp b/libavcodec/vulkan/prores_raw_decode.comp
index 069aef638c..809d787466 100644
--- a/libavcodec/vulkan/prores_raw_decode.comp
+++ b/libavcodec/vulkan/prores_raw_decode.comp
@@ -101,7 +101,8 @@ int16_t get_value(int16_t codebook)
 
 void store_val(ivec2 offs, int blk, int c, int16_t v)
 {
-    imageStore(dst, offs + 2*ivec2(blk*8 + (c & 7), c >> 3), ivec4(v));
+    imageStore(dst, offs + 2*ivec2(blk*8 + (c & 7), c >> 3),
+               ivec4(v & 0xFFFF));
 }
 
 void read_dc_vals(ivec2 offs, int nb_blocks)
diff --git a/libavcodec/vulkan/prores_raw_idct.comp b/libavcodec/vulkan/prores_raw_idct.comp
index d069a6e3c4..29ddf3b9e8 100644
--- a/libavcodec/vulkan/prores_raw_idct.comp
+++ b/libavcodec/vulkan/prores_raw_idct.comp
@@ -135,7 +135,7 @@ void main(void)
 
     uint64_t pkt_offset = uint64_t(pkt_data) + td.offset;
     u8vec2buf hdr_data = u8vec2buf(pkt_offset);
-    float qscale = float(pack16(hdr_data[0].v.yx)) / 2.0f;
+    int qscale = pack16(hdr_data[0].v.yx);
 
     const ivec2 offs = td.pos + ivec2(COMP_ID & 1, COMP_ID >> 1);
     const uint w = min(tile_size.x, frame_size.x - td.pos.x) / 2;
@@ -146,10 +146,11 @@ void main(void)
 
     [[unroll]]
     for (uint i = gl_LocalInvocationID.x; i < 64; i += gl_WorkGroupSize.x) {
-        int16_t v = int16_t(imageLoad(dst, offs + 2*ivec2(BLOCK_ID*8, 0) + scan[i])[0]);
-        blocks[BLOCK_ID][COMP_ID*64 + i] = (v / 16384.0) *
-                                           (float(qmat_buf[i]) / 295.0) *
-                                           idct_scale[i] * qscale;
+        int v = int(imageLoad(dst, offs + 2*ivec2(BLOCK_ID*8, 0) + scan[i])[0]);
+        float vf = float(sign_extend(v, 16)) / 32768.0;
+        vf *= qmat_buf[i] * qscale;
+        blocks[BLOCK_ID][COMP_ID*64 + i] = (vf / (64*4.56)) *
+                                           idct_scale[i];
     }
 
     barrier();
@@ -163,7 +164,9 @@ void main(void)
     barrier();
     [[unroll]]
     for (uint i = gl_LocalInvocationID.x; i < 64; i += gl_WorkGroupSize.x) {
-        int v = int(round(blocks[BLOCK_ID][COMP_ID*64 + i]*16384.0));
+        int v = int(round(blocks[BLOCK_ID][COMP_ID*64 + i]*4095.0));
+        v = clamp(v, 0, 4095);
+        v <<= 4;
         imageStore(dst,
                    offs + 2*ivec2(BLOCK_ID*8 + (i & 7), i >> 3),
                    ivec4(v));
diff --git a/libavcodec/vulkan_prores_raw.c b/libavcodec/vulkan_prores_raw.c
index bd3c8b6e46..cd025d25e6 100644
--- a/libavcodec/vulkan_prores_raw.c
+++ b/libavcodec/vulkan_prores_raw.c
@@ -96,7 +96,7 @@ static int vk_prores_raw_start_frame(AVCodecContext          *avctx,
 
     /* Prepare frame to be used */
     err = ff_vk_decode_prepare_frame_sdr(dec, prr->frame, vp, 1,
-                                         FF_VK_REP_INT, 0);
+                                         FF_VK_REP_NATIVE, 0);
     if (err < 0)
         return err;
 
@@ -314,7 +314,7 @@ static int add_common_data(AVCodecContext *avctx, FFVulkanContext *s,
             .name       = "dst",
             .type       = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
             .mem_layout = ff_vk_shader_rep_fmt(dec_frames_ctx->sw_format,
-                                               FF_VK_REP_INT),
+                                               FF_VK_REP_NATIVE),
             .mem_quali  = writeonly ? "writeonly" : NULL,
             .dimensions = 2,
             .stages     = VK_SHADER_STAGE_COMPUTE_BIT,
-- 
2.52.0


From b8a90b74ba43e88b7c1099c56306b4b373124206 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Sun, 9 Nov 2025 10:52:20 +0100
Subject: [PATCH 094/304] dpx: add a context

This simply adds a context with 4 fields to enable hardware unpacking.
---
 libavcodec/dpx.c | 141 ++++++++++++++++++++++++-----------------------
 libavcodec/dpx.h |  13 +++++
 2 files changed, 86 insertions(+), 68 deletions(-)

diff --git a/libavcodec/dpx.c b/libavcodec/dpx.c
index 1b1ada316a..c8981bbf3a 100644
--- a/libavcodec/dpx.c
+++ b/libavcodec/dpx.c
@@ -121,6 +121,8 @@ static uint16_t read12in32(const uint8_t **ptr, uint32_t *lbuf,
 static int decode_frame(AVCodecContext *avctx, AVFrame *p,
                         int *got_frame, AVPacket *avpkt)
 {
+    DPXDecContext *dpx = avctx->priv_data;
+
     const uint8_t *buf = avpkt->data;
     int buf_size       = avpkt->size;
     uint8_t *ptr[AV_NUM_DATA_POINTERS];
@@ -129,11 +131,11 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
     char input_device[33] = { 0 };
 
     unsigned int offset;
-    int magic_num, endian;
-    int x, y, stride, i, j, ret;
-    int w, h, bits_per_color, descriptor, elements, packing;
+    int magic_num;
+    int x, y, i, j, ret;
+    int w, h, descriptor;
     int yuv, color_trc, color_spec;
-    int encoding, need_align = 0, unpadded_10bit = 0;
+    int encoding;
 
     unsigned int rgbBuffer = 0;
     int n_datum = 0;
@@ -149,15 +151,15 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
     /* Check if the files "magic number" is "SDPX" which means it uses
      * big-endian or XPDS which is for little-endian files */
     if (magic_num == AV_RL32("SDPX")) {
-        endian = 0;
+        dpx->endian = 0;
     } else if (magic_num == AV_RB32("SDPX")) {
-        endian = 1;
+        dpx->endian = 1;
     } else {
         av_log(avctx, AV_LOG_ERROR, "DPX marker not found\n");
         return AVERROR_INVALIDDATA;
     }
 
-    offset = read32(&buf, endian);
+    offset = read32(&buf, dpx->endian);
     if (avpkt->size <= offset) {
         av_log(avctx, AV_LOG_ERROR, "Invalid data start offset\n");
         return AVERROR_INVALIDDATA;
@@ -174,7 +176,7 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
 
     // Check encryption
     buf = avpkt->data + 660;
-    ret = read32(&buf, endian);
+    ret = read32(&buf, dpx->endian);
     if (ret != 0xFFFFFFFF) {
         avpriv_report_missing_feature(avctx, "Encryption");
         av_log(avctx, AV_LOG_WARNING, "The image is encrypted and may "
@@ -183,8 +185,8 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
 
     // Need to end in 0x304 offset from start of file
     buf = avpkt->data + 0x304;
-    w = read32(&buf, endian);
-    h = read32(&buf, endian);
+    w = read32(&buf, dpx->endian);
+    h = read32(&buf, dpx->endian);
 
     if ((ret = ff_set_dimensions(avctx, w, h)) < 0)
         return ret;
@@ -197,23 +199,22 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
 
     // Need to end in 0x323 to read the bits per color
     buf += 3;
-    avctx->bits_per_raw_sample =
-    bits_per_color = buf[0];
+    avctx->bits_per_raw_sample = buf[0];
     buf++;
-    packing = read16(&buf, endian);
-    encoding = read16(&buf, endian);
+    dpx->packing = read16(&buf, dpx->endian);
+    encoding = read16(&buf, dpx->endian);
 
     if (encoding) {
         avpriv_report_missing_feature(avctx, "Encoding %d", encoding);
         return AVERROR_PATCHWELCOME;
     }
 
-    if (bits_per_color > 31)
+    if (avctx->bits_per_raw_sample > 31)
         return AVERROR_INVALIDDATA;
 
     buf += 820;
-    avctx->sample_aspect_ratio.num = read32(&buf, endian);
-    avctx->sample_aspect_ratio.den = read32(&buf, endian);
+    avctx->sample_aspect_ratio.num = read32(&buf, dpx->endian);
+    avctx->sample_aspect_ratio.den = read32(&buf, dpx->endian);
     if (avctx->sample_aspect_ratio.num > 0 && avctx->sample_aspect_ratio.den > 0)
         av_reduce(&avctx->sample_aspect_ratio.num, &avctx->sample_aspect_ratio.den,
                    avctx->sample_aspect_ratio.num,  avctx->sample_aspect_ratio.den,
@@ -224,7 +225,7 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
     /* preferred frame rate from Motion-picture film header */
     if (offset >= 1724 + 4) {
         buf = avpkt->data + 1724;
-        i = read32(&buf, endian);
+        i = read32(&buf, dpx->endian);
         if(i && i != 0xFFFFFFFF) {
             AVRational q = av_d2q(av_int2float(i), 4096);
             if (q.num > 0 && q.den > 0)
@@ -236,7 +237,7 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
     if (offset >= 1940 + 4 &&
         !(avctx->framerate.num && avctx->framerate.den)) {
         buf = avpkt->data + 1940;
-        i = read32(&buf, endian);
+        i = read32(&buf, dpx->endian);
         if(i && i != 0xFFFFFFFF) {
             AVRational q = av_d2q(av_int2float(i), 4096);
             if (q.num > 0 && q.den > 0)
@@ -253,7 +254,7 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
         buf = avpkt->data + 1920;
         // read32 to native endian, av_bswap32 to opposite of native for
         // compatibility with av_timecode_make_smpte_tc_string2 etc
-        tc = av_bswap32(read32(&buf, endian));
+        tc = av_bswap32(read32(&buf, dpx->endian));
 
         if (i != 0xFFFFFFFF) {
             AVFrameSideData *tcside;
@@ -277,21 +278,21 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
     /* color range from television header */
     if (offset >= 1964 + 4) {
         buf = avpkt->data + 1952;
-        i = read32(&buf, endian);
+        i = read32(&buf, dpx->endian);
 
         buf = avpkt->data + 1964;
-        j = read32(&buf, endian);
+        j = read32(&buf, dpx->endian);
 
         if (i != 0xFFFFFFFF && j != 0xFFFFFFFF) {
             float minCV, maxCV;
             minCV = av_int2float(i);
             maxCV = av_int2float(j);
-            if (bits_per_color >= 1 &&
-                minCV == 0.0f && maxCV == ((1U<<bits_per_color) - 1)) {
+            if (avctx->bits_per_raw_sample >= 1 &&
+                minCV == 0.0f && maxCV == ((1U<<avctx->bits_per_raw_sample) - 1)) {
                 avctx->color_range = AVCOL_RANGE_JPEG;
-            } else if (bits_per_color >= 8 &&
-                       minCV == (1  <<(bits_per_color - 4)) &&
-                       maxCV == (235<<(bits_per_color - 8))) {
+            } else if (avctx->bits_per_raw_sample >= 8 &&
+                       minCV == (1  <<(avctx->bits_per_raw_sample - 4)) &&
+                       maxCV == (235<<(avctx->bits_per_raw_sample - 8))) {
                 avctx->color_range = AVCOL_RANGE_MPEG;
             }
         }
@@ -303,28 +304,28 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
     case 3:  // B
     case 4:  // A
     case 6:  // Y
-        elements = 1;
+        dpx->components = 1;
         yuv = 1;
         break;
     case 50: // RGB
-        elements = 3;
+        dpx->components = 3;
         yuv = 0;
         break;
     case 52: // ABGR
     case 51: // RGBA
-        elements = 4;
+        dpx->components = 4;
         yuv = 0;
         break;
     case 100: // UYVY422
-        elements = 2;
+        dpx->components = 2;
         yuv = 1;
         break;
     case 102: // UYV444
-        elements = 3;
+        dpx->components = 3;
         yuv = 1;
         break;
     case 103: // UYVA4444
-        elements = 4;
+        dpx->components = 4;
         yuv = 1;
         break;
     default:
@@ -332,40 +333,40 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
         return AVERROR_PATCHWELCOME;
     }
 
-    switch (bits_per_color) {
+    switch (avctx->bits_per_raw_sample) {
     case 8:
-        stride = avctx->width * elements;
+        dpx->stride = avctx->width * dpx->components;
         break;
     case 10:
-        if (!packing) {
+        if (!dpx->packing) {
             av_log(avctx, AV_LOG_ERROR, "Packing to 32bit required\n");
             return -1;
         }
-        stride = (avctx->width * elements + 2) / 3 * 4;
+        dpx->stride = (avctx->width * dpx->components + 2) / 3 * 4;
         break;
     case 12:
-        stride = avctx->width * elements;
-        if (packing) {
-            stride *= 2;
+        dpx->stride = avctx->width * dpx->components;
+        if (dpx->packing) {
+            dpx->stride *= 2;
         } else {
-            stride *= 3;
-            if (stride % 8) {
-                stride /= 8;
-                stride++;
-                stride *= 8;
+            dpx->stride *= 3;
+            if (dpx->stride % 8) {
+                dpx->stride /= 8;
+                dpx->stride++;
+                dpx->stride *= 8;
             }
-            stride /= 2;
+            dpx->stride /= 2;
         }
         break;
     case 16:
-        stride = 2 * avctx->width * elements;
+        dpx->stride = 2 * avctx->width * dpx->components;
         break;
     case 32:
-        stride = 4 * avctx->width * elements;
+        dpx->stride = 4 * avctx->width * dpx->components;
         break;
     case 1:
     case 64:
-        avpriv_report_missing_feature(avctx, "Depth %d", bits_per_color);
+        avpriv_report_missing_feature(avctx, "Depth %d", avctx->bits_per_raw_sample);
         return AVERROR_PATCHWELCOME;
     default:
         return AVERROR_INVALIDDATA;
@@ -458,8 +459,8 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
     // Some devices do not pad 10bit samples to whole 32bit words per row
     if (!memcmp(input_device, "Scanity", 7) ||
         !memcmp(creator, "Lasergraphics Inc.", 18)) {
-        if (bits_per_color == 10)
-            unpadded_10bit = 1;
+        if (avctx->bits_per_raw_sample == 10)
+            dpx->unpadded_10bit = 1;
     }
 
     // Table 3c: Runs will always break at scan line boundaries. Packing
@@ -467,24 +468,24 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
     // Unfortunately, the encoder produced invalid files, so attempt
     // to detect it
     // Also handle special case with unpadded content
-    need_align = FFALIGN(stride, 4);
-    if (need_align*avctx->height + (int64_t)offset > avpkt->size &&
-        (!unpadded_10bit || (avctx->width * avctx->height * elements + 2) / 3 * 4 + (int64_t)offset > avpkt->size)) {
+    dpx->need_align = FFALIGN(dpx->stride, 4);
+    if (dpx->need_align*avctx->height + (int64_t)offset > avpkt->size &&
+        (!dpx->unpadded_10bit || (avctx->width * avctx->height * dpx->components + 2) / 3 * 4 + (int64_t)offset > avpkt->size)) {
         // Alignment seems unappliable, try without
-        if (stride*avctx->height + (int64_t)offset > avpkt->size || unpadded_10bit) {
+        if (dpx->stride*avctx->height + (int64_t)offset > avpkt->size || dpx->unpadded_10bit) {
             av_log(avctx, AV_LOG_ERROR, "Overread buffer. Invalid header?\n");
             return AVERROR_INVALIDDATA;
         } else {
             av_log(avctx, AV_LOG_INFO, "Decoding DPX without scanline "
                    "alignment.\n");
-            need_align = 0;
+            dpx->need_align = 0;
         }
     } else {
-        need_align -= stride;
-        stride = FFALIGN(stride, 4);
+        dpx->need_align -= dpx->stride;
+        dpx->stride = FFALIGN(dpx->stride, 4);
     }
 
-    switch (1000 * descriptor + 10 * bits_per_color + endian) {
+    switch (1000 * descriptor + 10 * avctx->bits_per_raw_sample + dpx->endian) {
     case 1081:
     case 1080:
     case 2081:
@@ -588,7 +589,7 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
         break;
     default:
         av_log(avctx, AV_LOG_ERROR, "Unsupported format %d\n",
-               1000 * descriptor + 10 * bits_per_color + endian);
+               1000 * descriptor + 10 * avctx->bits_per_raw_sample + dpx->endian);
         return AVERROR_PATCHWELCOME;
     }
 
@@ -599,18 +600,21 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
 
     // Move pointer to offset from start of file
     buf =  avpkt->data + offset;
+    dpx->frame = p;
 
+    int elements = dpx->components;
+    int endian = dpx->endian;
     for (i=0; i<AV_NUM_DATA_POINTERS; i++)
         ptr[i] = p->data[i];
 
-    switch (bits_per_color) {
+    switch (avctx->bits_per_raw_sample) {
     case 10:
         for (x = 0; x < avctx->height; x++) {
             uint16_t *dst[4] = {(uint16_t*)ptr[0],
                                 (uint16_t*)ptr[1],
                                 (uint16_t*)ptr[2],
                                 (uint16_t*)ptr[3]};
-            int shift = elements > 1 ? packing == 1 ? 22 : 20 : packing == 1 ? 2 : 0;
+            int shift = elements > 1 ? dpx->packing == 1 ? 22 : 20 : dpx->packing == 1 ? 2 : 0;
             for (y = 0; y < avctx->width; y++) {
                 if (elements >= 3)
                     *dst[2]++ = read10in32(&buf, &rgbBuffer,
@@ -629,7 +633,7 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
                     read10in32(&buf, &rgbBuffer,
                                &n_datum, endian, shift);
             }
-            if (!unpadded_10bit)
+            if (!dpx->unpadded_10bit)
                 n_datum = 0;
             for (i = 0; i < elements; i++)
                 ptr[i] += p->linesize[i];
@@ -641,9 +645,9 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
                                 (uint16_t*)ptr[1],
                                 (uint16_t*)ptr[2],
                                 (uint16_t*)ptr[3]};
-            int shift = packing == 1 ? 4 : 0;
+            int shift = dpx->packing == 1 ? 4 : 0;
             for (y = 0; y < avctx->width; y++) {
-                if (packing) {
+                if (dpx->packing) {
                     if (elements >= 3)
                         *dst[2]++ = read16(&buf, endian) >> shift & 0xFFF;
                     *dst[0]++ = read16(&buf, endian) >> shift & 0xFFF;
@@ -669,13 +673,13 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
             for (i = 0; i < elements; i++)
                 ptr[i] += p->linesize[i];
             // Jump to next aligned position
-            buf += need_align;
+            buf += dpx->need_align;
         }
         break;
     case 32:
         if (elements == 1) {
             av_image_copy_plane(ptr[0], p->linesize[0],
-                                buf, stride,
+                                buf, dpx->stride,
                                 elements * avctx->width * 4, avctx->height);
         } else {
             for (y = 0; y < avctx->height; y++) {
@@ -722,7 +726,7 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
             }
         } else {
         av_image_copy_plane(ptr[0], p->linesize[0],
-                            buf, stride,
+                            buf, dpx->stride,
                             elements * avctx->width, avctx->height);
         }
         break;
@@ -736,6 +740,7 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
 const FFCodec ff_dpx_decoder = {
     .p.name         = "dpx",
     CODEC_LONG_NAME("DPX (Digital Picture Exchange) image"),
+    .priv_data_size = sizeof(DPXDecContext),
     .p.type         = AVMEDIA_TYPE_VIDEO,
     .p.id           = AV_CODEC_ID_DPX,
     FF_CODEC_DECODE_CB(decode_frame),
diff --git a/libavcodec/dpx.h b/libavcodec/dpx.h
index 800c651e5a..35e8aa690f 100644
--- a/libavcodec/dpx.h
+++ b/libavcodec/dpx.h
@@ -22,6 +22,8 @@
 #ifndef AVCODEC_DPX_H
 #define AVCODEC_DPX_H
 
+#include "libavutil/frame.h"
+
 enum DPX_TRC {
     DPX_TRC_USER_DEFINED       = 0,
     DPX_TRC_PRINTING_DENSITY   = 1,
@@ -54,4 +56,15 @@ enum DPX_COL_SPEC {
     /* 12 = N/A */
 };
 
+typedef struct DPXDecContext {
+    AVFrame *frame;
+
+    int packing;
+    int stride;
+    int endian;
+    int components;
+    int unpadded_10bit;
+    int need_align;
+} DPXDecContext;
+
 #endif /* AVCODEC_DPX_H */
-- 
2.52.0


From c104a90cdea7e04989d5028977c500495dc5a90e Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Sun, 9 Nov 2025 11:11:54 +0100
Subject: [PATCH 095/304] dpxdec: move data parsing into a separate function

---
 libavcodec/dpx.c | 275 ++++++++++++++++++++++++-----------------------
 1 file changed, 141 insertions(+), 134 deletions(-)

diff --git a/libavcodec/dpx.c b/libavcodec/dpx.c
index c8981bbf3a..96d4647389 100644
--- a/libavcodec/dpx.c
+++ b/libavcodec/dpx.c
@@ -118,6 +118,145 @@ static uint16_t read12in32(const uint8_t **ptr, uint32_t *lbuf,
     }
 }
 
+static void unpack_frame(AVCodecContext *avctx, AVFrame *p, const uint8_t *buf,
+                         int elements, int endian)
+{
+    int i, x, y;
+    DPXDecContext *dpx = avctx->priv_data;
+
+    uint8_t *ptr[AV_NUM_DATA_POINTERS];
+    unsigned int rgbBuffer = 0;
+    int n_datum = 0;
+
+    for (i=0; i<AV_NUM_DATA_POINTERS; i++)
+        ptr[i] = p->data[i];
+
+    switch (avctx->bits_per_raw_sample) {
+    case 10:
+        for (x = 0; x < avctx->height; x++) {
+            uint16_t *dst[4] = {(uint16_t*)ptr[0],
+                                (uint16_t*)ptr[1],
+                                (uint16_t*)ptr[2],
+                                (uint16_t*)ptr[3]};
+            int shift = elements > 1 ? dpx->packing == 1 ? 22 : 20 : dpx->packing == 1 ? 2 : 0;
+            for (y = 0; y < avctx->width; y++) {
+                if (elements >= 3)
+                    *dst[2]++ = read10in32(&buf, &rgbBuffer,
+                                           &n_datum, endian, shift);
+                if (elements == 1)
+                    *dst[0]++ = read10in32_gray(&buf, &rgbBuffer,
+                                                &n_datum, endian, shift);
+                else
+                    *dst[0]++ = read10in32(&buf, &rgbBuffer,
+                                           &n_datum, endian, shift);
+                if (elements >= 2)
+                    *dst[1]++ = read10in32(&buf, &rgbBuffer,
+                                           &n_datum, endian, shift);
+                if (elements == 4)
+                    *dst[3]++ =
+                    read10in32(&buf, &rgbBuffer,
+                               &n_datum, endian, shift);
+            }
+            if (!dpx->unpadded_10bit)
+                n_datum = 0;
+            for (i = 0; i < elements; i++)
+                ptr[i] += p->linesize[i];
+        }
+        break;
+    case 12:
+        for (x = 0; x < avctx->height; x++) {
+            uint16_t *dst[4] = {(uint16_t*)ptr[0],
+                                (uint16_t*)ptr[1],
+                                (uint16_t*)ptr[2],
+                                (uint16_t*)ptr[3]};
+            int shift = dpx->packing == 1 ? 4 : 0;
+            for (y = 0; y < avctx->width; y++) {
+                if (dpx->packing) {
+                    if (elements >= 3)
+                        *dst[2]++ = read16(&buf, endian) >> shift & 0xFFF;
+                    *dst[0]++ = read16(&buf, endian) >> shift & 0xFFF;
+                    if (elements >= 2)
+                        *dst[1]++ = read16(&buf, endian) >> shift & 0xFFF;
+                    if (elements == 4)
+                        *dst[3]++ = read16(&buf, endian) >> shift & 0xFFF;
+                } else {
+                    if (elements >= 3)
+                        *dst[2]++ = read12in32(&buf, &rgbBuffer,
+                                               &n_datum, endian);
+                    *dst[0]++ = read12in32(&buf, &rgbBuffer,
+                                           &n_datum, endian);
+                    if (elements >= 2)
+                        *dst[1]++ = read12in32(&buf, &rgbBuffer,
+                                               &n_datum, endian);
+                    if (elements == 4)
+                        *dst[3]++ = read12in32(&buf, &rgbBuffer,
+                                               &n_datum, endian);
+                }
+            }
+            n_datum = 0;
+            for (i = 0; i < elements; i++)
+                ptr[i] += p->linesize[i];
+            // Jump to next aligned position
+            buf += dpx->need_align;
+        }
+        break;
+    case 32:
+        if (elements == 1) {
+            av_image_copy_plane(ptr[0], p->linesize[0],
+                                buf, dpx->stride,
+                                elements * avctx->width * 4, avctx->height);
+        } else {
+            for (y = 0; y < avctx->height; y++) {
+                ptr[0] = p->data[0] + y * p->linesize[0];
+                ptr[1] = p->data[1] + y * p->linesize[1];
+                ptr[2] = p->data[2] + y * p->linesize[2];
+                ptr[3] = p->data[3] + y * p->linesize[3];
+                for (x = 0; x < avctx->width; x++) {
+                    AV_WN32(ptr[2], AV_RN32(buf));
+                    AV_WN32(ptr[0], AV_RN32(buf + 4));
+                    AV_WN32(ptr[1], AV_RN32(buf + 8));
+                    if (avctx->pix_fmt == AV_PIX_FMT_GBRAPF32BE ||
+                        avctx->pix_fmt == AV_PIX_FMT_GBRAPF32LE) {
+                        AV_WN32(ptr[3], AV_RN32(buf + 12));
+                        buf += 4;
+                        ptr[3] += 4;
+                    }
+
+                    buf += 12;
+                    ptr[2] += 4;
+                    ptr[0] += 4;
+                    ptr[1] += 4;
+                }
+            }
+        }
+        break;
+    case 16:
+        elements *= 2;
+    case 8:
+        if (   avctx->pix_fmt == AV_PIX_FMT_YUVA444P
+            || avctx->pix_fmt == AV_PIX_FMT_YUV444P) {
+            for (x = 0; x < avctx->height; x++) {
+                ptr[0] = p->data[0] + x * p->linesize[0];
+                ptr[1] = p->data[1] + x * p->linesize[1];
+                ptr[2] = p->data[2] + x * p->linesize[2];
+                ptr[3] = p->data[3] + x * p->linesize[3];
+                for (y = 0; y < avctx->width; y++) {
+                    *ptr[1]++ = *buf++;
+                    *ptr[0]++ = *buf++;
+                    *ptr[2]++ = *buf++;
+                    if (avctx->pix_fmt == AV_PIX_FMT_YUVA444P)
+                        *ptr[3]++ = *buf++;
+                }
+            }
+        } else {
+        av_image_copy_plane(ptr[0], p->linesize[0],
+                            buf, dpx->stride,
+                            elements * avctx->width, avctx->height);
+        }
+        break;
+    }
+}
+
 static int decode_frame(AVCodecContext *avctx, AVFrame *p,
                         int *got_frame, AVPacket *avpkt)
 {
@@ -125,21 +264,17 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
 
     const uint8_t *buf = avpkt->data;
     int buf_size       = avpkt->size;
-    uint8_t *ptr[AV_NUM_DATA_POINTERS];
     uint32_t header_version, version = 0;
     char creator[101] = { 0 };
     char input_device[33] = { 0 };
 
     unsigned int offset;
     int magic_num;
-    int x, y, i, j, ret;
+    int i, j, ret;
     int w, h, descriptor;
     int yuv, color_trc, color_spec;
     int encoding;
 
-    unsigned int rgbBuffer = 0;
-    int n_datum = 0;
-
     if (avpkt->size <= 1634) {
         av_log(avctx, AV_LOG_ERROR, "Packet too small for DPX header\n");
         return AVERROR_INVALIDDATA;
@@ -602,135 +737,7 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
     buf =  avpkt->data + offset;
     dpx->frame = p;
 
-    int elements = dpx->components;
-    int endian = dpx->endian;
-    for (i=0; i<AV_NUM_DATA_POINTERS; i++)
-        ptr[i] = p->data[i];
-
-    switch (avctx->bits_per_raw_sample) {
-    case 10:
-        for (x = 0; x < avctx->height; x++) {
-            uint16_t *dst[4] = {(uint16_t*)ptr[0],
-                                (uint16_t*)ptr[1],
-                                (uint16_t*)ptr[2],
-                                (uint16_t*)ptr[3]};
-            int shift = elements > 1 ? dpx->packing == 1 ? 22 : 20 : dpx->packing == 1 ? 2 : 0;
-            for (y = 0; y < avctx->width; y++) {
-                if (elements >= 3)
-                    *dst[2]++ = read10in32(&buf, &rgbBuffer,
-                                           &n_datum, endian, shift);
-                if (elements == 1)
-                    *dst[0]++ = read10in32_gray(&buf, &rgbBuffer,
-                                                &n_datum, endian, shift);
-                else
-                    *dst[0]++ = read10in32(&buf, &rgbBuffer,
-                                           &n_datum, endian, shift);
-                if (elements >= 2)
-                    *dst[1]++ = read10in32(&buf, &rgbBuffer,
-                                           &n_datum, endian, shift);
-                if (elements == 4)
-                    *dst[3]++ =
-                    read10in32(&buf, &rgbBuffer,
-                               &n_datum, endian, shift);
-            }
-            if (!dpx->unpadded_10bit)
-                n_datum = 0;
-            for (i = 0; i < elements; i++)
-                ptr[i] += p->linesize[i];
-        }
-        break;
-    case 12:
-        for (x = 0; x < avctx->height; x++) {
-            uint16_t *dst[4] = {(uint16_t*)ptr[0],
-                                (uint16_t*)ptr[1],
-                                (uint16_t*)ptr[2],
-                                (uint16_t*)ptr[3]};
-            int shift = dpx->packing == 1 ? 4 : 0;
-            for (y = 0; y < avctx->width; y++) {
-                if (dpx->packing) {
-                    if (elements >= 3)
-                        *dst[2]++ = read16(&buf, endian) >> shift & 0xFFF;
-                    *dst[0]++ = read16(&buf, endian) >> shift & 0xFFF;
-                    if (elements >= 2)
-                        *dst[1]++ = read16(&buf, endian) >> shift & 0xFFF;
-                    if (elements == 4)
-                        *dst[3]++ = read16(&buf, endian) >> shift & 0xFFF;
-                } else {
-                    if (elements >= 3)
-                        *dst[2]++ = read12in32(&buf, &rgbBuffer,
-                                               &n_datum, endian);
-                    *dst[0]++ = read12in32(&buf, &rgbBuffer,
-                                           &n_datum, endian);
-                    if (elements >= 2)
-                        *dst[1]++ = read12in32(&buf, &rgbBuffer,
-                                               &n_datum, endian);
-                    if (elements == 4)
-                        *dst[3]++ = read12in32(&buf, &rgbBuffer,
-                                               &n_datum, endian);
-                }
-            }
-            n_datum = 0;
-            for (i = 0; i < elements; i++)
-                ptr[i] += p->linesize[i];
-            // Jump to next aligned position
-            buf += dpx->need_align;
-        }
-        break;
-    case 32:
-        if (elements == 1) {
-            av_image_copy_plane(ptr[0], p->linesize[0],
-                                buf, dpx->stride,
-                                elements * avctx->width * 4, avctx->height);
-        } else {
-            for (y = 0; y < avctx->height; y++) {
-                ptr[0] = p->data[0] + y * p->linesize[0];
-                ptr[1] = p->data[1] + y * p->linesize[1];
-                ptr[2] = p->data[2] + y * p->linesize[2];
-                ptr[3] = p->data[3] + y * p->linesize[3];
-                for (x = 0; x < avctx->width; x++) {
-                    AV_WN32(ptr[2], AV_RN32(buf));
-                    AV_WN32(ptr[0], AV_RN32(buf + 4));
-                    AV_WN32(ptr[1], AV_RN32(buf + 8));
-                    if (avctx->pix_fmt == AV_PIX_FMT_GBRAPF32BE ||
-                        avctx->pix_fmt == AV_PIX_FMT_GBRAPF32LE) {
-                        AV_WN32(ptr[3], AV_RN32(buf + 12));
-                        buf += 4;
-                        ptr[3] += 4;
-                    }
-
-                    buf += 12;
-                    ptr[2] += 4;
-                    ptr[0] += 4;
-                    ptr[1] += 4;
-                }
-            }
-        }
-        break;
-    case 16:
-        elements *= 2;
-    case 8:
-        if (   avctx->pix_fmt == AV_PIX_FMT_YUVA444P
-            || avctx->pix_fmt == AV_PIX_FMT_YUV444P) {
-            for (x = 0; x < avctx->height; x++) {
-                ptr[0] = p->data[0] + x * p->linesize[0];
-                ptr[1] = p->data[1] + x * p->linesize[1];
-                ptr[2] = p->data[2] + x * p->linesize[2];
-                ptr[3] = p->data[3] + x * p->linesize[3];
-                for (y = 0; y < avctx->width; y++) {
-                    *ptr[1]++ = *buf++;
-                    *ptr[0]++ = *buf++;
-                    *ptr[2]++ = *buf++;
-                    if (avctx->pix_fmt == AV_PIX_FMT_YUVA444P)
-                        *ptr[3]++ = *buf++;
-                }
-            }
-        } else {
-        av_image_copy_plane(ptr[0], p->linesize[0],
-                            buf, dpx->stride,
-                            elements * avctx->width, avctx->height);
-        }
-        break;
-    }
+    unpack_frame(avctx, p, buf, dpx->components, dpx->endian);
 
     *got_frame = 1;
 
-- 
2.52.0


From e0b5c800ea599b9748b59a22442c325bee4afc36 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Wed, 29 Oct 2025 14:24:10 +0100
Subject: [PATCH 096/304] dpxdec: add hardware decoding hooks

---
 libavcodec/dpx.c | 142 ++++++++++++++++++++++++++++++++++++++---------
 libavcodec/dpx.h |   3 +
 2 files changed, 118 insertions(+), 27 deletions(-)

diff --git a/libavcodec/dpx.c b/libavcodec/dpx.c
index 96d4647389..47efcb7572 100644
--- a/libavcodec/dpx.c
+++ b/libavcodec/dpx.c
@@ -29,6 +29,11 @@
 #include "decode.h"
 #include "dpx.h"
 
+#include "thread.h"
+#include "hwconfig.h"
+#include "hwaccel_internal.h"
+#include "config_components.h"
+
 static unsigned int read16(const uint8_t **ptr, int is_big)
 {
     unsigned int temp;
@@ -257,11 +262,26 @@ static void unpack_frame(AVCodecContext *avctx, AVFrame *p, const uint8_t *buf,
     }
 }
 
+static enum AVPixelFormat get_pixel_format(AVCodecContext *avctx,
+                                           enum AVPixelFormat pix_fmt)
+{
+    enum AVPixelFormat pix_fmts[] = {
+#if CONFIG_DPX_VULKAN_HWACCEL
+        AV_PIX_FMT_VULKAN,
+#endif
+        pix_fmt,
+        AV_PIX_FMT_NONE,
+    };
+
+    return ff_get_format(avctx, pix_fmts);
+}
+
 static int decode_frame(AVCodecContext *avctx, AVFrame *p,
                         int *got_frame, AVPacket *avpkt)
 {
     DPXDecContext *dpx = avctx->priv_data;
 
+    enum AVPixelFormat pix_fmt;
     const uint8_t *buf = avpkt->data;
     int buf_size       = avpkt->size;
     uint32_t header_version, version = 0;
@@ -631,96 +651,96 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
     case 4080:
     case 6081:
     case 6080:
-        avctx->pix_fmt = AV_PIX_FMT_GRAY8;
+        pix_fmt = AV_PIX_FMT_GRAY8;
         break;
     case 6121:
     case 6120:
-        avctx->pix_fmt = AV_PIX_FMT_GRAY12;
+        pix_fmt = AV_PIX_FMT_GRAY12;
         break;
     case 1320:
     case 2320:
     case 3320:
     case 4320:
     case 6320:
-        avctx->pix_fmt = AV_PIX_FMT_GRAYF32LE;
+        pix_fmt = AV_PIX_FMT_GRAYF32LE;
         break;
     case 1321:
     case 2321:
     case 3321:
     case 4321:
     case 6321:
-        avctx->pix_fmt = AV_PIX_FMT_GRAYF32BE;
+        pix_fmt = AV_PIX_FMT_GRAYF32BE;
         break;
     case 50081:
     case 50080:
-        avctx->pix_fmt = AV_PIX_FMT_RGB24;
+        pix_fmt = AV_PIX_FMT_RGB24;
         break;
     case 52081:
     case 52080:
-        avctx->pix_fmt = AV_PIX_FMT_ABGR;
+        pix_fmt = AV_PIX_FMT_ABGR;
         break;
     case 51081:
     case 51080:
-        avctx->pix_fmt = AV_PIX_FMT_RGBA;
+        pix_fmt = AV_PIX_FMT_RGBA;
         break;
     case 50100:
     case 50101:
-        avctx->pix_fmt = AV_PIX_FMT_GBRP10;
+        pix_fmt = AV_PIX_FMT_GBRP10;
         break;
     case 51100:
     case 51101:
-        avctx->pix_fmt = AV_PIX_FMT_GBRAP10;
+        pix_fmt = AV_PIX_FMT_GBRAP10;
         break;
     case 50120:
     case 50121:
-        avctx->pix_fmt = AV_PIX_FMT_GBRP12;
+        pix_fmt = AV_PIX_FMT_GBRP12;
         break;
     case 51120:
     case 51121:
-        avctx->pix_fmt = AV_PIX_FMT_GBRAP12;
+        pix_fmt = AV_PIX_FMT_GBRAP12;
         break;
     case 6100:
     case 6101:
-        avctx->pix_fmt = AV_PIX_FMT_GRAY10;
+        pix_fmt = AV_PIX_FMT_GRAY10;
         break;
     case 6161:
-        avctx->pix_fmt = AV_PIX_FMT_GRAY16BE;
+        pix_fmt = AV_PIX_FMT_GRAY16BE;
         break;
     case 6160:
-        avctx->pix_fmt = AV_PIX_FMT_GRAY16LE;
+        pix_fmt = AV_PIX_FMT_GRAY16LE;
         break;
     case 50161:
-        avctx->pix_fmt = AV_PIX_FMT_RGB48BE;
+        pix_fmt = AV_PIX_FMT_RGB48BE;
         break;
     case 50160:
-        avctx->pix_fmt = AV_PIX_FMT_RGB48LE;
+        pix_fmt = AV_PIX_FMT_RGB48LE;
         break;
     case 51161:
-        avctx->pix_fmt = AV_PIX_FMT_RGBA64BE;
+        pix_fmt = AV_PIX_FMT_RGBA64BE;
         break;
     case 51160:
-        avctx->pix_fmt = AV_PIX_FMT_RGBA64LE;
+        pix_fmt = AV_PIX_FMT_RGBA64LE;
         break;
     case 50320:
-        avctx->pix_fmt = AV_PIX_FMT_GBRPF32LE;
+        pix_fmt = AV_PIX_FMT_GBRPF32LE;
         break;
     case 50321:
-        avctx->pix_fmt = AV_PIX_FMT_GBRPF32BE;
+        pix_fmt = AV_PIX_FMT_GBRPF32BE;
         break;
     case 51320:
-        avctx->pix_fmt = AV_PIX_FMT_GBRAPF32LE;
+        pix_fmt = AV_PIX_FMT_GBRAPF32LE;
         break;
     case 51321:
-        avctx->pix_fmt = AV_PIX_FMT_GBRAPF32BE;
+        pix_fmt = AV_PIX_FMT_GBRAPF32BE;
         break;
     case 100081:
-        avctx->pix_fmt = AV_PIX_FMT_UYVY422;
+        pix_fmt = AV_PIX_FMT_UYVY422;
         break;
     case 102081:
-        avctx->pix_fmt = AV_PIX_FMT_YUV444P;
+        pix_fmt = AV_PIX_FMT_YUV444P;
         break;
     case 103081:
-        avctx->pix_fmt = AV_PIX_FMT_YUVA444P;
+        pix_fmt = AV_PIX_FMT_YUVA444P;
         break;
     default:
         av_log(avctx, AV_LOG_ERROR, "Unsupported format %d\n",
@@ -728,6 +748,16 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
         return AVERROR_PATCHWELCOME;
     }
 
+    if (pix_fmt != dpx->pix_fmt) {
+        dpx->pix_fmt = pix_fmt;
+
+        ret = get_pixel_format(avctx, pix_fmt);
+        if (ret < 0)
+            return ret;
+
+        avctx->pix_fmt = ret;
+    }
+
     ff_set_sar(avctx, avctx->sample_aspect_ratio);
 
     if ((ret = ff_get_buffer(avctx, p, 0)) < 0)
@@ -737,13 +767,65 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
     buf =  avpkt->data + offset;
     dpx->frame = p;
 
-    unpack_frame(avctx, p, buf, dpx->components, dpx->endian);
+    /* Start */
+    if (avctx->hwaccel) {
+        const FFHWAccel *hwaccel = ffhwaccel(avctx->hwaccel);
+
+        ret = ff_hwaccel_frame_priv_alloc(avctx, &dpx->hwaccel_picture_private);
+        if (ret < 0)
+            return ret;
+
+        ret = hwaccel->start_frame(avctx, avpkt->buf, buf, avpkt->size - offset);
+        if (ret < 0)
+            return ret;
+
+        ret = hwaccel->decode_slice(avctx, buf, avpkt->size - offset);
+        if (ret < 0)
+            return ret;
+
+        ret = hwaccel->end_frame(avctx);
+        if (ret < 0)
+            return ret;
+
+        av_refstruct_unref(&dpx->hwaccel_picture_private);
+    } else {
+        unpack_frame(avctx, p, buf, dpx->components, dpx->endian);
+    }
+
+    p->pict_type = AV_PICTURE_TYPE_I;
+    p->flags    |= AV_FRAME_FLAG_KEY;
 
     *got_frame = 1;
 
     return buf_size;
 }
 
+#if HAVE_THREADS
+static int update_thread_context(AVCodecContext *dst, const AVCodecContext *src)
+{
+    DPXDecContext *ssrc = src->priv_data;
+    DPXDecContext *sdst = dst->priv_data;
+
+    sdst->pix_fmt = ssrc->pix_fmt;
+
+    return 0;
+}
+#endif
+
+static av_cold int decode_end(AVCodecContext *avctx)
+{
+    DPXDecContext *dpx = avctx->priv_data;
+    av_refstruct_unref(&dpx->hwaccel_picture_private);
+    return 0;
+}
+
+static av_cold int decode_init(AVCodecContext *avctx)
+{
+    DPXDecContext *dpx = avctx->priv_data;
+    dpx->pix_fmt = AV_PIX_FMT_NONE;
+    return 0;
+}
+
 const FFCodec ff_dpx_decoder = {
     .p.name         = "dpx",
     CODEC_LONG_NAME("DPX (Digital Picture Exchange) image"),
@@ -751,5 +833,11 @@ const FFCodec ff_dpx_decoder = {
     .p.type         = AVMEDIA_TYPE_VIDEO,
     .p.id           = AV_CODEC_ID_DPX,
     FF_CODEC_DECODE_CB(decode_frame),
-    .p.capabilities = AV_CODEC_CAP_DR1,
+    .init           = decode_init,
+    .close          = decode_end,
+    UPDATE_THREAD_CONTEXT(update_thread_context),
+    .p.capabilities = AV_CODEC_CAP_DR1 | AV_CODEC_CAP_FRAME_THREADS,
+    .hw_configs     = (const AVCodecHWConfigInternal *const []) {
+        NULL
+    },
 };
diff --git a/libavcodec/dpx.h b/libavcodec/dpx.h
index 35e8aa690f..c9d95af1f1 100644
--- a/libavcodec/dpx.h
+++ b/libavcodec/dpx.h
@@ -23,6 +23,7 @@
 #define AVCODEC_DPX_H
 
 #include "libavutil/frame.h"
+#include "libavutil/pixfmt.h"
 
 enum DPX_TRC {
     DPX_TRC_USER_DEFINED       = 0,
@@ -58,6 +59,8 @@ enum DPX_COL_SPEC {
 
 typedef struct DPXDecContext {
     AVFrame *frame;
+    void *hwaccel_picture_private;
+    enum AVPixelFormat pix_fmt;
 
     int packing;
     int stride;
-- 
2.52.0


From a9954303b1d3b4fa229dbc2baa2be41a1f233447 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Wed, 29 Oct 2025 15:27:47 +0100
Subject: [PATCH 097/304] dpxdec: add a Vulkan hwaccel

---
 configure                         |   2 +
 libavcodec/Makefile               |   1 +
 libavcodec/dpx.c                  |   5 +
 libavcodec/hwaccels.h             |   1 +
 libavcodec/vulkan/Makefile        |   4 +
 libavcodec/vulkan/dpx_copy.comp   |  51 ++++
 libavcodec/vulkan/dpx_unpack.comp |  83 ++++++
 libavcodec/vulkan_decode.c        |  16 +
 libavcodec/vulkan_dpx.c           | 475 ++++++++++++++++++++++++++++++
 9 files changed, 638 insertions(+)
 create mode 100644 libavcodec/vulkan/dpx_copy.comp
 create mode 100644 libavcodec/vulkan/dpx_unpack.comp
 create mode 100644 libavcodec/vulkan_dpx.c

diff --git a/configure b/configure
index 7ef50095a3..fd6f602e1d 100755
--- a/configure
+++ b/configure
@@ -3262,6 +3262,8 @@ av1_videotoolbox_hwaccel_deps="videotoolbox"
 av1_videotoolbox_hwaccel_select="av1_decoder"
 av1_vulkan_hwaccel_deps="vulkan"
 av1_vulkan_hwaccel_select="av1_decoder"
+dpx_vulkan_hwaccel_deps="vulkan spirv_compiler"
+dpx_vulkan_hwaccel_select="dpx_decoder"
 ffv1_vulkan_hwaccel_deps="vulkan spirv_compiler"
 ffv1_vulkan_hwaccel_select="ffv1_decoder"
 h263_vaapi_hwaccel_deps="vaapi"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index fba9f0aff0..45c8237181 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1050,6 +1050,7 @@ OBJS-$(CONFIG_AV1_VAAPI_HWACCEL)          += vaapi_av1.o
 OBJS-$(CONFIG_AV1_VDPAU_HWACCEL)          += vdpau_av1.o
 OBJS-$(CONFIG_AV1_VIDEOTOOLBOX_HWACCEL)   += videotoolbox_av1.o
 OBJS-$(CONFIG_AV1_VULKAN_HWACCEL)         += vulkan_decode.o vulkan_av1.o
+OBJS-$(CONFIG_DPX_VULKAN_HWACCEL)         += vulkan_decode.o vulkan_dpx.o
 OBJS-$(CONFIG_FFV1_VULKAN_HWACCEL)        += vulkan_decode.o ffv1_vulkan.o vulkan_ffv1.o
 OBJS-$(CONFIG_H263_VAAPI_HWACCEL)         += vaapi_mpeg4.o
 OBJS-$(CONFIG_H263_VIDEOTOOLBOX_HWACCEL)  += videotoolbox.o
diff --git a/libavcodec/dpx.c b/libavcodec/dpx.c
index 47efcb7572..7355b50f7a 100644
--- a/libavcodec/dpx.c
+++ b/libavcodec/dpx.c
@@ -837,7 +837,12 @@ const FFCodec ff_dpx_decoder = {
     .close          = decode_end,
     UPDATE_THREAD_CONTEXT(update_thread_context),
     .p.capabilities = AV_CODEC_CAP_DR1 | AV_CODEC_CAP_FRAME_THREADS,
+    .caps_internal  = FF_CODEC_CAP_INIT_CLEANUP |
+                      FF_CODEC_CAP_SKIP_FRAME_FILL_PARAM,
     .hw_configs     = (const AVCodecHWConfigInternal *const []) {
+#if CONFIG_DPX_VULKAN_HWACCEL
+        HWACCEL_VULKAN(dpx),
+#endif
         NULL
     },
 };
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index 638a7bfb1d..3de191288a 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -28,6 +28,7 @@ extern const struct FFHWAccel ff_av1_vaapi_hwaccel;
 extern const struct FFHWAccel ff_av1_vdpau_hwaccel;
 extern const struct FFHWAccel ff_av1_videotoolbox_hwaccel;
 extern const struct FFHWAccel ff_av1_vulkan_hwaccel;
+extern const struct FFHWAccel ff_dpx_vulkan_hwaccel;
 extern const struct FFHWAccel ff_ffv1_vulkan_hwaccel;
 extern const struct FFHWAccel ff_h263_vaapi_hwaccel;
 extern const struct FFHWAccel ff_h263_videotoolbox_hwaccel;
diff --git a/libavcodec/vulkan/Makefile b/libavcodec/vulkan/Makefile
index 16a4116ef1..26e8e147c2 100644
--- a/libavcodec/vulkan/Makefile
+++ b/libavcodec/vulkan/Makefile
@@ -23,6 +23,10 @@ OBJS-$(CONFIG_PRORES_VULKAN_HWACCEL) += vulkan/common.o \
                                         vulkan/prores_vld.o \
                                         vulkan/prores_idct.o
 
+OBJS-$(CONFIG_DPX_VULKAN_HWACCEL) += vulkan/common.o \
+                                     vulkan/dpx_unpack.o \
+                                     vulkan/dpx_copy.o
+
 VULKAN = $(subst $(SRC_PATH)/,,$(wildcard $(SRC_PATH)/libavcodec/vulkan/*.comp))
 .SECONDARY: $(VULKAN:.comp=.c)
 libavcodec/vulkan/%.c: TAG = VULKAN
diff --git a/libavcodec/vulkan/dpx_copy.comp b/libavcodec/vulkan/dpx_copy.comp
new file mode 100644
index 0000000000..da0a11db93
--- /dev/null
+++ b/libavcodec/vulkan/dpx_copy.comp
@@ -0,0 +1,51 @@
+/*
+ * Copyright (c) 2025 Lynne <dev@lynne.ee>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+TYPE read_data(uint off)
+{
+#ifdef BIG_ENDIAN
+    return TYPE_REVERSE(data[off]);
+#else
+    return data[off];
+#endif
+}
+
+void main(void)
+{
+    ivec2 pos = ivec2(gl_GlobalInvocationID.xy);
+    if (!IS_WITHIN(pos, imageSize(dst[0])))
+        return;
+
+    uint offs = (pos.y*imageSize(dst[0]).x + pos.x)*COMPONENTS;
+#if NB_IMAGES == 1
+    TYPE_VEC val;
+    for (int i = 0; i < COMPONENTS; i++)
+        val[i] = read_data(offs + i);
+    val >>= SHIFT;
+    imageStore(dst[0], pos, val);
+#else
+    const ivec4 fmt_lut = ivec4(2, 0, 1, 3);
+    for (int i = 0; i < COMPONENTS; i++) {
+        TYPE val = read_data(offs + i);
+        val >>= SHIFT;
+        imageStore(dst[fmt_lut[i]], pos, TYPE_VEC(val));
+    }
+#endif
+}
diff --git a/libavcodec/vulkan/dpx_unpack.comp b/libavcodec/vulkan/dpx_unpack.comp
new file mode 100644
index 0000000000..5a44de87bf
--- /dev/null
+++ b/libavcodec/vulkan/dpx_unpack.comp
@@ -0,0 +1,83 @@
+/*
+ * Copyright (c) 2025 Lynne <dev@lynne.ee>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+uint32_t read_data(uint off)
+{
+#ifdef BIG_ENDIAN
+    return reverse4(data[off]);
+#else
+    return data[off];
+#endif
+}
+
+#ifdef PACKED_10BIT
+i16vec4 parse_packed_in_32(ivec2 pos, int stride)
+{
+    uint32_t d = read_data(pos.y*stride + pos.x);
+    i16vec4 v;
+    d = d << 10 | d >> 22 & 0x3FFFFF;
+    v[0] = int16_t(d & 0x3FF);
+    d = d << 10 | d >> 22 & 0x3FFFFF;
+    v[1] = int16_t(d & 0x3FF);
+    d = d << 10 | d >> 22 & 0x3FFFFF;
+    v[2] = int16_t(d & 0x3FF);
+    v[3] = int16_t(0);
+    return v;
+}
+#else
+i16vec4 parse_packed_in_32(ivec2 pos, int stride)
+{
+    uint line_off = pos.y*(stride*BITS_PER_COMP*COMPONENTS +
+                           (need_align << 3));
+    uint pix_off = pos.x*BITS_PER_COMP*COMPONENTS;
+
+    uint off = (line_off + pix_off >> 5);
+    uint bit = pix_off & 0x1f;
+
+    uint32_t d0 = read_data(off + 0);
+    uint32_t d1 = read_data(off + 1);
+
+    uint64_t combined = (uint64_t(d1) << 32) | d0;
+    combined >>= bit;
+
+    return i16vec4(combined,
+                   combined >> (BITS_PER_COMP*1),
+                   combined >> (BITS_PER_COMP*2),
+                   combined >> (BITS_PER_COMP*3)) &
+           int16_t((1 << BITS_PER_COMP) - 1);
+}
+#endif
+
+void main(void)
+{
+    ivec2 pos = ivec2(gl_GlobalInvocationID.xy);
+    if (!IS_WITHIN(pos, imageSize(dst[0])))
+        return;
+
+    i16vec4 p = parse_packed_in_32(pos, imageSize(dst[0]).x);
+
+#if NB_IMAGES == 1
+    imageStore(dst[0], pos, p);
+#else
+    const ivec4 fmt_lut = COMPONENTS == 1 ? ivec4(0) : ivec4(2, 0, 1, 3);
+    for (uint i = 0; i < COMPONENTS; i++)
+        imageStore(dst[fmt_lut[i]], pos, i16vec4(p[i]));
+#endif
+}
diff --git a/libavcodec/vulkan_decode.c b/libavcodec/vulkan_decode.c
index d22ccc21aa..d6f6ec8c3b 100644
--- a/libavcodec/vulkan_decode.c
+++ b/libavcodec/vulkan_decode.c
@@ -26,6 +26,7 @@
 
 #define DECODER_IS_SDR(codec_id) \
     (((codec_id) == AV_CODEC_ID_FFV1) || \
+     ((codec_id) == AV_CODEC_ID_DPX) || \
      ((codec_id) == AV_CODEC_ID_PRORES_RAW) || \
      ((codec_id) == AV_CODEC_ID_PRORES))
 
@@ -50,6 +51,9 @@ extern const FFVulkanDecodeDescriptor ff_vk_dec_prores_raw_desc;
 #if CONFIG_PRORES_VULKAN_HWACCEL
 extern const FFVulkanDecodeDescriptor ff_vk_dec_prores_desc;
 #endif
+#if CONFIG_DPX_VULKAN_HWACCEL
+extern const FFVulkanDecodeDescriptor ff_vk_dec_dpx_desc;
+#endif
 
 static const FFVulkanDecodeDescriptor *dec_descs[] = {
 #if CONFIG_H264_VULKAN_HWACCEL
@@ -73,6 +77,9 @@ static const FFVulkanDecodeDescriptor *dec_descs[] = {
 #if CONFIG_PRORES_VULKAN_HWACCEL
     &ff_vk_dec_prores_desc,
 #endif
+#if CONFIG_DPX_VULKAN_HWACCEL
+    &ff_vk_dec_dpx_desc,
+#endif
 };
 
 typedef struct FFVulkanDecodeProfileData {
@@ -1117,10 +1124,19 @@ int ff_vk_frame_params(AVCodecContext *avctx, AVBufferRef *hw_frames_ctx)
             /* This should be more efficient for downloading and using */
             frames_ctx->sw_format = AV_PIX_FMT_RGBA64;
             break;
+        case AV_PIX_FMT_RGB48LE:
+        case AV_PIX_FMT_RGB48BE: /* DPX outputs RGB48BE, so we need both */
+            /* Almost nothing supports native 3-component RGB */
+            frames_ctx->sw_format = AV_PIX_FMT_GBRP16;
+            break;
+        case AV_PIX_FMT_RGBA64BE: /* DPX again, fix for little-endian systems */
+            frames_ctx->sw_format = AV_PIX_FMT_RGBA64;
+            break;
         case AV_PIX_FMT_GBRP10:
             /* This saves memory bandwidth when downloading */
             frames_ctx->sw_format = AV_PIX_FMT_X2BGR10;
             break;
+        case AV_PIX_FMT_RGB24:
         case AV_PIX_FMT_BGR0:
             /* mpv has issues with bgr0 mapping, so just remap it */
             frames_ctx->sw_format = AV_PIX_FMT_RGB0;
diff --git a/libavcodec/vulkan_dpx.c b/libavcodec/vulkan_dpx.c
new file mode 100644
index 0000000000..1af417fdf7
--- /dev/null
+++ b/libavcodec/vulkan_dpx.c
@@ -0,0 +1,475 @@
+/*
+ * Copyright (c) 2025 Lynne <dev@lynne.ee>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "vulkan_decode.h"
+#include "hwaccel_internal.h"
+
+#include "dpx.h"
+#include "libavutil/vulkan_spirv.h"
+#include "libavutil/mem.h"
+
+extern const char *ff_source_common_comp;
+extern const char *ff_source_dpx_unpack_comp;
+extern const char *ff_source_dpx_copy_comp;
+
+const FFVulkanDecodeDescriptor ff_vk_dec_dpx_desc = {
+    .codec_id         = AV_CODEC_ID_DPX,
+    .decode_extension = FF_VK_EXT_PUSH_DESCRIPTOR,
+    .queue_flags      = VK_QUEUE_COMPUTE_BIT,
+};
+
+typedef struct DPXVulkanDecodePicture {
+    FFVulkanDecodePicture vp;
+} DPXVulkanDecodePicture;
+
+typedef struct DPXVulkanDecodeContext {
+    FFVulkanShader shader;
+    AVBufferPool *frame_data_pool;
+} DPXVulkanDecodeContext;
+
+typedef struct DecodePushData {
+    int stride;
+    int need_align;
+    int padded_10bit;
+} DecodePushData;
+
+static int host_upoad_image(AVCodecContext *avctx,
+                            FFVulkanDecodeContext *dec, DPXDecContext *dpx,
+                            const uint8_t *src, uint32_t size)
+{
+    int err;
+    VkImage temp;
+
+    FFVulkanDecodeShared *ctx = dec->shared_ctx;
+    DPXVulkanDecodeContext *dxv = ctx->sd_ctx;
+    VkPhysicalDeviceLimits *limits = &ctx->s.props.properties.limits;
+    FFVulkanFunctions *vk = &ctx->s.vkfn;
+
+    DPXVulkanDecodePicture *pp = dpx->hwaccel_picture_private;
+    FFVulkanDecodePicture *vp = &pp->vp;
+
+    int unpack = (avctx->bits_per_raw_sample == 12 && !dpx->packing) ||
+                 avctx->bits_per_raw_sample == 10;
+    if (unpack)
+        return 0;
+
+    VkImageCreateInfo create_info = {
+        .sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO,
+        .imageType = VK_IMAGE_TYPE_2D,
+        .format = avctx->bits_per_raw_sample == 8 ? VK_FORMAT_R8_UINT :
+                  avctx->bits_per_raw_sample == 32 ? VK_FORMAT_R32_UINT :
+                                                     VK_FORMAT_R16_UINT,
+        .extent.width = dpx->frame->width*dpx->components,
+        .extent.height = dpx->frame->height,
+        .extent.depth = 1,
+        .mipLevels = 1,
+        .arrayLayers = 1,
+        .tiling = VK_IMAGE_TILING_LINEAR,
+        .initialLayout = VK_IMAGE_LAYOUT_UNDEFINED,
+        .usage = VK_IMAGE_USAGE_STORAGE_BIT | VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT,
+        .samples = VK_SAMPLE_COUNT_1_BIT,
+        .pQueueFamilyIndices = &ctx->qf[0].idx,
+        .queueFamilyIndexCount = 1,
+        .sharingMode = VK_SHARING_MODE_EXCLUSIVE,
+    };
+
+    if (create_info.extent.width >= limits->maxImageDimension2D ||
+        create_info.extent.height >= limits->maxImageDimension2D)
+        return 0;
+
+    vk->CreateImage(ctx->s.hwctx->act_dev, &create_info, ctx->s.hwctx->alloc,
+                    &temp);
+
+    err = ff_vk_get_pooled_buffer(&ctx->s, &dxv->frame_data_pool,
+                                  &vp->slices_buf,
+                                  VK_BUFFER_USAGE_STORAGE_BUFFER_BIT |
+                                      VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT,
+                                  NULL, size,
+                                  VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT);
+    if (err < 0)
+        return err;
+
+    FFVkBuffer *vkb = (FFVkBuffer *)vp->slices_buf->data;
+    VkBindImageMemoryInfo bind_info = {
+        .sType = VK_STRUCTURE_TYPE_BIND_IMAGE_MEMORY_INFO,
+        .image = temp,
+        .memory = vkb->mem,
+    };
+    vk->BindImageMemory2(ctx->s.hwctx->act_dev, 1, &bind_info);
+
+    VkHostImageLayoutTransitionInfo layout_change = {
+        .sType = VK_STRUCTURE_TYPE_HOST_IMAGE_LAYOUT_TRANSITION_INFO,
+        .image = temp,
+        .oldLayout = VK_IMAGE_LAYOUT_UNDEFINED,
+        .newLayout = VK_IMAGE_LAYOUT_GENERAL,
+        .subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
+        .subresourceRange.layerCount = 1,
+        .subresourceRange.levelCount = 1,
+    };
+    vk->TransitionImageLayoutEXT(ctx->s.hwctx->act_dev, 1, &layout_change);
+
+    VkMemoryToImageCopy copy_region = {
+        .sType = VK_STRUCTURE_TYPE_MEMORY_TO_IMAGE_COPY,
+        .pHostPointer = src,
+        .imageSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
+        .imageSubresource.layerCount = 1,
+        .imageExtent = (VkExtent3D){ dpx->frame->width*dpx->components,
+                                     dpx->frame->height,
+                                     1 },
+    };
+    VkCopyMemoryToImageInfo copy_info = {
+        .sType = VK_STRUCTURE_TYPE_COPY_MEMORY_TO_IMAGE_INFO,
+        .flags = VK_HOST_IMAGE_COPY_MEMCPY_BIT_EXT,
+        .dstImage = temp,
+        .dstImageLayout = VK_IMAGE_LAYOUT_GENERAL,
+        .regionCount = 1,
+        .pRegions = &copy_region,
+    };
+    vk->CopyMemoryToImageEXT(ctx->s.hwctx->act_dev, &copy_info);
+
+    vk->DestroyImage(ctx->s.hwctx->act_dev, temp, ctx->s.hwctx->alloc);
+
+    return 0;
+}
+
+static int vk_dpx_start_frame(AVCodecContext          *avctx,
+                              const AVBufferRef       *buffer_ref,
+                              av_unused const uint8_t *buffer,
+                              av_unused uint32_t       size)
+{
+    int err;
+    FFVulkanDecodeContext *dec = avctx->internal->hwaccel_priv_data;
+    FFVulkanDecodeShared *ctx = dec->shared_ctx;
+    DPXDecContext *dpx = avctx->priv_data;
+
+    DPXVulkanDecodePicture *pp = dpx->hwaccel_picture_private;
+    FFVulkanDecodePicture *vp = &pp->vp;
+
+    if (ctx->s.extensions & FF_VK_EXT_HOST_IMAGE_COPY)
+        host_upoad_image(avctx, dec, dpx, buffer, size);
+
+    /* Host map the frame data if supported */
+    if (!vp->slices_buf &&
+        ctx->s.extensions & FF_VK_EXT_EXTERNAL_HOST_MEMORY)
+        ff_vk_host_map_buffer(&ctx->s, &vp->slices_buf, (uint8_t *)buffer,
+                              buffer_ref,
+                              VK_BUFFER_USAGE_STORAGE_BUFFER_BIT |
+                              VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT);
+
+    /* Prepare frame to be used */
+    err = ff_vk_decode_prepare_frame_sdr(dec, dpx->frame, vp, 1,
+                                         FF_VK_REP_NATIVE, 0);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static int vk_dpx_decode_slice(AVCodecContext *avctx,
+                               const uint8_t  *data,
+                               uint32_t        size)
+{
+    DPXDecContext *dpx = avctx->priv_data;
+
+    DPXVulkanDecodePicture *pp = dpx->hwaccel_picture_private;
+    FFVulkanDecodePicture *vp = &pp->vp;
+
+    if (!vp->slices_buf) {
+        int err = ff_vk_decode_add_slice(avctx, vp, data, size, 0,
+                                         NULL, NULL);
+        if (err < 0)
+            return err;
+    }
+
+    return 0;
+}
+
+static int vk_dpx_end_frame(AVCodecContext *avctx)
+{
+    int err;
+    FFVulkanDecodeContext *dec = avctx->internal->hwaccel_priv_data;
+    FFVulkanDecodeShared *ctx = dec->shared_ctx;
+    FFVulkanFunctions *vk = &ctx->s.vkfn;
+
+    DPXDecContext *dpx = avctx->priv_data;
+    DPXVulkanDecodeContext *dxv = ctx->sd_ctx;
+
+    DPXVulkanDecodePicture *pp = dpx->hwaccel_picture_private;
+    FFVulkanDecodePicture *vp = &pp->vp;
+
+    FFVkBuffer *slices_buf = (FFVkBuffer *)vp->slices_buf->data;
+
+    VkImageMemoryBarrier2 img_bar[8];
+    int nb_img_bar = 0;
+
+    FFVkExecContext *exec = ff_vk_exec_get(&ctx->s, &ctx->exec_pool);
+    ff_vk_exec_start(&ctx->s, exec);
+
+    /* Prepare deps */
+    RET(ff_vk_exec_add_dep_frame(&ctx->s, exec, dpx->frame,
+                                 VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
+                                 VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT));
+
+    err = ff_vk_exec_mirror_sem_value(&ctx->s, exec, &vp->sem, &vp->sem_value,
+                                      dpx->frame);
+    if (err < 0)
+        return err;
+
+    RET(ff_vk_exec_add_dep_buf(&ctx->s, exec, &vp->slices_buf, 1, 0));
+    vp->slices_buf = NULL;
+
+    AVVkFrame *vkf = (AVVkFrame *)dpx->frame->data[0];
+    for (int i = 0; i < 4; i++) {
+        vkf->layout[i] = VK_IMAGE_LAYOUT_UNDEFINED;
+        vkf->access[i] = VK_ACCESS_2_NONE;
+    }
+
+    ff_vk_frame_barrier(&ctx->s, exec, dpx->frame, img_bar, &nb_img_bar,
+                        VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
+                        VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT,
+                        VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT,
+                        VK_IMAGE_LAYOUT_GENERAL,
+                        VK_QUEUE_FAMILY_IGNORED);
+
+    vk->CmdPipelineBarrier2(exec->buf, &(VkDependencyInfo) {
+        .sType = VK_STRUCTURE_TYPE_DEPENDENCY_INFO,
+        .pImageMemoryBarriers = img_bar,
+        .imageMemoryBarrierCount = nb_img_bar,
+    });
+    nb_img_bar = 0;
+
+    FFVulkanShader *shd = &dxv->shader;
+    ff_vk_shader_update_img_array(&ctx->s, exec, shd,
+                                  dpx->frame, vp->view.out,
+                                  0, 0,
+                                  VK_IMAGE_LAYOUT_GENERAL,
+                                  VK_NULL_HANDLE);
+    ff_vk_shader_update_desc_buffer(&ctx->s, exec, shd,
+                                    0, 1, 0,
+                                    slices_buf,
+                                    0, slices_buf->size,
+                                    VK_FORMAT_UNDEFINED);
+
+    ff_vk_exec_bind_shader(&ctx->s, exec, shd);
+
+    /* Update push data */
+    DecodePushData pd = (DecodePushData) {
+        .stride = dpx->stride,
+        .need_align = dpx->need_align,
+        .padded_10bit = !dpx->unpadded_10bit,
+    };
+
+    ff_vk_shader_update_push_const(&ctx->s, exec, shd,
+                                   VK_SHADER_STAGE_COMPUTE_BIT,
+                                   0, sizeof(pd), &pd);
+
+    vk->CmdDispatch(exec->buf,
+                    FFALIGN(dpx->frame->width,  shd->lg_size[0])/shd->lg_size[0],
+                    FFALIGN(dpx->frame->height, shd->lg_size[1])/shd->lg_size[1],
+                    1);
+
+    err = ff_vk_exec_submit(&ctx->s, exec);
+    if (err < 0)
+        return err;
+
+fail:
+    return 0;
+}
+
+static int init_shader(AVCodecContext *avctx, FFVulkanContext *s,
+                       FFVkExecPool *pool, FFVkSPIRVCompiler *spv,
+                       FFVulkanShader *shd, int bits)
+{
+    int err;
+    DPXDecContext *dpx = avctx->priv_data;
+    FFVulkanDescriptorSetBinding *desc_set;
+    AVHWFramesContext *dec_frames_ctx;
+    dec_frames_ctx = (AVHWFramesContext *)avctx->hw_frames_ctx->data;
+    int planes = av_pix_fmt_count_planes(dec_frames_ctx->sw_format);
+
+    uint8_t *spv_data;
+    size_t spv_len;
+    void *spv_opaque = NULL;
+
+    RET(ff_vk_shader_init(s, shd, "dpx",
+                          VK_SHADER_STAGE_COMPUTE_BIT,
+                          (const char *[]) { "GL_EXT_buffer_reference",
+                                             "GL_EXT_buffer_reference2" }, 2,
+                          512, 1, 1,
+                          0));
+
+    /* Common codec header */
+    GLSLD(ff_source_common_comp);
+
+    GLSLC(0, layout(push_constant, scalar) uniform pushConstants {            );
+    GLSLC(1,     int stride;                                                  );
+    GLSLC(1,     int need_align;                                              );
+    GLSLC(1,     int padded_10bit;                                            );
+    GLSLC(0, };                                                               );
+    GLSLC(0,                                                                  );
+    ff_vk_shader_add_push_const(shd, 0, sizeof(DecodePushData),
+                                VK_SHADER_STAGE_COMPUTE_BIT);
+
+    int unpack = (avctx->bits_per_raw_sample == 12 && !dpx->packing) ||
+                 avctx->bits_per_raw_sample == 10;
+
+    desc_set = (FFVulkanDescriptorSetBinding []) {
+        {
+            .name       = "dst",
+            .type       = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
+            .dimensions = 2,
+            .mem_quali  = "writeonly",
+            .mem_layout = ff_vk_shader_rep_fmt(dec_frames_ctx->sw_format,
+                                               FF_VK_REP_NATIVE),
+            .elems      = planes,
+            .stages     = VK_SHADER_STAGE_COMPUTE_BIT,
+        },
+        {
+            .name        = "data_buf",
+            .type        = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
+            .stages      = VK_SHADER_STAGE_COMPUTE_BIT,
+            .mem_quali   = "readonly",
+            .buf_content = (unpack || bits == 32) ? "uint32_t data[];" :
+                           bits == 8 ? "uint8_t data[];" : "uint16_t data[];",
+        },
+    };
+    RET(ff_vk_shader_add_descriptor_set(s, shd, desc_set, 2, 0, 0));
+
+    if (dpx->endian)
+        GLSLC(0, #define BIG_ENDIAN                                           );
+    GLSLF(0, #define COMPONENTS (%i)                          ,dpx->components);
+    GLSLF(0, #define BITS_PER_COMP (%i)                                  ,bits);
+    GLSLF(0, #define NB_IMAGES (%i)                                    ,planes);
+    if (unpack) {
+        if (bits == 10)
+            GLSLC(0, #define PACKED_10BIT                                     );
+        GLSLD(ff_source_dpx_unpack_comp);
+    } else {
+        GLSLF(0, #define SHIFT (%i)                   ,FFALIGN(bits, 8) - bits);
+        GLSLF(0, #define TYPE uint%i_t                       ,FFALIGN(bits, 8));
+        GLSLF(0, #define TYPE_VEC u%ivec4                    ,FFALIGN(bits, 8));
+        GLSLF(0, #define TYPE_REVERSE(x) (reverse%i(x)),    FFALIGN(bits, 8)/8);
+        GLSLD(ff_source_dpx_copy_comp);
+    }
+
+    RET(spv->compile_shader(s, spv, shd, &spv_data, &spv_len, "main",
+                            &spv_opaque));
+    RET(ff_vk_shader_link(s, shd, spv_data, spv_len, "main"));
+
+    RET(ff_vk_shader_register_exec(s, pool, shd));
+
+fail:
+    if (spv_opaque)
+        spv->free_shader(spv, &spv_opaque);
+
+    return err;
+}
+
+static void vk_decode_dpx_uninit(FFVulkanDecodeShared *ctx)
+{
+    DPXVulkanDecodeContext *fv = ctx->sd_ctx;
+
+    ff_vk_shader_free(&ctx->s, &fv->shader);
+
+    av_buffer_pool_uninit(&fv->frame_data_pool);
+
+    av_freep(&fv);
+}
+
+static int vk_decode_dpx_init(AVCodecContext *avctx)
+{
+    int err;
+    DPXDecContext *dpx = avctx->priv_data;
+    FFVulkanDecodeContext *dec = avctx->internal->hwaccel_priv_data;
+
+    switch (dpx->pix_fmt) {
+    case AV_PIX_FMT_GRAY10:
+    case AV_PIX_FMT_GRAY12:
+    case AV_PIX_FMT_GBRAP10:
+    case AV_PIX_FMT_GBRAP12:
+    case AV_PIX_FMT_UYVY422:
+    case AV_PIX_FMT_YUV444P:
+    case AV_PIX_FMT_YUVA444P:
+        return AVERROR(ENOTSUP);
+    case AV_PIX_FMT_GBRP10:
+        if (dpx->unpadded_10bit)
+            return AVERROR(ENOTSUP);
+    /* fallthrough */
+    default:
+        break;
+    }
+
+    FFVkSPIRVCompiler *spv = ff_vk_spirv_init();
+    if (!spv) {
+        av_log(avctx, AV_LOG_ERROR, "Unable to initialize SPIR-V compiler!\n");
+        return AVERROR_EXTERNAL;
+    }
+
+    err = ff_vk_decode_init(avctx);
+    if (err < 0)
+        return err;
+
+    FFVulkanDecodeShared *ctx = dec->shared_ctx;
+    DPXVulkanDecodeContext *dxv = ctx->sd_ctx = av_mallocz(sizeof(*dxv));
+    if (!dxv) {
+        err = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    ctx->sd_ctx_free = &vk_decode_dpx_uninit;
+
+    RET(init_shader(avctx, &ctx->s, &ctx->exec_pool,
+                    spv, &dxv->shader, avctx->bits_per_raw_sample));
+
+fail:
+    spv->uninit(&spv);
+
+    return err;
+}
+
+static void vk_dpx_free_frame_priv(AVRefStructOpaque _hwctx, void *data)
+{
+    AVHWDeviceContext *dev_ctx = _hwctx.nc;
+
+    DPXVulkanDecodePicture *pp = data;
+    FFVulkanDecodePicture *vp = &pp->vp;
+
+    ff_vk_decode_free_frame(dev_ctx, vp);
+}
+
+const FFHWAccel ff_dpx_vulkan_hwaccel = {
+    .p.name                = "dpx_vulkan",
+    .p.type                = AVMEDIA_TYPE_VIDEO,
+    .p.id                  = AV_CODEC_ID_DPX,
+    .p.pix_fmt             = AV_PIX_FMT_VULKAN,
+    .start_frame           = &vk_dpx_start_frame,
+    .decode_slice          = &vk_dpx_decode_slice,
+    .end_frame             = &vk_dpx_end_frame,
+    .free_frame_priv       = &vk_dpx_free_frame_priv,
+    .frame_priv_data_size  = sizeof(DPXVulkanDecodePicture),
+    .init                  = &vk_decode_dpx_init,
+    .update_thread_context = &ff_vk_update_thread_context,
+    .decode_params         = &ff_vk_params_invalidate,
+    .flush                 = &ff_vk_decode_flush,
+    .uninit                = &ff_vk_decode_uninit,
+    .frame_params          = &ff_vk_frame_params,
+    .priv_data_size        = sizeof(FFVulkanDecodeContext),
+    .caps_internal         = HWACCEL_CAP_ASYNC_SAFE | HWACCEL_CAP_THREAD_SAFE,
+};
-- 
2.52.0


From 80fbc06d8762db57190353cfc0b01ef13eed64d4 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Tue, 25 Nov 2025 17:04:06 +0100
Subject: [PATCH 098/304] Changelog: bump lavc minor and add entry for the DPX
 Vulkan hwaccel

---
 Changelog            | 1 +
 libavcodec/version.h | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/Changelog b/Changelog
index 3757eac2e4..e4f7e123d4 100644
--- a/Changelog
+++ b/Changelog
@@ -12,6 +12,7 @@ version <next>:
 - ffmpeg CLI tiled HEIF support
 - D3D12 AV1 encoder
 - ProRes Vulkan hwaccel
+- DPX Vulkan hwaccel
 
 
 version 8.0:
diff --git a/libavcodec/version.h b/libavcodec/version.h
index 4b7ec515fe..d0980e28de 100644
--- a/libavcodec/version.h
+++ b/libavcodec/version.h
@@ -29,7 +29,7 @@
 
 #include "version_major.h"
 
-#define LIBAVCODEC_VERSION_MINOR  19
+#define LIBAVCODEC_VERSION_MINOR  20
 #define LIBAVCODEC_VERSION_MICRO 100
 
 #define LIBAVCODEC_VERSION_INT  AV_VERSION_INT(LIBAVCODEC_VERSION_MAJOR, \
-- 
2.52.0


From 3cdda6dc13e6922c5523cc9e5f260321af021222 Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Tue, 25 Nov 2025 10:26:27 -0300
Subject: [PATCH 099/304] avformat/iamf_parse: ensure each layout in an
 scalable channel representation has an increasing number of channels

Fixes issue #21013

Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavformat/iamf_parse.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libavformat/iamf_parse.c b/libavformat/iamf_parse.c
index 50acd7cf5a..737e3f7404 100644
--- a/libavformat/iamf_parse.c
+++ b/libavformat/iamf_parse.c
@@ -406,6 +406,9 @@ static int scalable_channel_layout_config(void *s, AVIOContext *pb,
                                                           .nb_channels = substream_count +
                                                                          coupled_substream_count };
 
+        if (i && ch_layout.nb_channels <= audio_element->element->layers[i-1]->ch_layout.nb_channels)
+            return AVERROR_INVALIDDATA;
+
         for (int j = 0; j < substream_count; j++) {
             IAMFSubStream *substream = &audio_element->substreams[k++];
 
-- 
2.52.0


From 06a02bc2d477420425c89509445fa3fe3c73023a Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Tue, 25 Nov 2025 12:42:30 -0300
Subject: [PATCH 100/304] avformat/iamf_parse: ensure the stream count in a
 scalable channel representation is equal to the audio element's stream count

Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavformat/iamf_parse.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libavformat/iamf_parse.c b/libavformat/iamf_parse.c
index 737e3f7404..597d800be0 100644
--- a/libavformat/iamf_parse.c
+++ b/libavformat/iamf_parse.c
@@ -480,6 +480,9 @@ static int scalable_channel_layout_config(void *s, AVIOContext *pb,
             av_channel_layout_copy(&layer->ch_layout, &ch_layout);
     }
 
+    if (k != audio_element->nb_substreams)
+        return AVERROR_INVALIDDATA;
+
     return 0;
 }
 
-- 
2.52.0


From fe3b7116b2fa923e1303576acbf2b76d15ca2b64 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Wed, 26 Nov 2025 18:32:25 +0100
Subject: [PATCH 101/304] vulkan_dpx: fix "upoad" typo

---
 libavcodec/vulkan_dpx.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libavcodec/vulkan_dpx.c b/libavcodec/vulkan_dpx.c
index 1af417fdf7..465ed17194 100644
--- a/libavcodec/vulkan_dpx.c
+++ b/libavcodec/vulkan_dpx.c
@@ -50,9 +50,9 @@ typedef struct DecodePushData {
     int padded_10bit;
 } DecodePushData;
 
-static int host_upoad_image(AVCodecContext *avctx,
-                            FFVulkanDecodeContext *dec, DPXDecContext *dpx,
-                            const uint8_t *src, uint32_t size)
+static int host_upload_image(AVCodecContext *avctx,
+                             FFVulkanDecodeContext *dec, DPXDecContext *dpx,
+                             const uint8_t *src, uint32_t size)
 {
     int err;
     VkImage temp;
@@ -163,7 +163,7 @@ static int vk_dpx_start_frame(AVCodecContext          *avctx,
     FFVulkanDecodePicture *vp = &pp->vp;
 
     if (ctx->s.extensions & FF_VK_EXT_HOST_IMAGE_COPY)
-        host_upoad_image(avctx, dec, dpx, buffer, size);
+        host_upload_image(avctx, dec, dpx, buffer, size);
 
     /* Host map the frame data if supported */
     if (!vp->slices_buf &&
-- 
2.52.0


From dc6179845c6ec5f2a0b4c4fef351f9e647c24168 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Wed, 26 Nov 2025 18:33:10 +0100
Subject: [PATCH 102/304] vulkan_dpx: use host visible allocation for host
 image copy buffer

Fixes black screen on Nvidia.
---
 libavcodec/vulkan_dpx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libavcodec/vulkan_dpx.c b/libavcodec/vulkan_dpx.c
index 465ed17194..8978b78db4 100644
--- a/libavcodec/vulkan_dpx.c
+++ b/libavcodec/vulkan_dpx.c
@@ -102,7 +102,8 @@ static int host_upload_image(AVCodecContext *avctx,
                                   VK_BUFFER_USAGE_STORAGE_BUFFER_BIT |
                                       VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT,
                                   NULL, size,
-                                  VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT);
+                                  VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
+                                  VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT);
     if (err < 0)
         return err;
 
-- 
2.52.0


From d263057d104ac660f8d16e4f1a65c5bf5d45fbdb Mon Sep 17 00:00:00 2001
From: Timo Rothenpieler <timo@rothenpieler.org>
Date: Tue, 25 Nov 2025 14:14:47 +0100
Subject: [PATCH 103/304] forgejo/workflows: make test shared/static mode more
 human readable

---
 .forgejo/workflows/test.yml | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/.forgejo/workflows/test.yml b/.forgejo/workflows/test.yml
index 1b2aaaa3b4..799d2c356a 100644
--- a/.forgejo/workflows/test.yml
+++ b/.forgejo/workflows/test.yml
@@ -10,14 +10,14 @@ jobs:
       fail-fast: false
       matrix:
         runner: [linux-aarch64]
-        shared: ['false']
+        shared: ['static']
         bits: ['64']
         include:
           - runner: linux-amd64
-            shared: 'false'
+            shared: 'static'
             bits: '32'
           - runner: linux-amd64
-            shared: 'true'
+            shared: 'shared'
             bits: '64'
     runs-on: ${{ matrix.runner }}
     steps:
@@ -27,7 +27,7 @@ jobs:
         run: |
           ./configure --enable-gpl --enable-nonfree --enable-memory-poisoning --assert-level=2 \
               $([ "${{ matrix.bits }}" != "32" ] || echo --arch=x86_32 --extra-cflags=-m32 --extra-cxxflags=-m32 --extra-ldflags=-m32) \
-              $([ "${{ matrix.shared }}" != "true" ] || echo --enable-shared --disable-static) \
+              $([ "${{ matrix.shared }}" != "shared" ] || echo --enable-shared --disable-static) \
               || CFGRES=$? && CFGRES=$?
           cat ffbuild/config.log
           exit $CFGRES
-- 
2.52.0


From 9c2b7b9c474bff001e77e69b100aee76e11a274d Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Thu, 27 Nov 2025 03:10:42 +0100
Subject: [PATCH 104/304] vulkan_dpx: fix compilation with older headers

Fixes #21028
---
 libavcodec/vulkan_dpx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/vulkan_dpx.c b/libavcodec/vulkan_dpx.c
index 8978b78db4..54774f2424 100644
--- a/libavcodec/vulkan_dpx.c
+++ b/libavcodec/vulkan_dpx.c
@@ -137,7 +137,7 @@ static int host_upload_image(AVCodecContext *avctx,
     };
     VkCopyMemoryToImageInfo copy_info = {
         .sType = VK_STRUCTURE_TYPE_COPY_MEMORY_TO_IMAGE_INFO,
-        .flags = VK_HOST_IMAGE_COPY_MEMCPY_BIT_EXT,
+        .flags = VK_HOST_IMAGE_COPY_MEMCPY_EXT,
         .dstImage = temp,
         .dstImageLayout = VK_IMAGE_LAYOUT_GENERAL,
         .regionCount = 1,
-- 
2.52.0


From b8c6f38bcc57ad157d4e642955e545c378ffbf7c Mon Sep 17 00:00:00 2001
From: Zhao Zhili <zhilizhao@tencent.com>
Date: Wed, 5 Nov 2025 11:30:37 +0800
Subject: [PATCH 105/304] configure: cleanup rkmpp check

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
---
 configure | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/configure b/configure
index fd6f602e1d..7dc99d53ea 100755
--- a/configure
+++ b/configure
@@ -7366,8 +7366,7 @@ enabled openssl           && { { check_pkg_config openssl "openssl >= 3.0.0" ope
                                check_lib openssl openssl/ssl.h DTLS_get_data_mtu -lssl -lcrypto -lws2_32 -lgdi32 ||
                                die "ERROR: openssl (>= 1.1.1) not found"; }
 enabled pocketsphinx      && require_pkg_config pocketsphinx pocketsphinx pocketsphinx/pocketsphinx.h ps_init
-enabled rkmpp             && { require_pkg_config rkmpp rockchip_mpp  rockchip/rk_mpi.h mpp_create &&
-                               require_pkg_config rockchip_mpp "rockchip_mpp >= 1.3.7" rockchip/rk_mpi.h mpp_create &&
+enabled rkmpp             && { require_pkg_config rkmpp "rockchip_mpp >= 1.3.7" rockchip/rk_mpi.h mpp_create &&
                                { enabled libdrm ||
                                  die "ERROR: rkmpp requires --enable-libdrm"; }
                              }
-- 
2.52.0


From c78d33b0d6e31b88a5e4d2bc328d8a0dc5eaf9b3 Mon Sep 17 00:00:00 2001
From: Zhao Zhili <zhilizhao@tencent.com>
Date: Tue, 7 Oct 2025 22:15:50 +0800
Subject: [PATCH 106/304] avcodec/rkmppenc: add h264/hevc rkmpp encoder

Bump rockchip_mpp to 1.3.8.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
---
 Changelog              |   1 +
 configure              |   4 +-
 libavcodec/Makefile    |   2 +
 libavcodec/allcodecs.c |   2 +
 libavcodec/rkmppenc.c  | 584 +++++++++++++++++++++++++++++++++++++++++
 libavcodec/version.h   |   2 +-
 6 files changed, 593 insertions(+), 2 deletions(-)
 create mode 100644 libavcodec/rkmppenc.c

diff --git a/Changelog b/Changelog
index e4f7e123d4..cda59ebc90 100644
--- a/Changelog
+++ b/Changelog
@@ -13,6 +13,7 @@ version <next>:
 - D3D12 AV1 encoder
 - ProRes Vulkan hwaccel
 - DPX Vulkan hwaccel
+- Rockchip H.264/HEVC hardware encoder
 
 
 version 8.0:
diff --git a/configure b/configure
index 7dc99d53ea..7202cbc57a 100755
--- a/configure
+++ b/configure
@@ -3484,6 +3484,7 @@ h264_qsv_decoder_select="h264_mp4toannexb_bsf qsvdec"
 h264_qsv_encoder_select="atsc_a53 qsvenc"
 h264_rkmpp_decoder_deps="rkmpp"
 h264_rkmpp_decoder_select="h264_mp4toannexb_bsf"
+h264_rkmpp_encoder_deps="rkmpp"
 h264_vaapi_encoder_select="atsc_a53 cbs_h264 vaapi_encode"
 h264_vulkan_encoder_select="atsc_a53 cbs_h264 vulkan_encode"
 h264_v4l2m2m_decoder_deps="v4l2_m2m h264_v4l2_m2m"
@@ -3508,6 +3509,7 @@ hevc_qsv_decoder_select="hevc_mp4toannexb_bsf qsvdec"
 hevc_qsv_encoder_select="hevcparse qsvenc"
 hevc_rkmpp_decoder_deps="rkmpp"
 hevc_rkmpp_decoder_select="hevc_mp4toannexb_bsf"
+hevc_rkmpp_encoder_deps="rkmpp"
 hevc_vaapi_encoder_deps="VAEncPictureParameterBufferHEVC"
 hevc_vaapi_encoder_select="atsc_a53 cbs_h265 vaapi_encode"
 hevc_vulkan_encoder_select="atsc_a53 cbs_h265 vulkan_encode"
@@ -7366,7 +7368,7 @@ enabled openssl           && { { check_pkg_config openssl "openssl >= 3.0.0" ope
                                check_lib openssl openssl/ssl.h DTLS_get_data_mtu -lssl -lcrypto -lws2_32 -lgdi32 ||
                                die "ERROR: openssl (>= 1.1.1) not found"; }
 enabled pocketsphinx      && require_pkg_config pocketsphinx pocketsphinx pocketsphinx/pocketsphinx.h ps_init
-enabled rkmpp             && { require_pkg_config rkmpp "rockchip_mpp >= 1.3.7" rockchip/rk_mpi.h mpp_create &&
+enabled rkmpp             && { require_pkg_config rkmpp "rockchip_mpp >= 1.3.8" "rockchip/rk_mpi.h rockchip/mpp_buffer.h" "mpp_create mpp_buffer_sync_begin_f" &&
                                { enabled libdrm ||
                                  die "ERROR: rkmpp requires --enable-libdrm"; }
                              }
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 45c8237181..49d696017d 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -446,6 +446,7 @@ OBJS-$(CONFIG_H264_OMX_ENCODER)        += omx.o
 OBJS-$(CONFIG_H264_QSV_DECODER)        += qsvdec.o
 OBJS-$(CONFIG_H264_QSV_ENCODER)        += qsvenc_h264.o
 OBJS-$(CONFIG_H264_RKMPP_DECODER)      += rkmppdec.o
+OBJS-$(CONFIG_H264_RKMPP_ENCODER)      += rkmppenc.o
 OBJS-$(CONFIG_H264_VAAPI_ENCODER)      += vaapi_encode_h264.o h264_levels.o \
                                           h2645data.o hw_base_encode_h264.o
 OBJS-$(CONFIG_H264_VULKAN_ENCODER)     += vulkan_encode.o vulkan_encode_h264.o \
@@ -475,6 +476,7 @@ OBJS-$(CONFIG_HEVC_OH_ENCODER)         += ohcodec.o ohenc.o
 OBJS-$(CONFIG_HEVC_QSV_DECODER)        += qsvdec.o
 OBJS-$(CONFIG_HEVC_QSV_ENCODER)        += qsvenc_hevc.o hevc/ps_enc.o
 OBJS-$(CONFIG_HEVC_RKMPP_DECODER)      += rkmppdec.o
+OBJS-$(CONFIG_HEVC_RKMPP_ENCODER)      += rkmppenc.o
 OBJS-$(CONFIG_HEVC_VAAPI_ENCODER)      += vaapi_encode_h265.o h265_profile_level.o \
                                           h2645data.o hw_base_encode_h265.o
 OBJS-$(CONFIG_HEVC_VULKAN_ENCODER)     += vulkan_encode.o vulkan_encode_h265.o \
diff --git a/libavcodec/allcodecs.c b/libavcodec/allcodecs.c
index cce23d1541..a335e2bb82 100644
--- a/libavcodec/allcodecs.c
+++ b/libavcodec/allcodecs.c
@@ -156,11 +156,13 @@ extern const FFCodec ff_h264_mediacodec_encoder;
 extern const FFCodec ff_h264_mmal_decoder;
 extern const FFCodec ff_h264_qsv_decoder;
 extern const FFCodec ff_h264_rkmpp_decoder;
+extern const FFCodec ff_h264_rkmpp_encoder;
 extern const FFCodec ff_hap_encoder;
 extern const FFCodec ff_hap_decoder;
 extern const FFCodec ff_hevc_decoder;
 extern const FFCodec ff_hevc_qsv_decoder;
 extern const FFCodec ff_hevc_rkmpp_decoder;
+extern const FFCodec ff_hevc_rkmpp_encoder;
 extern const FFCodec ff_hevc_v4l2m2m_decoder;
 extern const FFCodec ff_hnm4_video_decoder;
 extern const FFCodec ff_hq_hqa_decoder;
diff --git a/libavcodec/rkmppenc.c b/libavcodec/rkmppenc.c
new file mode 100644
index 0000000000..09501f0bca
--- /dev/null
+++ b/libavcodec/rkmppenc.c
@@ -0,0 +1,584 @@
+/*
+ * RockChip MPP Video Encoder
+ *
+ * This file is part of FFmpeg.
+ *
+ * Copyright (c) 2025 Zhao Zhili <quinkblack@foxmail.com>
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config_components.h"
+
+#include <assert.h>
+#include <stdbool.h>
+
+#include <rockchip/mpp_frame.h>
+#include <rockchip/mpp_packet.h>
+#include <rockchip/rk_mpi.h>
+
+#include "libavutil/avassert.h"
+#include "libavutil/hwcontext.h"
+#include "libavutil/hwcontext_drm.h"
+#include "libavutil/imgutils.h"
+#include "libavutil/log.h"
+#include "libavutil/mem.h"
+#include "libavutil/opt.h"
+
+#include "avcodec.h"
+#include "codec_internal.h"
+#include "encode.h"
+#include "hwconfig.h"
+
+#define RKMPP_TIME_BASE AV_TIME_BASE_Q
+#define RKMPP_ALIGN_SIZE 16
+
+typedef struct RKMPPEncoderContext {
+    const AVClass *av_class;
+
+    MppCtx enc;
+    MppApi *mpi;
+    MppEncCfg cfg;
+    AVFrame *frame;
+
+    MppFrameFormat pix_fmt;
+    int mpp_stride;
+    int mpp_height;
+    // When pix_fmt isn't hardware pixel format
+    MppBufferGroup buf_group;
+    MppBuffer frame_buf;
+
+    MppEncRcMode rc_mode;
+    bool eof_sent;
+} RKMPPEncoderContext;
+
+static const enum AVPixelFormat rkmpp_pix_fmts[] = {
+    AV_PIX_FMT_DRM_PRIME,
+    AV_PIX_FMT_NV12,
+    AV_PIX_FMT_YUV420P,
+    AV_PIX_FMT_NONE
+};
+
+static av_cold int rkmpp_close_encoder(AVCodecContext *avctx)
+{
+    RKMPPEncoderContext *ctx = avctx->priv_data;
+
+    if (ctx->enc) {
+        ctx->mpi->reset(ctx->enc);
+        mpp_destroy(ctx->enc);
+        ctx->enc = NULL;
+    }
+
+    if (ctx->cfg) {
+        mpp_enc_cfg_deinit(ctx->cfg);
+        ctx->cfg = NULL;
+    }
+
+    if (ctx->frame_buf) {
+        mpp_buffer_put(ctx->frame_buf);
+        ctx->frame_buf = NULL;
+    }
+
+    if (ctx->buf_group) {
+        mpp_buffer_group_put(ctx->buf_group);
+        ctx->buf_group = NULL;
+    }
+
+    av_frame_free(&ctx->frame);
+
+    return 0;
+}
+
+static int rkmpp_create_frame_buf(AVCodecContext *avctx)
+{
+    RKMPPEncoderContext *ctx = avctx->priv_data;
+
+    ctx->frame = av_frame_alloc();
+    if (!ctx->frame)
+        return AVERROR(ENOMEM);
+
+    if (avctx->pix_fmt == AV_PIX_FMT_DRM_PRIME)
+        return 0;
+
+    int ret = mpp_buffer_group_get_internal(&ctx->buf_group,
+                MPP_BUFFER_TYPE_DRM | MPP_BUFFER_FLAGS_CACHABLE);
+    if (ret != MPP_OK) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create buffer group, %d\n",
+               ret);
+        return AVERROR_EXTERNAL;
+    }
+
+    int n = av_image_get_buffer_size(avctx->pix_fmt, ctx->mpp_stride,
+                                     ctx->mpp_height, 1);
+    if (n < 0)
+        return ret;
+    ret = mpp_buffer_get(ctx->buf_group, &ctx->frame_buf, n);
+    if (ret != MPP_OK) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to get frame buffer, %d\n",
+               ret);
+        return AVERROR_EXTERNAL;
+    }
+
+    return 0;
+}
+
+static int rkmpp_export_extradata(AVCodecContext *avctx)
+{
+    RKMPPEncoderContext *ctx = avctx->priv_data;
+    MppEncHeaderMode mode = (avctx->flags & AV_CODEC_FLAG_GLOBAL_HEADER) ?
+            MPP_ENC_HEADER_MODE_DEFAULT : MPP_ENC_HEADER_MODE_EACH_IDR;
+
+    int ret = ctx->mpi->control(ctx->enc, MPP_ENC_SET_HEADER_MODE, &mode);
+    if (ret != MPP_OK) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to set header mode: %d\n", ret);
+        return AVERROR_EXTERNAL;
+    }
+
+    if (!(avctx->flags & AV_CODEC_FLAG_GLOBAL_HEADER))
+        return 0;
+
+    size_t size = 4096;
+    avctx->extradata = av_mallocz(size + AV_INPUT_BUFFER_PADDING_SIZE);
+    if (!avctx->extradata)
+        return AVERROR(ENOMEM);
+
+    MppPacket packet = NULL;
+    mpp_packet_init(&packet, avctx->extradata, size);
+    mpp_packet_set_length(packet, 0);
+    ret = ctx->mpi->control(ctx->enc, MPP_ENC_GET_HDR_SYNC, packet);
+    if (ret != MPP_OK) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to get header: %d\n", ret);
+        ret = AVERROR_EXTERNAL;
+        goto out;
+    }
+
+    avctx->extradata_size = mpp_packet_get_length(packet);
+    if (avctx->extradata_size == 0 || avctx->extradata_size > size) {
+        av_log(avctx, AV_LOG_ERROR, "Invalid extradata size %d\n",
+               avctx->extradata_size);
+        ret = AVERROR_EXTERNAL;
+        goto out;
+    }
+
+    ret = 0;
+out:
+    mpp_packet_deinit(&packet);
+
+    return ret;
+}
+
+static av_cold int rkmpp_init_encoder(AVCodecContext *avctx)
+{
+    RKMPPEncoderContext *ctx = avctx->priv_data;
+    int ret;
+
+    MppCodingType codectype;
+    switch (avctx->codec_id) {
+    case AV_CODEC_ID_H264:
+        codectype = MPP_VIDEO_CodingAVC;
+        break;
+    case AV_CODEC_ID_HEVC:
+        codectype = MPP_VIDEO_CodingHEVC;
+        break;
+    default:
+        av_unreachable("Invalid codec_id");
+    }
+
+    ret = mpp_check_support_format(MPP_CTX_ENC, codectype);
+    if (ret != MPP_OK) {
+        av_log(avctx, AV_LOG_ERROR, "The device doesn't support %s\n",
+                avcodec_get_name(avctx->codec_id));
+        return AVERROR_EXTERNAL;
+    }
+
+    ret = mpp_create(&ctx->enc, &ctx->mpi);
+    if (ret != MPP_OK) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create MPP context (%d).\n", ret);
+        return AVERROR_EXTERNAL;
+    }
+
+    ret = mpp_init(ctx->enc, MPP_CTX_ENC, codectype);
+    if (ret != MPP_OK) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to initialize MPP context (%d).\n", ret);
+        return AVERROR_EXTERNAL;
+    }
+
+    ret = mpp_enc_cfg_init(&ctx->cfg);
+    if (ret != MPP_OK) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to initialize config (%d).\n", ret);
+        return AVERROR_EXTERNAL;
+    }
+
+    MppEncCfg cfg = ctx->cfg;
+    ret = ctx->mpi->control(ctx->enc, MPP_ENC_GET_CFG, cfg);
+    if (ret != MPP_OK) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to get encoder config: %d\n", ret);
+        return AVERROR_EXTERNAL;
+    }
+
+    mpp_enc_cfg_set_s32(cfg, "prep:width", avctx->width);
+    mpp_enc_cfg_set_s32(cfg, "prep:height", avctx->height);
+    ctx->mpp_stride = FFALIGN(avctx->width, RKMPP_ALIGN_SIZE);
+    ctx->mpp_height = FFALIGN(avctx->height, RKMPP_ALIGN_SIZE);
+    mpp_enc_cfg_set_s32(cfg, "prep:hor_stride", ctx->mpp_stride);
+    mpp_enc_cfg_set_s32(cfg, "prep:ver_stride", ctx->mpp_height);
+
+    if (avctx->pix_fmt == AV_PIX_FMT_DRM_PRIME || avctx->pix_fmt == AV_PIX_FMT_NV12)
+        ctx->pix_fmt = MPP_FMT_YUV420SP;
+    else if (avctx->pix_fmt == AV_PIX_FMT_YUV420P)
+        ctx->pix_fmt = MPP_FMT_YUV420P;
+    else // Can only happen during development
+        return AVERROR_BUG;
+    mpp_enc_cfg_set_s32(cfg, "prep:format", ctx->pix_fmt);
+
+    if (avctx->colorspace != AVCOL_SPC_UNSPECIFIED)
+        mpp_enc_cfg_set_s32(cfg, "prep:colorspace", avctx->colorspace);
+    if (avctx->color_primaries != AVCOL_PRI_UNSPECIFIED)
+        mpp_enc_cfg_set_s32(cfg, "prep:colorprim", avctx->color_primaries);
+    if (avctx->color_trc != AVCOL_TRC_UNSPECIFIED)
+        mpp_enc_cfg_set_s32(cfg, "prep:colortrc", avctx->color_trc);
+    static_assert((int)AVCOL_RANGE_MPEG == (int)MPP_FRAME_RANGE_MPEG &&
+          (int)AVCOL_RANGE_JPEG == (int)MPP_FRAME_RANGE_JPEG &&
+          (int)AVCOL_RANGE_UNSPECIFIED == (int) MPP_FRAME_RANGE_UNSPECIFIED,
+          "MppFrameColorRange not equal to AVColorRange");
+    mpp_enc_cfg_set_s32(cfg, "prep:colorrange", avctx->color_range);
+
+    /* These two options sound like variable frame rate from the doc, but they
+     * are not. When they are false, bitrate control is based on frame numbers
+     * and framerate. But when they are true, bitrate control is based on wall
+     * clock time, not based on frame timestamps, which makes these options
+     * almost useless, except in certain rare realtime case.
+     */
+    mpp_enc_cfg_set_s32(cfg, "rc:fps_in_flex", 0);
+    mpp_enc_cfg_set_s32(cfg, "rc:fps_out_flex", 0);
+    if (avctx->framerate.den > 0 && avctx->framerate.num > 0) {
+        mpp_enc_cfg_set_s32(cfg, "rc:fps_in_num", avctx->framerate.num);
+        mpp_enc_cfg_set_s32(cfg, "rc:fps_in_denom", avctx->framerate.den);
+        mpp_enc_cfg_set_s32(cfg, "rc:fps_out_num", avctx->framerate.num);
+        mpp_enc_cfg_set_s32(cfg, "rc:fps_out_denom", avctx->framerate.den);
+    }
+
+    if (avctx->gop_size >= 0)
+        mpp_enc_cfg_set_s32(cfg, "rc:gop", avctx->gop_size);
+
+    mpp_enc_cfg_set_u32(cfg, "rc:mode", ctx->rc_mode);
+    if (avctx->bit_rate > 0) {
+        mpp_enc_cfg_set_s32(cfg, "rc:bps_target", avctx->bit_rate);
+        if (avctx->rc_buffer_size >= avctx->bit_rate) {
+            int seconds = round((double)avctx->rc_buffer_size / avctx->bit_rate);
+            // 60 is the upper bound from the doc
+            seconds = FFMIN(seconds, 60);
+            mpp_enc_cfg_set_s32(cfg, "rc:stats_time", seconds);
+        }
+    }
+    if (avctx->rc_max_rate > 0)
+        mpp_enc_cfg_set_s32(cfg, "rc:bps_max", avctx->rc_max_rate);
+    if (avctx->rc_min_rate > 0)
+        mpp_enc_cfg_set_s32(cfg, "rc:bps_min", avctx->rc_min_rate);
+
+    mpp_enc_cfg_set_u32(cfg, "rc:drop_mode", MPP_ENC_RC_DROP_FRM_DISABLED);
+
+    ret = ctx->mpi->control(ctx->enc, MPP_ENC_SET_CFG, cfg);
+    if (ret != MPP_OK) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to set config: %d\n", ret);
+        return AVERROR_EXTERNAL;
+    }
+
+    ret = rkmpp_create_frame_buf(avctx);
+    if (ret < 0)
+        return ret;
+
+    ret = rkmpp_export_extradata(avctx);
+    if (ret < 0)
+        return ret;
+
+    return 0;
+}
+
+static int rkmpp_output_pkt(AVCodecContext *avctx, AVPacket *pkt, MppPacket packet)
+{
+    if (mpp_packet_get_eos(packet)) {
+        av_log(avctx, AV_LOG_INFO, "Receive eos packet\n");
+        return AVERROR_EOF;
+    }
+
+    size_t size = mpp_packet_get_length(packet);
+    void *data = mpp_packet_get_pos(packet);
+
+    if (!size || !data) {
+        av_log(avctx, AV_LOG_ERROR, "Encoder return empty packet\n");
+        return AVERROR_EXTERNAL;
+    }
+
+    int ret = ff_get_encode_buffer(avctx, pkt, size, 0);
+    if (ret < 0)
+        return ret;
+    memcpy(pkt->data, data, size);
+
+    int64_t pts = mpp_packet_get_pts(packet);
+    int64_t dts = mpp_packet_get_dts(packet);
+
+    pkt->pts = av_rescale_q(pts, RKMPP_TIME_BASE, avctx->time_base);
+    /* dts is always zero currently, since rkmpp copy dts from MppFrame to
+     * MppPacket, and we don't set dts for MppFrame (it make no sense for
+     * encoder). rkmpp encoder doesn't support reordering, so we can just
+     * set dts as pts.
+     *
+     * TODO: remove this workaround once rkmpp fixed the issue.
+     */
+    if (dts)
+        pkt->dts = av_rescale_q(dts, RKMPP_TIME_BASE, avctx->time_base);
+    else
+        pkt->dts = pkt->pts;
+
+    MppMeta meta = mpp_packet_get_meta(packet);
+    if (!meta) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to get meta from mpp packet\n");
+        return AVERROR_EXTERNAL;
+    }
+
+    int key_frame = 0;
+    ret = mpp_meta_get_s32(meta, KEY_OUTPUT_INTRA, &key_frame);
+    if (ret != MPP_OK) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to get key frame info\n");
+        return AVERROR_EXTERNAL;
+    }
+
+    if (key_frame)
+        pkt->flags |= AV_PKT_FLAG_KEY;
+
+    return 0;
+}
+
+static int rkmpp_set_hw_frame(AVCodecContext *avctx, MppFrame frame)
+{
+    RKMPPEncoderContext *ctx = avctx->priv_data;
+    AVBufferRef *hw_ref = ctx->frame->hw_frames_ctx;
+    int ret;
+
+    if (!hw_ref)
+        return AVERROR(EINVAL);
+
+    AVHWFramesContext *hwframes = (AVHWFramesContext *)hw_ref->data;
+    if (hwframes->sw_format != AV_PIX_FMT_NV12)
+        return AVERROR(EINVAL);
+
+
+    const AVDRMFrameDescriptor *desc = (AVDRMFrameDescriptor *)ctx->frame->data[0];
+    const AVDRMLayerDescriptor *layer = &desc->layers[0];
+
+    int stride = layer->planes[0].pitch;
+    int vertical = layer->planes[1].offset / stride;
+    if (stride != ctx->mpp_stride || vertical != ctx->mpp_height) {
+        // Update stride info
+        ctx->mpp_stride = stride;
+        ctx->mpp_height = vertical;
+        mpp_enc_cfg_set_s32(ctx->cfg, "prep:hor_stride", ctx->mpp_stride);
+        mpp_enc_cfg_set_s32(ctx->cfg, "prep:ver_stride", ctx->mpp_height);
+        ret = ctx->mpi->control(ctx->enc, MPP_ENC_SET_CFG, ctx->cfg);
+        if (ret != MPP_OK) {
+            av_log(avctx, AV_LOG_ERROR, "Failed to set config: %d\n", ret);
+            return AVERROR_EXTERNAL;
+        }
+    }
+    mpp_frame_set_hor_stride(frame, stride);
+    mpp_frame_set_ver_stride(frame, vertical);
+
+    MppBuffer buffer = {0};
+    MppBufferInfo info = {
+        .type = MPP_BUFFER_TYPE_DRM,
+        .size = desc->objects[0].size,
+        .fd = desc->objects[0].fd,
+    };
+    ret = mpp_buffer_import(&buffer, &info);
+    if (ret != MPP_OK)
+        return AVERROR_EXTERNAL;
+
+    mpp_frame_set_buffer(frame, buffer);
+    mpp_buffer_put(buffer);
+
+    return 0;
+}
+
+static int rkmpp_set_sw_frame(AVCodecContext *avctx, MppFrame frame)
+{
+    RKMPPEncoderContext *ctx = avctx->priv_data;
+    AVFrame *f = ctx->frame;
+
+    mpp_buffer_sync_begin(ctx->frame_buf);
+    void *buf = mpp_buffer_get_ptr(ctx->frame_buf);
+
+    uint8_t *dst[4] = {NULL};
+    int dst_linesizes[4] = {0};
+    int ret = av_image_fill_linesizes(dst_linesizes, f->format, ctx->mpp_stride);
+    if (ret < 0)
+        goto out;
+    ret = av_image_fill_pointers(dst, f->format, ctx->mpp_height, buf,
+                                 dst_linesizes);
+    if (ret < 0)
+        goto out;
+
+    av_image_copy2(dst, dst_linesizes, f->data, f->linesize,
+                   f->format, f->width, f->height);
+    mpp_frame_set_hor_stride(frame, ctx->mpp_stride);
+    mpp_frame_set_ver_stride(frame, ctx->mpp_height);
+
+    ret = 0;
+
+out:
+    mpp_buffer_sync_end(ctx->frame_buf);
+    if (!ret)
+        mpp_frame_set_buffer(frame, ctx->frame_buf);
+
+    return ret;
+}
+
+static int rkmpp_send_frame(AVCodecContext *avctx)
+{
+    RKMPPEncoderContext *ctx = avctx->priv_data;
+    MppFrame frame = NULL;
+    int ret = 0;
+
+    ret = mpp_frame_init(&frame);
+    if (ret != MPP_OK) {
+        ret = AVERROR_EXTERNAL;
+        goto out;
+    }
+
+    if (ctx->frame->buf[0]) {
+        if (ctx->frame->format == AV_PIX_FMT_DRM_PRIME)
+            ret = rkmpp_set_hw_frame(avctx, frame);
+        else
+            ret = rkmpp_set_sw_frame(avctx, frame);
+
+        if (ret < 0)
+            goto out;
+
+        mpp_frame_set_fmt(frame, ctx->pix_fmt);
+        mpp_frame_set_width(frame, ctx->frame->width);
+        mpp_frame_set_height(frame, ctx->frame->height);
+        mpp_frame_set_pts(frame, av_rescale_q(ctx->frame->pts,
+                        avctx->time_base, RKMPP_TIME_BASE));
+    } else {
+        mpp_frame_set_buffer(frame, NULL);
+        mpp_frame_set_eos(frame, 1);
+    }
+
+    ret = ctx->mpi->encode_put_frame(ctx->enc, frame);
+    if (ret != MPP_OK)
+        ret = AVERROR_EXTERNAL;
+
+out:
+    if (frame)
+        mpp_frame_deinit(&frame);
+
+    return ret;
+}
+
+static int rkmpp_receive(AVCodecContext *avctx, AVPacket *pkt)
+{
+    RKMPPEncoderContext *ctx = avctx->priv_data;
+
+    while (true) {
+        MppPacket packet = NULL;
+        int ret = ctx->mpi->encode_get_packet(ctx->enc, &packet);
+
+        if (ret == MPP_OK && packet) {
+            ret = rkmpp_output_pkt(avctx, pkt, packet);
+            mpp_packet_deinit(&packet);
+            return ret;
+        }
+
+        if (ctx->eof_sent)
+            continue;
+
+        if (!ctx->frame->buf[0]) {
+            ret = ff_encode_get_frame(avctx, ctx->frame);
+            if (ret < 0 && ret != AVERROR_EOF)
+                return ret;
+        }
+
+        ret = rkmpp_send_frame(avctx);
+        if (ret < 0)
+            return ret;
+
+        if (!ctx->frame->buf[0])
+            ctx->eof_sent = true;
+        else
+            av_frame_unref(ctx->frame);
+    }
+}
+
+static av_cold void rkmpp_flush(AVCodecContext *avctx)
+{
+    RKMPPEncoderContext *ctx = avctx->priv_data;
+    ctx->mpi->reset(ctx->enc);
+    ctx->eof_sent = true;
+}
+
+static const AVCodecHWConfigInternal *const rkmpp_hw_configs[] = {
+    HW_CONFIG_ENCODER_FRAMES(DRM_PRIME, DRM),
+    NULL
+};
+
+#define OFFSET(x) offsetof(RKMPPEncoderContext, x)
+#define VE AV_OPT_FLAG_VIDEO_PARAM | AV_OPT_FLAG_ENCODING_PARAM
+static const AVOption rkmpp_options[] = {
+    {"rc", "rate-control mode",
+        OFFSET(rc_mode), AV_OPT_TYPE_INT,  { .i64 = MPP_ENC_RC_MODE_VBR }, MPP_ENC_RC_MODE_VBR, INT_MAX, VE, .unit = "rc"},
+        {"vbr", "Variable bitrate mode",
+            0, AV_OPT_TYPE_CONST, {.i64 = MPP_ENC_RC_MODE_VBR}, 0, 0, VE, .unit = "rc"},
+        {"cbr", "Constant bitrate mode",
+            0, AV_OPT_TYPE_CONST, {.i64 = MPP_ENC_RC_MODE_CBR}, 0, 0, VE, .unit = "rc"},
+        {"avbr", "Adaptive bit rate mode",
+            0, AV_OPT_TYPE_CONST, {.i64 = MPP_ENC_RC_MODE_AVBR}, 0, 0, VE, .unit = "rc"},
+    {NULL},
+};
+
+static const AVClass rkmpp_enc_class = {
+    .class_name = "rkmpp_enc",
+    .item_name = av_default_item_name,
+    .version = LIBAVUTIL_VERSION_INT,
+    .option = rkmpp_options,
+};
+
+#define RKMPP_ENC(NAME, ID) \
+    const FFCodec ff_##NAME##_rkmpp_encoder = { \
+        .p.name         = #NAME "_rkmpp", \
+        CODEC_LONG_NAME(#NAME " (rkmpp)"), \
+        .p.type         = AVMEDIA_TYPE_VIDEO, \
+        .p.id           = ID, \
+        .p.capabilities = AV_CODEC_CAP_DR1 | AV_CODEC_CAP_DELAY | \
+                          AV_CODEC_CAP_HARDWARE | AV_CODEC_CAP_ENCODER_FLUSH, \
+        .priv_data_size = sizeof(RKMPPEncoderContext), \
+        CODEC_PIXFMTS_ARRAY(rkmpp_pix_fmts), \
+        .color_ranges   = AVCOL_RANGE_MPEG | AVCOL_RANGE_JPEG, \
+        .init           = rkmpp_init_encoder, \
+        FF_CODEC_RECEIVE_PACKET_CB(rkmpp_receive), \
+        .close          = rkmpp_close_encoder, \
+        .flush          = rkmpp_flush, \
+        .p.priv_class   = &rkmpp_enc_class, \
+        .caps_internal  = FF_CODEC_CAP_INIT_CLEANUP, \
+        .p.wrapper_name = "rkmpp", \
+        .hw_configs     = rkmpp_hw_configs, \
+    };
+
+#if CONFIG_H264_RKMPP_ENCODER
+RKMPP_ENC(h264, AV_CODEC_ID_H264)
+#endif
+
+#if CONFIG_HEVC_RKMPP_ENCODER
+RKMPP_ENC(hevc, AV_CODEC_ID_HEVC)
+#endif
diff --git a/libavcodec/version.h b/libavcodec/version.h
index d0980e28de..da6f3a84ac 100644
--- a/libavcodec/version.h
+++ b/libavcodec/version.h
@@ -29,7 +29,7 @@
 
 #include "version_major.h"
 
-#define LIBAVCODEC_VERSION_MINOR  20
+#define LIBAVCODEC_VERSION_MINOR  21
 #define LIBAVCODEC_VERSION_MICRO 100
 
 #define LIBAVCODEC_VERSION_INT  AV_VERSION_INT(LIBAVCODEC_VERSION_MAJOR, \
-- 
2.52.0


From 62625650ee809b70eab9b9d7153426390ba4d804 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 25 Nov 2025 21:02:11 +0100
Subject: [PATCH 107/304] avcodec/vp3: Sync VLCs once during init, fix crash

6c7a344b65cb7476d1575cb1504e3a53bcbc83e7 made the VLCs shared between
threads and did so in a way that was designed to support stream
reconfigurations, so that the structure containing the VLCs was
synced in update_thread_context. The idea was that the currently
active VLCs would just be passed along between threads.

Yet this was broken by 5acbdd2264d3b90dc11369f9e031e762f260882e:
Before this commit, submit_packet() was a no-op during flushing
for VP3, as it is a no-delay decoder, so it won't produce any output
during flushing. This meant that prev_thread in pthread_frame.c
contained the last dst thread that update_thread_context()
was called for (so that these VLCs could be passed along between
threads). Yet after said commit, submit_packet was no longer
a no-op during flushing and changed prev_thread in such a way
that it did not need to contain any VLCs at all*. When flushing,
prev_thread is used to pass the current state to the first worker
thread which is the one that is used to restart decoding.
It could therefore happen that the decoding thread did not contain
the VLCs at all any more after decoding restarts after flushing
leading to a crash (this scenario was never anticipated and
must not happen at all).

There is a simple, easily backportable fix given that we do not
support stream reconfigurations (yet) when using frame threading:
Don't sync the VLCs in update_thread_context(), instead do it once
during init.

This fixes forgejo issue #20346 and trac issue #11592.

(I don't know why 5acbdd2264d3b90dc11369f9e031e762f260882e
changed submit_packet() to no longer be a no-op when draining
no-delay decoders.)

*: The exact condition for the crash is nb_threads > 2*nb_frames.

Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/vp3.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/libavcodec/vp3.c b/libavcodec/vp3.c
index 0613254c14..ab60dee098 100644
--- a/libavcodec/vp3.c
+++ b/libavcodec/vp3.c
@@ -47,7 +47,6 @@
 #include "decode.h"
 #include "get_bits.h"
 #include "hpeldsp.h"
-#include "internal.h"
 #include "jpegquanttables.h"
 #include "mathops.h"
 #include "progressframe.h"
@@ -2459,7 +2458,7 @@ static av_cold int vp3_decode_init(AVCodecContext *avctx)
         }
     }
 
-    if (!avctx->internal->is_copy) {
+    if (ff_thread_sync_ref(avctx, offsetof(Vp3DecodeContext, coeff_vlc)) != FF_THREAD_IS_COPY) {
         CoeffVLCs *vlcs = av_refstruct_alloc_ext(sizeof(*s->coeff_vlc), 0,
                                                  NULL, free_vlc_tables);
         if (!vlcs)
@@ -2528,8 +2527,6 @@ static int vp3_update_thread_context(AVCodecContext *dst, const AVCodecContext *
     const Vp3DecodeContext *s1 = src->priv_data;
     int qps_changed = 0;
 
-    av_refstruct_replace(&s->coeff_vlc, s1->coeff_vlc);
-
     // copy previous frame data
     ref_frames(s, s1);
     if (!s1->current_frame.f ||
-- 
2.52.0


From 1a31bc730726e6538b18f7742b43927b449e2dab Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 25 Nov 2025 20:19:42 +0100
Subject: [PATCH 108/304] avcodec/vp3: Move last_qps from context to stack

Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/vp3.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/libavcodec/vp3.c b/libavcodec/vp3.c
index ab60dee098..9a766ecb41 100644
--- a/libavcodec/vp3.c
+++ b/libavcodec/vp3.c
@@ -217,7 +217,6 @@ typedef struct Vp3DecodeContext {
 
     int qps[3];
     int nqps;
-    int last_qps[3];
 
     int superblock_count;
     int y_superblock_width;
@@ -2551,7 +2550,6 @@ static int vp3_update_thread_context(AVCodecContext *dst, const AVCodecContext *
 
         if (qps_changed) {
             memcpy(s->qps,      s1->qps,      sizeof(s->qps));
-            memcpy(s->last_qps, s1->last_qps, sizeof(s->last_qps));
             s->nqps = s1->nqps;
         }
     }
@@ -2618,8 +2616,10 @@ static int vp3_decode_frame(AVCodecContext *avctx, AVFrame *frame,
     }
     if (!s->theora)
         skip_bits(&gb, 1);
+
+    int last_qps[3];
     for (int i = 0; i < 3; i++)
-        s->last_qps[i] = s->qps[i];
+        last_qps[i] = s->qps[i];
 
     s->nqps = 0;
     do {
@@ -2636,13 +2636,13 @@ static int vp3_decode_frame(AVCodecContext *avctx, AVFrame *frame,
                           avctx->skip_loop_filter >= (s->keyframe ? AVDISCARD_ALL
                                                                   : AVDISCARD_NONKEY);
 
-    if (s->qps[0] != s->last_qps[0])
+    if (s->qps[0] != last_qps[0])
         init_loop_filter(s);
 
     for (int i = 0; i < s->nqps; i++)
         // reinit all dequantizers if the first one changed, because
         // the DC of the first quantizer must be used for all matrices
-        if (s->qps[i] != s->last_qps[i] || s->qps[0] != s->last_qps[0])
+        if (s->qps[i] != last_qps[i] || s->qps[0] != last_qps[0])
             init_dequantizer(s, i);
 
     if (avctx->skip_frame >= AVDISCARD_NONKEY && !s->keyframe)
-- 
2.52.0


From c950f782c015757dfe631664e23183786aee9054 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 25 Nov 2025 20:31:26 +0100
Subject: [PATCH 109/304] avcodec/vp3: Remove always-false checks

The dimensions are only set at two places: theora_decode_header()
and vp3_decode_init(). These functions are called during init
and during dimension changes, but the latter is only supported
(and attempted) when frame threading is not active. This implies that
the dimensions of the various worker threads in
vp3_update_thread_context() always coincide, so that these checks
are dead and can be removed.

(These checks would of course need to be removed when support
for dimension changes during frame threading is implemented;
and in any case, a dimension change is not an error.)

Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/vp3.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/libavcodec/vp3.c b/libavcodec/vp3.c
index 9a766ecb41..60908c877f 100644
--- a/libavcodec/vp3.c
+++ b/libavcodec/vp3.c
@@ -2528,10 +2528,8 @@ static int vp3_update_thread_context(AVCodecContext *dst, const AVCodecContext *
 
     // copy previous frame data
     ref_frames(s, s1);
-    if (!s1->current_frame.f ||
-        s->width != s1->width || s->height != s1->height) {
+    if (!s1->current_frame.f)
         return -1;
-    }
 
     if (s != s1) {
         s->keyframe = s1->keyframe;
-- 
2.52.0


From aea5b5d0e2cf925f03107bbbb8b5b204b3d4d835 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Wed, 26 Nov 2025 01:18:10 +0100
Subject: [PATCH 110/304] avcodec/vp3: Redo updating frames

VP3's frame managment is actually simple: It has three frame slots:
current, last and golden. After having decoded the current frame,
the old last frame will be freed and replaced by the current frame.
If the current frame is a keyframe, it also takes over the golden slot.

The VP3 decoder handled this like this: In single-threaded mode,
the above procedure was carried out (on success). Doing so with
frame-threading is impossible, as it would lead to data races.
Instead vp3_update_thread_context() created new references
to these frames and then carried out said procedure.

This means that vp3_update_thread_context() is not just a "dumb"
function that only copies certain fields from src to dst; instead
it actually processes them. E.g. trying to copy the decoding state
from A to B and then from B to C (with no decode_frame call in between)
will not be equivalent to copying from A to C, as both current and last
frames will be blank in the first case.

This commit changes this: Because last_frame won't be needed after
decoding, no reference to it will be created to it in
vp3_update_thread_context(); instead it is now always unreferenced
after decoding it (even on error). Replacing last_frame with the new
frame is now always performed when the new frame is allocated.
Replacing the golden frame is now done earlier, namely in decode_frame()
before ff_thread_finish_setup(), so that update_thread_context only
has to reference current frame and golden frame. Being dumb means
that update_thread_context also no longer checks whether the current
frame is valid, so that it can no longer error out.

This unifies the single- and multi-threaded codepaths; it can lead
to changes in output in single threaded mode: When erroring out,
the current frame would be discarded and not be put into one
of the reference slots at all in single-threaded mode. The new
code meanwhile does everything as the frame-threaded code already did
in order to reduce discrepancies between the two. It would be possible
to keep the old single-threaded behavior (one would need to postpone
replacing the golden frame to the end of vp3_decode_frame and would
need to swap the current frame and the last frame on error,
unreferencing the former).

Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/vp3.c | 37 ++++++++-----------------------------
 1 file changed, 8 insertions(+), 29 deletions(-)

diff --git a/libavcodec/vp3.c b/libavcodec/vp3.c
index 60908c877f..7ce54967e1 100644
--- a/libavcodec/vp3.c
+++ b/libavcodec/vp3.c
@@ -2499,25 +2499,11 @@ static av_cold int vp3_decode_init(AVCodecContext *avctx)
     return allocate_tables(avctx);
 }
 
-/// Release and shuffle frames after decode finishes
-static void update_frames(AVCodecContext *avctx)
-{
-    Vp3DecodeContext *s = avctx->priv_data;
-
-    if (s->keyframe)
-        ff_progress_frame_replace(&s->golden_frame, &s->current_frame);
-
-    /* shuffle frames */
-    ff_progress_frame_unref(&s->last_frame);
-    FFSWAP(ProgressFrame, s->last_frame, s->current_frame);
-}
-
 #if HAVE_THREADS
 static void ref_frames(Vp3DecodeContext *dst, const Vp3DecodeContext *src)
 {
     ff_progress_frame_replace(&dst->current_frame, &src->current_frame);
     ff_progress_frame_replace(&dst->golden_frame,  &src->golden_frame);
-    ff_progress_frame_replace(&dst->last_frame,    &src->last_frame);
 }
 
 static int vp3_update_thread_context(AVCodecContext *dst, const AVCodecContext *src)
@@ -2528,12 +2514,8 @@ static int vp3_update_thread_context(AVCodecContext *dst, const AVCodecContext *
 
     // copy previous frame data
     ref_frames(s, s1);
-    if (!s1->current_frame.f)
-        return -1;
 
     if (s != s1) {
-        s->keyframe = s1->keyframe;
-
         // copy qscale data if necessary
         for (int i = 0; i < 3; i++) {
             if (s->qps[i] != s1->qps[1]) {
@@ -2551,8 +2533,6 @@ static int vp3_update_thread_context(AVCodecContext *dst, const AVCodecContext *
             s->nqps = s1->nqps;
         }
     }
-
-    update_frames(dst);
     return 0;
 }
 #endif
@@ -2646,14 +2626,14 @@ static int vp3_decode_frame(AVCodecContext *avctx, AVFrame *frame,
     if (avctx->skip_frame >= AVDISCARD_NONKEY && !s->keyframe)
         return buf_size;
 
-    ff_progress_frame_unref(&s->current_frame);
-    ret = ff_progress_frame_get_buffer(avctx, &s->current_frame,
+    ret = ff_progress_frame_get_buffer(avctx, &s->last_frame,
                                        AV_GET_BUFFER_FLAG_REF);
     if (ret < 0) {
         // Don't goto error here, as one can't report progress on or
         // unref a non-existent frame.
         return ret;
     }
+    FFSWAP(ProgressFrame, s->last_frame, s->current_frame);
     s->current_frame.f->pict_type = s->keyframe ? AV_PICTURE_TYPE_I
                                                 : AV_PICTURE_TYPE_P;
     if (s->keyframe)
@@ -2678,7 +2658,8 @@ static int vp3_decode_frame(AVCodecContext *avctx, AVFrame *frame,
 #if !CONFIG_VP4_DECODER
                 if (version >= 2) {
                     av_log(avctx, AV_LOG_ERROR, "This build does not support decoding VP4.\n");
-                    return AVERROR_DECODER_NOT_FOUND;
+                    ret = AVERROR_DECODER_NOT_FOUND;
+                    goto error;
                 }
 #endif
                 s->version = version;
@@ -2716,6 +2697,7 @@ static int vp3_decode_frame(AVCodecContext *avctx, AVFrame *frame,
             }
 #endif
         }
+        ff_progress_frame_replace(&s->golden_frame, &s->current_frame);
     } else {
         if (!s->golden_frame.f) {
             av_log(s->avctx, AV_LOG_WARNING,
@@ -2793,6 +2775,8 @@ static int vp3_decode_frame(AVCodecContext *avctx, AVFrame *frame,
         }
     vp3_draw_horiz_band(s, s->height);
 
+    ff_progress_frame_unref(&s->last_frame);
+
     /* output frame, offset as needed */
     if ((ret = av_frame_ref(frame, s->current_frame.f)) < 0)
         return ret;
@@ -2804,16 +2788,11 @@ static int vp3_decode_frame(AVCodecContext *avctx, AVFrame *frame,
 
     *got_frame = 1;
 
-    if (!HAVE_THREADS || !(s->avctx->active_thread_type & FF_THREAD_FRAME))
-        update_frames(avctx);
-
     return buf_size;
 
 error:
     ff_progress_frame_report(&s->current_frame, INT_MAX);
-
-    if (!HAVE_THREADS || !(s->avctx->active_thread_type & FF_THREAD_FRAME))
-        av_frame_unref(s->current_frame.f);
+    ff_progress_frame_unref(&s->last_frame);
 
     return ret;
 }
-- 
2.52.0


From 05cfe3f250a9b8657a9296b27fd6947079b539ad Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 25 Nov 2025 00:19:58 +0100
Subject: [PATCH 111/304] avcodec/arm/vp6dsp: Remove VP6 edge filter functions

Forgotten in 160ebe0a8d780f6db7c18e824d8ec6f437da33a2.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/arm/Makefile          |   2 -
 libavcodec/arm/vp6dsp_init_arm.c |  39 ----------
 libavcodec/arm/vp6dsp_neon.S     | 121 -------------------------------
 libavcodec/vp56dsp.c             |   4 +-
 libavcodec/vp56dsp.h             |   1 -
 5 files changed, 1 insertion(+), 166 deletions(-)
 delete mode 100644 libavcodec/arm/vp6dsp_init_arm.c
 delete mode 100644 libavcodec/arm/vp6dsp_neon.S

diff --git a/libavcodec/arm/Makefile b/libavcodec/arm/Makefile
index 811b364195..e32a0bf49f 100644
--- a/libavcodec/arm/Makefile
+++ b/libavcodec/arm/Makefile
@@ -42,7 +42,6 @@ OBJS-$(CONFIG_RV40_DECODER)            += arm/rv40dsp_init_arm.o
 OBJS-$(CONFIG_SBC_ENCODER)             += arm/sbcdsp_init_arm.o
 OBJS-$(CONFIG_TRUEHD_DECODER)          += arm/mlpdsp_init_arm.o
 OBJS-$(CONFIG_VORBIS_DECODER)          += arm/vorbisdsp_init_arm.o
-OBJS-$(CONFIG_VP6_DECODER)             += arm/vp6dsp_init_arm.o
 OBJS-$(CONFIG_VP9_DECODER)             += arm/vp9dsp_init_10bpp_arm.o   \
                                           arm/vp9dsp_init_12bpp_arm.o   \
                                           arm/vp9dsp_init_arm.o
@@ -139,7 +138,6 @@ NEON-OBJS-$(CONFIG_RV40_DECODER)       += arm/rv34dsp_neon.o            \
                                           arm/rv40dsp_neon.o
 NEON-OBJS-$(CONFIG_SBC_ENCODER)        += arm/sbcdsp_neon.o
 NEON-OBJS-$(CONFIG_VORBIS_DECODER)     += arm/vorbisdsp_neon.o
-NEON-OBJS-$(CONFIG_VP6_DECODER)        += arm/vp6dsp_neon.o
 NEON-OBJS-$(CONFIG_VP9_DECODER)        += arm/vp9itxfm_16bpp_neon.o     \
                                           arm/vp9itxfm_neon.o           \
                                           arm/vp9lpf_16bpp_neon.o       \
diff --git a/libavcodec/arm/vp6dsp_init_arm.c b/libavcodec/arm/vp6dsp_init_arm.c
deleted file mode 100644
index a59d61278c..0000000000
--- a/libavcodec/arm/vp6dsp_init_arm.c
+++ /dev/null
@@ -1,39 +0,0 @@
-/*
- * Copyright (c) 2010 Mans Rullgard <mans@mansr.com>
- *
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#include <stdint.h>
-
-#include "libavutil/attributes.h"
-#include "libavutil/arm/cpu.h"
-
-#include "libavcodec/vp56dsp.h"
-
-void ff_vp6_edge_filter_hor_neon(uint8_t *yuv, ptrdiff_t stride, int t);
-void ff_vp6_edge_filter_ver_neon(uint8_t *yuv, ptrdiff_t stride, int t);
-
-av_cold void ff_vp6dsp_init_arm(VP56DSPContext *s)
-{
-    int cpu_flags = av_get_cpu_flags();
-
-    if (have_neon(cpu_flags)) {
-        s->edge_filter_hor = ff_vp6_edge_filter_hor_neon;
-        s->edge_filter_ver = ff_vp6_edge_filter_ver_neon;
-    }
-}
diff --git a/libavcodec/arm/vp6dsp_neon.S b/libavcodec/arm/vp6dsp_neon.S
deleted file mode 100644
index 03dd28d1cb..0000000000
--- a/libavcodec/arm/vp6dsp_neon.S
+++ /dev/null
@@ -1,121 +0,0 @@
-/*
- * Copyright (c) 2010 Mans Rullgard <mans@mansr.com>
- *
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#include "libavutil/arm/asm.S"
-
-.macro  vp6_edge_filter
-        vdup.16         q3,  r2                 @ t
-        vmov.i16        q13, #1
-        vsubl.u8        q0,  d20, d18           @ p[   0] - p[-s]
-        vsubl.u8        q1,  d16, d22           @ p[-2*s] - p[ s]
-        vsubl.u8        q14, d21, d19
-        vsubl.u8        q15, d17, d23
-        vadd.i16        q2,  q0,  q0            @ 2*(p[0]-p[-s])
-        vadd.i16        d29, d28, d28
-        vadd.i16        q0,  q0,  q1            @    p[0]-p[-s]  + p[-2*s]-p[s]
-        vadd.i16        d28, d28, d30
-        vadd.i16        q0,  q0,  q2            @ 3*(p[0]-p[-s]) + p[-2*s]-p[s]
-        vadd.i16        d28, d28, d29
-        vrshr.s16       q0,  q0,  #3            @ v
-        vrshr.s16       d28, d28, #3
-        vsub.i16        q8,  q3,  q13           @ t-1
-        vabs.s16        q1,  q0                 @ V
-        vshr.s16        q2,  q0,  #15           @ s
-        vabs.s16        d30, d28
-        vshr.s16        d29, d28, #15
-        vsub.i16        q12, q1,  q3            @ V-t
-        vsub.i16        d31, d30, d6
-        vsub.i16        q12, q12, q13           @ V-t-1
-        vsub.i16        d31, d31, d26
-        vcge.u16        q12, q12, q8            @ V-t-1 >= t-1
-        vcge.u16        d31, d31, d16
-        vadd.i16        q13, q3,  q3            @ 2*t
-        vadd.i16        d16, d6,  d6
-        vsub.i16        q13, q13, q1            @ 2*t - V
-        vsub.i16        d16, d16, d30
-        vadd.i16        q13, q13, q2            @ += s
-        vadd.i16        d16, d16, d29
-        veor            q13, q13, q2            @ ^= s
-        veor            d16, d16, d29
-        vbif            q0,  q13, q12
-        vbif            d28, d16, d31
-        vmovl.u8        q1,  d20
-        vmovl.u8        q15, d21
-        vaddw.u8        q2,  q0,  d18
-        vaddw.u8        q3,  q14, d19
-        vsub.i16        q1,  q1,  q0
-        vsub.i16        d30, d30, d28
-        vqmovun.s16     d18, q2
-        vqmovun.s16     d19, q3
-        vqmovun.s16     d20, q1
-        vqmovun.s16     d21, q15
-.endm
-
-function ff_vp6_edge_filter_ver_neon, export=1
-        sub             r0,  r0,  r1,  lsl #1
-        vld1.8          {q8},     [r0], r1      @ p[-2*s]
-        vld1.8          {q9},     [r0], r1      @ p[-s]
-        vld1.8          {q10},    [r0], r1      @ p[0]
-        vld1.8          {q11},    [r0]          @ p[s]
-        vp6_edge_filter
-        sub             r0,  r0,  r1,  lsl #1
-        sub             r1,  r1,  #8
-        vst1.8          {d18},    [r0]!
-        vst1.32         {d19[0]}, [r0], r1
-        vst1.8          {d20},    [r0]!
-        vst1.32         {d21[0]}, [r0]
-        bx              lr
-endfunc
-
-function ff_vp6_edge_filter_hor_neon, export=1
-        sub             r3,  r0,  #1
-        sub             r0,  r0,  #2
-        vld1.32         {d16[0]}, [r0], r1
-        vld1.32         {d18[0]}, [r0], r1
-        vld1.32         {d20[0]}, [r0], r1
-        vld1.32         {d22[0]}, [r0], r1
-        vld1.32         {d16[1]}, [r0], r1
-        vld1.32         {d18[1]}, [r0], r1
-        vld1.32         {d20[1]}, [r0], r1
-        vld1.32         {d22[1]}, [r0], r1
-        vld1.32         {d17[0]}, [r0], r1
-        vld1.32         {d19[0]}, [r0], r1
-        vld1.32         {d21[0]}, [r0], r1
-        vld1.32         {d23[0]}, [r0], r1
-        vtrn.8          q8,  q9
-        vtrn.8          q10, q11
-        vtrn.16         q8,  q10
-        vtrn.16         q9,  q11
-        vp6_edge_filter
-        vtrn.8          q9,  q10
-        vst1.16         {d18[0]}, [r3], r1
-        vst1.16         {d20[0]}, [r3], r1
-        vst1.16         {d18[1]}, [r3], r1
-        vst1.16         {d20[1]}, [r3], r1
-        vst1.16         {d18[2]}, [r3], r1
-        vst1.16         {d20[2]}, [r3], r1
-        vst1.16         {d18[3]}, [r3], r1
-        vst1.16         {d20[3]}, [r3], r1
-        vst1.16         {d19[0]}, [r3], r1
-        vst1.16         {d21[0]}, [r3], r1
-        vst1.16         {d19[1]}, [r3], r1
-        vst1.16         {d21[1]}, [r3], r1
-        bx              lr
-endfunc
diff --git a/libavcodec/vp56dsp.c b/libavcodec/vp56dsp.c
index a668712384..1ff67b1c87 100644
--- a/libavcodec/vp56dsp.c
+++ b/libavcodec/vp56dsp.c
@@ -77,9 +77,7 @@ av_cold void ff_vp6dsp_init(VP56DSPContext *s)
 {
     s->vp6_filter_diag4 = ff_vp6_filter_diag4_c;
 
-#if ARCH_ARM
-    ff_vp6dsp_init_arm(s);
-#elif ARCH_X86
+#if ARCH_X86
     ff_vp6dsp_init_x86(s);
 #endif
 }
diff --git a/libavcodec/vp56dsp.h b/libavcodec/vp56dsp.h
index e35e232ea3..f2cbb41a1e 100644
--- a/libavcodec/vp56dsp.h
+++ b/libavcodec/vp56dsp.h
@@ -38,7 +38,6 @@ void ff_vp6_filter_diag4_c(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
 void ff_vp5dsp_init(VP56DSPContext *s);
 void ff_vp6dsp_init(VP56DSPContext *s);
 
-void ff_vp6dsp_init_arm(VP56DSPContext *s);
 void ff_vp6dsp_init_x86(VP56DSPContext *s);
 
 #endif /* AVCODEC_VP56DSP_H */
-- 
2.52.0


From abe4d8ad55ab6296efbaf98f24b1d4132c448f5d Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 25 Nov 2025 00:28:32 +0100
Subject: [PATCH 112/304] avcodec/vp56: Fix indentation

Forgotten in 160ebe0a8d780f6db7c18e824d8ec6f437da33a2.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/vp56.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libavcodec/vp56.c b/libavcodec/vp56.c
index dc3ae70c66..0ddf7c985c 100644
--- a/libavcodec/vp56.c
+++ b/libavcodec/vp56.c
@@ -325,9 +325,9 @@ static void vp56_deblock_filter(VP56Context *s, uint8_t *yuv,
                                 ptrdiff_t stride, int dx, int dy)
 {
     if (s->avctx->codec->id == AV_CODEC_ID_VP5) {
-    int t = ff_vp56_filter_threshold[s->quantizer];
-    if (dx)  s->vp56dsp.edge_filter_hor(yuv +         10-dx , stride, t);
-    if (dy)  s->vp56dsp.edge_filter_ver(yuv + stride*(10-dy), stride, t);
+        int t = ff_vp56_filter_threshold[s->quantizer];
+        if (dx)  s->vp56dsp.edge_filter_hor(yuv +         10-dx , stride, t);
+        if (dy)  s->vp56dsp.edge_filter_ver(yuv + stride*(10-dy), stride, t);
     } else {
         int * bounding_values = s->bounding_values_array + 127;
         if (dx)
-- 
2.52.0


From 3f76d568488febbfba91d510c8c1108096be9fd3 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 25 Nov 2025 00:44:47 +0100
Subject: [PATCH 113/304] avcodec/vp56dsp: Separate VP5DSP and VP6DSP

They don't have anything in common since
160ebe0a8d780f6db7c18e824d8ec6f437da33a2.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 configure                          |  5 ++---
 libavcodec/Makefile                |  4 ++--
 libavcodec/vp5.c                   |  2 +-
 libavcodec/vp56.c                  |  4 ++--
 libavcodec/vp56.h                  |  5 ++++-
 libavcodec/vp56dsp.h               | 15 +++++++--------
 libavcodec/{vp56dsp.c => vp5dsp.c} | 17 +----------------
 libavcodec/vp6.c                   |  4 ++--
 libavcodec/vp6dsp.c                | 13 +++++++++++--
 libavcodec/x86/vp6dsp_init.c       |  2 +-
 10 files changed, 33 insertions(+), 38 deletions(-)
 rename libavcodec/{vp56dsp.c => vp5dsp.c} (87%)

diff --git a/configure b/configure
index 7202cbc57a..a7b0c18d9c 100755
--- a/configure
+++ b/configure
@@ -2701,7 +2701,6 @@ CONFIG_EXTRA="
     vc1dsp
     videodsp
     vp3dsp
-    vp56dsp
     vp8dsp
     vulkan_encode
     vvc_sei
@@ -3197,8 +3196,8 @@ vc1image_decoder_select="vc1_decoder"
 vorbis_encoder_select="audio_frame_queue"
 vp3_decoder_select="hpeldsp vp3dsp videodsp"
 vp4_decoder_select="vp3_decoder"
-vp5_decoder_select="h264chroma hpeldsp videodsp vp3dsp vp56dsp"
-vp6_decoder_select="h264chroma hpeldsp huffman videodsp vp3dsp vp56dsp"
+vp5_decoder_select="h264chroma hpeldsp videodsp vp3dsp"
+vp6_decoder_select="h264chroma hpeldsp huffman videodsp vp3dsp"
 vp6a_decoder_select="vp6_decoder"
 vp6f_decoder_select="vp6_decoder"
 vp7_decoder_select="h264pred videodsp vp8dsp"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 49d696017d..40e68116e8 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -182,7 +182,6 @@ OBJS-$(CONFIG_AV1_AMF_DECODER)         += amfdec.o
 OBJS-$(CONFIG_VC1DSP)                  += vc1dsp.o
 OBJS-$(CONFIG_VIDEODSP)                += videodsp.o
 OBJS-$(CONFIG_VP3DSP)                  += vp3dsp.o
-OBJS-$(CONFIG_VP56DSP)                 += vp56dsp.o
 OBJS-$(CONFIG_VP8DSP)                  += vp8dsp.o
 OBJS-$(CONFIG_V4L2_M2M)                += v4l2_m2m.o v4l2_context.o v4l2_buffers.o v4l2_fmt.o
 OBJS-$(CONFIG_WMA_FREQS)               += wma_freqs.o
@@ -806,7 +805,8 @@ OBJS-$(CONFIG_VORBIS_DECODER)          += vorbisdec.o vorbisdsp.o vorbis.o \
 OBJS-$(CONFIG_VORBIS_ENCODER)          += vorbisenc.o vorbis.o \
                                           vorbis_data.o
 OBJS-$(CONFIG_VP3_DECODER)             += vp3.o jpegquanttables.o
-OBJS-$(CONFIG_VP5_DECODER)             += vp5.o vp56.o vp56data.o vpx_rac.o
+OBJS-$(CONFIG_VP5_DECODER)             += vp5.o vp56.o vp56data.o \
+                                          vp5dsp.o vpx_rac.o
 OBJS-$(CONFIG_VP6_DECODER)             += vp6.o vp56.o vp56data.o \
                                           vp6dsp.o vpx_rac.o
 OBJS-$(CONFIG_VP7_DECODER)             += vp8.o vp8data.o vpx_rac.o
diff --git a/libavcodec/vp5.c b/libavcodec/vp5.c
index 77b479471b..98b8cf41f2 100644
--- a/libavcodec/vp5.c
+++ b/libavcodec/vp5.c
@@ -285,7 +285,7 @@ static av_cold int vp5_decode_init(AVCodecContext *avctx)
 
     if ((ret = ff_vp56_init_context(avctx, s, 1, 0)) < 0)
         return ret;
-    ff_vp5dsp_init(&s->vp56dsp);
+    ff_vp5dsp_init(&s->vp5dsp);
     s->vp56_coord_div = vp5_coord_div;
     s->parse_vector_adjustment = vp5_parse_vector_adjustment;
     s->parse_coeff = vp5_parse_coeff;
diff --git a/libavcodec/vp56.c b/libavcodec/vp56.c
index 0ddf7c985c..0d13d7a276 100644
--- a/libavcodec/vp56.c
+++ b/libavcodec/vp56.c
@@ -326,8 +326,8 @@ static void vp56_deblock_filter(VP56Context *s, uint8_t *yuv,
 {
     if (s->avctx->codec->id == AV_CODEC_ID_VP5) {
         int t = ff_vp56_filter_threshold[s->quantizer];
-        if (dx)  s->vp56dsp.edge_filter_hor(yuv +         10-dx , stride, t);
-        if (dy)  s->vp56dsp.edge_filter_ver(yuv + stride*(10-dy), stride, t);
+        if (dx)  s->vp5dsp.edge_filter_hor(yuv +         10-dx , stride, t);
+        if (dy)  s->vp5dsp.edge_filter_ver(yuv + stride*(10-dy), stride, t);
     } else {
         int * bounding_values = s->bounding_values_array + 127;
         if (dx)
diff --git a/libavcodec/vp56.h b/libavcodec/vp56.h
index af46e2f188..6610fc2892 100644
--- a/libavcodec/vp56.h
+++ b/libavcodec/vp56.h
@@ -118,7 +118,10 @@ struct vp56_context {
     HpelDSPContext hdsp;
     VideoDSPContext vdsp;
     VP3DSPContext vp3dsp;
-    VP56DSPContext vp56dsp;
+    union {
+        VP5DSPContext vp5dsp;
+        VP6DSPContext vp6dsp;
+    };
     uint8_t idct_scantable[64];
     AVFrame *frames[4];
     uint8_t *edge_emu_buffer_alloc;
diff --git a/libavcodec/vp56dsp.h b/libavcodec/vp56dsp.h
index f2cbb41a1e..692fd0c8ac 100644
--- a/libavcodec/vp56dsp.h
+++ b/libavcodec/vp56dsp.h
@@ -24,20 +24,19 @@
 #include <stddef.h>
 #include <stdint.h>
 
-typedef struct VP56DSPContext {
+typedef struct VP5DSPContext {
     void (*edge_filter_hor)(uint8_t *yuv, ptrdiff_t stride, int t);
     void (*edge_filter_ver)(uint8_t *yuv, ptrdiff_t stride, int t);
+} VP5DSPContext;
 
+typedef struct VP6DSPContext {
     void (*vp6_filter_diag4)(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
                              const int16_t *h_weights,const int16_t *v_weights);
-} VP56DSPContext;
+} VP6DSPContext;
 
-void ff_vp6_filter_diag4_c(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
-                           const int16_t *h_weights, const int16_t *v_weights);
+void ff_vp5dsp_init(VP5DSPContext *s);
 
-void ff_vp5dsp_init(VP56DSPContext *s);
-void ff_vp6dsp_init(VP56DSPContext *s);
-
-void ff_vp6dsp_init_x86(VP56DSPContext *s);
+void ff_vp6dsp_init(VP6DSPContext *s);
+void ff_vp6dsp_init_x86(VP6DSPContext *s);
 
 #endif /* AVCODEC_VP56DSP_H */
diff --git a/libavcodec/vp56dsp.c b/libavcodec/vp5dsp.c
similarity index 87%
rename from libavcodec/vp56dsp.c
rename to libavcodec/vp5dsp.c
index 1ff67b1c87..a06c2cfd5f 100644
--- a/libavcodec/vp56dsp.c
+++ b/libavcodec/vp5dsp.c
@@ -21,8 +21,6 @@
 
 #include <stdint.h>
 
-#include "config.h"
-#include "config_components.h"
 #include "libavutil/attributes.h"
 #include "vp56dsp.h"
 #include "libavutil/common.h"
@@ -43,7 +41,6 @@ static void pfx ## _edge_filter_ ## suf(uint8_t *yuv, ptrdiff_t stride, \
     }                                                                   \
 }
 
-#if CONFIG_VP5_DECODER
 /* Gives very similar result than the vp6 version except in a few cases */
 static int vp5_adjust(int v, int t)
 {
@@ -65,20 +62,8 @@ static int vp5_adjust(int v, int t)
 VP56_EDGE_FILTER(vp5, hor, 1, stride)
 VP56_EDGE_FILTER(vp5, ver, stride, 1)
 
-av_cold void ff_vp5dsp_init(VP56DSPContext *s)
+av_cold void ff_vp5dsp_init(VP5DSPContext *s)
 {
     s->edge_filter_hor = vp5_edge_filter_hor;
     s->edge_filter_ver = vp5_edge_filter_ver;
 }
-#endif /* CONFIG_VP5_DECODER */
-
-#if CONFIG_VP6_DECODER
-av_cold void ff_vp6dsp_init(VP56DSPContext *s)
-{
-    s->vp6_filter_diag4 = ff_vp6_filter_diag4_c;
-
-#if ARCH_X86
-    ff_vp6dsp_init_x86(s);
-#endif
-}
-#endif /* CONFIG_VP6_DECODER */
diff --git a/libavcodec/vp6.c b/libavcodec/vp6.c
index 48ff9da818..3f4bd42d07 100644
--- a/libavcodec/vp6.c
+++ b/libavcodec/vp6.c
@@ -641,7 +641,7 @@ static void vp6_filter(VP56Context *s, uint8_t *dst, uint8_t *src,
             vp6_filter_hv4(dst, src+offset1, stride, stride,
                            vp6_block_copy_filter[select][y8]);
         } else {
-            s->vp56dsp.vp6_filter_diag4(dst, src+offset1+((mv.x^mv.y)>>31), stride,
+            s->vp6dsp.vp6_filter_diag4(dst, src+offset1+((mv.x^mv.y)>>31), stride,
                              vp6_block_copy_filter[select][x8],
                              vp6_block_copy_filter[select][y8]);
         }
@@ -661,7 +661,7 @@ static av_cold int vp6_decode_init_context(AVCodecContext *avctx,
     if (ret < 0)
         return ret;
 
-    ff_vp6dsp_init(&s->vp56dsp);
+    ff_vp6dsp_init(&s->vp6dsp);
 
     s->deblock_filtering = 0;
     s->vp56_coord_div = vp6_coord_div;
diff --git a/libavcodec/vp6dsp.c b/libavcodec/vp6dsp.c
index f7f6856330..76c4983960 100644
--- a/libavcodec/vp6dsp.c
+++ b/libavcodec/vp6dsp.c
@@ -27,8 +27,8 @@
 #include "vp56dsp.h"
 
 
-void ff_vp6_filter_diag4_c(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
-                           const int16_t *h_weights, const int16_t *v_weights)
+static void vp6_filter_diag4_c(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
+                               const int16_t *h_weights, const int16_t *v_weights)
 {
     int x, y;
     int tmp[8*11];
@@ -59,3 +59,12 @@ void ff_vp6_filter_diag4_c(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
         t += 8;
     }
 }
+
+av_cold void ff_vp6dsp_init(VP6DSPContext *s)
+{
+    s->vp6_filter_diag4 = vp6_filter_diag4_c;
+
+#if ARCH_X86
+    ff_vp6dsp_init_x86(s);
+#endif
+}
diff --git a/libavcodec/x86/vp6dsp_init.c b/libavcodec/x86/vp6dsp_init.c
index 83d45ec36c..07e3becaec 100644
--- a/libavcodec/x86/vp6dsp_init.c
+++ b/libavcodec/x86/vp6dsp_init.c
@@ -28,7 +28,7 @@
 void ff_vp6_filter_diag4_sse2(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
                               const int16_t *h_weights,const int16_t *v_weights);
 
-av_cold void ff_vp6dsp_init_x86(VP56DSPContext *c)
+av_cold void ff_vp6dsp_init_x86(VP6DSPContext *c)
 {
     int cpu_flags = av_get_cpu_flags();
 
-- 
2.52.0


From 52a7aa66b2deb4e2cc7817a22d72d2a59ff9f337 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 25 Nov 2025 01:00:58 +0100
Subject: [PATCH 114/304] avcodec/vp6dsp: Constify source in vp6_filter_diag4

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/vp56dsp.h         | 2 +-
 libavcodec/vp6dsp.c          | 2 +-
 libavcodec/x86/vp6dsp_init.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/libavcodec/vp56dsp.h b/libavcodec/vp56dsp.h
index 692fd0c8ac..3981de4015 100644
--- a/libavcodec/vp56dsp.h
+++ b/libavcodec/vp56dsp.h
@@ -30,7 +30,7 @@ typedef struct VP5DSPContext {
 } VP5DSPContext;
 
 typedef struct VP6DSPContext {
-    void (*vp6_filter_diag4)(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
+    void (*vp6_filter_diag4)(uint8_t *dst, const uint8_t *src, ptrdiff_t stride,
                              const int16_t *h_weights,const int16_t *v_weights);
 } VP6DSPContext;
 
diff --git a/libavcodec/vp6dsp.c b/libavcodec/vp6dsp.c
index 76c4983960..bdaa054307 100644
--- a/libavcodec/vp6dsp.c
+++ b/libavcodec/vp6dsp.c
@@ -27,7 +27,7 @@
 #include "vp56dsp.h"
 
 
-static void vp6_filter_diag4_c(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
+static void vp6_filter_diag4_c(uint8_t *dst, const uint8_t *src, ptrdiff_t stride,
                                const int16_t *h_weights, const int16_t *v_weights)
 {
     int x, y;
diff --git a/libavcodec/x86/vp6dsp_init.c b/libavcodec/x86/vp6dsp_init.c
index 07e3becaec..db9a95767e 100644
--- a/libavcodec/x86/vp6dsp_init.c
+++ b/libavcodec/x86/vp6dsp_init.c
@@ -25,7 +25,7 @@
 #include "libavutil/x86/cpu.h"
 #include "libavcodec/vp56dsp.h"
 
-void ff_vp6_filter_diag4_sse2(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
+void ff_vp6_filter_diag4_sse2(uint8_t *dst, const uint8_t *src, ptrdiff_t stride,
                               const int16_t *h_weights,const int16_t *v_weights);
 
 av_cold void ff_vp6dsp_init_x86(VP6DSPContext *c)
-- 
2.52.0


From 23e08ef2f97acb359c151297af812cc9de434d02 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 25 Nov 2025 10:53:41 +0100
Subject: [PATCH 115/304] tests/checkasm: Test VP6DSP

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 tests/checkasm/Makefile   |  1 +
 tests/checkasm/checkasm.c |  3 ++
 tests/checkasm/checkasm.h |  1 +
 tests/checkasm/vp6dsp.c   | 93 +++++++++++++++++++++++++++++++++++++++
 tests/fate/checkasm.mak   |  1 +
 5 files changed, 99 insertions(+)
 create mode 100644 tests/checkasm/vp6dsp.c

diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index 6636bc7774..3762c0d83b 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -50,6 +50,7 @@ AVCODECOBJS-$(CONFIG_UTVIDEO_DECODER)   += utvideodsp.o
 AVCODECOBJS-$(CONFIG_V210_DECODER)      += v210dec.o
 AVCODECOBJS-$(CONFIG_V210_ENCODER)      += v210enc.o
 AVCODECOBJS-$(CONFIG_VORBIS_DECODER)    += vorbisdsp.o
+AVCODECOBJS-$(CONFIG_VP6_DECODER)       += vp6dsp.o
 AVCODECOBJS-$(CONFIG_VP9_DECODER)       += vp9dsp.o
 AVCODECOBJS-$(CONFIG_VVC_DECODER)       += vvc_alf.o vvc_mc.o vvc_sao.o
 
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 20d8f19757..8c64684fa3 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -254,6 +254,9 @@ static const struct {
     #if CONFIG_VP3DSP
         { "vp3dsp", checkasm_check_vp3dsp },
     #endif
+    #if CONFIG_VP6_DECODER
+        { "vp6dsp", checkasm_check_vp6dsp },
+    #endif
     #if CONFIG_VP8DSP
         { "vp8dsp", checkasm_check_vp8dsp },
     #endif
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index 45cd23cac4..bd33aba263 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -154,6 +154,7 @@ void checkasm_check_vf_hflip(void);
 void checkasm_check_vf_threshold(void);
 void checkasm_check_vf_sobel(void);
 void checkasm_check_vp3dsp(void);
+void checkasm_check_vp6dsp(void);
 void checkasm_check_vp8dsp(void);
 void checkasm_check_vp9dsp(void);
 void checkasm_check_videodsp(void);
diff --git a/tests/checkasm/vp6dsp.c b/tests/checkasm/vp6dsp.c
new file mode 100644
index 0000000000..a5f1c9c2fc
--- /dev/null
+++ b/tests/checkasm/vp6dsp.c
@@ -0,0 +1,93 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <assert.h>
+#include <stddef.h>
+#include <string.h>
+
+#include "checkasm.h"
+#include "libavutil/intreadwrite.h"
+#include "libavutil/macros.h"
+#include "libavutil/mem_internal.h"
+#include "libavcodec/vp6data.h"
+#include "libavcodec/vp56dsp.h"
+
+#define randomize_buffer(buf)                                   \
+    do {                                                        \
+        for (size_t k = 0; k < (sizeof(buf) & ~3); k += 4)      \
+            AV_WN32A(buf + k, rnd());                           \
+        for (size_t k = sizeof(buf) & ~3; k < sizeof(buf); ++k) \
+            buf[k] = rnd();                                     \
+    } while (0)
+
+
+void checkasm_check_vp6dsp(void)
+{
+    enum {
+        BLOCK_SIZE_1D  = 8,
+        SRC_ROWS_ABOVE = 1,
+        SRC_ROWS_BELOW = 2,
+        SRC_COLS_LEFT  = 1,
+        SRC_COLS_RIGHT = 2,
+        SRC_ROWS       = SRC_ROWS_ABOVE + BLOCK_SIZE_1D + SRC_ROWS_BELOW,
+        SRC_ROW_SIZE   = SRC_COLS_LEFT  + BLOCK_SIZE_1D + SRC_COLS_RIGHT,
+        MAX_STRIDE     = 64,    ///< arbitrary
+        SRC_BUF_SIZE   = (SRC_ROWS - 1) * MAX_STRIDE + SRC_ROW_SIZE + 7 /* to vary misalignment */,
+        DST_BUF_SIZE   = (BLOCK_SIZE_1D - 1) * MAX_STRIDE + BLOCK_SIZE_1D,
+    };
+    VP6DSPContext vp6dsp;
+
+    ff_vp6dsp_init(&vp6dsp);
+
+    declare_func(void, uint8_t *dst, const uint8_t *src, ptrdiff_t stride,
+                       const int16_t *h_weights, const int16_t *v_weights);
+
+    if (check_func(vp6dsp.vp6_filter_diag4, "filter_diag4")) {
+        DECLARE_ALIGNED(8, uint8_t, dstbuf_ref)[DST_BUF_SIZE];
+        DECLARE_ALIGNED(8, uint8_t, dstbuf_new)[DST_BUF_SIZE];
+        DECLARE_ALIGNED(8, uint8_t, srcbuf)[SRC_BUF_SIZE];
+
+        randomize_buffer(dstbuf_ref);
+        randomize_buffer(srcbuf);
+        memcpy(dstbuf_new, dstbuf_ref, sizeof(dstbuf_new));
+
+        ptrdiff_t  stride = (rnd() % (MAX_STRIDE / 16) + 1) * 16;
+        const uint8_t *src = srcbuf + SRC_COLS_LEFT + rnd() % 8U;
+        uint8_t *dst_new = dstbuf_new, *dst_ref = dstbuf_ref;
+
+        if (rnd() & 1) {
+            dst_new += (BLOCK_SIZE_1D - 1) * stride;
+            dst_ref += (BLOCK_SIZE_1D - 1) * stride;
+            src     += (SRC_ROWS - 1) * stride;
+            stride  *= -1;
+        }
+        src += SRC_ROWS_ABOVE * stride;
+
+        unsigned select = rnd() % FF_ARRAY_ELEMS(vp6_block_copy_filter);
+        unsigned x8 = 1 + rnd() % (FF_ARRAY_ELEMS(vp6_block_copy_filter[0]) - 1);
+        unsigned y8 = 1 + rnd() % (FF_ARRAY_ELEMS(vp6_block_copy_filter[0]) - 1);
+        const int16_t *h_weights = vp6_block_copy_filter[select][x8];
+        const int16_t *v_weights = vp6_block_copy_filter[select][y8];
+
+        call_ref(dst_ref, src, stride, h_weights, v_weights);
+        call_new(dst_new, src, stride, h_weights, v_weights);
+        if (memcmp(dstbuf_new, dstbuf_ref, sizeof(dstbuf_new)))
+            fail();
+        bench_new(dst_new, src, stride, h_weights, v_weights);
+    }
+}
diff --git a/tests/fate/checkasm.mak b/tests/fate/checkasm.mak
index 2be880c8db..f182efde46 100644
--- a/tests/fate/checkasm.mak
+++ b/tests/fate/checkasm.mak
@@ -76,6 +76,7 @@ FATE_CHECKASM = fate-checkasm-aacencdsp                                 \
                 fate-checkasm-videodsp                                  \
                 fate-checkasm-vorbisdsp                                 \
                 fate-checkasm-vp3dsp                                    \
+                fate-checkasm-vp6dsp                                    \
                 fate-checkasm-vp8dsp                                    \
                 fate-checkasm-vp9dsp                                    \
                 fate-checkasm-vvc_alf                                   \
-- 
2.52.0


From 43c0b352fbcf6c1556c662e4f55fcee9c0c45a51 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 25 Nov 2025 10:57:39 +0100
Subject: [PATCH 116/304] avcodec/x86/vp6dsp: Fix outdated comment

Forgotten in 6cb3ee80b3b58d692a722fb38ee05f170ae8b0d2.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp6dsp.asm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/x86/vp6dsp.asm b/libavcodec/x86/vp6dsp.asm
index 0106541734..61336f6465 100644
--- a/libavcodec/x86/vp6dsp.asm
+++ b/libavcodec/x86/vp6dsp.asm
@@ -1,5 +1,5 @@
 ;******************************************************************************
-;* MMX/SSE2-optimized functions for the VP6 decoder
+;* SSE2-optimized functions for the VP6 decoder
 ;* Copyright (C) 2009  Sebastien Lucas <sebastien.lucas@gmail.com>
 ;* Copyright (C) 2009  Zuxy Meng <zuxy.meng@gmail.com>
 ;*
-- 
2.52.0


From d6cbf12fa7ccbb37f862971ed4b9297ae410d1a1 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 25 Nov 2025 11:15:15 +0100
Subject: [PATCH 117/304] avcodec/x86/vp6dsp: Don't align the stack manually

For most systems (particularly all x64), the stack is already
guaranteed to be sufficiently aligned. So just use x86inc's
stack feature which does the right thing.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp6dsp.asm | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/libavcodec/x86/vp6dsp.asm b/libavcodec/x86/vp6dsp.asm
index 61336f6465..a9340ed05b 100644
--- a/libavcodec/x86/vp6dsp.asm
+++ b/libavcodec/x86/vp6dsp.asm
@@ -62,11 +62,7 @@ SECTION .text
 ; void ff_vp6_filter_diag4_<opt>(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
 ;                                const int16_t h_weight[4], const int16_t v_weights[4])
 INIT_XMM sse2
-cglobal vp6_filter_diag4, 5, 7, 8
-    mov          r5, rsp         ; backup stack pointer
-    and         rsp, ~(mmsize-1) ; align stack
-    sub         rsp, 8*11
-
+cglobal vp6_filter_diag4, 5, 6, 8, -8*11
     sub          r1, r2
 
     pxor         m7, m7
@@ -74,25 +70,24 @@ cglobal vp6_filter_diag4, 5, 7, 8
     SPLAT4REGS
 
     mov          r3, rsp
-    mov          r6, 11
+    mov         r5d, 11
 .nextrow:
     DIAG4        r1, -1, 0, 1, 2, r3
     add          r3, 8
     add          r1, r2
-    dec          r6
+    dec         r5d
     jnz .nextrow
 
     movq         m3, [r4]
     SPLAT4REGS
 
     lea          r3, [rsp+8]
-    mov          r6, 8
+    mov         r1d, 8
 .nextcol:
     DIAG4        r3, -8, 0, 8, 16, r0
     add          r3, 8
     add          r0, r2
-    dec          r6
+    dec         r1d
     jnz .nextcol
 
-    mov         rsp, r5          ; restore stack pointer
     RET
-- 
2.52.0


From dbb7fd8cbe5cd79ca35fe7a00f8793f96a206ac6 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 25 Nov 2025 11:22:45 +0100
Subject: [PATCH 118/304] avcodec/x86/vp6dsp: Simplify splatting

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp6dsp.asm | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/libavcodec/x86/vp6dsp.asm b/libavcodec/x86/vp6dsp.asm
index a9340ed05b..83b26d03cd 100644
--- a/libavcodec/x86/vp6dsp.asm
+++ b/libavcodec/x86/vp6dsp.asm
@@ -49,14 +49,11 @@ SECTION .text
 %endmacro
 
 %macro SPLAT4REGS 0
-    pshuflw      m4, m3, 0x0
-    pshuflw      m5, m3, 0x55
-    pshuflw      m6, m3, 0xAA
-    pshuflw      m3, m3, 0xFF
-    punpcklqdq   m4, m4
-    punpcklqdq   m5, m5
-    punpcklqdq   m6, m6
-    punpcklqdq   m3, m3
+    punpcklwd    m3, m3
+    pshufd       m4, m3, 0x0
+    pshufd       m5, m3, 0x55
+    pshufd       m6, m3, 0xAA
+    pshufd       m3, m3, 0xFF
 %endmacro
 
 ; void ff_vp6_filter_diag4_<opt>(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
-- 
2.52.0


From dda4740a5d2f4f08affcc4d91b32ebe0f14aea26 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 25 Nov 2025 11:27:16 +0100
Subject: [PATCH 119/304] avcodec/x86/vp6dsp: Avoid saturated addition

Only the two middle coefficients are so huge that overflow can happen.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp6dsp.asm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/x86/vp6dsp.asm b/libavcodec/x86/vp6dsp.asm
index 83b26d03cd..b9b562f84f 100644
--- a/libavcodec/x86/vp6dsp.asm
+++ b/libavcodec/x86/vp6dsp.asm
@@ -38,11 +38,11 @@ SECTION .text
     movq          m2, [%1+%5]
     punpcklbw     m1, m7
     punpcklbw     m2, m7
+    paddw         m0, [pw_64]    ; Add 64
     pmullw        m1, m6         ; src[x+8 ] * biweight [2]
     pmullw        m2, m3         ; src[x+16] * biweight [3]
     paddw         m1, m2
     paddsw        m0, m1
-    paddsw        m0, [pw_64]    ; Add 64
     psraw         m0, 7
     packuswb      m0, m0
     movq        [%6], m0
-- 
2.52.0


From 119a19e25828a6924827e0813467e0ee93e3cba1 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 25 Nov 2025 11:46:40 +0100
Subject: [PATCH 120/304] avcodec/x86/vp6dsp: Avoid packing+unpacking

Store the intermediate values as words, clipped to the 0..255 range
instead.

Old benchmarks:
filter_diag4_c:                                        353.4 ( 1.00x)
filter_diag4_sse2:                                      57.5 ( 6.15x)

New benchmarks:
filter_diag4_c:                                        350.6 ( 1.00x)
filter_diag4_sse2:                                      55.1 ( 6.36x)

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp6dsp.asm | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/libavcodec/x86/vp6dsp.asm b/libavcodec/x86/vp6dsp.asm
index b9b562f84f..1f7443db69 100644
--- a/libavcodec/x86/vp6dsp.asm
+++ b/libavcodec/x86/vp6dsp.asm
@@ -26,26 +26,41 @@ cextern pw_64
 
 SECTION .text
 
-%macro DIAG4 6
+%macro DIAG4 7
+%if %7
+    mova          m0, [%1+%2]
+    mova          m1, [%1+%3]
+%else
     movq          m0, [%1+%2]
     movq          m1, [%1+%3]
     punpcklbw     m0, m7
     punpcklbw     m1, m7
+%endif
     pmullw        m0, m4         ; src[x-8 ] * biweight [0]
     pmullw        m1, m5         ; src[x   ] * biweight [1]
     paddw         m0, m1
+%if %7
+    mova          m1, [%1+%4]
+    mova          m2, [%1+%5]
+%else
     movq          m1, [%1+%4]
     movq          m2, [%1+%5]
     punpcklbw     m1, m7
     punpcklbw     m2, m7
+%endif
     paddw         m0, [pw_64]    ; Add 64
     pmullw        m1, m6         ; src[x+8 ] * biweight [2]
     pmullw        m2, m3         ; src[x+16] * biweight [3]
     paddw         m1, m2
     paddsw        m0, m1
     psraw         m0, 7
+%if %7
     packuswb      m0, m0
     movq        [%6], m0
+%else
+    pmaxsw        m0, m7         ; clip to 0-255 range
+    mova        [%6], m0
+%endif
 %endmacro
 
 %macro SPLAT4REGS 0
@@ -59,7 +74,7 @@ SECTION .text
 ; void ff_vp6_filter_diag4_<opt>(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
 ;                                const int16_t h_weight[4], const int16_t v_weights[4])
 INIT_XMM sse2
-cglobal vp6_filter_diag4, 5, 6, 8, -8*11
+cglobal vp6_filter_diag4, 5, 6, 8, -16*11
     sub          r1, r2
 
     pxor         m7, m7
@@ -69,8 +84,8 @@ cglobal vp6_filter_diag4, 5, 6, 8, -8*11
     mov          r3, rsp
     mov         r5d, 11
 .nextrow:
-    DIAG4        r1, -1, 0, 1, 2, r3
-    add          r3, 8
+    DIAG4        r1, -1, 0, 1, 2, r3, 0
+    add          r3, 16
     add          r1, r2
     dec         r5d
     jnz .nextrow
@@ -78,11 +93,11 @@ cglobal vp6_filter_diag4, 5, 6, 8, -8*11
     movq         m3, [r4]
     SPLAT4REGS
 
-    lea          r3, [rsp+8]
+    lea          r3, [rsp+16]
     mov         r1d, 8
 .nextcol:
-    DIAG4        r3, -8, 0, 8, 16, r0
-    add          r3, 8
+    DIAG4        r3, -16, 0, 16, 32, r0, 1
+    add          r3, 16
     add          r0, r2
     dec         r1d
     jnz .nextcol
-- 
2.52.0


From 0e9463b653adb02dc3589666a5bc215c56d2b4fc Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Mon, 24 Nov 2025 17:45:36 +0100
Subject: [PATCH 121/304] avcodec/cbs_apv: Use
 ff_cbs_{read,write}_simple_unsigned()

Avoids checks and makes the calls cheaper.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/cbs_apv.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/libavcodec/cbs_apv.c b/libavcodec/cbs_apv.c
index a1e546c92e..7a06244c69 100644
--- a/libavcodec/cbs_apv.c
+++ b/libavcodec/cbs_apv.c
@@ -67,8 +67,6 @@ static void cbs_apv_derive_tile_info(CodedBitstreamContext *ctx,
 
 #define u(width, name, range_min, range_max) \
     xu(width, name, current->name, range_min, range_max, 0, )
-#define ub(width, name) \
-    xu(width, name, current->name, 0, MAX_UINT_BITS(width), 0, )
 #define us(width, name, range_min, range_max, subs, ...) \
     xu(width, name, current->name, range_min, range_max,  subs, __VA_ARGS__)
 #define ubs(width, name, subs, ...) \
@@ -86,6 +84,12 @@ static void cbs_apv_derive_tile_info(CodedBitstreamContext *ctx,
 #define RWContext GetBitContext
 #define FUNC(name) cbs_apv_read_ ## name
 
+#define ub(width, name) do { \
+        uint32_t value; \
+        CHECK(CBS_FUNC(read_simple_unsigned)(ctx, rw, width, #name, \
+                                             &value)); \
+        current->name = value; \
+    } while (0)
 #define xu(width, name, var, range_min, range_max, subs, ...) do { \
         uint32_t value; \
         CHECK(CBS_FUNC(read_unsigned)(ctx, rw, width, #name, \
@@ -106,6 +110,7 @@ static void cbs_apv_derive_tile_info(CodedBitstreamContext *ctx,
 #undef READWRITE
 #undef RWContext
 #undef FUNC
+#undef ub
 #undef xu
 #undef infer
 #undef byte_alignment
@@ -117,6 +122,11 @@ static void cbs_apv_derive_tile_info(CodedBitstreamContext *ctx,
 #define RWContext PutBitContext
 #define FUNC(name) cbs_apv_write_ ## name
 
+#define ub(width, name) do { \
+        uint32_t value = current->name; \
+        CHECK(CBS_FUNC(write_simple_unsigned)(ctx, rw, width, #name, \
+                                              value)); \
+    } while (0)
 #define xu(width, name, var, range_min, range_max, subs, ...) do { \
         uint32_t value = var; \
         CHECK(CBS_FUNC(write_unsigned)(ctx, rw, width, #name, \
@@ -142,6 +152,7 @@ static void cbs_apv_derive_tile_info(CodedBitstreamContext *ctx,
 #undef READWRITE
 #undef RWContext
 #undef FUNC
+#undef ub
 #undef xu
 #undef infer
 #undef byte_alignment
-- 
2.52.0


From 7d8bb1bc6ade4be612a3cb320d01c58c7e41f30b Mon Sep 17 00:00:00 2001
From: Zhao Zhili <zhilizhao@tencent.com>
Date: Tue, 25 Nov 2025 13:00:34 +0800
Subject: [PATCH 122/304] avformat/mov: fix crash when stsz_sample_size is zero
 and sample_sizes is null

Co-Authored-by: James Almer <jamrial@gmail.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
---
 libavformat/mov.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libavformat/mov.c b/libavformat/mov.c
index ee09478aef..009ddfec80 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -5164,7 +5164,8 @@ static int sanity_checks(void *log_obj, MOVStreamContext *sc, int index)
 {
     if ((sc->chunk_count && (!sc->stts_count || !sc->stsc_count ||
                             (!sc->sample_size && !sc->sample_count))) ||
-        (!sc->chunk_count && sc->sample_count)) {
+        (sc->sample_count && (!sc->chunk_count ||
+                             (!sc->sample_size && !sc->sample_sizes)))) {
         av_log(log_obj, AV_LOG_ERROR, "stream %d, missing mandatory atoms, broken header\n",
                index);
         return 1;
-- 
2.52.0


From d1ba4d6d373a09e260b7b759c4a5ed2268ab6906 Mon Sep 17 00:00:00 2001
From: mux47 <mux100@protonmail.com>
Date: Tue, 18 Nov 2025 16:43:49 +0000
Subject: [PATCH 123/304] libavcodec/opus/parser: Fix spurious 'Error parsing
 Opus packet header'

When PARSER_FLAG_COMPLETE_FRAMES is set, opus_parse() calls
set_frame_duration even on flush (buf_size==0), which triggers
a spurious "Error parsing Opus packet header" at EOF.

Match streaming-path behavior by skipping duration parsing on empty buffers.
Fixes #20954
---
 libavcodec/opus/parser.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/opus/parser.c b/libavcodec/opus/parser.c
index ae3d66592c..bab0e50412 100644
--- a/libavcodec/opus/parser.c
+++ b/libavcodec/opus/parser.c
@@ -189,7 +189,7 @@ static int opus_parse(AVCodecParserContext *ctx, AVCodecContext *avctx,
     if (ctx->flags & PARSER_FLAG_COMPLETE_FRAMES) {
         next = buf_size;
 
-        if (set_frame_duration(ctx, avctx, buf, buf_size) < 0)
+        if (buf_size && set_frame_duration(ctx, avctx, buf, buf_size) < 0)
             goto fail;
     } else {
         next = opus_find_frame_end(ctx, avctx, buf, buf_size, &header_len);
-- 
2.52.0


From e560a6607096de3ea452d8bccf43b4c94c93b4fa Mon Sep 17 00:00:00 2001
From: Frank Plowman <post@frankplowman.com>
Date: Sun, 19 Oct 2025 12:28:00 +0100
Subject: [PATCH 124/304] lavc/vvc: Ensure seq_decode is always updated with
 SPS

seq_decode is used to ensure that a picture and all of its reference
pictures use the same SPS. Any time the SPS changes, seq_decode should
be incremented. Prior to this patch, seq_decode was incremented in
frame_context_setup, which is called after the SPS is potentially
changed in decode_sps. Should the decoder encounter an error between
changing the SPS and incrementing seq_decode, the SPS could be modified
while seq_decode was not incremented, which could lead to invalid
reference pictures and various downstream issues. By instead updating
seq_decode within the picture set manager, we ensure seq_decode and the
SPS are always updated in tandem.
---
 libavcodec/vvc/dec.c | 1 -
 libavcodec/vvc/ps.c  | 2 ++
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/libavcodec/vvc/dec.c b/libavcodec/vvc/dec.c
index bfe330e028..028f34b491 100644
--- a/libavcodec/vvc/dec.c
+++ b/libavcodec/vvc/dec.c
@@ -726,7 +726,6 @@ static int frame_context_setup(VVCFrameContext *fc, VVCContext *s)
     }
 
     if (IS_IDR(s)) {
-        s->seq_decode = (s->seq_decode + 1) & 0xff;
         ff_vvc_clear_refs(fc);
     }
 
diff --git a/libavcodec/vvc/ps.c b/libavcodec/vvc/ps.c
index 8faee1d826..a591851238 100644
--- a/libavcodec/vvc/ps.c
+++ b/libavcodec/vvc/ps.c
@@ -279,12 +279,14 @@ fail:
 
 static int decode_sps(VVCParamSets *ps, AVCodecContext *c, const H266RawSPS *rsps, int is_clvss)
 {
+    VVCContext *s           = c->priv_data;
     const int sps_id        = rsps->sps_seq_parameter_set_id;
     const VVCSPS *old_sps   = ps->sps_list[sps_id];
     const VVCSPS *sps;
 
     if (is_clvss) {
         ps->sps_id_used = 0;
+        s->seq_decode = (s->seq_decode + 1) & 0xff;
     }
 
     if (old_sps) {
-- 
2.52.0


From 1286815a0280abfcdf5d7af2188b9d0e7d208651 Mon Sep 17 00:00:00 2001
From: Thomas Gritzan <phygon@gmx.de>
Date: Mon, 24 Nov 2025 00:26:33 +0100
Subject: [PATCH 125/304] libavdevice/decklink: Implement QueryInterface to
 support newer driver

Playback to a decklink device with a newer version of the
DeckLink SDK (14.3) stalls because the driver code calls
IDeckLinkVideoFrame::QueryInterface, which is not
implemented by ffmpeg.
This patch implements decklink_frame::QueryInterface,
so that playback works with both older (12.x) and
newer (>= 14.3) drivers.

Note: The patch still does not allow the code to compile
with DeckLink SDK 14.3 or newer, as the API has changed.
---
 configure                    |  4 ++--
 libavdevice/decklink_enc.cpp | 26 +++++++++++++++++++++++++-
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index a7b0c18d9c..883539e361 100755
--- a/configure
+++ b/configure
@@ -7405,8 +7405,8 @@ fi
 if enabled decklink; then
     case $target_os in
         mingw32*|mingw64*|win32|win64)
-            decklink_outdev_extralibs="$decklink_outdev_extralibs -lole32 -loleaut32"
-            decklink_indev_extralibs="$decklink_indev_extralibs -lole32 -loleaut32"
+            decklink_outdev_extralibs="$decklink_outdev_extralibs -lole32 -luuid -loleaut32"
+            decklink_indev_extralibs="$decklink_indev_extralibs -lole32 -luuid -loleaut32"
             ;;
     esac
 fi
diff --git a/libavdevice/decklink_enc.cpp b/libavdevice/decklink_enc.cpp
index cb8f91730e..195f005c17 100644
--- a/libavdevice/decklink_enc.cpp
+++ b/libavdevice/decklink_enc.cpp
@@ -47,6 +47,16 @@ extern "C" {
 #include "libklvanc/pixels.h"
 #endif
 
+#ifdef _WIN32
+#include <guiddef.h>
+#else
+/* There is no guiddef.h in Linux builds, so we provide our own IsEqualIID() */
+static bool IsEqualIID(REFIID riid1, REFIID riid2)
+{
+    return memcmp(&riid1, &riid2, sizeof(REFIID)) == 0;
+}
+#endif
+
 /* DeckLink callback class declaration */
 class decklink_frame : public IDeckLinkVideoFrame
 {
@@ -111,7 +121,21 @@ public:
         _ancillary->AddRef();
         return S_OK;
     }
-    virtual HRESULT STDMETHODCALLTYPE QueryInterface(REFIID iid, LPVOID *ppv) { return E_NOINTERFACE; }
+    virtual HRESULT STDMETHODCALLTYPE QueryInterface(REFIID riid, LPVOID *ppv)
+    {
+        if (IsEqualIID(riid, IID_IUnknown)) {
+            *ppv = static_cast<IUnknown*>(this);
+        } else if (IsEqualIID(riid, IID_IDeckLinkVideoFrame)) {
+            *ppv = static_cast<IDeckLinkVideoFrame*>(this);
+        } else {
+            *ppv = NULL;
+            return E_NOINTERFACE;
+        }
+
+        AddRef();
+        return S_OK;
+    }
+
     virtual ULONG   STDMETHODCALLTYPE AddRef(void)                            { return ++_refs; }
     virtual ULONG   STDMETHODCALLTYPE Release(void)
     {
-- 
2.52.0


From ac9a0f62e915117e2727a47e92c0eb93e5e69607 Mon Sep 17 00:00:00 2001
From: Diego de Souza <ddesouza@nvidia.com>
Date: Wed, 12 Nov 2025 20:08:45 +0100
Subject: [PATCH 126/304] avutil/hwcontext_cuda: Expands pixel formats support

Add support for additional pixel formats in CUDA hardware context:
- Planar formats (yuv420p10, yuv422p, yuv422p10, yuv444p10)
- Semiplanar formats (nv16, p210, p216)

Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
---
 libavutil/hwcontext_cuda.c | 4 ++++
 libavutil/version.h        | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/libavutil/hwcontext_cuda.c b/libavutil/hwcontext_cuda.c
index 10d3399537..b0b65b2446 100644
--- a/libavutil/hwcontext_cuda.c
+++ b/libavutil/hwcontext_cuda.c
@@ -50,6 +50,10 @@ static const enum AVPixelFormat supported_formats[] = {
     AV_PIX_FMT_P016,
     AV_PIX_FMT_P210,
     AV_PIX_FMT_P216,
+    AV_PIX_FMT_YUV422P,
+    AV_PIX_FMT_YUV420P10,
+    AV_PIX_FMT_YUV422P10,
+    AV_PIX_FMT_YUV444P10,
     AV_PIX_FMT_YUV444P10MSB,
     AV_PIX_FMT_YUV444P12MSB,
     AV_PIX_FMT_YUV444P16,
diff --git a/libavutil/version.h b/libavutil/version.h
index db250d5c9e..d058e94425 100644
--- a/libavutil/version.h
+++ b/libavutil/version.h
@@ -80,7 +80,7 @@
 
 #define LIBAVUTIL_VERSION_MAJOR  60
 #define LIBAVUTIL_VERSION_MINOR  19
-#define LIBAVUTIL_VERSION_MICRO 100
+#define LIBAVUTIL_VERSION_MICRO 101
 
 #define LIBAVUTIL_VERSION_INT   AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \
                                                LIBAVUTIL_VERSION_MINOR, \
-- 
2.52.0


From be724c2d1c7962d7df1b2be3080858d045dd3ac0 Mon Sep 17 00:00:00 2001
From: Diego de Souza <ddesouza@nvidia.com>
Date: Thu, 13 Nov 2025 09:49:45 +0100
Subject: [PATCH 127/304] avfilter/hwupload_cuda: Expands pixel formats support

Add support for uploading additional pixel formats to NVIDIA GPUs:
- Planar formats (yuv420p10, yuv422p, yuv422p10, yuv444p10)
- Semiplanar formats (nv16, p210, p216)

Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
---
 libavfilter/vf_hwupload_cuda.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libavfilter/vf_hwupload_cuda.c b/libavfilter/vf_hwupload_cuda.c
index b505f8b298..34f959ca50 100644
--- a/libavfilter/vf_hwupload_cuda.c
+++ b/libavfilter/vf_hwupload_cuda.c
@@ -59,9 +59,9 @@ static int cudaupload_query_formats(const AVFilterContext *ctx,
     int ret;
 
     static const enum AVPixelFormat input_pix_fmts[] = {
-        AV_PIX_FMT_NV12, AV_PIX_FMT_YUV420P, AV_PIX_FMT_YUVA420P, AV_PIX_FMT_YUV444P,
-        AV_PIX_FMT_P010, AV_PIX_FMT_P016, AV_PIX_FMT_YUV444P16,
-        AV_PIX_FMT_0RGB32, AV_PIX_FMT_0BGR32,
+        AV_PIX_FMT_NV12, AV_PIX_FMT_YUV420P, AV_PIX_FMT_YUVA420P, AV_PIX_FMT_NV16, AV_PIX_FMT_YUV422P, AV_PIX_FMT_YUV444P,
+        AV_PIX_FMT_P010, AV_PIX_FMT_P016, AV_PIX_FMT_P210, AV_PIX_FMT_P216, AV_PIX_FMT_YUV420P10, AV_PIX_FMT_YUV422P10, AV_PIX_FMT_YUV444P10, AV_PIX_FMT_YUV444P16,
+        AV_PIX_FMT_0RGB32, AV_PIX_FMT_0BGR32, AV_PIX_FMT_RGB32, AV_PIX_FMT_BGR32,
 #if CONFIG_VULKAN
         AV_PIX_FMT_VULKAN,
 #endif
-- 
2.52.0


From 7a56795da461e56ec00d2359c196ac35fc2ab860 Mon Sep 17 00:00:00 2001
From: Diego de Souza <ddesouza@nvidia.com>
Date: Tue, 18 Nov 2025 17:16:43 +0100
Subject: [PATCH 128/304] avfilter/scale_cuda: Add support for 4:2:2 chroma
 subsampling

The supported YUV pixel formats were separated between planar
and semiplanar. This approach reduces the number of CUDA kernels
for all pixel formats.

This patch:
1. Adds support for YUV 4:2:2 planar and semi-planar formats:
        yuv422p, yuv422p10, nv16, p210, p216
2. Implements new conversion structures and kernel definitions
        for planar and semi-planar formats

Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
---
 libavfilter/Makefile         |    1 +
 libavfilter/version.h        |    2 +-
 libavfilter/vf_scale_cuda.c  |   90 ++-
 libavfilter/vf_scale_cuda.cu | 1147 +++++++++++++++++-----------------
 libavfilter/vf_scale_cuda.h  |   22 +
 5 files changed, 672 insertions(+), 590 deletions(-)

diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index d56a458e45..67814c0d77 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -678,6 +678,7 @@ SKIPHEADERS-$(CONFIG_QSVVPP)                 += qsvvpp.h stack_internal.h
 SKIPHEADERS-$(CONFIG_OPENCL)                 += opencl.h
 SKIPHEADERS-$(CONFIG_VAAPI)                  += vaapi_vpp.h stack_internal.h
 SKIPHEADERS-$(CONFIG_VULKAN)                 += vulkan_filter.h
+SKIPHEADERS-$(CONFIG_SCALE_CUDA_FILTER)      += vf_scale_cuda.h
 
 TOOLS     = graph2dot
 TESTPROGS = drawutils filtfmts formats integral
diff --git a/libavfilter/version.h b/libavfilter/version.h
index 4a69d6be98..776321d1fc 100644
--- a/libavfilter/version.h
+++ b/libavfilter/version.h
@@ -32,7 +32,7 @@
 #include "version_major.h"
 
 #define LIBAVFILTER_VERSION_MINOR  10
-#define LIBAVFILTER_VERSION_MICRO 100
+#define LIBAVFILTER_VERSION_MICRO 101
 
 
 #define LIBAVFILTER_VERSION_INT AV_VERSION_INT(LIBAVFILTER_VERSION_MAJOR, \
diff --git a/libavfilter/vf_scale_cuda.c b/libavfilter/vf_scale_cuda.c
index 88a6e20610..5fd757161b 100644
--- a/libavfilter/vf_scale_cuda.c
+++ b/libavfilter/vf_scale_cuda.c
@@ -39,17 +39,29 @@
 #include "cuda/load_helper.h"
 #include "vf_scale_cuda.h"
 
-static const enum AVPixelFormat supported_formats[] = {
-    AV_PIX_FMT_YUV420P,
-    AV_PIX_FMT_NV12,
-    AV_PIX_FMT_YUV444P,
-    AV_PIX_FMT_P010,
-    AV_PIX_FMT_P016,
-    AV_PIX_FMT_YUV444P16,
-    AV_PIX_FMT_0RGB32,
-    AV_PIX_FMT_0BGR32,
-    AV_PIX_FMT_RGB32,
-    AV_PIX_FMT_BGR32,
+struct format_entry {
+    enum AVPixelFormat format;
+    char name[13];
+};
+
+static const struct format_entry supported_formats[] = {
+    {AV_PIX_FMT_YUV420P,  "planar8"},
+    {AV_PIX_FMT_YUV422P,  "planar8"},
+    {AV_PIX_FMT_YUV444P,  "planar8"},
+    {AV_PIX_FMT_YUV420P10,"planar10"},
+    {AV_PIX_FMT_YUV422P10,"planar10"},
+    {AV_PIX_FMT_YUV444P10,"planar10"},
+    {AV_PIX_FMT_YUV444P16,"planar16"},
+    {AV_PIX_FMT_NV12,     "semiplanar8"},
+    {AV_PIX_FMT_NV16,     "semiplanar8"},
+    {AV_PIX_FMT_P010,     "semiplanar10"},
+    {AV_PIX_FMT_P210,     "semiplanar10"},
+    {AV_PIX_FMT_P016,     "semiplanar16"},
+    {AV_PIX_FMT_P216,     "semiplanar16"},
+    {AV_PIX_FMT_0RGB32,   "bgr0"},
+    {AV_PIX_FMT_0BGR32,   "rgb0"},
+    {AV_PIX_FMT_RGB32,    "bgra"},
+    {AV_PIX_FMT_BGR32,    "rgba"},
 };
 
 #define DIV_UP(a, b) ( ((a) + (b) - 1) / (b) )
@@ -184,14 +196,20 @@ fail:
 
 static int format_is_supported(enum AVPixelFormat fmt)
 {
-    int i;
-
-    for (i = 0; i < FF_ARRAY_ELEMS(supported_formats); i++)
-        if (supported_formats[i] == fmt)
+    for (int i = 0; i < FF_ARRAY_ELEMS(supported_formats); i++)
+        if (supported_formats[i].format == fmt)
             return 1;
     return 0;
 }
 
+static const char* get_format_name(enum AVPixelFormat fmt)
+{
+    for (int i = 0; i < FF_ARRAY_ELEMS(supported_formats); i++)
+        if (supported_formats[i].format == fmt)
+            return supported_formats[i].name;
+    return NULL;
+}
+
 static av_cold void set_format_info(AVFilterContext *ctx, enum AVPixelFormat in_format, enum AVPixelFormat out_format)
 {
     CUDAScaleContext *s = ctx->priv;
@@ -284,8 +302,8 @@ static av_cold int cudascale_load_functions(AVFilterContext *ctx)
     char buf[128];
     int ret;
 
-    const char *in_fmt_name = av_get_pix_fmt_name(s->in_fmt);
-    const char *out_fmt_name = av_get_pix_fmt_name(s->out_fmt);
+    const char *in_fmt_name = get_format_name(s->in_fmt);
+    const char *out_fmt_name = get_format_name(s->out_fmt);
 
     const char *function_infix = "";
 
@@ -335,11 +353,13 @@ static av_cold int cudascale_load_functions(AVFilterContext *ctx)
         ret = AVERROR(ENOSYS);
         goto fail;
     }
+    av_log(ctx, AV_LOG_DEBUG, "Luma filter: %s (%s -> %s)\n", buf, av_get_pix_fmt_name(s->in_fmt), av_get_pix_fmt_name(s->out_fmt));
 
     snprintf(buf, sizeof(buf), "Subsample_%s_%s_%s_uv", function_infix, in_fmt_name, out_fmt_name);
     ret = CHECK_CU(cu->cuModuleGetFunction(&s->cu_func_uv, s->cu_module, buf));
     if (ret < 0)
         goto fail;
+    av_log(ctx, AV_LOG_DEBUG, "Chroma filter: %s (%s -> %s)\n", buf, av_get_pix_fmt_name(s->in_fmt), av_get_pix_fmt_name(s->out_fmt));
 
 fail:
     CHECK_CU(cu->cuCtxPopCurrent(&dummy));
@@ -416,26 +436,35 @@ fail:
 
 static int call_resize_kernel(AVFilterContext *ctx, CUfunction func,
                               CUtexObject src_tex[4], int src_left, int src_top, int src_width, int src_height,
-                              AVFrame *out_frame, int dst_width, int dst_height, int dst_pitch)
+                              AVFrame *out_frame, int dst_width, int dst_height, int dst_pitch, int mpeg_range)
 {
     CUDAScaleContext *s = ctx->priv;
     CudaFunctions *cu = s->hwctx->internal->cuda_dl;
 
-    CUdeviceptr dst_devptr[4] = {
-        (CUdeviceptr)out_frame->data[0], (CUdeviceptr)out_frame->data[1],
-        (CUdeviceptr)out_frame->data[2], (CUdeviceptr)out_frame->data[3]
+    CUDAScaleKernelParams params = {
+        .src_tex = {src_tex[0], src_tex[1], src_tex[2], src_tex[3]},
+        .dst = {
+            (CUdeviceptr)out_frame->data[0],
+            (CUdeviceptr)out_frame->data[1],
+            (CUdeviceptr)out_frame->data[2],
+            (CUdeviceptr)out_frame->data[3]
+        },
+        .dst_width = dst_width,
+        .dst_height = dst_height,
+        .dst_pitch = dst_pitch,
+        .src_left = src_left,
+        .src_top = src_top,
+        .src_width = src_width,
+        .src_height = src_height,
+        .param = s->param,
+        .mpeg_range = mpeg_range
     };
 
-    void *args_uchar[] = {
-        &src_tex[0], &src_tex[1], &src_tex[2], &src_tex[3],
-        &dst_devptr[0], &dst_devptr[1], &dst_devptr[2], &dst_devptr[3],
-        &dst_width, &dst_height, &dst_pitch,
-        &src_left, &src_top, &src_width, &src_height, &s->param
-    };
+    void *args[] = { &params };
 
     return CHECK_CU(cu->cuLaunchKernel(func,
                                        DIV_UP(dst_width, BLOCKX), DIV_UP(dst_height, BLOCKY), 1,
-                                       BLOCKX, BLOCKY, 1, 0, s->cu_stream, args_uchar, NULL));
+                                       BLOCKX, BLOCKY, 1, 0, s->cu_stream, args, NULL));
 }
 
 static int scalecuda_resize(AVFilterContext *ctx,
@@ -445,6 +474,7 @@ static int scalecuda_resize(AVFilterContext *ctx,
     CudaFunctions *cu = s->hwctx->internal->cuda_dl;
     CUcontext dummy, cuda_ctx = s->hwctx->cuda_ctx;
     int i, ret;
+    int mpeg_range = in->color_range != AVCOL_RANGE_JPEG;
 
     CUtexObject tex[4] = { 0, 0, 0, 0 };
 
@@ -489,7 +519,7 @@ static int scalecuda_resize(AVFilterContext *ctx,
     // scale primary plane(s). Usually Y (and A), or single plane of RGB frames.
     ret = call_resize_kernel(ctx, s->cu_func,
                              tex, in->crop_left, in->crop_top, crop_width, crop_height,
-                             out, out->width, out->height, out->linesize[0]);
+                             out, out->width, out->height, out->linesize[0], mpeg_range);
     if (ret < 0)
         goto exit;
 
@@ -503,7 +533,7 @@ static int scalecuda_resize(AVFilterContext *ctx,
                                  out,
                                  AV_CEIL_RSHIFT(out->width, s->out_desc->log2_chroma_w),
                                  AV_CEIL_RSHIFT(out->height, s->out_desc->log2_chroma_h),
-                                 out->linesize[1]);
+                                 out->linesize[1], mpeg_range);
         if (ret < 0)
             goto exit;
     }
diff --git a/libavfilter/vf_scale_cuda.cu b/libavfilter/vf_scale_cuda.cu
index 271b55cd5d..d674c0885a 100644
--- a/libavfilter/vf_scale_cuda.cu
+++ b/libavfilter/vf_scale_cuda.cu
@@ -35,9 +35,16 @@ using subsample_function_t = T (*)(cudaTextureObject_t tex, int xo, int yo,
 static const ushort mask_10bit = 0xFFC0;
 static const ushort mask_16bit = 0xFFFF;
 
-static inline __device__ ushort conv_8to16(uchar in, ushort mask)
+static inline __device__ ushort conv_8to16(uchar in, ushort mask, int mpeg_range)
 {
-    return ((ushort)in | ((ushort)in << 8)) & mask;
+    ushort shifted = (ushort)in << 8;
+    return mpeg_range ? shifted : ((shifted | ((ushort)in )) & mask);
+}
+
+static inline __device__ ushort conv_8to10pl(uchar in, int mpeg_range)
+{
+    ushort shifted = (ushort)in << 2;
+    return mpeg_range ? shifted : (shifted | ((ushort)in >> 6));
 }
 
 static inline __device__ uchar conv_16to8(ushort in)
@@ -50,9 +57,21 @@ static inline __device__ uchar conv_10to8(ushort in)
     return in >> 8;
 }
 
-static inline __device__ ushort conv_10to16(ushort in)
+static inline __device__ uchar conv_10to8pl(ushort in)
 {
-    return in | (in >> 10);
+    return in >> 2;
+}
+
+static inline __device__ ushort conv_10to16(ushort in, int mpeg_range)
+{
+    ushort shifted = (in >> 10);
+    return mpeg_range ? in : (in | shifted);
+}
+
+static inline __device__ ushort conv_10to16pl(ushort in, int mpeg_range)
+{
+    ushort shifted = (in << 6);
+    return mpeg_range ? shifted : (shifted | (in >> 4));
 }
 
 static inline __device__ ushort conv_16to10(ushort in)
@@ -60,12 +79,18 @@ static inline __device__ ushort conv_16to10(ushort in)
     return in & mask_10bit;
 }
 
+static inline __device__ ushort conv_16to10pl(ushort in)
+{
+    return in >> 6;
+}
+
 #define DEF_F(N, T) \
     template<subsample_function_t<in_T> subsample_func_y,                                      \
              subsample_function_t<in_T_uv> subsample_func_uv>                                  \
     __device__ static inline void N(cudaTextureObject_t src_tex[4], T *dst[4], int xo, int yo, \
                                     int dst_width, int dst_height, int dst_pitch,              \
-                                    int src_left, int src_top, int src_width, int src_height, float param)
+                                    int src_left, int src_top, int src_width, int src_height,  \
+                                    float param, int mpeg_range)
 
 #define SUB_F(m, plane) \
     subsample_func_##m(src_tex[plane], xo, yo, \
@@ -81,9 +106,9 @@ static inline __device__ ushort conv_16to10(ushort in)
 #define DEFAULT_DST(n) \
     dst[n][yo*FIXED_PITCH+xo]
 
-// yuv420p->X
+// planar8->X
 
-struct Convert_yuv420p_yuv420p
+struct Convert_planar8_planar8
 {
     static const int in_bit_depth = 8;
     typedef uchar in_T;
@@ -103,7 +128,47 @@ struct Convert_yuv420p_yuv420p
     }
 };
 
-struct Convert_yuv420p_nv12
+struct Convert_planar8_planar10
+{
+    static const int in_bit_depth = 8;
+    typedef uchar in_T;
+    typedef uchar in_T_uv;
+    typedef ushort out_T;
+    typedef ushort out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = conv_8to10pl(SUB_F(y, 0), mpeg_range);
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        DEFAULT_DST(1) = conv_8to10pl(SUB_F(uv, 1), mpeg_range);
+        DEFAULT_DST(2) = conv_8to10pl(SUB_F(uv, 2), mpeg_range);
+    }
+};
+
+struct Convert_planar8_planar16
+{
+    static const int in_bit_depth = 8;
+    typedef uchar in_T;
+    typedef uchar in_T_uv;
+    typedef ushort out_T;
+    typedef ushort out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = conv_8to16(SUB_F(y, 0), mask_16bit, mpeg_range);
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        DEFAULT_DST(1) = conv_8to16(SUB_F(uv, 1), mask_16bit, mpeg_range);
+        DEFAULT_DST(2) = conv_8to16(SUB_F(uv, 2), mask_16bit, mpeg_range);
+    }
+};
+
+struct Convert_planar8_semiplanar8
 {
     static const int in_bit_depth = 8;
     typedef uchar in_T;
@@ -125,14 +190,82 @@ struct Convert_yuv420p_nv12
     }
 };
 
-struct Convert_yuv420p_yuv444p
+struct Convert_planar8_semiplanar10
 {
     static const int in_bit_depth = 8;
     typedef uchar in_T;
     typedef uchar in_T_uv;
+    typedef ushort out_T;
+    typedef ushort2 out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = conv_8to16(SUB_F(y, 0), mask_10bit, mpeg_range);
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        DEFAULT_DST(1) = make_ushort2(
+            conv_8to16(SUB_F(uv, 1), mask_10bit, mpeg_range),
+            conv_8to16(SUB_F(uv, 2), mask_10bit, mpeg_range)
+        );
+    }
+};
+
+struct Convert_planar8_semiplanar16
+{
+    static const int in_bit_depth = 8;
+    typedef uchar in_T;
+    typedef uchar in_T_uv;
+    typedef ushort out_T;
+    typedef ushort2 out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = conv_8to16(SUB_F(y, 0), mask_16bit, mpeg_range);
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        DEFAULT_DST(1) = make_ushort2(
+            conv_8to16(SUB_F(uv, 1), mask_16bit, mpeg_range),
+            conv_8to16(SUB_F(uv, 2), mask_16bit, mpeg_range)
+        );
+    }
+};
+
+
+
+// planar10->X
+
+struct Convert_planar10_planar8
+{
+    static const int in_bit_depth = 10;
+    typedef ushort in_T;
+    typedef ushort in_T_uv;
     typedef uchar out_T;
     typedef uchar out_T_uv;
 
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = conv_10to8pl(SUB_F(y, 0));
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        DEFAULT_DST(1) = conv_10to8pl(SUB_F(uv, 1));
+        DEFAULT_DST(2) = conv_10to8pl(SUB_F(uv, 2));
+    }
+};
+
+struct Convert_planar10_planar10
+{
+    static const int in_bit_depth = 10;
+    typedef ushort in_T;
+    typedef ushort in_T_uv;
+    typedef ushort out_T;
+    typedef ushort out_T_uv;
+
     DEF_F(Convert, out_T)
     {
         DEFAULT_DST(0) = SUB_F(y, 0);
@@ -145,249 +278,227 @@ struct Convert_yuv420p_yuv444p
     }
 };
 
-struct Convert_yuv420p_p010le
+struct Convert_planar10_planar16
 {
-    static const int in_bit_depth = 8;
-    typedef uchar in_T;
-    typedef uchar in_T_uv;
-    typedef ushort out_T;
-    typedef ushort2 out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = conv_8to16(SUB_F(y, 0), mask_10bit);
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        DEFAULT_DST(1) = make_ushort2(
-            conv_8to16(SUB_F(uv, 1), mask_10bit),
-            conv_8to16(SUB_F(uv, 2), mask_10bit)
-        );
-    }
-};
-
-struct Convert_yuv420p_p016le
-{
-    static const int in_bit_depth = 8;
-    typedef uchar in_T;
-    typedef uchar in_T_uv;
-    typedef ushort out_T;
-    typedef ushort2 out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = conv_8to16(SUB_F(y, 0), mask_16bit);
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        DEFAULT_DST(1) = make_ushort2(
-            conv_8to16(SUB_F(uv, 1), mask_16bit),
-            conv_8to16(SUB_F(uv, 2), mask_16bit)
-        );
-    }
-};
-
-struct Convert_yuv420p_yuv444p16le
-{
-    static const int in_bit_depth = 8;
-    typedef uchar in_T;
-    typedef uchar in_T_uv;
+    static const int in_bit_depth = 10;
+    typedef ushort in_T;
+    typedef ushort in_T_uv;
     typedef ushort out_T;
     typedef ushort out_T_uv;
 
     DEF_F(Convert, out_T)
     {
-        DEFAULT_DST(0) = conv_8to16(SUB_F(y, 0), mask_16bit);
+        DEFAULT_DST(0) = conv_10to16pl(SUB_F(y, 0), mpeg_range);
     }
 
     DEF_F(Convert_uv, out_T_uv)
     {
-        DEFAULT_DST(1) = conv_8to16(SUB_F(uv, 1), mask_16bit);
-        DEFAULT_DST(2) = conv_8to16(SUB_F(uv, 2), mask_16bit);
+        DEFAULT_DST(1) = conv_10to16pl(SUB_F(uv, 1), mpeg_range);
+        DEFAULT_DST(2) = conv_10to16pl(SUB_F(uv, 2), mpeg_range);
     }
 };
 
-// nv12->X
-
-struct Convert_nv12_yuv420p
+struct Convert_planar10_semiplanar8
 {
-    static const int in_bit_depth = 8;
-    typedef uchar in_T;
-    typedef uchar2 in_T_uv;
-    typedef uchar out_T;
-    typedef uchar out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = SUB_F(y, 0);
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        in_T_uv res = SUB_F(uv, 1);
-        DEFAULT_DST(1) = res.x;
-        DEFAULT_DST(2) = res.y;
-    }
-};
-
-struct Convert_nv12_nv12
-{
-    static const int in_bit_depth = 8;
-    typedef uchar in_T;
-    typedef uchar2 in_T_uv;
+    static const int in_bit_depth = 10;
+    typedef ushort in_T;
+    typedef ushort in_T_uv;
     typedef uchar out_T;
     typedef uchar2 out_T_uv;
 
     DEF_F(Convert, out_T)
     {
-        DEFAULT_DST(0) = SUB_F(y, 0);
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        DEFAULT_DST(1) = SUB_F(uv, 1);
-    }
-};
-
-struct Convert_nv12_yuv444p
-{
-    static const int in_bit_depth = 8;
-    typedef uchar in_T;
-    typedef uchar2 in_T_uv;
-    typedef uchar out_T;
-    typedef uchar out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = SUB_F(y, 0);
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        in_T_uv res = SUB_F(uv, 1);
-        DEFAULT_DST(1) = res.x;
-        DEFAULT_DST(2) = res.y;
-    }
-};
-
-struct Convert_nv12_p010le
-{
-    static const int in_bit_depth = 8;
-    typedef uchar in_T;
-    typedef uchar2 in_T_uv;
-    typedef ushort out_T;
-    typedef ushort2 out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = conv_8to16(SUB_F(y, 0), mask_10bit);
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        in_T_uv res = SUB_F(uv, 1);
-        DEFAULT_DST(1) = make_ushort2(
-            conv_8to16(res.x, mask_10bit),
-            conv_8to16(res.y, mask_10bit)
-        );
-    }
-};
-
-struct Convert_nv12_p016le
-{
-    static const int in_bit_depth = 8;
-    typedef uchar in_T;
-    typedef uchar2 in_T_uv;
-    typedef ushort out_T;
-    typedef ushort2 out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = conv_8to16(SUB_F(y, 0), mask_16bit);
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        in_T_uv res = SUB_F(uv, 1);
-        DEFAULT_DST(1) = make_ushort2(
-            conv_8to16(res.x, mask_16bit),
-            conv_8to16(res.y, mask_16bit)
-        );
-    }
-};
-
-struct Convert_nv12_yuv444p16le
-{
-    static const int in_bit_depth = 8;
-    typedef uchar in_T;
-    typedef uchar2 in_T_uv;
-    typedef ushort out_T;
-    typedef ushort out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = conv_8to16(SUB_F(y, 0), mask_16bit);
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        in_T_uv res = SUB_F(uv, 1);
-        DEFAULT_DST(1) = conv_8to16(res.x, mask_16bit);
-        DEFAULT_DST(2) = conv_8to16(res.y, mask_16bit);
-    }
-};
-
-// yuv444p->X
-
-struct Convert_yuv444p_yuv420p
-{
-    static const int in_bit_depth = 8;
-    typedef uchar in_T;
-    typedef uchar in_T_uv;
-    typedef uchar out_T;
-    typedef uchar out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = SUB_F(y, 0);
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        DEFAULT_DST(1) = SUB_F(uv, 1);
-        DEFAULT_DST(2) = SUB_F(uv, 2);
-    }
-};
-
-struct Convert_yuv444p_nv12
-{
-    static const int in_bit_depth = 8;
-    typedef uchar in_T;
-    typedef uchar in_T_uv;
-    typedef uchar out_T;
-    typedef uchar2 out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = SUB_F(y, 0);
+        DEFAULT_DST(0) = conv_10to8pl(SUB_F(y, 0));
     }
 
     DEF_F(Convert_uv, out_T_uv)
     {
         DEFAULT_DST(1) = make_uchar2(
+            conv_10to8pl(SUB_F(uv, 1)),
+            conv_10to8pl(SUB_F(uv, 2))
+        );
+    }
+};
+
+struct Convert_planar10_semiplanar10
+{
+    static const int in_bit_depth = 10;
+    typedef ushort in_T;
+    typedef ushort in_T_uv;
+    typedef ushort out_T;
+    typedef ushort2 out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = (SUB_F(y, 0) << 6);
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        DEFAULT_DST(1) = make_ushort2(
+            (SUB_F(uv, 1) << 6),
+            (SUB_F(uv, 2) << 6)
+        );
+    }
+};
+
+struct Convert_planar10_semiplanar16
+{
+    static const int in_bit_depth = 10;
+    typedef ushort in_T;
+    typedef ushort in_T_uv;
+    typedef ushort out_T;
+    typedef ushort2 out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = conv_10to16pl(SUB_F(y, 0), mpeg_range);
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        DEFAULT_DST(1) = make_ushort2(
+            conv_10to16pl(SUB_F(uv, 1), mpeg_range),
+            conv_10to16pl(SUB_F(uv, 2), mpeg_range)
+        );
+    }
+};
+
+// planar16->X
+
+struct Convert_planar16_planar8
+{
+    static const int in_bit_depth = 16;
+    typedef ushort in_T;
+    typedef ushort in_T_uv;
+    typedef uchar out_T;
+    typedef uchar out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = conv_16to8(SUB_F(y, 0));
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        DEFAULT_DST(1) = conv_16to8(SUB_F(uv, 1));
+        DEFAULT_DST(2) = conv_16to8(SUB_F(uv, 2));
+    }
+};
+
+struct Convert_planar16_planar10
+{
+    static const int in_bit_depth = 16;
+    typedef ushort in_T;
+    typedef ushort in_T_uv;
+    typedef ushort out_T;
+    typedef ushort out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = conv_16to10pl(SUB_F(y, 0));
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        DEFAULT_DST(1) = conv_16to10pl(SUB_F(uv, 1));
+        DEFAULT_DST(2) = conv_16to10pl(SUB_F(uv, 2));
+    }
+};
+
+struct Convert_planar16_planar16
+{
+    static const int in_bit_depth = 16;
+    typedef ushort in_T;
+    typedef ushort in_T_uv;
+    typedef ushort out_T;
+    typedef ushort out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = SUB_F(y, 0);
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        DEFAULT_DST(1) = SUB_F(uv, 1);
+        DEFAULT_DST(2) = SUB_F(uv, 2);
+    }
+};
+
+struct Convert_planar16_semiplanar8
+{
+    static const int in_bit_depth = 16;
+    typedef ushort in_T;
+    typedef ushort in_T_uv;
+    typedef uchar out_T;
+    typedef uchar2 out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = conv_16to8(SUB_F(y, 0));
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        DEFAULT_DST(1) = make_uchar2(
+            conv_16to8(SUB_F(uv, 1)),
+            conv_16to8(SUB_F(uv, 2))
+        );
+    }
+};
+
+struct Convert_planar16_semiplanar10
+{
+    static const int in_bit_depth = 16;
+    typedef ushort in_T;
+    typedef ushort in_T_uv;
+    typedef ushort out_T;
+    typedef ushort2 out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = conv_16to10(SUB_F(y, 0));
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        DEFAULT_DST(1) = make_ushort2(
+            conv_16to10(SUB_F(uv, 1)),
+            conv_16to10(SUB_F(uv, 2))
+        );
+    }
+};
+
+struct Convert_planar16_semiplanar16
+{
+    static const int in_bit_depth = 16;
+    typedef ushort in_T;
+    typedef ushort in_T_uv;
+    typedef ushort out_T;
+    typedef ushort2 out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = SUB_F(y, 0);
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        DEFAULT_DST(1) = make_ushort2(
             SUB_F(uv, 1),
             SUB_F(uv, 2)
         );
     }
 };
 
-struct Convert_yuv444p_yuv444p
+// semiplanar8->X
+
+struct Convert_semiplanar8_planar8
 {
     static const int in_bit_depth = 8;
     typedef uchar in_T;
-    typedef uchar in_T_uv;
+    typedef uchar2 in_T_uv;
     typedef uchar out_T;
     typedef uchar out_T_uv;
 
@@ -398,78 +509,122 @@ struct Convert_yuv444p_yuv444p
 
     DEF_F(Convert_uv, out_T_uv)
     {
-        DEFAULT_DST(1) = SUB_F(uv, 1);
-        DEFAULT_DST(2) = SUB_F(uv, 2);
+        in_T_uv res = SUB_F(uv, 1);
+        DEFAULT_DST(1) = res.x;
+        DEFAULT_DST(2) = res.y;
     }
 };
 
-struct Convert_yuv444p_p010le
+struct Convert_semiplanar8_planar10
 {
     static const int in_bit_depth = 8;
     typedef uchar in_T;
-    typedef uchar in_T_uv;
-    typedef ushort out_T;
-    typedef ushort2 out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = conv_8to16(SUB_F(y, 0), mask_10bit);
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        DEFAULT_DST(1) = make_ushort2(
-            conv_8to16(SUB_F(uv, 1), mask_10bit),
-            conv_8to16(SUB_F(uv, 2), mask_10bit)
-        );
-    }
-};
-
-struct Convert_yuv444p_p016le
-{
-    static const int in_bit_depth = 8;
-    typedef uchar in_T;
-    typedef uchar in_T_uv;
-    typedef ushort out_T;
-    typedef ushort2 out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = conv_8to16(SUB_F(y, 0), mask_16bit);
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        DEFAULT_DST(1) = make_ushort2(
-            conv_8to16(SUB_F(uv, 1), mask_16bit),
-            conv_8to16(SUB_F(uv, 2), mask_16bit)
-        );
-    }
-};
-
-struct Convert_yuv444p_yuv444p16le
-{
-    static const int in_bit_depth = 8;
-    typedef uchar in_T;
-    typedef uchar in_T_uv;
+    typedef uchar2 in_T_uv;
     typedef ushort out_T;
     typedef ushort out_T_uv;
 
     DEF_F(Convert, out_T)
     {
-        DEFAULT_DST(0) = conv_8to16(SUB_F(y, 0), mask_16bit);
+        DEFAULT_DST(0) = conv_8to10pl(SUB_F(y, 0), mpeg_range);
     }
 
     DEF_F(Convert_uv, out_T_uv)
     {
-        DEFAULT_DST(1) = conv_8to16(SUB_F(uv, 1), mask_16bit);
-        DEFAULT_DST(2) = conv_8to16(SUB_F(uv, 2), mask_16bit);
+        in_T_uv res = SUB_F(uv, 1);
+        DEFAULT_DST(1) = conv_8to10pl(res.x, mpeg_range);
+        DEFAULT_DST(2) = conv_8to10pl(res.y, mpeg_range);
     }
 };
 
-// p010le->X
+struct Convert_semiplanar8_planar16
+{
+    static const int in_bit_depth = 8;
+    typedef uchar in_T;
+    typedef uchar2 in_T_uv;
+    typedef ushort out_T;
+    typedef ushort out_T_uv;
 
-struct Convert_p010le_yuv420p
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = conv_8to16(SUB_F(y, 0), mask_16bit, mpeg_range);
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        in_T_uv res = SUB_F(uv, 1);
+        DEFAULT_DST(1) = conv_8to16(res.x, mask_16bit, mpeg_range);
+        DEFAULT_DST(2) = conv_8to16(res.y, mask_16bit, mpeg_range);
+    }
+};
+
+struct Convert_semiplanar8_semiplanar8
+{
+    static const int in_bit_depth = 8;
+    typedef uchar in_T;
+    typedef uchar2 in_T_uv;
+    typedef uchar out_T;
+    typedef uchar2 out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = SUB_F(y, 0);
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        DEFAULT_DST(1) = SUB_F(uv, 1);
+    }
+};
+
+struct Convert_semiplanar8_semiplanar10
+{
+    static const int in_bit_depth = 8;
+    typedef uchar in_T;
+    typedef uchar2 in_T_uv;
+    typedef ushort out_T;
+    typedef ushort2 out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = conv_8to16(SUB_F(y, 0), mask_10bit, mpeg_range);
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        in_T_uv res = SUB_F(uv, 1);
+        DEFAULT_DST(1) = make_ushort2(
+            conv_8to16(res.x, mask_10bit, mpeg_range),
+            conv_8to16(res.y, mask_10bit, mpeg_range)
+        );
+    }
+};
+
+struct Convert_semiplanar8_semiplanar16
+{
+    static const int in_bit_depth = 8;
+    typedef uchar in_T;
+    typedef uchar2 in_T_uv;
+    typedef ushort out_T;
+    typedef ushort2 out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = conv_8to16(SUB_F(y, 0), mask_16bit, mpeg_range);
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        in_T_uv res = SUB_F(uv, 1);
+        DEFAULT_DST(1) = make_ushort2(
+            conv_8to16(res.x, mask_16bit, mpeg_range),
+            conv_8to16(res.y, mask_16bit, mpeg_range)
+        );
+    }
+};
+
+// semiplanar10->X
+
+struct Convert_semiplanar10_planar8
 {
     static const int in_bit_depth = 10;
     typedef ushort in_T;
@@ -490,7 +645,49 @@ struct Convert_p010le_yuv420p
     }
 };
 
-struct Convert_p010le_nv12
+struct Convert_semiplanar10_planar10
+{
+    static const int in_bit_depth = 10;
+    typedef ushort in_T;
+    typedef ushort2 in_T_uv;
+    typedef ushort out_T;
+    typedef ushort out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = SUB_F(y, 0) >> 6;
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        in_T_uv res = SUB_F(uv, 1);
+        DEFAULT_DST(1) = res.x >> 6;
+        DEFAULT_DST(2) = res.y >> 6;
+    }
+};
+
+struct Convert_semiplanar10_planar16
+{
+    static const int in_bit_depth = 10;
+    typedef ushort in_T;
+    typedef ushort2 in_T_uv;
+    typedef ushort out_T;
+    typedef ushort out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = conv_10to16(SUB_F(y, 0), mpeg_range);
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        in_T_uv res = SUB_F(uv, 1);
+        DEFAULT_DST(1) = conv_10to16(res.x, mpeg_range);
+        DEFAULT_DST(2) = conv_10to16(res.y, mpeg_range);
+    }
+};
+
+struct Convert_semiplanar10_semiplanar8
 {
     static const int in_bit_depth = 10;
     typedef ushort in_T;
@@ -513,28 +710,7 @@ struct Convert_p010le_nv12
     }
 };
 
-struct Convert_p010le_yuv444p
-{
-    static const int in_bit_depth = 10;
-    typedef ushort in_T;
-    typedef ushort2 in_T_uv;
-    typedef uchar out_T;
-    typedef uchar out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = conv_10to8(SUB_F(y, 0));
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        in_T_uv res = SUB_F(uv, 1);
-        DEFAULT_DST(1) = conv_10to8(res.x);
-        DEFAULT_DST(2) = conv_10to8(res.y);
-    }
-};
-
-struct Convert_p010le_p010le
+struct Convert_semiplanar10_semiplanar10
 {
     static const int in_bit_depth = 10;
     typedef ushort in_T;
@@ -553,7 +729,7 @@ struct Convert_p010le_p010le
     }
 };
 
-struct Convert_p010le_p016le
+struct Convert_semiplanar10_semiplanar16
 {
     static const int in_bit_depth = 10;
     typedef ushort in_T;
@@ -563,43 +739,23 @@ struct Convert_p010le_p016le
 
     DEF_F(Convert, out_T)
     {
-        DEFAULT_DST(0) = conv_10to16(SUB_F(y, 0));
+        DEFAULT_DST(0) = conv_10to16(SUB_F(y, 0), mpeg_range);
     }
 
     DEF_F(Convert_uv, out_T_uv)
     {
         in_T_uv res = SUB_F(uv, 1);
         DEFAULT_DST(1) = make_ushort2(
-            conv_10to16(res.x),
-            conv_10to16(res.y)
+            conv_10to16(res.x, mpeg_range),
+            conv_10to16(res.y, mpeg_range)
         );
     }
 };
 
-struct Convert_p010le_yuv444p16le
-{
-    static const int in_bit_depth = 10;
-    typedef ushort in_T;
-    typedef ushort2 in_T_uv;
-    typedef ushort out_T;
-    typedef ushort out_T_uv;
 
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = conv_10to16(SUB_F(y, 0));
-    }
+// semiplanar16->X
 
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        in_T_uv res = SUB_F(uv, 1);
-        DEFAULT_DST(1) = conv_10to16(res.x);
-        DEFAULT_DST(2) = conv_10to16(res.y);
-    }
-};
-
-// p016le->X
-
-struct Convert_p016le_yuv420p
+struct Convert_semiplanar16_planar8
 {
     static const int in_bit_depth = 16;
     typedef ushort in_T;
@@ -620,7 +776,49 @@ struct Convert_p016le_yuv420p
     }
 };
 
-struct Convert_p016le_nv12
+struct Convert_semiplanar16_planar10
+{
+    static const int in_bit_depth = 16;
+    typedef ushort in_T;
+    typedef ushort2 in_T_uv;
+    typedef ushort out_T;
+    typedef ushort out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = conv_16to10pl(SUB_F(y, 0));
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        in_T_uv res = SUB_F(uv, 1);
+        DEFAULT_DST(1) = conv_16to10pl(res.x);
+        DEFAULT_DST(2) = conv_16to10pl(res.y);
+    }
+};
+
+struct Convert_semiplanar16_planar16
+{
+    static const int in_bit_depth = 16;
+    typedef ushort in_T;
+    typedef ushort2 in_T_uv;
+    typedef ushort out_T;
+    typedef ushort out_T_uv;
+
+    DEF_F(Convert, out_T)
+    {
+        DEFAULT_DST(0) = SUB_F(y, 0);
+    }
+
+    DEF_F(Convert_uv, out_T_uv)
+    {
+        in_T_uv res = SUB_F(uv, 1);
+        DEFAULT_DST(1) = res.x;
+        DEFAULT_DST(2) = res.y;
+    }
+};
+
+struct Convert_semiplanar16_semiplanar8
 {
     static const int in_bit_depth = 16;
     typedef ushort in_T;
@@ -643,28 +841,7 @@ struct Convert_p016le_nv12
     }
 };
 
-struct Convert_p016le_yuv444p
-{
-    static const int in_bit_depth = 16;
-    typedef ushort in_T;
-    typedef ushort2 in_T_uv;
-    typedef uchar out_T;
-    typedef uchar out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = conv_16to8(SUB_F(y, 0));
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        in_T_uv res = SUB_F(uv, 1);
-        DEFAULT_DST(1) = conv_16to8(res.x);
-        DEFAULT_DST(2) = conv_16to8(res.y);
-    }
-};
-
-struct Convert_p016le_p010le
+struct Convert_semiplanar16_semiplanar10
 {
     static const int in_bit_depth = 16;
     typedef ushort in_T;
@@ -687,7 +864,7 @@ struct Convert_p016le_p010le
     }
 };
 
-struct Convert_p016le_p016le
+struct Convert_semiplanar16_semiplanar16
 {
     static const int in_bit_depth = 16;
     typedef ushort in_T;
@@ -706,155 +883,6 @@ struct Convert_p016le_p016le
     }
 };
 
-struct Convert_p016le_yuv444p16le
-{
-    static const int in_bit_depth = 16;
-    typedef ushort in_T;
-    typedef ushort2 in_T_uv;
-    typedef ushort out_T;
-    typedef ushort out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = SUB_F(y, 0);
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        in_T_uv res = SUB_F(uv, 1);
-        DEFAULT_DST(1) = res.x;
-        DEFAULT_DST(2) = res.y;
-    }
-};
-
-// yuv444p16le->X
-
-struct Convert_yuv444p16le_yuv420p
-{
-    static const int in_bit_depth = 16;
-    typedef ushort in_T;
-    typedef ushort in_T_uv;
-    typedef uchar out_T;
-    typedef uchar out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = conv_16to8(SUB_F(y, 0));
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        DEFAULT_DST(1) = conv_16to8(SUB_F(uv, 1));
-        DEFAULT_DST(2) = conv_16to8(SUB_F(uv, 2));
-    }
-};
-
-struct Convert_yuv444p16le_nv12
-{
-    static const int in_bit_depth = 16;
-    typedef ushort in_T;
-    typedef ushort in_T_uv;
-    typedef uchar out_T;
-    typedef uchar2 out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = conv_16to8(SUB_F(y, 0));
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        DEFAULT_DST(1) = make_uchar2(
-            conv_16to8(SUB_F(uv, 1)),
-            conv_16to8(SUB_F(uv, 2))
-        );
-    }
-};
-
-struct Convert_yuv444p16le_yuv444p
-{
-    static const int in_bit_depth = 16;
-    typedef ushort in_T;
-    typedef ushort in_T_uv;
-    typedef uchar out_T;
-    typedef uchar out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = conv_16to8(SUB_F(y, 0));
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        DEFAULT_DST(1) = conv_16to8(SUB_F(uv, 1));
-        DEFAULT_DST(2) = conv_16to8(SUB_F(uv, 2));
-    }
-};
-
-struct Convert_yuv444p16le_p010le
-{
-    static const int in_bit_depth = 16;
-    typedef ushort in_T;
-    typedef ushort in_T_uv;
-    typedef ushort out_T;
-    typedef ushort2 out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = conv_16to10(SUB_F(y, 0));
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        DEFAULT_DST(1) = make_ushort2(
-            conv_16to10(SUB_F(uv, 1)),
-            conv_16to10(SUB_F(uv, 2))
-        );
-    }
-};
-
-struct Convert_yuv444p16le_p016le
-{
-    static const int in_bit_depth = 16;
-    typedef ushort in_T;
-    typedef ushort in_T_uv;
-    typedef ushort out_T;
-    typedef ushort2 out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = SUB_F(y, 0);
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        DEFAULT_DST(1) = make_ushort2(
-            SUB_F(uv, 1),
-            SUB_F(uv, 2)
-        );
-    }
-};
-
-struct Convert_yuv444p16le_yuv444p16le
-{
-    static const int in_bit_depth = 16;
-    typedef ushort in_T;
-    typedef ushort in_T_uv;
-    typedef ushort out_T;
-    typedef ushort out_T_uv;
-
-    DEF_F(Convert, out_T)
-    {
-        DEFAULT_DST(0) = SUB_F(y, 0);
-    }
-
-    DEF_F(Convert_uv, out_T_uv)
-    {
-        DEFAULT_DST(1) = SUB_F(uv, 1);
-        DEFAULT_DST(2) = SUB_F(uv, 2);
-    }
-};
-
 #define DEF_CONVERT_IDENTITY(fmt1, fmt2)\
                                         \
 struct Convert_##fmt1##_##fmt2          \
@@ -930,7 +958,7 @@ struct Convert_bgr0_bgra
             res.x,
             res.y,
             res.z,
-            1
+            0xFF
         );
     }
 
@@ -954,7 +982,7 @@ struct Convert_bgr0_rgba
             res.z,
             res.y,
             res.x,
-            1
+            0xFF
         );
     }
 
@@ -978,7 +1006,7 @@ struct Convert_rgb0_bgra
             res.z,
             res.y,
             res.x,
-            1
+            0xFF
         );
     }
 
@@ -1002,7 +1030,7 @@ struct Convert_rgb0_rgba
             res.x,
             res.y,
             res.z,
-            1
+            0xFF
         );
     }
 
@@ -1147,25 +1175,26 @@ __device__ static inline T Subsample_Bicubic(cudaTextureObject_t tex,
 
 /// --- FUNCTION EXPORTS ---
 
-#define KERNEL_ARGS(T) \
-    cudaTextureObject_t src_tex_0, cudaTextureObject_t src_tex_1, \
-    cudaTextureObject_t src_tex_2, cudaTextureObject_t src_tex_3, \
-    T *dst_0, T *dst_1, T *dst_2, T *dst_3,                       \
-    int dst_width, int dst_height, int dst_pitch,                 \
-    int src_left, int src_top, int src_width, int src_height, float param
+#define KERNEL_ARGS(T) CUDAScaleKernelParams params
 
 #define SUBSAMPLE(Convert, T) \
-    cudaTextureObject_t src_tex[4] =                    \
-        { src_tex_0, src_tex_1, src_tex_2, src_tex_3 }; \
-    T *dst[4] = { dst_0, dst_1, dst_2, dst_3 };         \
+    cudaTextureObject_t src_tex[4] = {                  \
+        params.src_tex[0], params.src_tex[1],           \
+        params.src_tex[2], params.src_tex[3]            \
+    };                                                  \
+    T *dst[4] = {                                       \
+        (T*)params.dst[0], (T*)params.dst[1],           \
+        (T*)params.dst[2], (T*)params.dst[3]            \
+    };                                                  \
     int xo = blockIdx.x * blockDim.x + threadIdx.x;     \
     int yo = blockIdx.y * blockDim.y + threadIdx.y;     \
-    if (yo >= dst_height || xo >= dst_width) return;    \
+    if (yo >= params.dst_height || xo >= params.dst_width) return; \
     Convert(                                            \
         src_tex, dst, xo, yo,                           \
-        dst_width, dst_height, dst_pitch,               \
-        src_left, src_top,                              \
-        src_width, src_height, param);
+        params.dst_width, params.dst_height, params.dst_pitch, \
+        params.src_left, params.src_top,                \
+        params.src_width, params.src_height,            \
+        params.param, params.mpeg_range);
 
 extern "C" {
 
@@ -1184,12 +1213,12 @@ extern "C" {
     NEAREST_KERNEL(C,_uv)
 
 #define NEAREST_KERNELS(C) \
-    NEAREST_KERNEL_RAW(yuv420p_ ## C)     \
-    NEAREST_KERNEL_RAW(nv12_ ## C)        \
-    NEAREST_KERNEL_RAW(yuv444p_ ## C)     \
-    NEAREST_KERNEL_RAW(p010le_ ## C)      \
-    NEAREST_KERNEL_RAW(p016le_ ## C)      \
-    NEAREST_KERNEL_RAW(yuv444p16le_ ## C)
+    NEAREST_KERNEL_RAW(planar8_ ## C)      \
+    NEAREST_KERNEL_RAW(planar10_ ## C)     \
+    NEAREST_KERNEL_RAW(planar16_ ## C)     \
+    NEAREST_KERNEL_RAW(semiplanar8_ ## C)  \
+    NEAREST_KERNEL_RAW(semiplanar10_ ## C) \
+    NEAREST_KERNEL_RAW(semiplanar16_ ## C)
 
 #define NEAREST_KERNELS_RGB(C) \
     NEAREST_KERNEL_RAW(rgb0_ ## C)  \
@@ -1197,12 +1226,12 @@ extern "C" {
     NEAREST_KERNEL_RAW(rgba_ ## C)  \
     NEAREST_KERNEL_RAW(bgra_ ## C)  \
 
-NEAREST_KERNELS(yuv420p)
-NEAREST_KERNELS(nv12)
-NEAREST_KERNELS(yuv444p)
-NEAREST_KERNELS(p010le)
-NEAREST_KERNELS(p016le)
-NEAREST_KERNELS(yuv444p16le)
+NEAREST_KERNELS(planar8)
+NEAREST_KERNELS(planar10)
+NEAREST_KERNELS(planar16)
+NEAREST_KERNELS(semiplanar8)
+NEAREST_KERNELS(semiplanar10)
+NEAREST_KERNELS(semiplanar16)
 
 NEAREST_KERNELS_RGB(rgb0)
 NEAREST_KERNELS_RGB(bgr0)
@@ -1224,12 +1253,12 @@ NEAREST_KERNELS_RGB(bgra)
     BILINEAR_KERNEL(C,_uv)
 
 #define BILINEAR_KERNELS(C) \
-    BILINEAR_KERNEL_RAW(yuv420p_ ## C)     \
-    BILINEAR_KERNEL_RAW(nv12_ ## C)        \
-    BILINEAR_KERNEL_RAW(yuv444p_ ## C)     \
-    BILINEAR_KERNEL_RAW(p010le_ ## C)      \
-    BILINEAR_KERNEL_RAW(p016le_ ## C)      \
-    BILINEAR_KERNEL_RAW(yuv444p16le_ ## C)
+    BILINEAR_KERNEL_RAW(planar8_ ## C)      \
+    BILINEAR_KERNEL_RAW(planar10_ ## C)     \
+    BILINEAR_KERNEL_RAW(planar16_ ## C)     \
+    BILINEAR_KERNEL_RAW(semiplanar8_ ## C)  \
+    BILINEAR_KERNEL_RAW(semiplanar10_ ## C) \
+    BILINEAR_KERNEL_RAW(semiplanar16_ ## C)
 
 #define BILINEAR_KERNELS_RGB(C)     \
     BILINEAR_KERNEL_RAW(rgb0_ ## C) \
@@ -1237,12 +1266,12 @@ NEAREST_KERNELS_RGB(bgra)
     BILINEAR_KERNEL_RAW(rgba_ ## C) \
     BILINEAR_KERNEL_RAW(bgra_ ## C)
 
-BILINEAR_KERNELS(yuv420p)
-BILINEAR_KERNELS(nv12)
-BILINEAR_KERNELS(yuv444p)
-BILINEAR_KERNELS(p010le)
-BILINEAR_KERNELS(p016le)
-BILINEAR_KERNELS(yuv444p16le)
+BILINEAR_KERNELS(planar8)
+BILINEAR_KERNELS(planar10)
+BILINEAR_KERNELS(planar16)
+BILINEAR_KERNELS(semiplanar8)
+BILINEAR_KERNELS(semiplanar10)
+BILINEAR_KERNELS(semiplanar16)
 
 BILINEAR_KERNELS_RGB(rgb0)
 BILINEAR_KERNELS_RGB(bgr0)
@@ -1264,12 +1293,12 @@ BILINEAR_KERNELS_RGB(bgra)
     BICUBIC_KERNEL(C,_uv)
 
 #define BICUBIC_KERNELS(C) \
-    BICUBIC_KERNEL_RAW(yuv420p_ ## C)     \
-    BICUBIC_KERNEL_RAW(nv12_ ## C)        \
-    BICUBIC_KERNEL_RAW(yuv444p_ ## C)     \
-    BICUBIC_KERNEL_RAW(p010le_ ## C)      \
-    BICUBIC_KERNEL_RAW(p016le_ ## C)      \
-    BICUBIC_KERNEL_RAW(yuv444p16le_ ## C)
+    BICUBIC_KERNEL_RAW(planar8_ ## C)      \
+    BICUBIC_KERNEL_RAW(planar10_ ## C)     \
+    BICUBIC_KERNEL_RAW(planar16_ ## C)     \
+    BICUBIC_KERNEL_RAW(semiplanar8_ ## C)  \
+    BICUBIC_KERNEL_RAW(semiplanar10_ ## C) \
+    BICUBIC_KERNEL_RAW(semiplanar16_ ## C)
 
 #define BICUBIC_KERNELS_RGB(C)      \
     BICUBIC_KERNEL_RAW(rgb0_ ## C)  \
@@ -1277,12 +1306,12 @@ BILINEAR_KERNELS_RGB(bgra)
     BICUBIC_KERNEL_RAW(rgba_ ## C)  \
     BICUBIC_KERNEL_RAW(bgra_ ## C)
 
-BICUBIC_KERNELS(yuv420p)
-BICUBIC_KERNELS(nv12)
-BICUBIC_KERNELS(yuv444p)
-BICUBIC_KERNELS(p010le)
-BICUBIC_KERNELS(p016le)
-BICUBIC_KERNELS(yuv444p16le)
+BICUBIC_KERNELS(planar8)
+BICUBIC_KERNELS(planar10)
+BICUBIC_KERNELS(planar16)
+BICUBIC_KERNELS(semiplanar8)
+BICUBIC_KERNELS(semiplanar10)
+BICUBIC_KERNELS(semiplanar16)
 
 BICUBIC_KERNELS_RGB(rgb0)
 BICUBIC_KERNELS_RGB(bgr0)
@@ -1304,12 +1333,12 @@ BICUBIC_KERNELS_RGB(bgra)
     LANCZOS_KERNEL(C,_uv)
 
 #define LANCZOS_KERNELS(C) \
-    LANCZOS_KERNEL_RAW(yuv420p_ ## C)     \
-    LANCZOS_KERNEL_RAW(nv12_ ## C)        \
-    LANCZOS_KERNEL_RAW(yuv444p_ ## C)     \
-    LANCZOS_KERNEL_RAW(p010le_ ## C)      \
-    LANCZOS_KERNEL_RAW(p016le_ ## C)      \
-    LANCZOS_KERNEL_RAW(yuv444p16le_ ## C)
+    LANCZOS_KERNEL_RAW(planar8_ ## C)      \
+    LANCZOS_KERNEL_RAW(planar10_ ## C)     \
+    LANCZOS_KERNEL_RAW(planar16_ ## C)     \
+    LANCZOS_KERNEL_RAW(semiplanar8_ ## C)  \
+    LANCZOS_KERNEL_RAW(semiplanar10_ ## C) \
+    LANCZOS_KERNEL_RAW(semiplanar16_ ## C)
 
 #define LANCZOS_KERNELS_RGB(C)      \
     LANCZOS_KERNEL_RAW(rgb0_ ## C)  \
@@ -1317,12 +1346,12 @@ BICUBIC_KERNELS_RGB(bgra)
     LANCZOS_KERNEL_RAW(rgba_ ## C)  \
     LANCZOS_KERNEL_RAW(bgra_ ## C)
 
-LANCZOS_KERNELS(yuv420p)
-LANCZOS_KERNELS(nv12)
-LANCZOS_KERNELS(yuv444p)
-LANCZOS_KERNELS(p010le)
-LANCZOS_KERNELS(p016le)
-LANCZOS_KERNELS(yuv444p16le)
+LANCZOS_KERNELS(planar8)
+LANCZOS_KERNELS(planar10)
+LANCZOS_KERNELS(planar16)
+LANCZOS_KERNELS(semiplanar8)
+LANCZOS_KERNELS(semiplanar10)
+LANCZOS_KERNELS(semiplanar16)
 
 LANCZOS_KERNELS_RGB(rgb0)
 LANCZOS_KERNELS_RGB(bgr0)
diff --git a/libavfilter/vf_scale_cuda.h b/libavfilter/vf_scale_cuda.h
index 40d5b9cfac..81fd8061e3 100644
--- a/libavfilter/vf_scale_cuda.h
+++ b/libavfilter/vf_scale_cuda.h
@@ -23,6 +23,28 @@
 #ifndef AVFILTER_SCALE_CUDA_H
 #define AVFILTER_SCALE_CUDA_H
 
+#if defined(__CUDACC__) || defined(__CUDA__)
+#include <stdint.h>
+typedef cudaTextureObject_t CUtexObject;
+typedef uint8_t* CUdeviceptr;
+#else
+#include <ffnvcodec/dynlink_cuda.h>
+#endif
+
 #define SCALE_CUDA_PARAM_DEFAULT 999999.0f
 
+typedef struct {
+    CUtexObject src_tex[4];
+    CUdeviceptr dst[4];
+    int dst_width;
+    int dst_height;
+    int dst_pitch;
+    int src_left;
+    int src_top;
+    int src_width;
+    int src_height;
+    float param;
+    int mpeg_range;
+} CUDAScaleKernelParams;
+
 #endif
-- 
2.52.0


From 4efaef3c87ae97d3e5dbcafb6a73fa172b6a6efa Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Tue, 25 Nov 2025 18:55:35 -0300
Subject: [PATCH 129/304] avcodec/cavs_parser: parse sequence headers for
 stream parameters

Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavcodec/cavs_parser.c  |  85 +++++++++++++++++++++++++++
 tests/ref/fate/cavs-demux | 120 +++++++++++++++++++-------------------
 2 files changed, 145 insertions(+), 60 deletions(-)

diff --git a/libavcodec/cavs_parser.c b/libavcodec/cavs_parser.c
index 13fbd009c8..882840cd7f 100644
--- a/libavcodec/cavs_parser.c
+++ b/libavcodec/cavs_parser.c
@@ -27,7 +27,10 @@
 
 #include "parser.h"
 #include "cavs.h"
+#include "get_bits.h"
+#include "mpeg12data.h"
 #include "parser_internal.h"
+#include "startcode.h"
 
 
 /**
@@ -73,6 +76,85 @@ static int cavs_find_frame_end(ParseContext *pc, const uint8_t *buf,
     return END_NOT_FOUND;
 }
 
+static int parse_seq_header(AVCodecParserContext *s, AVCodecContext *avctx,
+                            GetBitContext *gb)
+{
+    int frame_rate_code;
+    int width, height;
+    int mb_width, mb_height;
+
+    skip_bits(gb, 8); // profile
+    skip_bits(gb, 8); // level
+    skip_bits1(gb);   // progressive sequence
+
+    width  = get_bits(gb, 14);
+    height = get_bits(gb, 14);
+    if (width <= 0 || height <= 0) {
+        av_log(avctx, AV_LOG_ERROR, "Dimensions invalid\n");
+        return AVERROR_INVALIDDATA;
+    }
+    mb_width  = (width  + 15) >> 4;
+    mb_height = (height + 15) >> 4;
+
+    skip_bits(gb, 2); // chroma format
+    skip_bits(gb, 3); // sample_precision
+    skip_bits(gb, 4); // aspect_ratio
+    frame_rate_code = get_bits(gb, 4);
+    if (frame_rate_code == 0 || frame_rate_code > 13) {
+        av_log(avctx, AV_LOG_WARNING,
+               "frame_rate_code %d is invalid\n", frame_rate_code);
+        frame_rate_code = 1;
+    }
+
+    skip_bits(gb, 18); // bit_rate_lower
+    skip_bits1(gb);    // marker_bit
+    skip_bits(gb, 12); // bit_rate_upper
+    skip_bits1(gb);    // low_delay
+
+    s->width  = width;
+    s->height = height;
+    s->coded_width  = 16 * mb_width;
+    s->coded_height = 16 * mb_height;
+    avctx->framerate = ff_mpeg12_frame_rate_tab[frame_rate_code];
+
+    return 0;
+}
+
+static int cavs_parse_frame(AVCodecParserContext *s, AVCodecContext *avctx,
+                            const uint8_t *buf, int buf_size)
+{
+    GetBitContext gb;
+    const uint8_t *buf_end;
+    const uint8_t *buf_ptr;
+    uint32_t stc = -1;
+
+    s->key_frame = 0;
+    s->pict_type = AV_PICTURE_TYPE_NONE;
+
+    if (buf_size == 0)
+        return 0;
+
+    buf_ptr = buf;
+    buf_end = buf + buf_size;
+    for (;;) {
+        buf_ptr = avpriv_find_start_code(buf_ptr, buf_end, &stc);
+        if ((stc & 0xFFFFFE00) || buf_ptr == buf_end)
+            return 0;
+        switch (stc) {
+        case CAVS_START_CODE:
+            init_get_bits8(&gb, buf_ptr, buf_end - buf_ptr);
+            parse_seq_header(s, avctx, &gb);
+            break;
+        case PIC_I_START_CODE:
+            s->key_frame = 1;
+            s->pict_type = AV_PICTURE_TYPE_I;
+            break;
+        default:
+            break;
+        }
+    }
+}
+
 static int cavsvideo_parse(AVCodecParserContext *s,
                            AVCodecContext *avctx,
                            const uint8_t **poutbuf, int *poutbuf_size,
@@ -92,6 +174,9 @@ static int cavsvideo_parse(AVCodecParserContext *s,
             return buf_size;
         }
     }
+
+    cavs_parse_frame(s, avctx, buf, buf_size);
+
     *poutbuf = buf;
     *poutbuf_size = buf_size;
     return next;
diff --git a/tests/ref/fate/cavs-demux b/tests/ref/fate/cavs-demux
index b1e2c2fd25..c4847293ab 100644
--- a/tests/ref/fate/cavs-demux
+++ b/tests/ref/fate/cavs-demux
@@ -1,62 +1,62 @@
-packet|codec_type=video|stream_index=0|pts=0|pts_time=0.000000|dts=0|dts_time=0.000000|duration=48000|duration_time=0.040000|size=14447|pos=0|flags=K__|data_hash=CRC32:83f257c0
-packet|codec_type=video|stream_index=0|pts=48000|pts_time=0.040000|dts=48000|dts_time=0.040000|duration=48000|duration_time=0.040000|size=483|pos=14447|flags=K__|data_hash=CRC32:5abb82f8
-packet|codec_type=video|stream_index=0|pts=96000|pts_time=0.080000|dts=96000|dts_time=0.080000|duration=48000|duration_time=0.040000|size=18|pos=14930|flags=K__|data_hash=CRC32:b8b123d8
-packet|codec_type=video|stream_index=0|pts=144000|pts_time=0.120000|dts=144000|dts_time=0.120000|duration=48000|duration_time=0.040000|size=18|pos=14948|flags=K__|data_hash=CRC32:19180fa8
-packet|codec_type=video|stream_index=0|pts=192000|pts_time=0.160000|dts=192000|dts_time=0.160000|duration=48000|duration_time=0.040000|size=18|pos=14966|flags=K__|data_hash=CRC32:cf501647
-packet|codec_type=video|stream_index=0|pts=240000|pts_time=0.200000|dts=240000|dts_time=0.200000|duration=40000|duration_time=0.033333|size=1807|pos=14984|flags=K__|data_hash=CRC32:4267e1d5
-packet|codec_type=video|stream_index=0|pts=280000|pts_time=0.233333|dts=280000|dts_time=0.233333|duration=40000|duration_time=0.033333|size=28|pos=16791|flags=K__|data_hash=CRC32:c223285a
-packet|codec_type=video|stream_index=0|pts=320000|pts_time=0.266667|dts=320000|dts_time=0.266667|duration=40000|duration_time=0.033333|size=25|pos=16819|flags=K__|data_hash=CRC32:2565cc9e
-packet|codec_type=video|stream_index=0|pts=360000|pts_time=0.300000|dts=360000|dts_time=0.300000|duration=40000|duration_time=0.033333|size=22|pos=16844|flags=K__|data_hash=CRC32:7fbf36ac
-packet|codec_type=video|stream_index=0|pts=400000|pts_time=0.333333|dts=400000|dts_time=0.333333|duration=40000|duration_time=0.033333|size=23884|pos=16866|flags=K__|data_hash=CRC32:d61430fd
-packet|codec_type=video|stream_index=0|pts=440000|pts_time=0.366667|dts=440000|dts_time=0.366667|duration=40000|duration_time=0.033333|size=265|pos=40750|flags=K__|data_hash=CRC32:d64145a0
-packet|codec_type=video|stream_index=0|pts=480000|pts_time=0.400000|dts=480000|dts_time=0.400000|duration=40000|duration_time=0.033333|size=393|pos=41015|flags=K__|data_hash=CRC32:32c020e2
-packet|codec_type=video|stream_index=0|pts=520000|pts_time=0.433333|dts=520000|dts_time=0.433333|duration=40000|duration_time=0.033333|size=656|pos=41408|flags=K__|data_hash=CRC32:965c7846
-packet|codec_type=video|stream_index=0|pts=560000|pts_time=0.466667|dts=560000|dts_time=0.466667|duration=40000|duration_time=0.033333|size=3500|pos=42064|flags=K__|data_hash=CRC32:ddf731de
-packet|codec_type=video|stream_index=0|pts=600000|pts_time=0.500000|dts=600000|dts_time=0.500000|duration=40000|duration_time=0.033333|size=68|pos=45564|flags=K__|data_hash=CRC32:f8c8ba07
-packet|codec_type=video|stream_index=0|pts=640000|pts_time=0.533333|dts=640000|dts_time=0.533333|duration=40000|duration_time=0.033333|size=58|pos=45632|flags=K__|data_hash=CRC32:22adbb83
-packet|codec_type=video|stream_index=0|pts=680000|pts_time=0.566667|dts=680000|dts_time=0.566667|duration=40000|duration_time=0.033333|size=43|pos=45690|flags=K__|data_hash=CRC32:53fb136c
-packet|codec_type=video|stream_index=0|pts=720000|pts_time=0.600000|dts=720000|dts_time=0.600000|duration=40000|duration_time=0.033333|size=11757|pos=45733|flags=K__|data_hash=CRC32:551e491b
-packet|codec_type=video|stream_index=0|pts=760000|pts_time=0.633333|dts=760000|dts_time=0.633333|duration=40000|duration_time=0.033333|size=98|pos=57490|flags=K__|data_hash=CRC32:4e718dd4
-packet|codec_type=video|stream_index=0|pts=800000|pts_time=0.666667|dts=800000|dts_time=0.666667|duration=40000|duration_time=0.033333|size=79|pos=57588|flags=K__|data_hash=CRC32:4c5a32f5
-packet|codec_type=video|stream_index=0|pts=840000|pts_time=0.700000|dts=840000|dts_time=0.700000|duration=40000|duration_time=0.033333|size=128|pos=57667|flags=K__|data_hash=CRC32:95b8cad1
-packet|codec_type=video|stream_index=0|pts=880000|pts_time=0.733333|dts=880000|dts_time=0.733333|duration=40000|duration_time=0.033333|size=10487|pos=57795|flags=K__|data_hash=CRC32:8646f8f2
-packet|codec_type=video|stream_index=0|pts=920000|pts_time=0.766667|dts=920000|dts_time=0.766667|duration=40000|duration_time=0.033333|size=65|pos=68282|flags=K__|data_hash=CRC32:73687d19
-packet|codec_type=video|stream_index=0|pts=960000|pts_time=0.800000|dts=960000|dts_time=0.800000|duration=40000|duration_time=0.033333|size=46|pos=68347|flags=K__|data_hash=CRC32:ad381ca5
-packet|codec_type=video|stream_index=0|pts=1000000|pts_time=0.833333|dts=1000000|dts_time=0.833333|duration=40000|duration_time=0.033333|size=67|pos=68393|flags=K__|data_hash=CRC32:89069152
-packet|codec_type=video|stream_index=0|pts=1040000|pts_time=0.866667|dts=1040000|dts_time=0.866667|duration=40000|duration_time=0.033333|size=8403|pos=68460|flags=K__|data_hash=CRC32:a22913dd
-packet|codec_type=video|stream_index=0|pts=1080000|pts_time=0.900000|dts=1080000|dts_time=0.900000|duration=40000|duration_time=0.033333|size=70|pos=76863|flags=K__|data_hash=CRC32:98772596
-packet|codec_type=video|stream_index=0|pts=1120000|pts_time=0.933333|dts=1120000|dts_time=0.933333|duration=40000|duration_time=0.033333|size=63|pos=76933|flags=K__|data_hash=CRC32:cfd62cc4
-packet|codec_type=video|stream_index=0|pts=1160000|pts_time=0.966667|dts=1160000|dts_time=0.966667|duration=40000|duration_time=0.033333|size=70|pos=76996|flags=K__|data_hash=CRC32:9b526357
-packet|codec_type=video|stream_index=0|pts=1200000|pts_time=1.000000|dts=1200000|dts_time=1.000000|duration=40000|duration_time=0.033333|size=7945|pos=77066|flags=K__|data_hash=CRC32:d0f46769
-packet|codec_type=video|stream_index=0|pts=1240000|pts_time=1.033333|dts=1240000|dts_time=1.033333|duration=40000|duration_time=0.033333|size=40558|pos=85011|flags=K__|data_hash=CRC32:4db0bd7d
-packet|codec_type=video|stream_index=0|pts=1280000|pts_time=1.066667|dts=1280000|dts_time=1.066667|duration=40000|duration_time=0.033333|size=1260|pos=125569|flags=K__|data_hash=CRC32:3c4397d7
-packet|codec_type=video|stream_index=0|pts=1320000|pts_time=1.100000|dts=1320000|dts_time=1.100000|duration=40000|duration_time=0.033333|size=27|pos=126829|flags=K__|data_hash=CRC32:5e233c77
-packet|codec_type=video|stream_index=0|pts=1360000|pts_time=1.133333|dts=1360000|dts_time=1.133333|duration=40000|duration_time=0.033333|size=26|pos=126856|flags=K__|data_hash=CRC32:57985e7b
-packet|codec_type=video|stream_index=0|pts=1400000|pts_time=1.166667|dts=1400000|dts_time=1.166667|duration=40000|duration_time=0.033333|size=18|pos=126882|flags=K__|data_hash=CRC32:f4eb01ba
-packet|codec_type=video|stream_index=0|pts=1440000|pts_time=1.200000|dts=1440000|dts_time=1.200000|duration=40000|duration_time=0.033333|size=2931|pos=126900|flags=K__|data_hash=CRC32:ca20964f
-packet|codec_type=video|stream_index=0|pts=1480000|pts_time=1.233333|dts=1480000|dts_time=1.233333|duration=40000|duration_time=0.033333|size=25|pos=129831|flags=K__|data_hash=CRC32:a82bd0b4
-packet|codec_type=video|stream_index=0|pts=1520000|pts_time=1.266667|dts=1520000|dts_time=1.266667|duration=40000|duration_time=0.033333|size=19|pos=129856|flags=K__|data_hash=CRC32:bc5f709d
-packet|codec_type=video|stream_index=0|pts=1560000|pts_time=1.300000|dts=1560000|dts_time=1.300000|duration=40000|duration_time=0.033333|size=30|pos=129875|flags=K__|data_hash=CRC32:c1f8a4c9
-packet|codec_type=video|stream_index=0|pts=1600000|pts_time=1.333333|dts=1600000|dts_time=1.333333|duration=40000|duration_time=0.033333|size=5088|pos=129905|flags=K__|data_hash=CRC32:41ace145
-packet|codec_type=video|stream_index=0|pts=1640000|pts_time=1.366667|dts=1640000|dts_time=1.366667|duration=40000|duration_time=0.033333|size=41|pos=134993|flags=K__|data_hash=CRC32:e169b3c7
-packet|codec_type=video|stream_index=0|pts=1680000|pts_time=1.400000|dts=1680000|dts_time=1.400000|duration=40000|duration_time=0.033333|size=53|pos=135034|flags=K__|data_hash=CRC32:973c5fe3
-packet|codec_type=video|stream_index=0|pts=1720000|pts_time=1.433333|dts=1720000|dts_time=1.433333|duration=40000|duration_time=0.033333|size=54|pos=135087|flags=K__|data_hash=CRC32:665639e6
-packet|codec_type=video|stream_index=0|pts=1760000|pts_time=1.466667|dts=1760000|dts_time=1.466667|duration=40000|duration_time=0.033333|size=7150|pos=135141|flags=K__|data_hash=CRC32:cc910027
-packet|codec_type=video|stream_index=0|pts=1800000|pts_time=1.500000|dts=1800000|dts_time=1.500000|duration=40000|duration_time=0.033333|size=48|pos=142291|flags=K__|data_hash=CRC32:45658f78
-packet|codec_type=video|stream_index=0|pts=1840000|pts_time=1.533333|dts=1840000|dts_time=1.533333|duration=40000|duration_time=0.033333|size=48|pos=142339|flags=K__|data_hash=CRC32:94e359a2
-packet|codec_type=video|stream_index=0|pts=1880000|pts_time=1.566667|dts=1880000|dts_time=1.566667|duration=40000|duration_time=0.033333|size=51|pos=142387|flags=K__|data_hash=CRC32:959ccdd9
-packet|codec_type=video|stream_index=0|pts=1920000|pts_time=1.600000|dts=1920000|dts_time=1.600000|duration=40000|duration_time=0.033333|size=9379|pos=142438|flags=K__|data_hash=CRC32:a3318410
-packet|codec_type=video|stream_index=0|pts=1960000|pts_time=1.633333|dts=1960000|dts_time=1.633333|duration=40000|duration_time=0.033333|size=58|pos=151817|flags=K__|data_hash=CRC32:44b24f03
-packet|codec_type=video|stream_index=0|pts=2000000|pts_time=1.666667|dts=2000000|dts_time=1.666667|duration=40000|duration_time=0.033333|size=43|pos=151875|flags=K__|data_hash=CRC32:f4876e05
-packet|codec_type=video|stream_index=0|pts=2040000|pts_time=1.700000|dts=2040000|dts_time=1.700000|duration=40000|duration_time=0.033333|size=62|pos=151918|flags=K__|data_hash=CRC32:34dce749
-packet|codec_type=video|stream_index=0|pts=2080000|pts_time=1.733333|dts=2080000|dts_time=1.733333|duration=40000|duration_time=0.033333|size=10733|pos=151980|flags=K__|data_hash=CRC32:9012fdfb
-packet|codec_type=video|stream_index=0|pts=2120000|pts_time=1.766667|dts=2120000|dts_time=1.766667|duration=40000|duration_time=0.033333|size=58|pos=162713|flags=K__|data_hash=CRC32:8a3c8760
-packet|codec_type=video|stream_index=0|pts=2160000|pts_time=1.800000|dts=2160000|dts_time=1.800000|duration=40000|duration_time=0.033333|size=41|pos=162771|flags=K__|data_hash=CRC32:28da6bf4
-packet|codec_type=video|stream_index=0|pts=2200000|pts_time=1.833333|dts=2200000|dts_time=1.833333|duration=40000|duration_time=0.033333|size=68|pos=162812|flags=K__|data_hash=CRC32:959dcc10
-packet|codec_type=video|stream_index=0|pts=2240000|pts_time=1.866667|dts=2240000|dts_time=1.866667|duration=40000|duration_time=0.033333|size=9247|pos=162880|flags=K__|data_hash=CRC32:cf1e2a1a
-packet|codec_type=video|stream_index=0|pts=2280000|pts_time=1.900000|dts=2280000|dts_time=1.900000|duration=40000|duration_time=0.033333|size=58|pos=172127|flags=K__|data_hash=CRC32:2efcb7ba
-packet|codec_type=video|stream_index=0|pts=2320000|pts_time=1.933333|dts=2320000|dts_time=1.933333|duration=40000|duration_time=0.033333|size=67|pos=172185|flags=K__|data_hash=CRC32:42484449
-packet|codec_type=video|stream_index=0|pts=2360000|pts_time=1.966667|dts=2360000|dts_time=1.966667|duration=40000|duration_time=0.033333|size=83|pos=172252|flags=K__|data_hash=CRC32:a941bdf0
-packet|codec_type=video|stream_index=0|pts=2400000|pts_time=2.000000|dts=2400000|dts_time=2.000000|duration=40000|duration_time=0.033333|size=5417|pos=172335|flags=K__|data_hash=CRC32:9d0d503b
+packet|codec_type=video|stream_index=0|pts=0|pts_time=0.000000|dts=0|dts_time=0.000000|duration=40000|duration_time=0.033333|size=14447|pos=0|flags=K__|data_hash=CRC32:83f257c0
+packet|codec_type=video|stream_index=0|pts=40000|pts_time=0.033333|dts=40000|dts_time=0.033333|duration=40000|duration_time=0.033333|size=483|pos=14447|flags=___|data_hash=CRC32:5abb82f8
+packet|codec_type=video|stream_index=0|pts=80000|pts_time=0.066667|dts=80000|dts_time=0.066667|duration=40000|duration_time=0.033333|size=18|pos=14930|flags=___|data_hash=CRC32:b8b123d8
+packet|codec_type=video|stream_index=0|pts=120000|pts_time=0.100000|dts=120000|dts_time=0.100000|duration=40000|duration_time=0.033333|size=18|pos=14948|flags=___|data_hash=CRC32:19180fa8
+packet|codec_type=video|stream_index=0|pts=160000|pts_time=0.133333|dts=160000|dts_time=0.133333|duration=40000|duration_time=0.033333|size=18|pos=14966|flags=___|data_hash=CRC32:cf501647
+packet|codec_type=video|stream_index=0|pts=200000|pts_time=0.166667|dts=200000|dts_time=0.166667|duration=40000|duration_time=0.033333|size=1807|pos=14984|flags=___|data_hash=CRC32:4267e1d5
+packet|codec_type=video|stream_index=0|pts=240000|pts_time=0.200000|dts=240000|dts_time=0.200000|duration=40000|duration_time=0.033333|size=28|pos=16791|flags=___|data_hash=CRC32:c223285a
+packet|codec_type=video|stream_index=0|pts=280000|pts_time=0.233333|dts=280000|dts_time=0.233333|duration=40000|duration_time=0.033333|size=25|pos=16819|flags=___|data_hash=CRC32:2565cc9e
+packet|codec_type=video|stream_index=0|pts=320000|pts_time=0.266667|dts=320000|dts_time=0.266667|duration=40000|duration_time=0.033333|size=22|pos=16844|flags=___|data_hash=CRC32:7fbf36ac
+packet|codec_type=video|stream_index=0|pts=360000|pts_time=0.300000|dts=360000|dts_time=0.300000|duration=40000|duration_time=0.033333|size=23884|pos=16866|flags=___|data_hash=CRC32:d61430fd
+packet|codec_type=video|stream_index=0|pts=400000|pts_time=0.333333|dts=400000|dts_time=0.333333|duration=40000|duration_time=0.033333|size=265|pos=40750|flags=___|data_hash=CRC32:d64145a0
+packet|codec_type=video|stream_index=0|pts=440000|pts_time=0.366667|dts=440000|dts_time=0.366667|duration=40000|duration_time=0.033333|size=393|pos=41015|flags=___|data_hash=CRC32:32c020e2
+packet|codec_type=video|stream_index=0|pts=480000|pts_time=0.400000|dts=480000|dts_time=0.400000|duration=40000|duration_time=0.033333|size=656|pos=41408|flags=___|data_hash=CRC32:965c7846
+packet|codec_type=video|stream_index=0|pts=520000|pts_time=0.433333|dts=520000|dts_time=0.433333|duration=40000|duration_time=0.033333|size=3500|pos=42064|flags=___|data_hash=CRC32:ddf731de
+packet|codec_type=video|stream_index=0|pts=560000|pts_time=0.466667|dts=560000|dts_time=0.466667|duration=40000|duration_time=0.033333|size=68|pos=45564|flags=___|data_hash=CRC32:f8c8ba07
+packet|codec_type=video|stream_index=0|pts=600000|pts_time=0.500000|dts=600000|dts_time=0.500000|duration=40000|duration_time=0.033333|size=58|pos=45632|flags=___|data_hash=CRC32:22adbb83
+packet|codec_type=video|stream_index=0|pts=640000|pts_time=0.533333|dts=640000|dts_time=0.533333|duration=40000|duration_time=0.033333|size=43|pos=45690|flags=___|data_hash=CRC32:53fb136c
+packet|codec_type=video|stream_index=0|pts=680000|pts_time=0.566667|dts=680000|dts_time=0.566667|duration=40000|duration_time=0.033333|size=11757|pos=45733|flags=___|data_hash=CRC32:551e491b
+packet|codec_type=video|stream_index=0|pts=720000|pts_time=0.600000|dts=720000|dts_time=0.600000|duration=40000|duration_time=0.033333|size=98|pos=57490|flags=___|data_hash=CRC32:4e718dd4
+packet|codec_type=video|stream_index=0|pts=760000|pts_time=0.633333|dts=760000|dts_time=0.633333|duration=40000|duration_time=0.033333|size=79|pos=57588|flags=___|data_hash=CRC32:4c5a32f5
+packet|codec_type=video|stream_index=0|pts=800000|pts_time=0.666667|dts=800000|dts_time=0.666667|duration=40000|duration_time=0.033333|size=128|pos=57667|flags=___|data_hash=CRC32:95b8cad1
+packet|codec_type=video|stream_index=0|pts=840000|pts_time=0.700000|dts=840000|dts_time=0.700000|duration=40000|duration_time=0.033333|size=10487|pos=57795|flags=___|data_hash=CRC32:8646f8f2
+packet|codec_type=video|stream_index=0|pts=880000|pts_time=0.733333|dts=880000|dts_time=0.733333|duration=40000|duration_time=0.033333|size=65|pos=68282|flags=___|data_hash=CRC32:73687d19
+packet|codec_type=video|stream_index=0|pts=920000|pts_time=0.766667|dts=920000|dts_time=0.766667|duration=40000|duration_time=0.033333|size=46|pos=68347|flags=___|data_hash=CRC32:ad381ca5
+packet|codec_type=video|stream_index=0|pts=960000|pts_time=0.800000|dts=960000|dts_time=0.800000|duration=40000|duration_time=0.033333|size=67|pos=68393|flags=___|data_hash=CRC32:89069152
+packet|codec_type=video|stream_index=0|pts=1000000|pts_time=0.833333|dts=1000000|dts_time=0.833333|duration=40000|duration_time=0.033333|size=8403|pos=68460|flags=___|data_hash=CRC32:a22913dd
+packet|codec_type=video|stream_index=0|pts=1040000|pts_time=0.866667|dts=1040000|dts_time=0.866667|duration=40000|duration_time=0.033333|size=70|pos=76863|flags=___|data_hash=CRC32:98772596
+packet|codec_type=video|stream_index=0|pts=1080000|pts_time=0.900000|dts=1080000|dts_time=0.900000|duration=40000|duration_time=0.033333|size=63|pos=76933|flags=___|data_hash=CRC32:cfd62cc4
+packet|codec_type=video|stream_index=0|pts=1120000|pts_time=0.933333|dts=1120000|dts_time=0.933333|duration=40000|duration_time=0.033333|size=70|pos=76996|flags=___|data_hash=CRC32:9b526357
+packet|codec_type=video|stream_index=0|pts=1160000|pts_time=0.966667|dts=1160000|dts_time=0.966667|duration=40000|duration_time=0.033333|size=7945|pos=77066|flags=___|data_hash=CRC32:d0f46769
+packet|codec_type=video|stream_index=0|pts=1200000|pts_time=1.000000|dts=1200000|dts_time=1.000000|duration=40000|duration_time=0.033333|size=40558|pos=85011|flags=K__|data_hash=CRC32:4db0bd7d
+packet|codec_type=video|stream_index=0|pts=1240000|pts_time=1.033333|dts=1240000|dts_time=1.033333|duration=40000|duration_time=0.033333|size=1260|pos=125569|flags=___|data_hash=CRC32:3c4397d7
+packet|codec_type=video|stream_index=0|pts=1280000|pts_time=1.066667|dts=1280000|dts_time=1.066667|duration=40000|duration_time=0.033333|size=27|pos=126829|flags=___|data_hash=CRC32:5e233c77
+packet|codec_type=video|stream_index=0|pts=1320000|pts_time=1.100000|dts=1320000|dts_time=1.100000|duration=40000|duration_time=0.033333|size=26|pos=126856|flags=___|data_hash=CRC32:57985e7b
+packet|codec_type=video|stream_index=0|pts=1360000|pts_time=1.133333|dts=1360000|dts_time=1.133333|duration=40000|duration_time=0.033333|size=18|pos=126882|flags=___|data_hash=CRC32:f4eb01ba
+packet|codec_type=video|stream_index=0|pts=1400000|pts_time=1.166667|dts=1400000|dts_time=1.166667|duration=40000|duration_time=0.033333|size=2931|pos=126900|flags=___|data_hash=CRC32:ca20964f
+packet|codec_type=video|stream_index=0|pts=1440000|pts_time=1.200000|dts=1440000|dts_time=1.200000|duration=40000|duration_time=0.033333|size=25|pos=129831|flags=___|data_hash=CRC32:a82bd0b4
+packet|codec_type=video|stream_index=0|pts=1480000|pts_time=1.233333|dts=1480000|dts_time=1.233333|duration=40000|duration_time=0.033333|size=19|pos=129856|flags=___|data_hash=CRC32:bc5f709d
+packet|codec_type=video|stream_index=0|pts=1520000|pts_time=1.266667|dts=1520000|dts_time=1.266667|duration=40000|duration_time=0.033333|size=30|pos=129875|flags=___|data_hash=CRC32:c1f8a4c9
+packet|codec_type=video|stream_index=0|pts=1560000|pts_time=1.300000|dts=1560000|dts_time=1.300000|duration=40000|duration_time=0.033333|size=5088|pos=129905|flags=___|data_hash=CRC32:41ace145
+packet|codec_type=video|stream_index=0|pts=1600000|pts_time=1.333333|dts=1600000|dts_time=1.333333|duration=40000|duration_time=0.033333|size=41|pos=134993|flags=___|data_hash=CRC32:e169b3c7
+packet|codec_type=video|stream_index=0|pts=1640000|pts_time=1.366667|dts=1640000|dts_time=1.366667|duration=40000|duration_time=0.033333|size=53|pos=135034|flags=___|data_hash=CRC32:973c5fe3
+packet|codec_type=video|stream_index=0|pts=1680000|pts_time=1.400000|dts=1680000|dts_time=1.400000|duration=40000|duration_time=0.033333|size=54|pos=135087|flags=___|data_hash=CRC32:665639e6
+packet|codec_type=video|stream_index=0|pts=1720000|pts_time=1.433333|dts=1720000|dts_time=1.433333|duration=40000|duration_time=0.033333|size=7150|pos=135141|flags=___|data_hash=CRC32:cc910027
+packet|codec_type=video|stream_index=0|pts=1760000|pts_time=1.466667|dts=1760000|dts_time=1.466667|duration=40000|duration_time=0.033333|size=48|pos=142291|flags=___|data_hash=CRC32:45658f78
+packet|codec_type=video|stream_index=0|pts=1800000|pts_time=1.500000|dts=1800000|dts_time=1.500000|duration=40000|duration_time=0.033333|size=48|pos=142339|flags=___|data_hash=CRC32:94e359a2
+packet|codec_type=video|stream_index=0|pts=1840000|pts_time=1.533333|dts=1840000|dts_time=1.533333|duration=40000|duration_time=0.033333|size=51|pos=142387|flags=___|data_hash=CRC32:959ccdd9
+packet|codec_type=video|stream_index=0|pts=1880000|pts_time=1.566667|dts=1880000|dts_time=1.566667|duration=40000|duration_time=0.033333|size=9379|pos=142438|flags=___|data_hash=CRC32:a3318410
+packet|codec_type=video|stream_index=0|pts=1920000|pts_time=1.600000|dts=1920000|dts_time=1.600000|duration=40000|duration_time=0.033333|size=58|pos=151817|flags=___|data_hash=CRC32:44b24f03
+packet|codec_type=video|stream_index=0|pts=1960000|pts_time=1.633333|dts=1960000|dts_time=1.633333|duration=40000|duration_time=0.033333|size=43|pos=151875|flags=___|data_hash=CRC32:f4876e05
+packet|codec_type=video|stream_index=0|pts=2000000|pts_time=1.666667|dts=2000000|dts_time=1.666667|duration=40000|duration_time=0.033333|size=62|pos=151918|flags=___|data_hash=CRC32:34dce749
+packet|codec_type=video|stream_index=0|pts=2040000|pts_time=1.700000|dts=2040000|dts_time=1.700000|duration=40000|duration_time=0.033333|size=10733|pos=151980|flags=___|data_hash=CRC32:9012fdfb
+packet|codec_type=video|stream_index=0|pts=2080000|pts_time=1.733333|dts=2080000|dts_time=1.733333|duration=40000|duration_time=0.033333|size=58|pos=162713|flags=___|data_hash=CRC32:8a3c8760
+packet|codec_type=video|stream_index=0|pts=2120000|pts_time=1.766667|dts=2120000|dts_time=1.766667|duration=40000|duration_time=0.033333|size=41|pos=162771|flags=___|data_hash=CRC32:28da6bf4
+packet|codec_type=video|stream_index=0|pts=2160000|pts_time=1.800000|dts=2160000|dts_time=1.800000|duration=40000|duration_time=0.033333|size=68|pos=162812|flags=___|data_hash=CRC32:959dcc10
+packet|codec_type=video|stream_index=0|pts=2200000|pts_time=1.833333|dts=2200000|dts_time=1.833333|duration=40000|duration_time=0.033333|size=9247|pos=162880|flags=___|data_hash=CRC32:cf1e2a1a
+packet|codec_type=video|stream_index=0|pts=2240000|pts_time=1.866667|dts=2240000|dts_time=1.866667|duration=40000|duration_time=0.033333|size=58|pos=172127|flags=___|data_hash=CRC32:2efcb7ba
+packet|codec_type=video|stream_index=0|pts=2280000|pts_time=1.900000|dts=2280000|dts_time=1.900000|duration=40000|duration_time=0.033333|size=67|pos=172185|flags=___|data_hash=CRC32:42484449
+packet|codec_type=video|stream_index=0|pts=2320000|pts_time=1.933333|dts=2320000|dts_time=1.933333|duration=40000|duration_time=0.033333|size=83|pos=172252|flags=___|data_hash=CRC32:a941bdf0
+packet|codec_type=video|stream_index=0|pts=2360000|pts_time=1.966667|dts=2360000|dts_time=1.966667|duration=40000|duration_time=0.033333|size=5417|pos=172335|flags=___|data_hash=CRC32:9d0d503b
 stream|index=0|codec_name=cavs|profile=unknown|codec_type=video|codec_tag_string=[0][0][0][0]|codec_tag=0x0000|width=1280|height=720|coded_width=1280|coded_height=720|has_b_frames=0|sample_aspect_ratio=N/A|display_aspect_ratio=N/A|pix_fmt=yuv420p|level=-99|color_range=unknown|color_space=unknown|color_transfer=unknown|color_primaries=unknown|chroma_location=unspecified|field_order=unknown|refs=1|id=N/A|r_frame_rate=30/1|avg_frame_rate=25/1|time_base=1/1200000|start_pts=N/A|start_time=N/A|duration_ts=N/A|duration=N/A|bit_rate=N/A|max_bit_rate=N/A|bits_per_raw_sample=N/A|nb_frames=N/A|nb_read_frames=N/A|nb_read_packets=60|extradata_size=18|extradata_hash=CRC32:1255d52e|disposition:default=0|disposition:dub=0|disposition:original=0|disposition:comment=0|disposition:lyrics=0|disposition:karaoke=0|disposition:forced=0|disposition:hearing_impaired=0|disposition:visual_impaired=0|disposition:clean_effects=0|disposition:attached_pic=0|disposition:timed_thumbnails=0|disposition:non_diegetic=0|disposition:captions=0|disposition:descriptions=0|disposition:metadata=0|disposition:dependent=0|disposition:still_image=0|disposition:multilayer=0
 format|filename=bunny.mp4|nb_streams=1|nb_programs=0|nb_stream_groups=0|format_name=cavsvideo|start_time=N/A|duration=N/A|size=177752|bit_rate=N/A|probe_score=51
-- 
2.52.0


From 3af24c499853fef2b666d79057d2accca0d12cd8 Mon Sep 17 00:00:00 2001
From: Gavin Li <gfl3162@gmail.com>
Date: Sun, 26 Jan 2025 22:54:06 -0500
Subject: [PATCH 130/304] avformat/rawdec: set framerate in codec parameters

Commit ba4b73c9779c32580f8a3ba08602a5d94e0bcd7c caused a regression in
the usage of avg_frame_rate to detect the frame rate of raw h264/hevc
bitstreams: after the commit, avg_frame_rate is always the value of the
-framerate option (which is set to 25 by default) instead of the actual
frame rate derived from the bitstream SPS/VPS NALUs.

This commit fixes the regression by setting the framerate codec
parameter to the value of the framerate option instead. After this
change, bitstreams without timing information will derive avg_frame_rate
from the -framerate option, while bitstreams with timing information
will derive avg_frame_rate from the bitstream itself.

The h264-bsf-dts2pts test now returns the correct frame durations for a
bitstream with a mix of single-field and double-field frames.

Signed-off-by: Gavin Li <git@thegavinli.com>
Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavformat/rawdec.c                          |   2 +-
 tests/ref/fate/cavs-demux                     |   2 +-
 tests/ref/fate/enhanced-flv-hevc-hdr10        |   4 +-
 tests/ref/fate/h264-bsf-dts2pts               | 102 +++++++++---------
 .../ref/fate/h264-conformance-cvpcmnl2_sva_c  |   6 +-
 tests/ref/fate/hevc-bsf-dts2pts-cra           |  68 ++++++------
 tests/ref/fate/hevc-bsf-dts2pts-idr           |  38 +++----
 tests/ref/fate/hevc-bsf-dts2pts-idr-cra       |  38 +++----
 tests/ref/lavf-fate/hevc.mp4                  |   2 +-
 9 files changed, 131 insertions(+), 131 deletions(-)

diff --git a/libavformat/rawdec.c b/libavformat/rawdec.c
index d0c829dc42..5cf2764a0d 100644
--- a/libavformat/rawdec.c
+++ b/libavformat/rawdec.c
@@ -83,9 +83,9 @@ int ff_raw_video_read_header(AVFormatContext *s)
 
     st->codecpar->codec_type = AVMEDIA_TYPE_VIDEO;
     st->codecpar->codec_id = ffifmt(s->iformat)->raw_codec_id;
+    st->codecpar->framerate = s1->framerate;
     sti->need_parsing = AVSTREAM_PARSE_FULL_RAW;
 
-    st->avg_frame_rate = s1->framerate;
     avpriv_set_pts_info(st, 64, 1, 1200000);
 
 fail:
diff --git a/tests/ref/fate/cavs-demux b/tests/ref/fate/cavs-demux
index c4847293ab..eb16eb1f9d 100644
--- a/tests/ref/fate/cavs-demux
+++ b/tests/ref/fate/cavs-demux
@@ -58,5 +58,5 @@ packet|codec_type=video|stream_index=0|pts=2240000|pts_time=1.866667|dts=2240000
 packet|codec_type=video|stream_index=0|pts=2280000|pts_time=1.900000|dts=2280000|dts_time=1.900000|duration=40000|duration_time=0.033333|size=67|pos=172185|flags=___|data_hash=CRC32:42484449
 packet|codec_type=video|stream_index=0|pts=2320000|pts_time=1.933333|dts=2320000|dts_time=1.933333|duration=40000|duration_time=0.033333|size=83|pos=172252|flags=___|data_hash=CRC32:a941bdf0
 packet|codec_type=video|stream_index=0|pts=2360000|pts_time=1.966667|dts=2360000|dts_time=1.966667|duration=40000|duration_time=0.033333|size=5417|pos=172335|flags=___|data_hash=CRC32:9d0d503b
-stream|index=0|codec_name=cavs|profile=unknown|codec_type=video|codec_tag_string=[0][0][0][0]|codec_tag=0x0000|width=1280|height=720|coded_width=1280|coded_height=720|has_b_frames=0|sample_aspect_ratio=N/A|display_aspect_ratio=N/A|pix_fmt=yuv420p|level=-99|color_range=unknown|color_space=unknown|color_transfer=unknown|color_primaries=unknown|chroma_location=unspecified|field_order=unknown|refs=1|id=N/A|r_frame_rate=30/1|avg_frame_rate=25/1|time_base=1/1200000|start_pts=N/A|start_time=N/A|duration_ts=N/A|duration=N/A|bit_rate=N/A|max_bit_rate=N/A|bits_per_raw_sample=N/A|nb_frames=N/A|nb_read_frames=N/A|nb_read_packets=60|extradata_size=18|extradata_hash=CRC32:1255d52e|disposition:default=0|disposition:dub=0|disposition:original=0|disposition:comment=0|disposition:lyrics=0|disposition:karaoke=0|disposition:forced=0|disposition:hearing_impaired=0|disposition:visual_impaired=0|disposition:clean_effects=0|disposition:attached_pic=0|disposition:timed_thumbnails=0|disposition:non_diegetic=0|disposition:captions=0|disposition:descriptions=0|disposition:metadata=0|disposition:dependent=0|disposition:still_image=0|disposition:multilayer=0
+stream|index=0|codec_name=cavs|profile=unknown|codec_type=video|codec_tag_string=[0][0][0][0]|codec_tag=0x0000|width=1280|height=720|coded_width=1280|coded_height=720|has_b_frames=0|sample_aspect_ratio=N/A|display_aspect_ratio=N/A|pix_fmt=yuv420p|level=-99|color_range=unknown|color_space=unknown|color_transfer=unknown|color_primaries=unknown|chroma_location=unspecified|field_order=unknown|refs=1|id=N/A|r_frame_rate=30/1|avg_frame_rate=30/1|time_base=1/1200000|start_pts=N/A|start_time=N/A|duration_ts=N/A|duration=N/A|bit_rate=N/A|max_bit_rate=N/A|bits_per_raw_sample=N/A|nb_frames=N/A|nb_read_frames=N/A|nb_read_packets=60|extradata_size=18|extradata_hash=CRC32:1255d52e|disposition:default=0|disposition:dub=0|disposition:original=0|disposition:comment=0|disposition:lyrics=0|disposition:karaoke=0|disposition:forced=0|disposition:hearing_impaired=0|disposition:visual_impaired=0|disposition:clean_effects=0|disposition:attached_pic=0|disposition:timed_thumbnails=0|disposition:non_diegetic=0|disposition:captions=0|disposition:descriptions=0|disposition:metadata=0|disposition:dependent=0|disposition:still_image=0|disposition:multilayer=0
 format|filename=bunny.mp4|nb_streams=1|nb_programs=0|nb_stream_groups=0|format_name=cavsvideo|start_time=N/A|duration=N/A|size=177752|bit_rate=N/A|probe_score=51
diff --git a/tests/ref/fate/enhanced-flv-hevc-hdr10 b/tests/ref/fate/enhanced-flv-hevc-hdr10
index 525f056d66..bebcf84fab 100644
--- a/tests/ref/fate/enhanced-flv-hevc-hdr10
+++ b/tests/ref/fate/enhanced-flv-hevc-hdr10
@@ -4,7 +4,7 @@
 #codec_id 0: hevc
 #dimensions 0: 1280x720
 #sar 0: 0/1
-0,          0,          0,       40,    77718, 0xb59c83a5
+0,          0,          0,        0,    77718, 0xb59c83a5
 [FRAME]
 media_type=video
 stream_index=0
@@ -17,7 +17,7 @@ best_effort_timestamp=0
 best_effort_timestamp_time=0.000000
 duration=N/A
 duration_time=N/A
-pkt_pos=459
+pkt_pos=439
 pkt_size=77718
 width=1280
 height=720
diff --git a/tests/ref/fate/h264-bsf-dts2pts b/tests/ref/fate/h264-bsf-dts2pts
index f908bb44f5..46c640eab8 100644
--- a/tests/ref/fate/h264-bsf-dts2pts
+++ b/tests/ref/fate/h264-bsf-dts2pts
@@ -1,5 +1,5 @@
-219edd347ce3151f5b5579d300cd7179 *tests/data/fate/h264-bsf-dts2pts.mov
-243937 tests/data/fate/h264-bsf-dts2pts.mov
+451601038b9091014e45660bc98e09dc *tests/data/fate/h264-bsf-dts2pts.mov
+244033 tests/data/fate/h264-bsf-dts2pts.mov
 #extradata 0:       26, 0x75e2093d
 #tb 0: 1/1200000
 #media_type 0: video
@@ -7,52 +7,52 @@
 #dimensions 0: 352x288
 #sar 0: 0/1
 0,     -48000,          0,    48000,    13686, 0x5ee9bd4c
-0,          0,     240000,    48000,     9320, 0x17224db1, F=0x0
-0,      48000,     288000,    48000,     8903, 0xe394918b, F=0x0
-0,      96000,      96000,    48000,    10108, 0x98418e7e, F=0x0
-0,     144000,     144000,    48000,     2937, 0x49dccb76, F=0x0
-0,     192000,     192000,    48000,     2604, 0xfc8013cd, F=0x0
-0,     240000,     480000,    48000,     7420, 0xcb4155cd, F=0x0
-0,     288000,     528000,    48000,     5664, 0x060bc948, F=0x0
-0,     336000,     336000,    48000,     4859, 0x0a5a8368, F=0x0
-0,     384000,     384000,    48000,     2883, 0xb9639a19, F=0x0
-0,     432000,     432000,    48000,     2547, 0xba95e99d, F=0x0
-0,     480000,     672000,    48000,     4659, 0x19203a0d, F=0x0
-0,     528000,     696000,    48000,     9719, 0xb500c328, F=0x0
-0,     576000,     576000,    48000,     5078, 0x5359c6b8, F=0x0
-0,     624000,     624000,    48000,     5041, 0x88dfcdf1, F=0x0
-0,     672000,     864000,    48000,     9494, 0x29297319, F=0x0
-0,     720000,     720000,    48000,     4772, 0x80273a60, F=0x0
-0,     768000,     768000,    48000,     3237, 0xd99e742c, F=0x0
-0,     816000,     816000,    48000,     2650, 0xc7cc378a, F=0x0
-0,     864000,    1152000,    48000,     6519, 0x142aa357, F=0x0
-0,     912000,    1176000,    48000,     5878, 0xe70d7e21, F=0x0
-0,     960000,     960000,    48000,     2648, 0xe58b1c4b, F=0x0
-0,    1008000,    1008000,    48000,     4522, 0x33ad0882, F=0x0
-0,    1056000,    1056000,    48000,     3246, 0xdbfa539f, F=0x0
-0,    1104000,    1104000,    48000,     3027, 0xdb5bf675, F=0x0
-0,    1152000,    1392000,    48000,     9282, 0x07973603, F=0x0
-0,    1200000,    1200000,    48000,     2786, 0x14824d92, F=0x0
-0,    1248000,    1248000,    48000,     2719, 0x00614eef, F=0x0
-0,    1296000,    1296000,    48000,     2627, 0xe8e91216, F=0x0
-0,    1344000,    1344000,    48000,     2720, 0xbe974fcc, F=0x0
-0,    1392000,    1584000,    48000,     7687, 0x0de01895, F=0x0
-0,    1440000,    1440000,    48000,     5464, 0x113f954d, F=0x0
-0,    1488000,    1488000,    48000,     3482, 0x5c90cdae, F=0x0
-0,    1536000,    1536000,    48000,     2791, 0x4acb702a, F=0x0
-0,    1584000,    1872000,    48000,    11362, 0x13363bdb, F=0x0
-0,    1632000,    1920000,    48000,     2975, 0x99b1e813, F=0x0
-0,    1680000,    1680000,    48000,     2342, 0xe9587867, F=0x0
-0,    1728000,    1728000,    48000,     2634, 0x8d9814fc, F=0x0
-0,    1776000,    1776000,    48000,     2419, 0x033cbb5f, F=0x0
-0,    1824000,    1824000,    48000,     2498, 0x7dd9e476, F=0x0
-0,    1872000,    2112000,    48000,     2668, 0x358e2bd8, F=0x0
-0,    1920000,    2136000,    48000,     9068, 0x3a639927, F=0x0
-0,    1968000,    1968000,    48000,     4939, 0xa5309a8c, F=0x0
-0,    2016000,    2016000,    48000,     2650, 0x2ab82b97, F=0x0
-0,    2064000,    2064000,    48000,     2503, 0xfd97cd4c, F=0x0
-0,    2112000,    2352000,    48000,     5121, 0xaf88e5b8, F=0x0
-0,    2160000,    2160000,    48000,     2643, 0xa1791db0, F=0x0
-0,    2208000,    2208000,    48000,     2637, 0xe1a42510, F=0x0
-0,    2256000,    2256000,    48000,     2633, 0x08430f15, F=0x0
-0,    2304000,    2304000,    48000,     2721, 0xe6756990, F=0x0
+0,          0,     144000,    24000,     9320, 0x17224db1, F=0x0
+0,      24000,     168000,    24000,     8903, 0xe394918b, F=0x0
+0,      48000,      48000,    48000,    10108, 0x98418e7e, F=0x0
+0,      96000,      96000,    24000,     2937, 0x49dccb76, F=0x0
+0,     120000,     120000,    24000,     2604, 0xfc8013cd, F=0x0
+0,     144000,     288000,    24000,     7420, 0xcb4155cd, F=0x0
+0,     168000,     312000,    24000,     5664, 0x060bc948, F=0x0
+0,     192000,     192000,    48000,     4859, 0x0a5a8368, F=0x0
+0,     240000,     240000,    24000,     2883, 0xb9639a19, F=0x0
+0,     264000,     264000,    24000,     2547, 0xba95e99d, F=0x0
+0,     288000,     432000,    24000,     4659, 0x19203a0d, F=0x0
+0,     312000,     456000,    24000,     9719, 0xb500c328, F=0x0
+0,     336000,     336000,    48000,     5078, 0x5359c6b8, F=0x0
+0,     384000,     384000,    48000,     5041, 0x88dfcdf1, F=0x0
+0,     432000,     576000,    48000,     9494, 0x29297319, F=0x0
+0,     480000,     480000,    48000,     4772, 0x80273a60, F=0x0
+0,     528000,     528000,    24000,     3237, 0xd99e742c, F=0x0
+0,     552000,     552000,    24000,     2650, 0xc7cc378a, F=0x0
+0,     576000,     720000,    24000,     6519, 0x142aa357, F=0x0
+0,     600000,     744000,    24000,     5878, 0xe70d7e21, F=0x0
+0,     624000,     624000,    24000,     2648, 0xe58b1c4b, F=0x0
+0,     648000,     648000,    24000,     4522, 0x33ad0882, F=0x0
+0,     672000,     672000,    24000,     3246, 0xdbfa539f, F=0x0
+0,     696000,     696000,    24000,     3027, 0xdb5bf675, F=0x0
+0,     720000,     864000,    48000,     9282, 0x07973603, F=0x0
+0,     768000,     768000,    24000,     2786, 0x14824d92, F=0x0
+0,     792000,     792000,    24000,     2719, 0x00614eef, F=0x0
+0,     816000,     816000,    24000,     2627, 0xe8e91216, F=0x0
+0,     840000,     840000,    24000,     2720, 0xbe974fcc, F=0x0
+0,     864000,    1008000,    48000,     7687, 0x0de01895, F=0x0
+0,     912000,     912000,    48000,     5464, 0x113f954d, F=0x0
+0,     960000,     960000,    24000,     3482, 0x5c90cdae, F=0x0
+0,     984000,     984000,    24000,     2791, 0x4acb702a, F=0x0
+0,    1008000,    1152000,    24000,    11362, 0x13363bdb, F=0x0
+0,    1032000,    1176000,    24000,     2975, 0x99b1e813, F=0x0
+0,    1056000,    1056000,    24000,     2342, 0xe9587867, F=0x0
+0,    1080000,    1080000,    24000,     2634, 0x8d9814fc, F=0x0
+0,    1104000,    1104000,    24000,     2419, 0x033cbb5f, F=0x0
+0,    1128000,    1128000,    24000,     2498, 0x7dd9e476, F=0x0
+0,    1152000,    1296000,    24000,     2668, 0x358e2bd8, F=0x0
+0,    1176000,    1320000,    24000,     9068, 0x3a639927, F=0x0
+0,    1200000,    1200000,    48000,     4939, 0xa5309a8c, F=0x0
+0,    1248000,    1248000,    24000,     2650, 0x2ab82b97, F=0x0
+0,    1272000,    1272000,    24000,     2503, 0xfd97cd4c, F=0x0
+0,    1296000,    1440000,    48000,     5121, 0xaf88e5b8, F=0x0
+0,    1344000,    1344000,    24000,     2643, 0xa1791db0, F=0x0
+0,    1368000,    1368000,    24000,     2637, 0xe1a42510, F=0x0
+0,    1392000,    1392000,    24000,     2633, 0x08430f15, F=0x0
+0,    1416000,    1416000,    24000,     2721, 0xe6756990, F=0x0
diff --git a/tests/ref/fate/h264-conformance-cvpcmnl2_sva_c b/tests/ref/fate/h264-conformance-cvpcmnl2_sva_c
index 0303bc24e6..0a0795df41 100644
--- a/tests/ref/fate/h264-conformance-cvpcmnl2_sva_c
+++ b/tests/ref/fate/h264-conformance-cvpcmnl2_sva_c
@@ -1,7 +1,7 @@
-#tb 0: 1/25
+#tb 0: 1/50
 #media_type 0: video
 #codec_id 0: rawvideo
 #dimensions 0: 1280x720
 #sar 0: 0/1
-0,          0,          0,        1,  1382400, 0xccbe6bf8
-0,          1,          1,        1,  1382400, 0x49c0cfd7
+0,          0,          0,        2,  1382400, 0xccbe6bf8
+0,          2,          2,        2,  1382400, 0x49c0cfd7
diff --git a/tests/ref/fate/hevc-bsf-dts2pts-cra b/tests/ref/fate/hevc-bsf-dts2pts-cra
index 4e9e2c5114..e50aad6f70 100644
--- a/tests/ref/fate/hevc-bsf-dts2pts-cra
+++ b/tests/ref/fate/hevc-bsf-dts2pts-cra
@@ -1,5 +1,5 @@
-c3c00fdc637a19fa3d23d37d9974d28d *tests/data/fate/hevc-bsf-dts2pts-cra.mov
-103067 tests/data/fate/hevc-bsf-dts2pts-cra.mov
+f6102684fbf13aa624e6edece38c83f2 *tests/data/fate/hevc-bsf-dts2pts-cra.mov
+102971 tests/data/fate/hevc-bsf-dts2pts-cra.mov
 #extradata 0:      118, 0x25f51994
 #tb 0: 1/1200000
 #media_type 0: video
@@ -54,35 +54,35 @@ c3c00fdc637a19fa3d23d37d9974d28d *tests/data/fate/hevc-bsf-dts2pts-cra.mov
 0,    2016000,    2160000,    48000,      892, 0x0d2ab3bc, F=0x0
 0,    2064000,    2112000,    48000,      238, 0x45827561, F=0x0
 0,    2112000,    2208000,    48000,      281, 0x2a3a8e61, F=0x0
-0,    2160000,    2256010,    48000,     4629, 0xf2e0fb0f, F=0x0
-0,    2208000,    2256005,    48000,     1453, 0x6ae5dc98, F=0x0
-0,    2256000,    2256002,        1,      869, 0x3982ae69, F=0x0
-0,    2256001,    2256001,        1,      282, 0xd9e28960, F=0x0
-0,    2256002,    2256004,        2,      259, 0x253a809d, F=0x0
-0,    2256004,    2256007,        1,      835, 0x83499f30, F=0x0
-0,    2256005,    2256006,        1,      255, 0xa77b7690, F=0x0
-0,    2256006,    2256008,        1,      242, 0x83977ccf, F=0x0
-0,    2256007,    2256019,        1,     5082, 0xba55ee51, F=0x0
-0,    2256008,    2256014,        2,     1393, 0xc998b442, F=0x0
-0,    2256010,    2256012,        1,      742, 0x91ab75d2, F=0x0
-0,    2256011,    2256011,        1,      229, 0xfa326d98, F=0x0
-0,    2256012,    2256013,        1,      275, 0x49c38226, F=0x0
-0,    2256013,    2256017,        1,      869, 0xdd05acc4, F=0x0
-0,    2256014,    2256016,        2,      293, 0xcc9e904f, F=0x0
-0,    2256016,    2256018,        1,      334, 0x212aa4b1, F=0x0
-0,    2256017,    2256029,        1,     8539, 0xcccc9eb1
-0,    2256018,    2256024,        1,     1593, 0x5a351a68, F=0x0
-0,    2256019,    2256022,        1,     1042, 0xb77d00cc, F=0x0
-0,    2256020,    2256020,        2,      302, 0xbcdb9750, F=0x0
-0,    2256022,    2256023,        1,      336, 0xc7b0a55d, F=0x0
-0,    2256023,    2256026,        1,      875, 0x7e31b046, F=0x0
-0,    2256024,    2256025,        1,      401, 0xb473bca8, F=0x0
-0,    2256025,    2256028,        1,      246, 0x43357263, F=0x0
-0,    2256026,    2256038,        2,     3254, 0x8be44a2d, F=0x0
-0,    2256028,    2256034,        1,     1151, 0x29d52d14, F=0x0
-0,    2256029,    2256031,        1,      733, 0x33606982, F=0x0
-0,    2256030,    2256030,        1,      234, 0xb70a79ff, F=0x0
-0,    2256031,    2256032,        1,      228, 0x86916848, F=0x0
-0,    2256032,    2256036,        2,      689, 0xcca34b40, F=0x0
-0,    2256034,    2256035,        1,      223, 0xa96f6e31, F=0x0
-0,    2256035,    2256037,        1,      241, 0x7ac17531, F=0x0
+0,    2160000,    2640000,    48000,     4629, 0xf2e0fb0f, F=0x0
+0,    2208000,    2448000,    48000,     1453, 0x6ae5dc98, F=0x0
+0,    2256000,    2352000,    48000,      869, 0x3982ae69, F=0x0
+0,    2304000,    2304000,    48000,      282, 0xd9e28960, F=0x0
+0,    2352000,    2400000,    48000,      259, 0x253a809d, F=0x0
+0,    2400000,    2544000,    48000,      835, 0x83499f30, F=0x0
+0,    2448000,    2496000,    48000,      255, 0xa77b7690, F=0x0
+0,    2496000,    2592000,    48000,      242, 0x83977ccf, F=0x0
+0,    2544000,    3024000,    48000,     5082, 0xba55ee51, F=0x0
+0,    2592000,    2832000,    48000,     1393, 0xc998b442, F=0x0
+0,    2640000,    2736000,    48000,      742, 0x91ab75d2, F=0x0
+0,    2688000,    2688000,    48000,      229, 0xfa326d98, F=0x0
+0,    2736000,    2784000,    48000,      275, 0x49c38226, F=0x0
+0,    2784000,    2928000,    48000,      869, 0xdd05acc4, F=0x0
+0,    2832000,    2880000,    48000,      293, 0xcc9e904f, F=0x0
+0,    2880000,    2976000,    48000,      334, 0x212aa4b1, F=0x0
+0,    2928000,    3408000,    48000,     8539, 0xcccc9eb1
+0,    2976000,    3216000,    48000,     1593, 0x5a351a68, F=0x0
+0,    3024000,    3120000,    48000,     1042, 0xb77d00cc, F=0x0
+0,    3072000,    3072000,    48000,      302, 0xbcdb9750, F=0x0
+0,    3120000,    3168000,    48000,      336, 0xc7b0a55d, F=0x0
+0,    3168000,    3312000,    48000,      875, 0x7e31b046, F=0x0
+0,    3216000,    3264000,    48000,      401, 0xb473bca8, F=0x0
+0,    3264000,    3360000,    48000,      246, 0x43357263, F=0x0
+0,    3312000,    3792000,    48000,     3254, 0x8be44a2d, F=0x0
+0,    3360000,    3600000,    48000,     1151, 0x29d52d14, F=0x0
+0,    3408000,    3504000,    48000,      733, 0x33606982, F=0x0
+0,    3456000,    3456000,    48000,      234, 0xb70a79ff, F=0x0
+0,    3504000,    3552000,    48000,      228, 0x86916848, F=0x0
+0,    3552000,    3696000,    48000,      689, 0xcca34b40, F=0x0
+0,    3600000,    3648000,    48000,      223, 0xa96f6e31, F=0x0
+0,    3648000,    3744000,    48000,      241, 0x7ac17531, F=0x0
diff --git a/tests/ref/fate/hevc-bsf-dts2pts-idr b/tests/ref/fate/hevc-bsf-dts2pts-idr
index 9568a5932c..adfbe75987 100644
--- a/tests/ref/fate/hevc-bsf-dts2pts-idr
+++ b/tests/ref/fate/hevc-bsf-dts2pts-idr
@@ -1,5 +1,5 @@
-368d177821450241820bf3507d74b35a *tests/data/fate/hevc-bsf-dts2pts-idr.mov
-346603 tests/data/fate/hevc-bsf-dts2pts-idr.mov
+8f02d31fece6a067a8488d8ca6ee3329 *tests/data/fate/hevc-bsf-dts2pts-idr.mov
+346579 tests/data/fate/hevc-bsf-dts2pts-idr.mov
 #extradata 0:      699, 0x9c810c10
 #tb 0: 1/1200000
 #media_type 0: video
@@ -47,7 +47,7 @@
 0,    1680000,    1824000,    48000,     2208, 0x8e2146e8, F=0x0
 0,    1728000,    1776000,    48000,      900, 0x669fb97b, F=0x0
 0,    1776000,    1872000,    48000,      817, 0x3eb689e1, F=0x0
-0,    1824000,    2256001,    48000,    17919, 0x4e83b301, F=0x0
+0,    1824000,    2304000,    48000,    17919, 0x4e83b301, F=0x0
 0,    1872000,    2112000,    48000,     4911, 0x2fc88e1f, F=0x0
 0,    1920000,    2016000,    48000,     2717, 0xcc1f551e, F=0x0
 0,    1968000,    1968000,    48000,      908, 0x43e1b421, F=0x0
@@ -55,19 +55,19 @@
 0,    2064000,    2208000,    48000,     2857, 0xdeee9614, F=0x0
 0,    2112000,    2160000,    48000,     1039, 0x9cf8f494, F=0x0
 0,    2160000,    2256000,    48000,      975, 0x0f4ec85b, F=0x0
-0,    2208000,    2256011,    48000,    18345, 0xf9dabd3b, F=0x0
-0,    2256000,    2256006,        1,     5280, 0xc5c33809, F=0x0
-0,    2256001,    2256004,        1,     2755, 0xda955c75, F=0x0
-0,    2256002,    2256002,        2,     1027, 0xf652eff3, F=0x0
-0,    2256004,    2256005,        1,      901, 0xe23cb716, F=0x0
-0,    2256005,    2256008,        1,     3010, 0x783de5cb, F=0x0
-0,    2256006,    2256007,        1,     1005, 0x91fcf92f, F=0x0
-0,    2256007,    2256010,        1,     1131, 0xd88b2920, F=0x0
-0,    2256008,    2256008,        2,    29018, 0xc9b25608
-0,    2256010,    2256016,        1,     7436, 0x18344e77, F=0x0
-0,    2256011,    2256013,        1,     4050, 0x54fcbee0, F=0x0
-0,    2256012,    2256012,        1,     1412, 0xe249bd86, F=0x0
-0,    2256013,    2256014,        1,      977, 0xa749dc21, F=0x0
-0,    2256014,    2256014,        2,     2953, 0xba90c008, F=0x0
-0,    2256016,    2256017,        1,     1004, 0xddebea9a, F=0x0
-0,    2256017,    2256017,        1,     1109, 0x7e081570, F=0x0
+0,    2208000,    2688000,    48000,    18345, 0xf9dabd3b, F=0x0
+0,    2256000,    2496000,    48000,     5280, 0xc5c33809, F=0x0
+0,    2304000,    2400000,    48000,     2755, 0xda955c75, F=0x0
+0,    2352000,    2352000,    48000,     1027, 0xf652eff3, F=0x0
+0,    2400000,    2448000,    48000,      901, 0xe23cb716, F=0x0
+0,    2448000,    2592000,    48000,     3010, 0x783de5cb, F=0x0
+0,    2496000,    2544000,    48000,     1005, 0x91fcf92f, F=0x0
+0,    2544000,    2640000,    48000,     1131, 0xd88b2920, F=0x0
+0,    2592000,    2592000,    48000,    29018, 0xc9b25608
+0,    2640000,    2880000,    48000,     7436, 0x18344e77, F=0x0
+0,    2688000,    2784000,    48000,     4050, 0x54fcbee0, F=0x0
+0,    2736000,    2736000,    48000,     1412, 0xe249bd86, F=0x0
+0,    2784000,    2832000,        1,      977, 0xa749dc21, F=0x0
+0,    2784001,    2784001,    95999,     2953, 0xba90c008, F=0x0
+0,    2880000,    2928000,        1,     1004, 0xddebea9a, F=0x0
+0,    2880001,    2880001,    48000,     1109, 0x7e081570, F=0x0
diff --git a/tests/ref/fate/hevc-bsf-dts2pts-idr-cra b/tests/ref/fate/hevc-bsf-dts2pts-idr-cra
index 02e9765a26..b12f02581f 100644
--- a/tests/ref/fate/hevc-bsf-dts2pts-idr-cra
+++ b/tests/ref/fate/hevc-bsf-dts2pts-idr-cra
@@ -1,5 +1,5 @@
-07a216d6537502705348fea392d5d73d *tests/data/fate/hevc-bsf-dts2pts-idr-cra.mov
-375266 tests/data/fate/hevc-bsf-dts2pts-idr-cra.mov
+ed8d735b9a2780ce8ada61fd31c68886 *tests/data/fate/hevc-bsf-dts2pts-idr-cra.mov
+375210 tests/data/fate/hevc-bsf-dts2pts-idr-cra.mov
 #extradata 0:      648, 0x30a7fa5c
 #tb 0: 1/1200000
 #media_type 0: video
@@ -47,7 +47,7 @@
 0,    1680000,    1824000,    48000,     2674, 0x296fd411, F=0x0
 0,    1728000,    1776000,    48000,     1420, 0x304d6168, F=0x0
 0,    1776000,    1872000,    48000,     1364, 0x48734084, F=0x0
-0,    1824000,    2256001,    48000,    19349, 0xda0944f0, F=0x0
+0,    1824000,    2304000,    48000,    19349, 0xda0944f0, F=0x0
 0,    1872000,    2112000,    48000,     5550, 0x9c346b7b, F=0x0
 0,    1920000,    2016000,    48000,     3180, 0x5368f72b, F=0x0
 0,    1968000,    1968000,    48000,     1475, 0x369b7aae, F=0x0
@@ -55,19 +55,19 @@
 0,    2064000,    2208000,    48000,     3365, 0x81d511a6, F=0x0
 0,    2112000,    2160000,    48000,     1575, 0xba81b0a5, F=0x0
 0,    2160000,    2256000,    48000,     1538, 0xf1199439, F=0x0
-0,    2208000,    2256011,    48000,    20138, 0x101fb220, F=0x0
-0,    2256000,    2256006,        1,     5806, 0xcc500158, F=0x0
-0,    2256001,    2256004,        1,     3267, 0xf8f0d6cd, F=0x0
-0,    2256002,    2256002,        2,     1557, 0x894faf3e, F=0x0
-0,    2256004,    2256005,        1,     1482, 0x6b1884b7, F=0x0
-0,    2256005,    2256008,        1,     3513, 0x81227a59, F=0x0
-0,    2256006,    2256007,        1,     1576, 0x855eabd8, F=0x0
-0,    2256007,    2256010,        1,     1668, 0x030ade2e, F=0x0
-0,    2256008,    2256008,        2,    32088, 0xfadbf5f6
-0,    2256010,    2256016,        1,     5921, 0x21fb4976, F=0x0
-0,    2256011,    2256013,        1,     3436, 0x92085cd6, F=0x0
-0,    2256012,    2256012,        1,     1613, 0x1e0cb7c6, F=0x0
-0,    2256013,    2256014,        1,     1483, 0x88d18a4e, F=0x0
-0,    2256014,    2256018,        2,     3540, 0xc57c8e7b, F=0x0
-0,    2256016,    2256017,        1,     1561, 0x9740a363, F=0x0
-0,    2256017,    2256019,        1,     1651, 0x18e3d52d, F=0x0
+0,    2208000,    2688000,    48000,    20138, 0x101fb220, F=0x0
+0,    2256000,    2496000,    48000,     5806, 0xcc500158, F=0x0
+0,    2304000,    2400000,    48000,     3267, 0xf8f0d6cd, F=0x0
+0,    2352000,    2352000,    48000,     1557, 0x894faf3e, F=0x0
+0,    2400000,    2448000,    48000,     1482, 0x6b1884b7, F=0x0
+0,    2448000,    2592000,    48000,     3513, 0x81227a59, F=0x0
+0,    2496000,    2544000,    48000,     1576, 0x855eabd8, F=0x0
+0,    2544000,    2640000,    48000,     1668, 0x030ade2e, F=0x0
+0,    2592000,    2592000,    48000,    32088, 0xfadbf5f6
+0,    2640000,    2880000,    48000,     5921, 0x21fb4976, F=0x0
+0,    2688000,    2784000,    48000,     3436, 0x92085cd6, F=0x0
+0,    2736000,    2736000,    48000,     1613, 0x1e0cb7c6, F=0x0
+0,    2784000,    2832000,    48000,     1483, 0x88d18a4e, F=0x0
+0,    2832000,    2976000,    48000,     3540, 0xc57c8e7b, F=0x0
+0,    2880000,    2928000,    48000,     1561, 0x9740a363, F=0x0
+0,    2928000,    3024000,    48000,     1651, 0x18e3d52d, F=0x0
diff --git a/tests/ref/lavf-fate/hevc.mp4 b/tests/ref/lavf-fate/hevc.mp4
index aea5ae8979..39dddd49cb 100644
--- a/tests/ref/lavf-fate/hevc.mp4
+++ b/tests/ref/lavf-fate/hevc.mp4
@@ -1,3 +1,3 @@
-37b3a3e84df2350380b05b2af4dc97f5 *tests/data/lavf-fate/lavf.hevc.mp4
+539b0f6daece6656609de0898a1f3d42 *tests/data/lavf-fate/lavf.hevc.mp4
 151340 tests/data/lavf-fate/lavf.hevc.mp4
 tests/data/lavf-fate/lavf.hevc.mp4 CRC=0xc0a771de
-- 
2.52.0


From 58fc24016c5c71045267753e91603dde28c99e5f Mon Sep 17 00:00:00 2001
From: Ayose <ayosec@gmail.com>
Date: Wed, 26 Nov 2025 22:00:39 +0000
Subject: [PATCH 131/304] tests/fate-filter-drawvg-video: copy drawvg.lines
 file to tests/data.

If the SRC_PATH variable contains certain characters (like a `:`, which may
happen when FATE is executed on Windows), the value for the `file` option is
broken, so `make fate-filter-drawvg-video` always fails.

The solution in this commit is to copy the `drawvg.lines` to the `tests/data`
directory (which already has temp files), so the value for `file` is a fixed
string with no problematic characters.

Signed-off-by: Ayose <ayosec@gmail.com>
---
 tests/fate/filter-video.mak | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tests/fate/filter-video.mak b/tests/fate/filter-video.mak
index 3fe7f10476..087ba8d9cd 100644
--- a/tests/fate/filter-video.mak
+++ b/tests/fate/filter-video.mak
@@ -609,7 +609,10 @@ fate-filter-tiltandshift-410: CMD = framecrc -c:v pgmyuv -i $(SRC) -flags +bitex
 fate-filter-tiltandshift-422: CMD = framecrc -c:v pgmyuv -i $(SRC) -flags +bitexact -vf scale=sws_flags=+accurate_rnd+bitexact,format=yuv422p,tiltandshift
 fate-filter-tiltandshift-444: CMD = framecrc -c:v pgmyuv -i $(SRC) -flags +bitexact -vf scale=sws_flags=+accurate_rnd+bitexact,format=yuv444p,tiltandshift
 
-DRAWVG_SCRIPT_LINES = $(SRC_PATH)/tests/ref/lavf/drawvg.lines
+DRAWVG_SCRIPT_LINES = tests/data/fate/drawvg.lines
+$(DRAWVG_SCRIPT_LINES): $(SRC_PATH)/tests/ref/lavf/drawvg.lines
+	$(M)cp $< $@
+
 FATE_FILTER_VSYNTH_VIDEO_FILTER-$(CONFIG_DRAWVG_FILTER) += fate-filter-drawvg-video
 fate-filter-drawvg-video: $(DRAWVG_SCRIPT_LINES)
 fate-filter-drawvg-video: CMD = video_filter scale,format=bgr0,drawvg=file=$(DRAWVG_SCRIPT_LINES)
-- 
2.52.0


From 077bc459c02734d9931da511006626a329d0f15a Mon Sep 17 00:00:00 2001
From: Hao Chen <chenhao@loongson.cn>
Date: Tue, 11 Nov 2025 19:05:48 +0800
Subject: [PATCH 132/304] swscale: Fix out-of-bounds write errors in
 yuv2rgb_lasx.c file.

The patch adds support for dstw values ending in 2, 4, 6, 8, 10, 12, and 14,
which fixes the out-of-bounds write problem.
---
 libswscale/loongarch/yuv2rgb_lasx.c | 108 ++++++++++++++++++++++++----
 1 file changed, 96 insertions(+), 12 deletions(-)

diff --git a/libswscale/loongarch/yuv2rgb_lasx.c b/libswscale/loongarch/yuv2rgb_lasx.c
index d83e5d70fe..9032887ff8 100644
--- a/libswscale/loongarch/yuv2rgb_lasx.c
+++ b/libswscale/loongarch/yuv2rgb_lasx.c
@@ -173,7 +173,7 @@
     __m256i shuf3 = {0x1E0F0E1C0D0C1A0B, 0x0101010101010101,                        \
                      0x1E0F0E1C0D0C1A0B, 0x0101010101010101};                       \
     YUV2RGB_LOAD_COE                                                                \
-    y      = (c->opts.dst_w + 7) & ~7;                                              \
+    y      = c->opts.dst_w;                                                         \
     h_size = y >> 4;                                                                \
     res    = y & 15;                                                                \
                                                                                     \
@@ -199,7 +199,7 @@
     __m256i a = __lasx_xvldi(0xFF);                                                 \
                                                                                     \
     YUV2RGB_LOAD_COE                                                                \
-    y      = (c->opts.dst_w + 7) & ~7;                                              \
+    y      = c->opts.dst_w;                                                         \
     h_size = y >> 4;                                                                \
     res    = y & 15;                                                                \
                                                                                     \
@@ -215,7 +215,7 @@
         const uint8_t *pv   = src[2] +   (y >> vshift) * srcStride[2];              \
         for(x = 0; x < h_size; x++) {                                               \
 
-#define DEALYUV2RGBREMAIN                                                           \
+#define DEALYUV2RGBLINE                                                             \
             py_1 += 16;                                                             \
             py_2 += 16;                                                             \
             pu += 8;                                                                \
@@ -223,9 +223,40 @@
             image1 += 48;                                                           \
             image2 += 48;                                                           \
         }                                                                           \
-        if (res) {                                                                  \
+        if (res & 8) {                                                              \
 
-#define DEALYUV2RGBREMAIN32                                                         \
+#define DEALYUV2RGBLINERES                                                          \
+            py_1 += 8;                                                              \
+            py_2 += 8;                                                              \
+            pu += 4;                                                                \
+            pv += 4;                                                                \
+            image1 += 24;                                                           \
+            image2 += 24;                                                           \
+            res -= 8 ;                                                              \
+        }                                                                           \
+        if (res) {
+
+#define ENDYUV2RGBLINE(rgb_l, rgb_h, image_1, image_2)                              \
+        if (res == 6) {                                                             \
+            __lasx_xvstelm_d(rgb_l, image_1, 0, 0);                                 \
+            __lasx_xvstelm_d(rgb_l, image_1, 8, 1);                                 \
+            __lasx_xvstelm_h(rgb_h, image_1, 16, 0);                                \
+            __lasx_xvstelm_d(rgb_l, image_2, 0, 2);                                 \
+            __lasx_xvstelm_d(rgb_l, image_2, 8, 3);                                 \
+            __lasx_xvstelm_h(rgb_h, image_2, 16, 8);                                \
+        } else if (res == 4) {                                                      \
+            __lasx_xvstelm_d(rgb_l, image_1, 0, 0);                                 \
+            __lasx_xvstelm_w(rgb_l, image_1, 8, 2);                                 \
+            __lasx_xvstelm_d(rgb_l, image_2, 0, 2);                                 \
+            __lasx_xvstelm_w(rgb_l, image_2, 8, 6);                                 \
+        } else if (res == 2) {                                                      \
+            __lasx_xvstelm_w(rgb_l, image_1, 0, 0);                                 \
+            __lasx_xvstelm_h(rgb_l, image_1, 4, 2);                                 \
+            __lasx_xvstelm_w(rgb_l, image_2, 0, 4);                                 \
+            __lasx_xvstelm_h(rgb_l, image_2, 4, 10);                                \
+        }
+
+#define DEALYUV2RGBLINE32                                                           \
             py_1 += 16;                                                             \
             py_2 += 16;                                                             \
             pu += 8;                                                                \
@@ -233,7 +264,36 @@
             image1 += 16;                                                           \
             image2 += 16;                                                           \
         }                                                                           \
-        if (res) {                                                                  \
+        if (res & 8) {                                                              \
+
+#define DEALYUV2RGBLINERES32                                                        \
+            py_1 += 8;                                                              \
+            py_2 += 8;                                                              \
+            pu += 4;                                                                \
+            pv += 4;                                                                \
+            image1 += 8;                                                            \
+            image2 += 8;                                                            \
+            res -= 8;                                                               \
+        }                                                                           \
+        if (res) {
+
+#define ENDYUV2RGBLINE32(rgb_l, rgb_h, image_1, image_2)                            \
+        if (res == 6) {                                                             \
+            __lasx_xvstelm_d(rgb_l, image_1, 0, 0);                                 \
+            __lasx_xvstelm_d(rgb_l, image_1, 8, 1);                                 \
+            __lasx_xvstelm_d(rgb_l, image_1, 16, 2);                                \
+            __lasx_xvstelm_d(rgb_h, image_2, 0, 0);                                 \
+            __lasx_xvstelm_d(rgb_h, image_2, 8, 1);                                 \
+            __lasx_xvstelm_d(rgb_h, image_2, 16, 2);                                \
+        } else if (res == 4) {                                                      \
+            __lasx_xvstelm_d(rgb_l, image_1, 0, 0);                                 \
+            __lasx_xvstelm_d(rgb_l, image_1, 8, 1);                                 \
+            __lasx_xvstelm_d(rgb_h, image_2, 0, 0);                                 \
+            __lasx_xvstelm_d(rgb_h, image_2, 8, 1);                                 \
+        } else if (res == 2) {                                                      \
+            __lasx_xvstelm_d(rgb_l, image_1, 0, 0);                                 \
+            __lasx_xvstelm_d(rgb_h, image_2, 0, 0);                                 \
+        }
 
 
 #define END_FUNC()                          \
@@ -249,10 +309,14 @@ YUV2RGBFUNC(yuv420_rgb24_lasx, uint8_t, 0)
     RGB_PACK(r2, g2, b2, rgb2_l, rgb2_h);
     RGB_STORE(rgb1_l, rgb1_h, image1);
     RGB_STORE(rgb2_l, rgb2_h, image2);
-    DEALYUV2RGBREMAIN
+    DEALYUV2RGBLINE
     YUV2RGB_RES
     RGB_PACK(r1, g1, b1, rgb1_l, rgb1_h);
     RGB_STORE_RES(rgb1_l, rgb1_h, image1, image2);
+    DEALYUV2RGBLINERES
+    YUV2RGB_RES
+    RGB_PACK(r1, g1, b1, rgb1_l, rgb1_h);
+    ENDYUV2RGBLINE(rgb1_l, rgb1_h, image1, image2);
     END_FUNC()
 
 YUV2RGBFUNC(yuv420_bgr24_lasx, uint8_t, 0)
@@ -262,10 +326,14 @@ YUV2RGBFUNC(yuv420_bgr24_lasx, uint8_t, 0)
     RGB_PACK(b2, g2, r2, rgb2_l, rgb2_h);
     RGB_STORE(rgb1_l, rgb1_h, image1);
     RGB_STORE(rgb2_l, rgb2_h, image2);
-    DEALYUV2RGBREMAIN
+    DEALYUV2RGBLINE
     YUV2RGB_RES
     RGB_PACK(b1, g1, r1, rgb1_l, rgb1_h);
     RGB_STORE_RES(rgb1_l, rgb1_h, image1, image2);
+    DEALYUV2RGBLINERES
+    YUV2RGB_RES
+    RGB_PACK(b1, g1, r1, rgb1_l, rgb1_h);
+    ENDYUV2RGBLINE(rgb1_l, rgb1_h, image1, image2);
     END_FUNC()
 
 YUV2RGBFUNC32(yuv420_rgba32_lasx, uint32_t, 0)
@@ -275,10 +343,14 @@ YUV2RGBFUNC32(yuv420_rgba32_lasx, uint32_t, 0)
     RGB32_PACK(r2, g2, b2, a, rgb2_l, rgb2_h);
     RGB32_STORE(rgb1_l, rgb1_h, image1);
     RGB32_STORE(rgb2_l, rgb2_h, image2);
-    DEALYUV2RGBREMAIN32
+    DEALYUV2RGBLINE32
     YUV2RGB_RES
     RGB32_PACK(r1, g1, b1, a, rgb1_l, rgb1_h);
     RGB32_STORE_RES(rgb1_l, rgb1_h, image1, image2);
+    DEALYUV2RGBLINERES32
+    YUV2RGB_RES
+    RGB32_PACK(r1, g1, b1, a, rgb1_l, rgb1_h);
+    ENDYUV2RGBLINE32(rgb1_l, rgb1_h, image1, image2);
     END_FUNC()
 
 YUV2RGBFUNC32(yuv420_bgra32_lasx, uint32_t, 0)
@@ -288,10 +360,14 @@ YUV2RGBFUNC32(yuv420_bgra32_lasx, uint32_t, 0)
     RGB32_PACK(b2, g2, r2, a, rgb2_l, rgb2_h);
     RGB32_STORE(rgb1_l, rgb1_h, image1);
     RGB32_STORE(rgb2_l, rgb2_h, image2);
-    DEALYUV2RGBREMAIN32
+    DEALYUV2RGBLINE32
     YUV2RGB_RES
     RGB32_PACK(b1, g1, r1, a, rgb1_l, rgb1_h);
     RGB32_STORE_RES(rgb1_l, rgb1_h, image1, image2);
+    DEALYUV2RGBLINERES32
+    YUV2RGB_RES
+    RGB32_PACK(b1, g1, r1, a, rgb1_l, rgb1_h);
+    ENDYUV2RGBLINE32(rgb1_l, rgb1_h, image1, image2);
     END_FUNC()
 
 YUV2RGBFUNC32(yuv420_argb32_lasx, uint32_t, 0)
@@ -301,10 +377,14 @@ YUV2RGBFUNC32(yuv420_argb32_lasx, uint32_t, 0)
     RGB32_PACK(a, r2, g2, b2, rgb2_l, rgb2_h);
     RGB32_STORE(rgb1_l, rgb1_h, image1);
     RGB32_STORE(rgb2_l, rgb2_h, image2);
-    DEALYUV2RGBREMAIN32
+    DEALYUV2RGBLINE32
     YUV2RGB_RES
     RGB32_PACK(a, r1, g1, b1, rgb1_l, rgb1_h);
     RGB32_STORE_RES(rgb1_l, rgb1_h, image1, image2);
+    DEALYUV2RGBLINERES32
+    YUV2RGB_RES
+    RGB32_PACK(a, r1, g1, b1, rgb1_l, rgb1_h);
+    ENDYUV2RGBLINE32(rgb1_l, rgb1_h, image1, image2);
     END_FUNC()
 
 YUV2RGBFUNC32(yuv420_abgr32_lasx, uint32_t, 0)
@@ -314,8 +394,12 @@ YUV2RGBFUNC32(yuv420_abgr32_lasx, uint32_t, 0)
     RGB32_PACK(a, b2, g2, r2, rgb2_l, rgb2_h);
     RGB32_STORE(rgb1_l, rgb1_h, image1);
     RGB32_STORE(rgb2_l, rgb2_h, image2);
-    DEALYUV2RGBREMAIN32
+    DEALYUV2RGBLINE32
     YUV2RGB_RES
     RGB32_PACK(a, b1, g1, r1, rgb1_l, rgb1_h);
     RGB32_STORE_RES(rgb1_l, rgb1_h, image1, image2);
+    DEALYUV2RGBLINERES32
+    YUV2RGB_RES
+    RGB32_PACK(a, b1, g1, r1, rgb1_l, rgb1_h);
+    ENDYUV2RGBLINE32(rgb1_l, rgb1_h, image1, image2);
     END_FUNC()
-- 
2.52.0


From 75722aefa741b5edb0ab97e6978895a18cfa0921 Mon Sep 17 00:00:00 2001
From: Piotr Pawlowski <p@perkele.cc>
Date: Thu, 30 Oct 2025 14:41:40 +0100
Subject: [PATCH 133/304] All: Removed reliance on compiler performing dead
 code elimination, changed various macro constant checks from if() to #if

---
 libavcodec/avcodec.c                 |  5 +-
 libavcodec/encode.c                  | 17 +++---
 libavcodec/x86/flacdsp_init.c        | 16 ++++--
 libavcodec/x86/hevc/dsp_init.c       | 84 +++++++++++++++-------------
 libavcodec/x86/idctdsp_init.c        |  9 ++-
 libavcodec/x86/mlpdsp_init.c         |  6 +-
 libavcodec/x86/v210-init.c           | 24 +++++---
 libavcodec/x86/v210enc_init.c        |  3 +-
 libavfilter/x86/colorspacedsp_init.c |  4 +-
 libavfilter/x86/f_ebur128_init.c     |  4 +-
 libavfilter/x86/vf_atadenoise_init.c |  6 +-
 libavfilter/x86/vf_bwdif_init.c      | 12 ++--
 libavfilter/x86/vf_nlmeans_init.c    |  4 +-
 libavfilter/x86/vf_ssim_init.c       |  4 +-
 libavfilter/x86/vf_w3fdif_init.c     |  4 +-
 15 files changed, 125 insertions(+), 77 deletions(-)

diff --git a/libavcodec/avcodec.c b/libavcodec/avcodec.c
index 14aca0f164..dacdfba8da 100644
--- a/libavcodec/avcodec.c
+++ b/libavcodec/avcodec.c
@@ -437,10 +437,11 @@ av_cold void ff_codec_close(AVCodecContext *avctx)
     if (avcodec_is_open(avctx)) {
         AVCodecInternal *avci = avctx->internal;
 
-        if (CONFIG_FRAME_THREAD_ENCODER &&
-            avci->frame_thread_encoder && avctx->thread_count > 1) {
+#if CONFIG_FRAME_THREAD_ENCODER
+        if (avci->frame_thread_encoder && avctx->thread_count > 1) {
             ff_frame_thread_encoder_free(avctx);
         }
+#endif
         if (HAVE_THREADS && avci->thread_ctx)
             ff_thread_free(avctx);
         if (avci->needs_close && ffcodec(avctx->codec)->close)
diff --git a/libavcodec/encode.c b/libavcodec/encode.c
index 6e0620a36e..407bd8920f 100644
--- a/libavcodec/encode.c
+++ b/libavcodec/encode.c
@@ -317,12 +317,13 @@ static int encode_simple_internal(AVCodecContext *avctx, AVPacket *avpkt)
 
     av_assert0(codec->cb_type == FF_CODEC_CB_TYPE_ENCODE);
 
-    if (CONFIG_FRAME_THREAD_ENCODER && avci->frame_thread_encoder)
+#if CONFIG_FRAME_THREAD_ENCODER
+    if (avci->frame_thread_encoder)
         /* This will unref frame. */
         ret = ff_thread_video_encode_frame(avctx, avpkt, frame, &got_packet);
-    else {
+    else
+#endif
         ret = ff_encode_encode_cb(avctx, avpkt, frame, &got_packet);
-    }
 
     if (avci->draining && !got_packet)
         avci->draining_done = 1;
@@ -825,11 +826,11 @@ int ff_encode_preinit(AVCodecContext *avctx)
         memcpy(sd_packet->data, sd_frame->data, sd_frame->size);
     }
 
-    if (CONFIG_FRAME_THREAD_ENCODER) {
-        ret = ff_frame_thread_encoder_init(avctx);
-        if (ret < 0)
-            return ret;
-    }
+#if CONFIG_FRAME_THREAD_ENCODER
+    ret = ff_frame_thread_encoder_init(avctx);
+    if (ret < 0)
+        return ret;
+#endif
 
     return 0;
 }
diff --git a/libavcodec/x86/flacdsp_init.c b/libavcodec/x86/flacdsp_init.c
index fa993d3466..386955ba67 100644
--- a/libavcodec/x86/flacdsp_init.c
+++ b/libavcodec/x86/flacdsp_init.c
@@ -85,8 +85,10 @@ av_cold void ff_flacdsp_init_x86(FLACDSPContext *c, enum AVSampleFormat fmt, int
                 c->decorrelate[0] = ff_flac_decorrelate_indep4_16_ssse3;
             else if (channels == 6)
                 c->decorrelate[0] = ff_flac_decorrelate_indep6_16_ssse3;
-            else if (ARCH_X86_64 && channels == 8)
+#if ARCH_X86_64
+            else if (channels == 8)
                 c->decorrelate[0] = ff_flac_decorrelate_indep8_16_ssse3;
+#endif
         } else if (fmt == AV_SAMPLE_FMT_S32) {
             if (channels == 2)
                 c->decorrelate[0] = ff_flac_decorrelate_indep2_32_ssse3;
@@ -94,8 +96,10 @@ av_cold void ff_flacdsp_init_x86(FLACDSPContext *c, enum AVSampleFormat fmt, int
                 c->decorrelate[0] = ff_flac_decorrelate_indep4_32_ssse3;
             else if (channels == 6)
                 c->decorrelate[0] = ff_flac_decorrelate_indep6_32_ssse3;
-            else if (ARCH_X86_64 && channels == 8)
+#if ARCH_X86_64
+            else if (channels == 8)
                 c->decorrelate[0] = ff_flac_decorrelate_indep8_32_ssse3;
+#endif
         }
     }
     if (EXTERNAL_SSE4(cpu_flags)) {
@@ -105,15 +109,19 @@ av_cold void ff_flacdsp_init_x86(FLACDSPContext *c, enum AVSampleFormat fmt, int
     }
     if (EXTERNAL_AVX(cpu_flags)) {
         if (fmt == AV_SAMPLE_FMT_S16) {
-            if (ARCH_X86_64 && channels == 8)
+#if ARCH_X86_64
+            if (channels == 8)
                 c->decorrelate[0] = ff_flac_decorrelate_indep8_16_avx;
+#endif
         } else if (fmt == AV_SAMPLE_FMT_S32) {
             if (channels == 4)
                 c->decorrelate[0] = ff_flac_decorrelate_indep4_32_avx;
             else if (channels == 6)
                 c->decorrelate[0] = ff_flac_decorrelate_indep6_32_avx;
-            else if (ARCH_X86_64 && channels == 8)
+#if ARCH_X86_64
+            else if (channels == 8)
                 c->decorrelate[0] = ff_flac_decorrelate_indep8_32_avx;
+#endif
         }
     }
     if (EXTERNAL_XOP(cpu_flags)) {
diff --git a/libavcodec/x86/hevc/dsp_init.c b/libavcodec/x86/hevc/dsp_init.c
index f1558b7e3e..5b2b10f33a 100644
--- a/libavcodec/x86/hevc/dsp_init.c
+++ b/libavcodec/x86/hevc/dsp_init.c
@@ -821,13 +821,13 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth)
         if (EXTERNAL_SSE2(cpu_flags)) {
             c->hevc_v_loop_filter_chroma = ff_hevc_v_loop_filter_chroma_8_sse2;
             c->hevc_h_loop_filter_chroma = ff_hevc_h_loop_filter_chroma_8_sse2;
-            if (ARCH_X86_64) {
-                c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_8_sse2;
-                c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_8_sse2;
+#if ARCH_X86_64
+            c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_8_sse2;
+            c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_8_sse2;
 
-                c->idct[2] = ff_hevc_idct_16x16_8_sse2;
-                c->idct[3] = ff_hevc_idct_32x32_8_sse2;
-            }
+            c->idct[2] = ff_hevc_idct_16x16_8_sse2;
+            c->idct[3] = ff_hevc_idct_32x32_8_sse2;
+#endif
             SAO_BAND_INIT(8, sse2);
 
             c->idct_dc[0] = ff_hevc_idct_4x4_dc_8_sse2;
@@ -843,10 +843,10 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth)
             c->add_residual[3] = ff_hevc_add_residual_32_8_sse2;
         }
         if (EXTERNAL_SSSE3(cpu_flags)) {
-            if(ARCH_X86_64) {
-                c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_8_ssse3;
-                c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_8_ssse3;
-            }
+#if ARCH_X86_64
+            c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_8_ssse3;
+            c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_8_ssse3;
+#endif
             SAO_EDGE_INIT(8, ssse3);
         }
 #if HAVE_SSE4_EXTERNAL && ARCH_X86_64
@@ -866,13 +866,13 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth)
         if (EXTERNAL_AVX(cpu_flags)) {
             c->hevc_v_loop_filter_chroma = ff_hevc_v_loop_filter_chroma_8_avx;
             c->hevc_h_loop_filter_chroma = ff_hevc_h_loop_filter_chroma_8_avx;
-            if (ARCH_X86_64) {
-                c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_8_avx;
-                c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_8_avx;
+#if ARCH_X86_64
+            c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_8_avx;
+            c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_8_avx;
 
-                c->idct[2] = ff_hevc_idct_16x16_8_avx;
-                c->idct[3] = ff_hevc_idct_32x32_8_avx;
-            }
+            c->idct[2] = ff_hevc_idct_16x16_8_avx;
+            c->idct[3] = ff_hevc_idct_32x32_8_avx;
+#endif
             SAO_BAND_INIT(8, avx);
 
             c->idct[0] = ff_hevc_idct_4x4_8_avx;
@@ -982,7 +982,8 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth)
             c->add_residual[3] = ff_hevc_add_residual_32_8_avx2;
         }
 #endif /* HAVE_AVX2_EXTERNAL */
-        if (EXTERNAL_AVX512ICL(cpu_flags) && ARCH_X86_64) {
+#if ARCH_X86_64
+        if (EXTERNAL_AVX512ICL(cpu_flags)) {
             c->put_hevc_qpel[1][0][1] = ff_hevc_put_qpel_h4_8_avx512icl;
             c->put_hevc_qpel[3][0][1] = ff_hevc_put_qpel_h8_8_avx512icl;
             c->put_hevc_qpel[5][0][1] = ff_hevc_put_qpel_h16_8_avx512icl;
@@ -990,6 +991,7 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth)
             c->put_hevc_qpel[9][0][1] = ff_hevc_put_qpel_h64_8_avx512icl;
             c->put_hevc_qpel[3][1][1] = ff_hevc_put_qpel_hv8_8_avx512icl;
         }
+#endif
     } else if (bit_depth == 10) {
         if (EXTERNAL_MMXEXT(cpu_flags)) {
             c->add_residual[0] = ff_hevc_add_residual_4_10_mmxext;
@@ -997,13 +999,13 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth)
         if (EXTERNAL_SSE2(cpu_flags)) {
             c->hevc_v_loop_filter_chroma = ff_hevc_v_loop_filter_chroma_10_sse2;
             c->hevc_h_loop_filter_chroma = ff_hevc_h_loop_filter_chroma_10_sse2;
-            if (ARCH_X86_64) {
-                c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_10_sse2;
-                c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_10_sse2;
+#if ARCH_X86_64
+            c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_10_sse2;
+            c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_10_sse2;
 
-                c->idct[2] = ff_hevc_idct_16x16_10_sse2;
-                c->idct[3] = ff_hevc_idct_32x32_10_sse2;
-            }
+            c->idct[2] = ff_hevc_idct_16x16_10_sse2;
+            c->idct[3] = ff_hevc_idct_32x32_10_sse2;
+#endif
             SAO_BAND_INIT(10, sse2);
             SAO_EDGE_INIT(10, sse2);
 
@@ -1019,10 +1021,12 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth)
             c->add_residual[2] = ff_hevc_add_residual_16_10_sse2;
             c->add_residual[3] = ff_hevc_add_residual_32_10_sse2;
         }
-        if (EXTERNAL_SSSE3(cpu_flags) && ARCH_X86_64) {
+#if ARCH_X86_64
+        if (EXTERNAL_SSSE3(cpu_flags)) {
             c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_10_ssse3;
             c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_10_ssse3;
         }
+#endif
 #if HAVE_SSE4_EXTERNAL && ARCH_X86_64
         if (EXTERNAL_SSE4(cpu_flags)) {
             EPEL_LINKS(c->put_hevc_epel, 0, 0, pel_pixels, 10, sse4);
@@ -1039,13 +1043,13 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth)
         if (EXTERNAL_AVX(cpu_flags)) {
             c->hevc_v_loop_filter_chroma = ff_hevc_v_loop_filter_chroma_10_avx;
             c->hevc_h_loop_filter_chroma = ff_hevc_h_loop_filter_chroma_10_avx;
-            if (ARCH_X86_64) {
-                c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_10_avx;
-                c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_10_avx;
+#if ARCH_X86_64
+            c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_10_avx;
+            c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_10_avx;
 
-                c->idct[2] = ff_hevc_idct_16x16_10_avx;
-                c->idct[3] = ff_hevc_idct_32x32_10_avx;
-            }
+            c->idct[2] = ff_hevc_idct_16x16_10_avx;
+            c->idct[3] = ff_hevc_idct_32x32_10_avx;
+#endif
 
             c->idct[0] = ff_hevc_idct_4x4_10_avx;
             c->idct[1] = ff_hevc_idct_8x8_10_avx;
@@ -1216,10 +1220,10 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth)
         if (EXTERNAL_SSE2(cpu_flags)) {
             c->hevc_v_loop_filter_chroma = ff_hevc_v_loop_filter_chroma_12_sse2;
             c->hevc_h_loop_filter_chroma = ff_hevc_h_loop_filter_chroma_12_sse2;
-            if (ARCH_X86_64) {
-                c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_12_sse2;
-                c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_12_sse2;
-            }
+#if ARCH_X86_64
+            c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_12_sse2;
+            c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_12_sse2;
+#endif
             SAO_BAND_INIT(12, sse2);
             SAO_EDGE_INIT(12, sse2);
 
@@ -1228,10 +1232,12 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth)
             c->idct_dc[2] = ff_hevc_idct_16x16_dc_12_sse2;
             c->idct_dc[3] = ff_hevc_idct_32x32_dc_12_sse2;
         }
-        if (EXTERNAL_SSSE3(cpu_flags) && ARCH_X86_64) {
+#if ARCH_X86_64
+        if (EXTERNAL_SSSE3(cpu_flags)) {
             c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_12_ssse3;
             c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_12_ssse3;
         }
+#endif
 #if HAVE_SSE4_EXTERNAL && ARCH_X86_64
         if (EXTERNAL_SSE4(cpu_flags)) {
             EPEL_LINKS(c->put_hevc_epel, 0, 0, pel_pixels, 12, sse4);
@@ -1248,10 +1254,10 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth)
         if (EXTERNAL_AVX(cpu_flags)) {
             c->hevc_v_loop_filter_chroma = ff_hevc_v_loop_filter_chroma_12_avx;
             c->hevc_h_loop_filter_chroma = ff_hevc_h_loop_filter_chroma_12_avx;
-            if (ARCH_X86_64) {
-                c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_12_avx;
-                c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_12_avx;
-            }
+#if ARCH_X86_64
+            c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_12_avx;
+            c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_12_avx;
+#endif
             SAO_BAND_INIT(12, avx);
         }
         if (EXTERNAL_AVX2(cpu_flags)) {
diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c
index 9c7f235b3f..fce62e5590 100644
--- a/libavcodec/x86/idctdsp_init.c
+++ b/libavcodec/x86/idctdsp_init.c
@@ -83,8 +83,8 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, AVCodecContext *avctx,
         }
 #endif
 
-        if (ARCH_X86_64 &&
-            !high_bit_depth &&
+#if ARCH_X86_64
+        if (!high_bit_depth &&
             avctx->lowres == 0 &&
             (avctx->idct_algo == FF_IDCT_AUTO ||
                 avctx->idct_algo == FF_IDCT_SIMPLEAUTO ||
@@ -95,9 +95,11 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, AVCodecContext *avctx,
                 c->idct_add  = ff_simple_idct8_add_sse2;
                 c->perm_type = FF_IDCT_PERM_TRANSPOSE;
         }
+#endif
     }
 
-    if (ARCH_X86_64 && avctx->lowres == 0) {
+#if ARCH_X86_64
+    if (avctx->lowres == 0) {
         if (EXTERNAL_AVX(cpu_flags) &&
             !high_bit_depth &&
             (avctx->idct_algo == FF_IDCT_AUTO ||
@@ -147,4 +149,5 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, AVCodecContext *avctx,
             }
         }
     }
+#endif
 }
diff --git a/libavcodec/x86/mlpdsp_init.c b/libavcodec/x86/mlpdsp_init.c
index 950f996832..21a0e38143 100644
--- a/libavcodec/x86/mlpdsp_init.c
+++ b/libavcodec/x86/mlpdsp_init.c
@@ -200,8 +200,10 @@ av_cold void ff_mlpdsp_init_x86(MLPDSPContext *c)
     if (INLINE_MMX(cpu_flags))
         c->mlp_filter_channel = mlp_filter_channel_x86;
 #endif
-    if (ARCH_X86_64 && EXTERNAL_SSE4(cpu_flags))
+#if ARCH_X86_64
+    if (EXTERNAL_SSE4(cpu_flags))
         c->mlp_rematrix_channel = ff_mlp_rematrix_channel_sse4;
-    if (ARCH_X86_64 && EXTERNAL_AVX2_FAST(cpu_flags) && cpu_flags & AV_CPU_FLAG_BMI2)
+    if (EXTERNAL_AVX2_FAST(cpu_flags) && cpu_flags & AV_CPU_FLAG_BMI2)
         c->mlp_rematrix_channel = ff_mlp_rematrix_channel_avx2_bmi2;
+#endif // ARCH_X86_64
 }
diff --git a/libavcodec/x86/v210-init.c b/libavcodec/x86/v210-init.c
index 8b3677b8aa..895cc8f329 100644
--- a/libavcodec/x86/v210-init.c
+++ b/libavcodec/x86/v210-init.c
@@ -38,28 +38,38 @@ av_cold void ff_v210_x86_init(V210DecContext *s)
     if (s->aligned_input) {
         if (cpu_flags & AV_CPU_FLAG_SSSE3)
             s->unpack_frame = ff_v210_planar_unpack_aligned_ssse3;
-
-        if (HAVE_AVX_EXTERNAL && cpu_flags & AV_CPU_FLAG_AVX)
+#if HAVE_AVX_EXTERNAL
+        if (cpu_flags & AV_CPU_FLAG_AVX)
             s->unpack_frame = ff_v210_planar_unpack_aligned_avx;
+#endif
 
-        if (HAVE_AVX2_EXTERNAL && cpu_flags & AV_CPU_FLAG_AVX2)
+#if HAVE_AVX2_EXTERNAL
+        if (cpu_flags & AV_CPU_FLAG_AVX2)
             s->unpack_frame = ff_v210_planar_unpack_aligned_avx2;
+#endif
 
+#if HAVE_AVX512ICL_EXTERNAL
         if (EXTERNAL_AVX512ICL(cpu_flags))
             s->unpack_frame = ff_v210_planar_unpack_avx512icl;
+#endif
     }
     else {
         if (cpu_flags & AV_CPU_FLAG_SSSE3)
             s->unpack_frame = ff_v210_planar_unpack_unaligned_ssse3;
-
-        if (HAVE_AVX_EXTERNAL && cpu_flags & AV_CPU_FLAG_AVX)
+#if HAVE_AVX_EXTERNAL
+        if (cpu_flags & AV_CPU_FLAG_AVX)
             s->unpack_frame = ff_v210_planar_unpack_unaligned_avx;
+#endif
 
-        if (HAVE_AVX2_EXTERNAL && cpu_flags & AV_CPU_FLAG_AVX2)
+#if HAVE_AVX2_EXTERNAL
+        if (cpu_flags & AV_CPU_FLAG_AVX2)
             s->unpack_frame = ff_v210_planar_unpack_unaligned_avx2;
+#endif
 
+#if HAVE_AVX512ICL_EXTERNAL
         if (EXTERNAL_AVX512ICL(cpu_flags))
             s->unpack_frame = ff_v210_planar_unpack_avx512icl;
-    }
 #endif
+    }
+#endif // HAVE_X86ASM
 }
diff --git a/libavcodec/x86/v210enc_init.c b/libavcodec/x86/v210enc_init.c
index 44f22ca7fe..8396ea7a0b 100644
--- a/libavcodec/x86/v210enc_init.c
+++ b/libavcodec/x86/v210enc_init.c
@@ -71,11 +71,12 @@ av_cold void ff_v210enc_init_x86(V210EncContext *s)
         s->pack_line_10      = ff_v210_planar_pack_10_avx512;
 #endif
     }
-
+#if HAVE_AVX512ICL_EXTERNAL
     if (EXTERNAL_AVX512ICL(cpu_flags)) {
         s->sample_factor_8  = 4;
         s->pack_line_8      = ff_v210_planar_pack_8_avx512icl;
         s->sample_factor_10 = 4;
         s->pack_line_10     = ff_v210_planar_pack_10_avx512icl;
     }
+#endif
 }
diff --git a/libavfilter/x86/colorspacedsp_init.c b/libavfilter/x86/colorspacedsp_init.c
index b5006ac295..429cc8bd26 100644
--- a/libavfilter/x86/colorspacedsp_init.c
+++ b/libavfilter/x86/colorspacedsp_init.c
@@ -78,9 +78,10 @@ void ff_multiply3x3_sse2(int16_t *data[3], ptrdiff_t stride, int w, int h,
 
 void ff_colorspacedsp_x86_init(ColorSpaceDSPContext *dsp)
 {
+#if ARCH_X86_64
     int cpu_flags = av_get_cpu_flags();
 
-    if (ARCH_X86_64 && EXTERNAL_SSE2(cpu_flags)) {
+    if (EXTERNAL_SSE2(cpu_flags)) {
 #define assign_yuv2yuv_fns(ss) \
         dsp->yuv2yuv[BPP_8 ][BPP_8 ][SS_##ss] = ff_yuv2yuv_##ss##p8to8_sse2; \
         dsp->yuv2yuv[BPP_8 ][BPP_10][SS_##ss] = ff_yuv2yuv_##ss##p8to10_sse2; \
@@ -116,4 +117,5 @@ void ff_colorspacedsp_x86_init(ColorSpaceDSPContext *dsp)
 
         dsp->multiply3x3 = ff_multiply3x3_sse2;
     }
+#endif
 }
diff --git a/libavfilter/x86/f_ebur128_init.c b/libavfilter/x86/f_ebur128_init.c
index 6d43b4887f..8169e75ccc 100644
--- a/libavfilter/x86/f_ebur128_init.c
+++ b/libavfilter/x86/f_ebur128_init.c
@@ -30,12 +30,14 @@ double ff_ebur128_find_peak_2ch_avx(double *, int, const double *, int);
 
 av_cold void ff_ebur128_init_x86(EBUR128DSPContext *dsp, int nb_channels)
 {
+#if ARCH_X86_64
     int cpu_flags = av_get_cpu_flags();
 
-    if (ARCH_X86_64 && EXTERNAL_AVX(cpu_flags)) {
+    if (EXTERNAL_AVX(cpu_flags)) {
         if (nb_channels >= 2)
             dsp->filter_channels = ff_ebur128_filter_channels_avx;
         if (nb_channels == 2)
             dsp->find_peak = ff_ebur128_find_peak_2ch_avx;
     }
+#endif
 }
diff --git a/libavfilter/x86/vf_atadenoise_init.c b/libavfilter/x86/vf_atadenoise_init.c
index e7a653f191..3847e53250 100644
--- a/libavfilter/x86/vf_atadenoise_init.c
+++ b/libavfilter/x86/vf_atadenoise_init.c
@@ -36,15 +36,17 @@ void ff_atadenoise_filter_row8_serial_sse4(const uint8_t *src, uint8_t *dst,
 
 av_cold void ff_atadenoise_init_x86(ATADenoiseDSPContext *dsp, int depth, int algorithm, const float *sigma)
 {
+#if ARCH_X86_64
     int cpu_flags = av_get_cpu_flags();
 
     for (int p = 0; p < 4; p++) {
-        if (ARCH_X86_64 && EXTERNAL_SSE4(cpu_flags) && depth <= 8 && algorithm == PARALLEL && sigma[p] == INT16_MAX) {
+        if (EXTERNAL_SSE4(cpu_flags) && depth <= 8 && algorithm == PARALLEL && sigma[p] == INT16_MAX) {
             dsp->filter_row[p] = ff_atadenoise_filter_row8_sse4;
         }
 
-        if (ARCH_X86_64 && EXTERNAL_SSE4(cpu_flags) && depth <= 8 && algorithm == SERIAL && sigma[p] == INT16_MAX) {
+        if (EXTERNAL_SSE4(cpu_flags) && depth <= 8 && algorithm == SERIAL && sigma[p] == INT16_MAX) {
             dsp->filter_row[p] = ff_atadenoise_filter_row8_serial_sse4;
         }
     }
+#endif
 }
diff --git a/libavfilter/x86/vf_bwdif_init.c b/libavfilter/x86/vf_bwdif_init.c
index 76b574b2a9..51508ee771 100644
--- a/libavfilter/x86/vf_bwdif_init.c
+++ b/libavfilter/x86/vf_bwdif_init.c
@@ -67,18 +67,22 @@ av_cold void ff_bwdif_init_x86(BWDIFDSPContext *bwdif, int bit_depth)
             bwdif->filter_line = ff_bwdif_filter_line_sse2;
         if (EXTERNAL_SSSE3(cpu_flags))
             bwdif->filter_line = ff_bwdif_filter_line_ssse3;
-        if (ARCH_X86_64 && EXTERNAL_AVX2_FAST(cpu_flags))
+#if ARCH_X86_64
+        if (EXTERNAL_AVX2_FAST(cpu_flags))
             bwdif->filter_line = ff_bwdif_filter_line_avx2;
-        if (ARCH_X86_64 && EXTERNAL_AVX512ICL(cpu_flags))
+        if (EXTERNAL_AVX512ICL(cpu_flags))
             bwdif->filter_line = ff_bwdif_filter_line_avx512icl;
+#endif
     } else if (bit_depth <= 12) {
         if (EXTERNAL_SSE2(cpu_flags))
             bwdif->filter_line = ff_bwdif_filter_line_12bit_sse2;
         if (EXTERNAL_SSSE3(cpu_flags))
             bwdif->filter_line = ff_bwdif_filter_line_12bit_ssse3;
-        if (ARCH_X86_64 && EXTERNAL_AVX2_FAST(cpu_flags))
+#if ARCH_X86_64
+        if (EXTERNAL_AVX2_FAST(cpu_flags))
             bwdif->filter_line = ff_bwdif_filter_line_12bit_avx2;
-        if (ARCH_X86_64 && EXTERNAL_AVX512ICL(cpu_flags))
+        if (EXTERNAL_AVX512ICL(cpu_flags))
             bwdif->filter_line = ff_bwdif_filter_line_12bit_avx512icl;
+#endif
     }
 }
diff --git a/libavfilter/x86/vf_nlmeans_init.c b/libavfilter/x86/vf_nlmeans_init.c
index 37764d30ab..5d67090a98 100644
--- a/libavfilter/x86/vf_nlmeans_init.c
+++ b/libavfilter/x86/vf_nlmeans_init.c
@@ -33,8 +33,10 @@ void ff_compute_weights_line_avx2(const uint32_t *const iia,
 
 av_cold void ff_nlmeans_init_x86(NLMeansDSPContext *dsp)
 {
+#if ARCH_X86_64
     int cpu_flags = av_get_cpu_flags();
 
-    if (ARCH_X86_64 && EXTERNAL_AVX2_FAST(cpu_flags))
+    if (EXTERNAL_AVX2_FAST(cpu_flags))
         dsp->compute_weights_line = ff_compute_weights_line_avx2;
+#endif
 }
diff --git a/libavfilter/x86/vf_ssim_init.c b/libavfilter/x86/vf_ssim_init.c
index cbaa20ef16..6f2305430c 100644
--- a/libavfilter/x86/vf_ssim_init.c
+++ b/libavfilter/x86/vf_ssim_init.c
@@ -34,8 +34,10 @@ void ff_ssim_init_x86(SSIMDSPContext *dsp)
 {
     int cpu_flags = av_get_cpu_flags();
 
-    if (ARCH_X86_64 && EXTERNAL_SSSE3(cpu_flags))
+#if ARCH_X86_64
+    if (EXTERNAL_SSSE3(cpu_flags))
         dsp->ssim_4x4_line = ff_ssim_4x4_line_ssse3;
+#endif
     if (EXTERNAL_SSE4(cpu_flags))
         dsp->ssim_end_line = ff_ssim_end_line_sse4;
     if (EXTERNAL_XOP(cpu_flags))
diff --git a/libavfilter/x86/vf_w3fdif_init.c b/libavfilter/x86/vf_w3fdif_init.c
index 16202fba76..6d677d651d 100644
--- a/libavfilter/x86/vf_w3fdif_init.c
+++ b/libavfilter/x86/vf_w3fdif_init.c
@@ -56,7 +56,9 @@ av_cold void ff_w3fdif_init_x86(W3FDIFDSPContext *dsp, int depth)
         dsp->filter_scale        = ff_w3fdif_scale_sse2;
     }
 
-    if (ARCH_X86_64 && EXTERNAL_SSE2(cpu_flags) && depth <= 8) {
+#if ARCH_X86_64
+    if (EXTERNAL_SSE2(cpu_flags) && depth <= 8) {
         dsp->filter_complex_high = ff_w3fdif_complex_high_sse2;
     }
+#endif
 }
-- 
2.52.0


From df0d199839b6dbe283f2dba94a750048d1a19b13 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Mon, 10 Nov 2025 12:32:32 +0100
Subject: [PATCH 134/304] avfilter/vf_libplacebo: un-rotate image crop after
 fitting

When combining rotation with a FIT_ mode other than FIT_FILL, the fitting
logic was operating on the un-rotated rects, when it should have been
operating on the rotated (output) rects.
---
 libavfilter/vf_libplacebo.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/libavfilter/vf_libplacebo.c b/libavfilter/vf_libplacebo.c
index 42501c51f2..5a615aceed 100644
--- a/libavfilter/vf_libplacebo.c
+++ b/libavfilter/vf_libplacebo.c
@@ -913,14 +913,6 @@ static void update_crops(AVFilterContext *ctx, LibplaceboInput *in,
         image->crop.x1 = image->crop.x0 + s->var_values[VAR_CROP_W];
         image->crop.y1 = image->crop.y0 + s->var_values[VAR_CROP_H];
 
-        const pl_rotation rot_total = image->rotation - target->rotation;
-        if ((rot_total + PL_ROTATION_360) % PL_ROTATION_180 == PL_ROTATION_90) {
-            /* Libplacebo expects the input crop relative to the actual frame
-             * dimensions, so un-transpose them here */
-            FFSWAP(float, image->crop.x0, image->crop.y0);
-            FFSWAP(float, image->crop.x1, image->crop.y1);
-        }
-
         if (src == ref) {
             /* Only update the target crop once, for the 'reference' frame */
             target->crop.x0 = av_expr_eval(s->pos_x_pexpr, s->var_values, NULL);
@@ -955,6 +947,14 @@ static void update_crops(AVFilterContext *ctx, LibplaceboInput *in,
             case FIT_SCALE_DOWN:
                 pl_rect2df_aspect_fit(&target->crop, &fixed, 0.0);
             }
+
+            const pl_rotation rot_total = image->rotation - target->rotation;
+            if ((rot_total + PL_ROTATION_360) % PL_ROTATION_180 == PL_ROTATION_90) {
+                /* Libplacebo expects the input crop relative to the actual frame
+                 * dimensions, so un-transpose them here */
+                FFSWAP(float, image->crop.x0, image->crop.y0);
+                FFSWAP(float, image->crop.x1, image->crop.y1);
+            }
         }
     }
 }
-- 
2.52.0


From d8347f99f021b59b4a8ecde77fbec5d1d5f615ac Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Mon, 10 Nov 2025 12:36:33 +0100
Subject: [PATCH 135/304] avfilter/vf_libplacebo: fix math when AVRationals are
 undefined

---
 libavfilter/vf_libplacebo.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/libavfilter/vf_libplacebo.c b/libavfilter/vf_libplacebo.c
index 5a615aceed..8b5f789280 100644
--- a/libavfilter/vf_libplacebo.c
+++ b/libavfilter/vf_libplacebo.c
@@ -861,6 +861,11 @@ static const AVFrame *ref_frame(const struct pl_frame_mix *mix)
     return NULL;
 }
 
+static inline double q2d_fallback(AVRational q, const double def)
+{
+    return (q.num && q.den) ? av_q2d(q) : def;
+}
+
 static void update_crops(AVFilterContext *ctx, LibplaceboInput *in,
                          struct pl_frame *target, double target_pts)
 {
@@ -882,8 +887,7 @@ static void update_crops(AVFilterContext *ctx, LibplaceboInput *in,
         s->var_values[VAR_IN_W]   = s->var_values[VAR_IW] = inlink->w;
         s->var_values[VAR_IN_H]   = s->var_values[VAR_IH] = inlink->h;
         s->var_values[VAR_A]      = (double) inlink->w / inlink->h;
-        s->var_values[VAR_SAR]    = inlink->sample_aspect_ratio.num ?
-            av_q2d(inlink->sample_aspect_ratio) : 1.0;
+        s->var_values[VAR_SAR]    = q2d_fallback(inlink->sample_aspect_ratio, 1.0);
         s->var_values[VAR_IN_T]   = s->var_values[VAR_T]  = image_pts;
         s->var_values[VAR_OUT_T]  = s->var_values[VAR_OT] = target_pts;
         s->var_values[VAR_N]      = outl->frame_count_out;
@@ -920,9 +924,11 @@ static void update_crops(AVFilterContext *ctx, LibplaceboInput *in,
             target->crop.x1 = target->crop.x0 + s->var_values[VAR_POS_W];
             target->crop.y1 = target->crop.y0 + s->var_values[VAR_POS_H];
 
+
             /* Effective visual crop */
-            const float w_adj = av_q2d(inlink->sample_aspect_ratio) /
-                                av_q2d(outlink->sample_aspect_ratio);
+            double sar_in = q2d_fallback(inlink->sample_aspect_ratio, 1.0);
+            double sar_out = q2d_fallback(outlink->sample_aspect_ratio, 1.0);
+            const float w_adj = sar_in / sar_out;
 
             pl_rect2df fixed = image->crop;
             pl_rect2df_stretch(&fixed, w_adj, 1.0);
@@ -1278,7 +1284,7 @@ static int libplacebo_activate(AVFilterContext *ctx)
             in->qstatus = pl_queue_update(in->queue, &in->mix, pl_queue_params(
                 .pts            = TS2T(out_pts, outlink->time_base),
                 .radius         = pl_frame_mix_radius(&s->opts->params),
-                .vsync_duration = l->frame_rate.num ? av_q2d(av_inv_q(l->frame_rate)) : 0,
+                .vsync_duration = q2d_fallback(av_inv_q(l->frame_rate), 0.0),
             ));
 
             switch (in->qstatus) {
@@ -1461,8 +1467,7 @@ static int libplacebo_config_output(AVFilterLink *outlink)
                                  &outlink->w, &outlink->h));
 
     s->reset_sar |= s->normalize_sar || s->nb_inputs > 1;
-    double sar_in = inlink->sample_aspect_ratio.num ?
-                    av_q2d(inlink->sample_aspect_ratio) : 1.0;
+    double sar_in = q2d_fallback(inlink->sample_aspect_ratio, 1.0);
 
     int force_oar = s->force_original_aspect_ratio;
     if (!force_oar && s->fit_sense == FIT_CONSTRAINT) {
@@ -1549,8 +1554,7 @@ static int libplacebo_config_output(AVFilterLink *outlink)
     /* Static variables */
     s->var_values[VAR_OUT_W]    = s->var_values[VAR_OW] = outlink->w;
     s->var_values[VAR_OUT_H]    = s->var_values[VAR_OH] = outlink->h;
-    s->var_values[VAR_DAR]      = outlink->sample_aspect_ratio.num ?
-        av_q2d(outlink->sample_aspect_ratio) : 1.0;
+    s->var_values[VAR_DAR]      = q2d_fallback(outlink->sample_aspect_ratio, 1.0);
     s->var_values[VAR_HSUB]     = 1 << desc->log2_chroma_w;
     s->var_values[VAR_VSUB]     = 1 << desc->log2_chroma_h;
     s->var_values[VAR_OHSUB]    = 1 << out_desc->log2_chroma_w;
-- 
2.52.0


From a995be3805aa7db7e58f6b731b3ee7223dc0546d Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Mon, 10 Nov 2025 12:38:11 +0100
Subject: [PATCH 136/304] avfilter/vf_libplacebo: also rotate SAR when fitting

---
 libavfilter/vf_libplacebo.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/libavfilter/vf_libplacebo.c b/libavfilter/vf_libplacebo.c
index 8b5f789280..b1788565bf 100644
--- a/libavfilter/vf_libplacebo.c
+++ b/libavfilter/vf_libplacebo.c
@@ -928,10 +928,13 @@ static void update_crops(AVFilterContext *ctx, LibplaceboInput *in,
             /* Effective visual crop */
             double sar_in = q2d_fallback(inlink->sample_aspect_ratio, 1.0);
             double sar_out = q2d_fallback(outlink->sample_aspect_ratio, 1.0);
-            const float w_adj = sar_in / sar_out;
+
+            pl_rotation rot_total = PL_ROTATION_360 + image->rotation - target->rotation;
+            if (rot_total % PL_ROTATION_180 == PL_ROTATION_90)
+                sar_in = 1.0 / sar_in;
 
             pl_rect2df fixed = image->crop;
-            pl_rect2df_stretch(&fixed, w_adj, 1.0);
+            pl_rect2df_stretch(&fixed, sar_in / sar_out, 1.0);
 
             switch (s->fit_mode) {
             case FIT_FILL:
@@ -954,8 +957,7 @@ static void update_crops(AVFilterContext *ctx, LibplaceboInput *in,
                 pl_rect2df_aspect_fit(&target->crop, &fixed, 0.0);
             }
 
-            const pl_rotation rot_total = image->rotation - target->rotation;
-            if ((rot_total + PL_ROTATION_360) % PL_ROTATION_180 == PL_ROTATION_90) {
+            if (rot_total % PL_ROTATION_180 == PL_ROTATION_90) {
                 /* Libplacebo expects the input crop relative to the actual frame
                  * dimensions, so un-transpose them here */
                 FFSWAP(float, image->crop.x0, image->crop.y0);
-- 
2.52.0


From 6a7550bd919f53b796c8041159a7878facbf7bbc Mon Sep 17 00:00:00 2001
From: averne <averne381@gmail.com>
Date: Sat, 29 Nov 2025 17:26:51 +0100
Subject: [PATCH 137/304] vulkan/prores: fix dequantization for 4:2:2
 subsampling

Bug introduced in d00f41f due to an oversight.
---
 libavcodec/vulkan/prores_idct.comp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/vulkan/prores_idct.comp b/libavcodec/vulkan/prores_idct.comp
index 05ba8e4967..5d0d41cfa5 100644
--- a/libavcodec/vulkan/prores_idct.comp
+++ b/libavcodec/vulkan/prores_idct.comp
@@ -127,7 +127,7 @@ void main(void)
         uint8_t[64] qmat = comp == 0 ? qmat_luma : qmat_chroma;
 
         /* Table 15 */
-        uint8_t qidx = quant_idx[(gid.y >> 1) * mb_width + (gid.x >> 4)];
+        uint8_t qidx = quant_idx[(gid.y >> 1) * mb_width + (gid.x >> (4 - chroma_shift))];
         int qscale = qidx > 128 ? (qidx - 96) << 2 : qidx;
 
         [[unroll]] for (uint i = 0; i < 8; ++i) {
-- 
2.52.0


From 216758a1a451f615c282a6b346a2f981b323c34a Mon Sep 17 00:00:00 2001
From: averne <averne381@gmail.com>
Date: Sat, 29 Nov 2025 17:25:17 +0100
Subject: [PATCH 138/304] vulkan/prores: normalize coefficients during IDCT

This allows increased internal precision.
In addition, we can introduce an offset to the DC coefficient
during the second IDCT step, to remove a per-element addition
in the output codepath.
Finally, by processing columns first we can remove the barrier
after loading coefficients.

Signed-off-by: averne <averne381@gmail.com>
---
 libavcodec/vulkan/prores_idct.comp | 57 +++++++++++++++++++-----------
 1 file changed, 37 insertions(+), 20 deletions(-)

diff --git a/libavcodec/vulkan/prores_idct.comp b/libavcodec/vulkan/prores_idct.comp
index 5d0d41cfa5..5eef61e57a 100644
--- a/libavcodec/vulkan/prores_idct.comp
+++ b/libavcodec/vulkan/prores_idct.comp
@@ -37,19 +37,27 @@ void put_px(uint tex_idx, ivec2 pos, uint v)
 #endif
 }
 
-const float idct_8x8_scales[] = {
-    0.353553390593274f, // cos(4 * pi/16) / 2
-    0.490392640201615f, // cos(1 * pi/16) / 2
-    0.461939766255643f, // cos(2 * pi/16) / 2
-    0.415734806151273f, // cos(3 * pi/16) / 2
-    0.353553390593274f, // cos(4 * pi/16) / 2
-    0.277785116509801f, // cos(5 * pi/16) / 2
-    0.191341716182545f, // cos(6 * pi/16) / 2
-    0.097545161008064f, // cos(7 * pi/16) / 2
+const float idct_scale[64] = {
+    0.1250000000000000, 0.1733799806652684, 0.1633203706095471, 0.1469844503024199,
+    0.1250000000000000, 0.0982118697983878, 0.0676495125182746, 0.0344874224103679,
+    0.1733799806652684, 0.2404849415639108, 0.2265318615882219, 0.2038732892122293,
+    0.1733799806652684, 0.1362237766939547, 0.0938325693794663, 0.0478354290456362,
+    0.1633203706095471, 0.2265318615882219, 0.2133883476483184, 0.1920444391778541,
+    0.1633203706095471, 0.1283199917898342, 0.0883883476483185, 0.0450599888754343,
+    0.1469844503024199, 0.2038732892122293, 0.1920444391778541, 0.1728354290456362,
+    0.1469844503024199, 0.1154849415639109, 0.0795474112858021, 0.0405529186026822,
+    0.1250000000000000, 0.1733799806652684, 0.1633203706095471, 0.1469844503024199,
+    0.1250000000000000, 0.0982118697983878, 0.0676495125182746, 0.0344874224103679,
+    0.0982118697983878, 0.1362237766939547, 0.1283199917898342, 0.1154849415639109,
+    0.0982118697983878, 0.0771645709543638, 0.0531518809229535, 0.0270965939155924,
+    0.0676495125182746, 0.0938325693794663, 0.0883883476483185, 0.0795474112858021,
+    0.0676495125182746, 0.0531518809229535, 0.0366116523516816, 0.0186644585125857,
+    0.0344874224103679, 0.0478354290456362, 0.0450599888754343, 0.0405529186026822,
+    0.0344874224103679, 0.0270965939155924, 0.0186644585125857, 0.0095150584360892,
 };
 
 /* 7.4 Inverse Transform */
-void idct(uint block, uint offset, uint stride)
+void idct8(uint block, uint offset, uint stride)
 {
     float t0, t1, t2, t3, t4, t5, t6, t7, u8;
     float u0, u1, u2, u3, u4, u5, u6, u7;
@@ -117,6 +125,12 @@ void main(void)
     uint chroma_shift = comp != 0 ? log2_chroma_w : 0;
     bool act = gid.x < mb_width << (4 - chroma_shift);
 
+    /**
+     * Normalize coefficients to [-1, 1] for increased precision during the iDCT.
+     * DCT coeffs have the range of a 12-bit signed integer (7.4 Inverse Transform).
+     */
+    const float norm = 1.0f / (1 << 11);
+
     /* Coalesced load of DCT coeffs in shared memory, inverse quantization */
     if (act) {
         /**
@@ -131,28 +145,31 @@ void main(void)
         int qscale = qidx > 128 ? (qidx - 96) << 2 : qidx;
 
         [[unroll]] for (uint i = 0; i < 8; ++i) {
+            uint cidx = (i << 3) + idx;
             int   c = sign_extend(int(get_px(comp, ivec2(gid.x, (gid.y << 3) + i))), 16);
-            float v = float(c * qscale * int(qmat[(i << 3) + idx]));
-            blocks[block][i * 9 + idx] = v * idct_8x8_scales[idx] * idct_8x8_scales[i];
+            float v = float(c * qscale * int(qmat[cidx])) * norm;
+            blocks[block][i * 9 + idx] = v * idct_scale[cidx];
         }
     }
 
-    /* Row-wise iDCT */
-    barrier();
-    idct(block, idx * 9, 1);
-
     /* Column-wise iDCT */
+    idct8(block, idx, 9);
     barrier();
-    idct(block, idx, 9);
 
-    float fact = 1.0f / (1 << (12 - depth)), off = 1 << (depth - 1);
+    /* Remap [-1, 1] to [0, 2] to remove a per-element addition in the output loop */
+    blocks[block][idx * 9] += 1.0f;
+
+    /* Row-wise iDCT */
+    idct8(block, idx * 9, 1);
+    barrier();
+
+    float fact = 1 << (depth - 1);
     int maxv = (1 << depth) - 1;
 
     /* 7.5.1 Color Component Samples. Rescale, clamp and write back to global memory */
-    barrier();
     if (act) {
         [[unroll]] for (uint i = 0; i < 8; ++i) {
-            float v = round(blocks[block][i * 9 + idx] * fact + off);
+            float v = round(blocks[block][i * 9 + idx] * fact);
             put_px(comp, ivec2(gid.x, (gid.y << 3) + i), clamp(int(v), 0, maxv));
         }
     }
-- 
2.52.0


From 34473250411d5b96babb3187cdab92ec8342aeee Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Sat, 29 Nov 2025 11:33:08 +0100
Subject: [PATCH 139/304] tools/sofa2wavs: fix build on Windows
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 tools/sofa2wavs.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/sofa2wavs.c b/tools/sofa2wavs.c
index 1f1075b22f..d551913dce 100644
--- a/tools/sofa2wavs.c
+++ b/tools/sofa2wavs.c
@@ -24,6 +24,12 @@
 #include <stdio.h>
 #include <mysofa.h>
 
+#ifdef _WIN32
+#include <direct.h>
+#undef mkdir
+#define mkdir(a, b) _mkdir(a)
+#endif
+
 int main(int argc, char **argv)
 {
     struct MYSOFA_HRTF *hrtf;
-- 
2.52.0


From 78bc7065485873d644188aeac424b0a547681add Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Wed, 26 Nov 2025 12:26:37 +0100
Subject: [PATCH 140/304] avcodec/x86/h264idct: Remove dead MMX macros

Forgotten in 4618f36a2424a3a4d5760afabc2e9dd18d73f0a4.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/h264_idct.asm | 105 +----------------------------------
 1 file changed, 3 insertions(+), 102 deletions(-)

diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm
index d9c3c9c862..985955d96a 100644
--- a/libavcodec/x86/h264_idct.asm
+++ b/libavcodec/x86/h264_idct.asm
@@ -145,61 +145,6 @@ SECTION .text
     IDCT8_1D   [%1], [%1+ 64]
 %endmacro
 
-; %1=int16_t *block, %2=int16_t *dstblock
-%macro IDCT8_ADD_MMX_START 2
-    IDCT8_1D_FULL %1
-    mova       [%1], m7
-    TRANSPOSE4x4W 0, 1, 2, 3, 7
-    mova         m7, [%1]
-    mova    [%2   ], m0
-    mova    [%2+16], m1
-    mova    [%2+32], m2
-    mova    [%2+48], m3
-    TRANSPOSE4x4W 4, 5, 6, 7, 3
-    mova    [%2+ 8], m4
-    mova    [%2+24], m5
-    mova    [%2+40], m6
-    mova    [%2+56], m7
-%endmacro
-
-; %1=uint8_t *dst, %2=int16_t *block, %3=int stride
-%macro IDCT8_ADD_MMX_END 3-4
-    IDCT8_1D_FULL %2
-    mova    [%2   ], m5
-    mova    [%2+16], m6
-    mova    [%2+32], m7
-
-    pxor         m7, m7
-%if %0 == 4
-    movq   [%4+  0], m7
-    movq   [%4+  8], m7
-    movq   [%4+ 16], m7
-    movq   [%4+ 24], m7
-    movq   [%4+ 32], m7
-    movq   [%4+ 40], m7
-    movq   [%4+ 48], m7
-    movq   [%4+ 56], m7
-    movq   [%4+ 64], m7
-    movq   [%4+ 72], m7
-    movq   [%4+ 80], m7
-    movq   [%4+ 88], m7
-    movq   [%4+ 96], m7
-    movq   [%4+104], m7
-    movq   [%4+112], m7
-    movq   [%4+120], m7
-%endif
-    STORE_DIFFx2 m0, m1, m5, m6, m7, 6, %1, %3
-    lea          %1, [%1+%3*2]
-    STORE_DIFFx2 m2, m3, m5, m6, m7, 6, %1, %3
-    mova         m0, [%2   ]
-    mova         m1, [%2+16]
-    mova         m2, [%2+32]
-    lea          %1, [%1+%3*2]
-    STORE_DIFFx2 m4, m0, m5, m6, m7, 6, %1, %3
-    lea          %1, [%1+%3*2]
-    STORE_DIFFx2 m1, m2, m5, m6, m7, 6, %1, %3
-%endmacro
-
 ; %1=uint8_t *dst, %2=int16_t *block, %3=int stride
 %macro IDCT8_ADD_SSE 4
     IDCT8_1D_FULL %2
@@ -612,7 +557,7 @@ cglobal h264_idct_add8_8, 5, 7 + ARCH_X86_64, 8
     add8_sse2_cycle 3, 0x64
 RET
 
-;void ff_h264_luma_dc_dequant_idct_mmx(int16_t *output, int16_t *input, int qmul)
+;void ff_h264_luma_dc_dequant_idct_sse2(int16_t *output, int16_t *input, int qmul)
 
 %macro WALSH4_1D 5
     SUMSUB_BADC w, %4, %3, %2, %1, %5
@@ -620,8 +565,7 @@ RET
     SWAP %1, %4, %3
 %endmacro
 
-%macro DEQUANT 1-3
-%if cpuflag(sse2)
+%macro DEQUANT 1
     movd      xmm4, t3d
     movq      xmm5, [pw_1]
     pshufd    xmm4, xmm4, 0
@@ -643,31 +587,9 @@ RET
     psrad     xmm3, %1
     packssdw  xmm0, xmm1
     packssdw  xmm2, xmm3
-%else
-    mova        m7, [pw_1]
-    mova        m4, %1
-    punpcklwd   %1, m7
-    punpckhwd   m4, m7
-    mova        m5, %2
-    punpcklwd   %2, m7
-    punpckhwd   m5, m7
-    movd        m7, t3d
-    punpckldq   m7, m7
-    pmaddwd     %1, m7
-    pmaddwd     %2, m7
-    pmaddwd     m4, m7
-    pmaddwd     m5, m7
-    psrad       %1, %3
-    psrad       %2, %3
-    psrad       m4, %3
-    psrad       m5, %3
-    packssdw    %1, m4
-    packssdw    %2, m5
-%endif
 %endmacro
 
-%macro STORE_WORDS 5-9
-%if cpuflag(sse)
+%macro STORE_WORDS 9
     movd  t0d, %1
     psrldq  %1, 4
     movd  t1d, %1
@@ -687,33 +609,12 @@ RET
     shr   t1d, 16
     mov [t2+%7*32], t0w
     mov [t2+%9*32], t1w
-%else
-    movd  t0d, %1
-    psrlq  %1, 32
-    movd  t1d, %1
-    mov [t2+%2*32], t0w
-    mov [t2+%4*32], t1w
-    shr   t0d, 16
-    shr   t1d, 16
-    mov [t2+%3*32], t0w
-    mov [t2+%5*32], t1w
-%endif
 %endmacro
 
 %macro DEQUANT_STORE 1
-%if cpuflag(sse2)
     DEQUANT     %1
     STORE_WORDS xmm0,  0,  1,  4,  5,  2,  3,  6,  7
     STORE_WORDS xmm2,  8,  9, 12, 13, 10, 11, 14, 15
-%else
-    DEQUANT     m0, m1, %1
-    STORE_WORDS m0,  0,  1,  4,  5
-    STORE_WORDS m1,  2,  3,  6,  7
-
-    DEQUANT     m2, m3, %1
-    STORE_WORDS m2,  8,  9, 12, 13
-    STORE_WORDS m3, 10, 11, 14, 15
-%endif
 %endmacro
 
 INIT_XMM sse2
-- 
2.52.0


From 4ee0be272a003688717631353b7be2ac161e48f9 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Wed, 26 Nov 2025 12:38:58 +0100
Subject: [PATCH 141/304] avcodec/x86/h264_idct: Remove redundant movsxdifnidn

Only exported (i.e. cglobal) functions need it; stride is already
sign-extended when it reaches any of the internal functions used here,
so don't sign-extend again.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/h264_idct.asm | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm
index 985955d96a..6863dbcb4d 100644
--- a/libavcodec/x86/h264_idct.asm
+++ b/libavcodec/x86/h264_idct.asm
@@ -55,7 +55,7 @@ cextern pw_1
 
 SECTION .text
 
-; %1=uint8_t *dst, %2=int16_t *block, %3=int stride
+; %1=uint8_t *dst, %2=int16_t *block, %3=ptrdiff_t stride
 %macro IDCT4_ADD 3
     ; Load dct coeffs
     movq         m0, [%2]
@@ -145,7 +145,7 @@ SECTION .text
     IDCT8_1D   [%1], [%1+ 64]
 %endmacro
 
-; %1=uint8_t *dst, %2=int16_t *block, %3=int stride
+; %1=uint8_t *dst, %2=int16_t *block, %3=ptrdiff_t stride
 %macro IDCT8_ADD_SSE 4
     IDCT8_1D_FULL %2
 %if ARCH_X86_64
@@ -317,7 +317,6 @@ INIT_XMM cpuname
 
 INIT_MMX mmx
 h264_idct_add8_mmx_plane:
-    movsxdifnidn r3, r3d
 .nextblock:
     movzx        r6, byte [scan8+r5]
     movzx        r6, byte [r4+r6]
@@ -372,9 +371,8 @@ cglobal h264_idct_add8_422_8, 5, 8 + npicregs, 0, dst1, block_offset, block, str
 
     RET ; TODO: check rep ret after a function call
 
-; r0 = uint8_t *dst, r2 = int16_t *block, r3 = int stride, r6=clobbered
+; r0 = uint8_t *dst, r2 = int16_t *block, r3 = ptrdiff_t stride, r6=clobbered
 h264_idct_dc_add8_mmxext:
-    movsxdifnidn r3, r3d
     movd         m0, [r2   ]          ;  0 0 X D
     mov word [r2+ 0], 0
     punpcklwd    m0, [r2+32]          ;  x X d D
@@ -393,9 +391,8 @@ h264_idct_dc_add8_mmxext:
 
 ALIGN 16
 INIT_XMM sse2
-; r0 = uint8_t *dst (clobbered), r2 = int16_t *block, r3 = int stride
+; r0 = uint8_t *dst (clobbered), r2 = int16_t *block, r3 = ptrdiff_t stride
 h264_add8x4_idct_sse2:
-    movsxdifnidn r3, r3d
     movq   m0, [r2+ 0]
     movq   m1, [r2+ 8]
     movq   m2, [r2+16]
-- 
2.52.0


From d230f539ac1eb4258d022e52057453b7186db3db Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Wed, 26 Nov 2025 14:09:57 +0100
Subject: [PATCH 142/304] avcodec/x86/h264_idct: Avoid call where possible

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/h264_idct.asm | 49 ++++++++++++++++++------------------
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm
index 6863dbcb4d..9405aa848a 100644
--- a/libavcodec/x86/h264_idct.asm
+++ b/libavcodec/x86/h264_idct.asm
@@ -316,29 +316,6 @@ INIT_XMM cpuname
     RET
 
 INIT_MMX mmx
-h264_idct_add8_mmx_plane:
-.nextblock:
-    movzx        r6, byte [scan8+r5]
-    movzx        r6, byte [r4+r6]
-    or          r6w, word [r2]
-    test         r6, r6
-    jz .skipblock
-%if ARCH_X86_64
-    mov         r0d, dword [r1+r5*4]
-    add          r0, [dst2q]
-%else
-    mov          r0, r1m ; XXX r1m here is actually r0m of the calling func
-    mov          r0, [r0]
-    add          r0, dword [r1+r5*4]
-%endif
-    IDCT4_ADD    r0, r2, r3
-.skipblock:
-    inc          r5
-    add          r2, 32
-    test         r5, 3
-    jnz .nextblock
-    rep ret
-
 cglobal h264_idct_add8_422_8, 5, 8 + npicregs, 0, dst1, block_offset, block, stride, nnzc, cntr, coeff, dst2, picreg
 ; dst1, block_offset, block, stride, nnzc, cntr, coeff, dst2, picreg
     movsxdifnidn r3, r3d
@@ -367,9 +344,31 @@ cglobal h264_idct_add8_422_8, 5, 8 + npicregs, 0, dst1, block_offset, block, str
 
     call         h264_idct_add8_mmx_plane
     add r5, 4
-    call         h264_idct_add8_mmx_plane
+    TAIL_CALL    h264_idct_add8_mmx_plane, 0
+
+h264_idct_add8_mmx_plane:
+.nextblock:
+    movzx       r6d, byte [scan8+r5]
+    movzx       r6d, byte [r4+r6]
+    or          r6w, word [r2]
+    test        r6d, r6d
+    jz .skipblock
+%if ARCH_X86_64
+    mov         r0d, dword [r1+r5*4]
+    add          r0, [dst2q]
+%else
+    mov          r0, r1m ; XXX r1m here is actually r0m of the calling func
+    mov          r0, [r0]
+    add          r0, dword [r1+r5*4]
+%endif
+    IDCT4_ADD    r0, r2, r3
+.skipblock:
+    inc         r5d
+    add          r2, 32
+    test        r5d, 3
+    jnz .nextblock
+    rep ret
 
-    RET ; TODO: check rep ret after a function call
 
 ; r0 = uint8_t *dst, r2 = int16_t *block, r3 = ptrdiff_t stride, r6=clobbered
 h264_idct_dc_add8_mmxext:
-- 
2.52.0


From fde45f49ae0d7fbab41987fb6b7f27617b854c62 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Wed, 26 Nov 2025 14:57:45 +0100
Subject: [PATCH 143/304] avutil/x86/x86inc: Use parentheses in has_epilogue

Prevents surprises.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavutil/x86/x86inc.asm | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index e61d924bc1..0e80ebed43 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -609,7 +609,7 @@ DECLARE_REG 14, R13, 120
     RESET_STACK_STATE
 %endmacro
 
-%define has_epilogue regs_used > 7 || stack_size > 0 || vzeroupper_required || xmm_regs_used > 6+high_mm_regs
+%define has_epilogue (regs_used > 7 || stack_size > 0 || vzeroupper_required || xmm_regs_used > 6+high_mm_regs)
 
 %macro RET 0
     WIN64_RESTORE_XMM_INTERNAL
@@ -658,7 +658,7 @@ DECLARE_REG 14, R13, 72
     %endif
 %endmacro
 
-%define has_epilogue regs_used > 9 || stack_size > 0 || vzeroupper_required
+%define has_epilogue (regs_used > 9 || stack_size > 0 || vzeroupper_required)
 
 %macro RET 0
     %if stack_size_padded > 0
@@ -722,7 +722,7 @@ DECLARE_ARG 7, 8, 9, 10, 11, 12, 13, 14
     %endif
 %endmacro
 
-%define has_epilogue regs_used > 3 || stack_size > 0 || vzeroupper_required
+%define has_epilogue (regs_used > 3 || stack_size > 0 || vzeroupper_required)
 
 %macro RET 0
     %if stack_size_padded > 0
-- 
2.52.0


From 5c03e0f0cbf4d90ef176d14c192d47f736dce726 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Wed, 26 Nov 2025 15:23:31 +0100
Subject: [PATCH 144/304] avcodec/x86/h264_idct: Use tail call where
 advantageous

It is possible on UNIX64.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/h264_idct.asm | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm
index 9405aa848a..4b9efd6d6d 100644
--- a/libavcodec/x86/h264_idct.asm
+++ b/libavcodec/x86/h264_idct.asm
@@ -55,6 +55,19 @@ cextern pw_1
 
 SECTION .text
 
+; %1=callee, %2=dst to jump to if tail call is impossible (can be empty,
+; then no jmp is performed), %3=current iteration, %4=last iteration
+%macro TAIL_CALL_IF_LAST 4
+%if (%3 == %4) && !has_epilogue
+    jmp         %1
+%else
+    call        %1
+    %ifnempty %2
+        jmp      %2
+    %endif
+%endif
+%endmacro
+
 ; %1=uint8_t *dst, %2=int16_t *block, %3=ptrdiff_t stride
 %macro IDCT4_ADD 3
     ; Load dct coeffs
@@ -424,7 +437,7 @@ h264_add8x4_idct_sse2:
 %else
     add         r0, r0m
 %endif
-    call        h264_add8x4_idct_sse2
+    TAIL_CALL_IF_LAST h264_add8x4_idct_sse2, , %1, 7
 .cycle%1end:
 %if %1 < 7
     add         r2, 64
@@ -461,8 +474,7 @@ RET
 %else
     add         r0, r0m
 %endif
-    call        h264_add8x4_idct_sse2
-    jmp .cycle%1end
+    TAIL_CALL_IF_LAST h264_add8x4_idct_sse2, .cycle%1end, %1, 7
 .try%1dc:
     movsx       r0, word [r2   ]
     or         r0w, word [r2+32]
@@ -473,7 +485,7 @@ RET
 %else
     add         r0, r0m
 %endif
-    call        h264_idct_dc_add8_mmxext
+    TAIL_CALL_IF_LAST h264_idct_dc_add8_mmxext, , %1, 7
 .cycle%1end:
 %if %1 < 7
     add         r2, 64
@@ -510,8 +522,7 @@ RET
     mov         r0, [r0]
     add         r0, dword [r1+(%1&1)*8+64*(1+(%1>>1))]
 %endif
-    call        h264_add8x4_idct_sse2
-    jmp .cycle%1end
+    TAIL_CALL_IF_LAST h264_add8x4_idct_sse2, .cycle%1end, %1, 3
 .try%1dc:
     movsx       r0, word [r2   ]
     or         r0w, word [r2+32]
@@ -524,7 +535,7 @@ RET
     mov         r0, [r0]
     add         r0, dword [r1+(%1&1)*8+64*(1+(%1>>1))]
 %endif
-    call        h264_idct_dc_add8_mmxext
+    TAIL_CALL_IF_LAST h264_idct_dc_add8_mmxext, , %1, 3
 .cycle%1end:
 %if %1 == 1
     add         r2, 384+64
-- 
2.52.0


From 302ee09371a1e35391f5ee6e196a756e794c53e2 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Wed, 26 Nov 2025 15:59:03 +0100
Subject: [PATCH 145/304] avcodec/x86/h264_idct: Zero with full-width stores

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/h264_idct.asm | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm
index 4b9efd6d6d..50647f2454 100644
--- a/libavcodec/x86/h264_idct.asm
+++ b/libavcodec/x86/h264_idct.asm
@@ -90,10 +90,15 @@ SECTION .text
     paddw        m0, m6
     IDCT4_1D      w, 0, 1, 2, 3, 4, 5
     pxor         m7, m7
-    movq    [%2+ 0], m7
-    movq    [%2+ 8], m7
-    movq    [%2+16], m7
-    movq    [%2+24], m7
+    %if mmsize == 16
+        mova    [%2+ 0], m7
+        mova    [%2+16], m7
+    %else
+        movq    [%2+ 0], m7
+        movq    [%2+ 8], m7
+        movq    [%2+16], m7
+        movq    [%2+24], m7
+    %endif
 
     STORE_DIFFx2 m0, m1, m4, m5, m7, 6, %1, %3
     lea          %1, [%1+%3*2]
-- 
2.52.0


From b388d43c0a90f4e8be5897f17e4c9f1d7f00ba66 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Wed, 26 Nov 2025 17:26:47 +0100
Subject: [PATCH 146/304] avcodec/x86/h264_idct: Don't use MMX registers in
 ff_h264_luma_dc_dequant_idct_sse2

It is ABI compliant and gives a tiny speedup here (and is 16B smaller).

Old benchmarks:
h264_luma_dc_dequant_idct_8_c:                          33.2 ( 1.00x)
h264_luma_dc_dequant_idct_8_sse2:                       16.0 ( 2.07x)

New benchmarks:
h264_luma_dc_dequant_idct_8_c:                          33.0 ( 1.00x)
h264_luma_dc_dequant_idct_8_sse2:                       15.0 ( 2.20x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/h264_idct.asm | 56 +++++++++++++++++++-----------------
 tests/checkasm/h264dsp.c     |  2 +-
 2 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm
index 50647f2454..fe46107867 100644
--- a/libavcodec/x86/h264_idct.asm
+++ b/libavcodec/x86/h264_idct.asm
@@ -578,27 +578,23 @@ RET
 %endmacro
 
 %macro DEQUANT 1
-    movd      xmm4, t3d
-    movq      xmm5, [pw_1]
-    pshufd    xmm4, xmm4, 0
-    movq2dq   xmm0, m0
-    movq2dq   xmm1, m1
-    movq2dq   xmm2, m2
-    movq2dq   xmm3, m3
-    punpcklwd xmm0, xmm5
-    punpcklwd xmm1, xmm5
-    punpcklwd xmm2, xmm5
-    punpcklwd xmm3, xmm5
-    pmaddwd   xmm0, xmm4
-    pmaddwd   xmm1, xmm4
-    pmaddwd   xmm2, xmm4
-    pmaddwd   xmm3, xmm4
-    psrad     xmm0, %1
-    psrad     xmm1, %1
-    psrad     xmm2, %1
-    psrad     xmm3, %1
-    packssdw  xmm0, xmm1
-    packssdw  xmm2, xmm3
+    movd        m4, t3d
+    movq        m5, [pw_1]
+    pshufd      m4, m4, 0
+    punpcklwd   m0, m5
+    punpcklwd   m1, m5
+    punpcklwd   m2, m5
+    punpcklwd   m3, m5
+    pmaddwd     m0, m4
+    pmaddwd     m1, m4
+    pmaddwd     m2, m4
+    pmaddwd     m3, m4
+    psrad       m0, %1
+    psrad       m1, %1
+    psrad       m2, %1
+    psrad       m3, %1
+    packssdw    m0, m1
+    packssdw    m2, m3
 %endmacro
 
 %macro STORE_WORDS 9
@@ -625,19 +621,25 @@ RET
 
 %macro DEQUANT_STORE 1
     DEQUANT     %1
-    STORE_WORDS xmm0,  0,  1,  4,  5,  2,  3,  6,  7
-    STORE_WORDS xmm2,  8,  9, 12, 13, 10, 11, 14, 15
+    STORE_WORDS m0,  0,  1,  4,  5,  2,  3,  6,  7
+    STORE_WORDS m2,  8,  9, 12, 13, 10, 11, 14, 15
 %endmacro
 
 INIT_XMM sse2
 cglobal h264_luma_dc_dequant_idct, 3, 4, 7
-INIT_MMX cpuname
     movq        m3, [r1+24]
     movq        m2, [r1+16]
     movq        m1, [r1+ 8]
     movq        m0, [r1+ 0]
     WALSH4_1D    0,1,2,3,4
-    TRANSPOSE4x4W 0,1,2,3,4
+    punpcklwd   m0, m1
+    punpcklwd   m2, m3
+    mova        m4, m0
+    punpckldq   m0, m2
+    punpckhdq   m4, m2
+    movhlps     m1, m0
+    movhlps     m3, m4
+    SWAP 2, 4
     WALSH4_1D    0,1,2,3,4
 
 ; shift, tmp, output, qmul
@@ -665,8 +667,8 @@ INIT_MMX cpuname
     inc        t1d
     shr        t3d, t0b
     sub        t1d, t0d
-    movd      xmm6, t1d
-    DEQUANT_STORE xmm6
+    movd        m6, t1d
+    DEQUANT_STORE m6
     RET
 
 %ifdef __NASM_VER__
diff --git a/tests/checkasm/h264dsp.c b/tests/checkasm/h264dsp.c
index f05ae419fc..acf4f61ebb 100644
--- a/tests/checkasm/h264dsp.c
+++ b/tests/checkasm/h264dsp.c
@@ -336,7 +336,7 @@ static void check_idct_dequant(void)
     LOCAL_ALIGNED_16(int32_t, dst1_32, [16 * 16]);
     H264DSPContext h;
     int bit_depth, i, qmul;
-    declare_func_emms(AV_CPU_FLAG_MMX | AV_CPU_FLAG_SSE2, void, int16_t *output, int16_t *input, int qmul);
+    declare_func(void, int16_t *output, int16_t *input, int qmul);
 
     qmul = rnd() % 4096;
 
-- 
2.52.0


From f6a9aed5ac27b9e961689cc61e9d48b27ce155e1 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Wed, 26 Nov 2025 17:48:01 +0100
Subject: [PATCH 147/304] avcodec/x86/h264_idct: Deduplicate generating
 constant

pw_1 is currently loaded in both codepaths. Generate it earlier instead.
Gives tiny speedups (15 vs 14.5 cycles) and reduces codesize.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/h264_idct.asm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm
index fe46107867..d35d583ce7 100644
--- a/libavcodec/x86/h264_idct.asm
+++ b/libavcodec/x86/h264_idct.asm
@@ -51,7 +51,6 @@ scan8_mem: db  4+ 1*8, 5+ 1*8, 4+ 2*8, 5+ 2*8
 %endif
 
 cextern pw_32
-cextern pw_1
 
 SECTION .text
 
@@ -577,9 +576,9 @@ RET
     SWAP %1, %4, %3
 %endmacro
 
+; requires m5 to contain pw_1
 %macro DEQUANT 1
     movd        m4, t3d
-    movq        m5, [pw_1]
     pshufd      m4, m4, 0
     punpcklwd   m0, m5
     punpcklwd   m1, m5
@@ -635,6 +634,7 @@ cglobal h264_luma_dc_dequant_idct, 3, 4, 7
     punpcklwd   m0, m1
     punpcklwd   m2, m3
     mova        m4, m0
+    pcmpeqw     m5, m5
     punpckldq   m0, m2
     punpckhdq   m4, m2
     movhlps     m1, m0
@@ -652,6 +652,7 @@ cglobal h264_luma_dc_dequant_idct, 3, 4, 7
 %else
     DECLARE_REG_TMP 1,3,0,2
 %endif
+    psrlw       m5, 15
 
     cmp        t3d, 32767
     jg .big_qmul
-- 
2.52.0


From 219fe5b9aef484ece4525f2900eeb730ac189363 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Wed, 26 Nov 2025 20:15:55 +0100
Subject: [PATCH 148/304] avcodec/x86/h264_idct: Fix
 ff_h264_luma_dc_dequant_idct_sse2 checkasm failures

ff_h264_luma_dc_dequant_idct_sse2() does not pass checkasm for certain
seeds, because the input to packssdw no longer fits into an int16_t,
leading to saturation, where the C code just truncates. I don't know
whether the spec contains provisions that ensure that valid input
must not exceed 16 bit or whether the such inputs (even if invalid)
can be triggered by the actual code and not only the test.

This commit adapts the behavior of the function to the C reference code
to fix the test. packssdw is avoided, instead the lower words are
directly transfered to GPRs to be written out. This has unfortunately
led to a slight performance regression here (14.5 vs 15.1 cycles).

Fixes issue #20835.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/h264_idct.asm | 64 ++++++++++++++++++++++++------------
 1 file changed, 43 insertions(+), 21 deletions(-)

diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm
index d35d583ce7..47e4116f42 100644
--- a/libavcodec/x86/h264_idct.asm
+++ b/libavcodec/x86/h264_idct.asm
@@ -592,36 +592,58 @@ RET
     psrad       m1, %1
     psrad       m2, %1
     psrad       m3, %1
-    packssdw    m0, m1
-    packssdw    m2, m3
 %endmacro
 
-%macro STORE_WORDS 9
-    movd  t0d, %1
-    psrldq  %1, 4
-    movd  t1d, %1
-    psrldq  %1, 4
-    mov [t2+%2*32], t0w
-    mov [t2+%4*32], t1w
-    shr   t0d, 16
-    shr   t1d, 16
+%macro STORE_WORDS 10
+%if ARCH_X86_64
+    movq        t0, %1
+    movq        t1, %2
+    psrldq      %1, 8
+    psrldq      %2, 8
     mov [t2+%3*32], t0w
-    mov [t2+%5*32], t1w
-    movd  t0d, %1
-    psrldq  %1, 4
-    movd  t1d, %1
-    mov [t2+%6*32], t0w
+    mov [t2+%7*32], t1w
+    shr         t0, 32
+    shr         t1, 32
+    mov [t2+%4*32], t0w
     mov [t2+%8*32], t1w
-    shr   t0d, 16
-    shr   t1d, 16
-    mov [t2+%7*32], t0w
+    movq        t0, %1
+    movq        t1, %2
+    mov [t2+%5*32], t0w
     mov [t2+%9*32], t1w
+    shr         t0, 32
+    shr         t1, 32
+    mov [t2+%6*32], t0w
+    mov [t2+%10*32], t1w
+%else
+    movd       t0d, %1
+    movd       t1d, %2
+    psrldq      %1, 4
+    psrldq      %2, 4
+    mov [t2+%3*32], t0w
+    mov [t2+%7*32], t1w
+    movd       t0d, %1
+    movd       t1d, %2
+    psrldq      %1, 4
+    psrldq      %2, 4
+    mov [t2+%4*32], t0w
+    mov [t2+%8*32], t1w
+    movd       t0d, %1
+    movd       t1d, %2
+    psrldq      %1, 4
+    psrldq      %2, 4
+    mov [t2+%5*32], t0w
+    mov [t2+%9*32], t1w
+    movd       t0d, %1
+    movd       t1d, %2
+    mov [t2+%6*32], t0w
+    mov [t2+%10*32], t1w
+%endif
 %endmacro
 
 %macro DEQUANT_STORE 1
     DEQUANT     %1
-    STORE_WORDS m0,  0,  1,  4,  5,  2,  3,  6,  7
-    STORE_WORDS m2,  8,  9, 12, 13, 10, 11, 14, 15
+    STORE_WORDS m0, m1,  0,  1,  4,  5,  2,  3,  6,  7
+    STORE_WORDS m2, m3,  8,  9, 12, 13, 10, 11, 14, 15
 %endmacro
 
 INIT_XMM sse2
-- 
2.52.0


From c2dbf5d6fd494549431fb483628a208966ccb7eb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Sun, 30 Nov 2025 01:56:07 +0100
Subject: [PATCH 149/304] avfilter/avfiltergraph: fix constant string
 comparision
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

It's not guaranteed that the conversion filter name string will be
deduplicated to the same memory location. While this is common
optimization to do, we cannot rely on it always happening.

Fixes regression since 8b375b2ffd4377909180241cdc65d63d372a35a3.

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 libavfilter/avfiltergraph.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavfilter/avfiltergraph.c b/libavfilter/avfiltergraph.c
index a46a7bd408..d5c2ef54e6 100644
--- a/libavfilter/avfiltergraph.c
+++ b/libavfilter/avfiltergraph.c
@@ -570,7 +570,7 @@ retry:
                 void *b = FF_FIELD_AT(void *, m->offset, link->outcfg);
                 if (a && b && a != b && !m->can_merge(a, b)) {
                     for (k = 0; k < num_conv; k++) {
-                        if (conv_filters[k] == m->conversion_filter)
+                        if (!strcmp(conv_filters[k], m->conversion_filter))
                             break;
                     }
                     if (k == num_conv) {
@@ -683,7 +683,7 @@ retry:
 
                 for (neg_step = 0; neg_step < neg->nb_mergers; neg_step++) {
                     const AVFilterFormatsMerger *m = &neg->mergers[neg_step];
-                    if (m->conversion_filter != conv_filters[k])
+                    if (strcmp(m->conversion_filter, conv_filters[k]))
                         continue;
                     if ((ret = MERGE(m,  inlink)) <= 0 ||
                         (ret = MERGE(m, outlink)) <= 0) {
-- 
2.52.0


From 83d8982c5c715f0e3d2842416f78365d927bdd5a Mon Sep 17 00:00:00 2001
From: averne <averne381@gmail.com>
Date: Sat, 29 Nov 2025 22:33:26 +0100
Subject: [PATCH 150/304] libavcodec/vulkan: introduce cached bitstream reader

This stores a small buffer in shared memory per decode thread (16 bytes),
which helps reduce the number of memory accesses.
The bitstream buffer is first aligned to a 4 byte boundary, so that the
buffer can be filled with a single memory request.
---
 libavcodec/vulkan/common.comp | 42 +++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/libavcodec/vulkan/common.comp b/libavcodec/vulkan/common.comp
index eda92ce28d..4b71dfd2f4 100644
--- a/libavcodec/vulkan/common.comp
+++ b/libavcodec/vulkan/common.comp
@@ -42,6 +42,10 @@ layout(buffer_reference, buffer_reference_align = 4) buffer u32vec2buf {
     u32vec2 v;
 };
 
+layout(buffer_reference, buffer_reference_align = 4) buffer u32vec4buf {
+    u32vec4 v;
+};
+
 layout(buffer_reference, buffer_reference_align = 8) buffer u64buf {
     uint64_t v;
 };
@@ -198,8 +202,12 @@ struct GetBitContext {
     uint64_t bits;
     int bits_valid;
     int size_in_bits;
+#ifdef GET_BITS_SMEM
+    int cur_smem_pos;
+#endif
 };
 
+#ifndef GET_BITS_SMEM
 #define LOAD64()                                       \
     {                                                  \
         u8vec4buf ptr = u8vec4buf(gb.buf);             \
@@ -218,6 +226,40 @@ struct GetBitContext {
         gb.bits = uint64_t(rf) << (32 - gb.bits_valid) | gb.bits; \
         gb.bits_valid += 32;                                      \
     }
+#else /* GET_BITS_SMEM */
+shared u32vec4 gb_storage[gl_WorkGroupSize.x*gl_WorkGroupSize.y*gl_WorkGroupSize.z];
+
+#define FILL_SMEM()                                      \
+    {                                                    \
+        u32vec4buf ptr = u32vec4buf(gb.buf);             \
+        gb_storage[gl_LocalInvocationIndex] = ptr[0].v;  \
+        gb.cur_smem_pos = 0;                             \
+    }
+
+#define LOAD64()                                                    \
+    {                                                               \
+        gb.bits = 0;                                                \
+        gb.bits_valid = 0;                                          \
+        u8buf ptr = u8buf(gb.buf);                                  \
+        for (uint i = 0; i < ((4 - uint(gb.buf_start)) & 3); ++i) { \
+            gb.bits |= uint64_t(ptr[i].v) << (56 - i * 8);          \
+            gb.bits_valid += 8;                                     \
+            gb.buf += 1;                                            \
+        }                                                           \
+        FILL_SMEM();                                                \
+    }
+
+#define RELOAD32()                                                         \
+    {                                                                      \
+        if (gb.cur_smem_pos >= 4)                                          \
+            FILL_SMEM();                                                   \
+        uint v = gb_storage[gl_LocalInvocationIndex][gb.cur_smem_pos];     \
+        gb.buf += 4;                                                       \
+        gb.bits = uint64_t(reverse4(v)) << (32 - gb.bits_valid) | gb.bits; \
+        gb.bits_valid += 32;                                               \
+        gb.cur_smem_pos += 1;                                              \
+    }
+#endif /* GET_BITS_SMEM */
 
 void init_get_bits(inout GetBitContext gb, u8buf data, int len)
 {
-- 
2.52.0


From 4d25cfe225b36d3d19b3f47bb693686381ffc22b Mon Sep 17 00:00:00 2001
From: averne <averne381@gmail.com>
Date: Sun, 30 Nov 2025 13:25:37 +0100
Subject: [PATCH 151/304] libavcodec/vulkan: remove unnessary member in
 GetBitContext

The number of remaining bits can be calculated using existing state.
This simplifies calculations and frees up one register.
---
 libavcodec/vulkan/common.comp | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/libavcodec/vulkan/common.comp b/libavcodec/vulkan/common.comp
index 4b71dfd2f4..f5f466ce31 100644
--- a/libavcodec/vulkan/common.comp
+++ b/libavcodec/vulkan/common.comp
@@ -201,7 +201,6 @@ struct GetBitContext {
 
     uint64_t bits;
     int bits_valid;
-    int size_in_bits;
 #ifdef GET_BITS_SMEM
     int cur_smem_pos;
 #endif
@@ -265,7 +264,6 @@ void init_get_bits(inout GetBitContext gb, u8buf data, int len)
 {
     gb.buf = gb.buf_start = uint64_t(data);
     gb.buf_end = uint64_t(data) + len;
-    gb.size_in_bits = len * 8;
 
     /* Preload */
     LOAD64()
@@ -320,5 +318,5 @@ int tell_bits(in GetBitContext gb)
 
 int left_bits(in GetBitContext gb)
 {
-    return gb.size_in_bits - int(gb.buf - gb.buf_start) * 8 + gb.bits_valid;
+    return int(gb.buf_end - gb.buf) * 8 + gb.bits_valid;
 }
-- 
2.52.0


From e60eb8c2d4680efc60e7869b27c86605c4789341 Mon Sep 17 00:00:00 2001
From: averne <averne381@gmail.com>
Date: Sat, 29 Nov 2025 22:33:45 +0100
Subject: [PATCH 152/304] vulkan/prores: use cached bitstream reader

Speedup is around 75% on NVIDIA 3050, 20% on AMD 6700XT, 5% on Intel TigerLake.
---
 libavcodec/vulkan_prores.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/libavcodec/vulkan_prores.c b/libavcodec/vulkan_prores.c
index 0c704c3d1c..90b8610817 100644
--- a/libavcodec/vulkan_prores.c
+++ b/libavcodec/vulkan_prores.c
@@ -21,7 +21,6 @@
 #include "hwaccel_internal.h"
 #include "libavutil/mem.h"
 #include "libavutil/vulkan.h"
-#include "libavutil/vulkan_loader.h"
 #include "libavutil/vulkan_spirv.h"
 
 extern const char *ff_source_common_comp;
@@ -207,14 +206,12 @@ static int vk_prores_end_frame(AVCodecContext *avctx)
     RET(ff_vk_exec_mirror_sem_value(&ctx->s, exec, &vp->sem, &vp->sem_value,
                                     pr->frame));
 
+    /* Transfer ownership to the exec context */
     RET(ff_vk_exec_add_dep_buf(&ctx->s, exec, &vp->slices_buf, 1, 0));
     vp->slices_buf = NULL;
     RET(ff_vk_exec_add_dep_buf(&ctx->s, exec, &pp->metadata_buf, 1, 0));
     pp->metadata_buf = NULL;
 
-    /* Transfer ownership to the exec context */
-    vp->slices_buf = pp->metadata_buf = NULL;
-
     /* Input barrier */
     ff_vk_frame_barrier(&ctx->s, exec, pr->frame, img_bar, &nb_img_bar,
                         VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
@@ -404,6 +401,11 @@ static int init_shader(AVCodecContext *avctx, FFVulkanContext *s,
                           local_size >> 16 & 0xff, local_size >> 8 & 0xff, local_size >> 0 & 0xff,
                           0));
 
+    av_bprintf(&shd->src, "#define GET_BITS_SMEM\n");
+
+    if (interlaced)
+        av_bprintf(&shd->src, "#define INTERLACED\n");
+
     /* Common code */
     GLSLD(ff_source_common_comp);
 
@@ -412,9 +414,6 @@ static int init_shader(AVCodecContext *avctx, FFVulkanContext *s,
 
     RET(ff_vk_shader_add_descriptor_set(s, shd, descs, num_descs, 0, 0));
 
-    if (interlaced)
-        av_bprintf(&shd->src, "#define INTERLACED\n");
-
     /* Main code */
     GLSLD(source);
 
@@ -494,6 +493,7 @@ static int vk_decode_prores_init(AVCodecContext *avctx)
     RET(init_shader(avctx, &ctx->s, &ctx->exec_pool, spv, &pv->reset,
                     "prores_dec_reset", "main", desc_set, 1,
                     ff_source_prores_reset_comp, 0x080801, pr->frame_type != 0));
+
     desc_set = (FFVulkanDescriptorSetBinding []) {
         {
             .name        = "slice_offsets_buf",
-- 
2.52.0


From 0bee9995bb8b31ed27bae0243a4704d5420e2f4b Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Thu, 27 Nov 2025 17:14:05 +0100
Subject: [PATCH 153/304] avcodec/x86/Makefile: Remove redundant WebP
 decoder->vp8dsp dependencies
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Redundant since 35b02732b98c1e2e862dc78476d8bc527af1c93c.

Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/Makefile | 2 --
 1 file changed, 2 deletions(-)

diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile
index 1c1e14b4b8..84ba395c82 100644
--- a/libavcodec/x86/Makefile
+++ b/libavcodec/x86/Makefile
@@ -78,7 +78,6 @@ OBJS-$(CONFIG_VP9_DECODER)             += x86/vp9dsp_init.o            \
                                           x86/vp9dsp_init_10bpp.o      \
                                           x86/vp9dsp_init_12bpp.o      \
                                           x86/vp9dsp_init_16bpp.o
-OBJS-$(CONFIG_WEBP_DECODER)            += x86/vp8dsp_init.o
 
 
 # GCC inline assembly optimizations
@@ -193,4 +192,3 @@ X86ASM-OBJS-$(CONFIG_VP9_DECODER)      += x86/vp9intrapred.o            \
                                           x86/vp9lpf_16bpp.o            \
                                           x86/vp9mc.o                   \
                                           x86/vp9mc_16bpp.o
-X86ASM-OBJS-$(CONFIG_WEBP_DECODER)     += x86/vp8dsp.o
-- 
2.52.0


From 0bb3aef4ce65e5f4dcf40e1c2f4bd9a282e66158 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Thu, 27 Nov 2025 19:07:26 +0100
Subject: [PATCH 154/304] avcodec/x86/Makefile: Only compile ASM init files
 when X86ASM is enabled
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

To do so, simply add these init files to X86ASM-OBJS instead of OBJS
in the Makefile. The former is already used for the actual assembly
files, but using them for the C init files just works, because the build
system uses file extensions to derive whether it is a C or a NASM file.

This avoids compiling unused function stubs and also reduces our
reliance on DCE: We don't add %if checks to the asm files except
for AVX, AVX2, FMA3, FMA4, XOP and AVX512, so all the MMX-SSE4
functions will be available. It also allows to remove HAVE_X86ASM checks
in these init files.

Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/aacencdsp.h                      |   2 +-
 libavcodec/aacpsdsp_template.c              |   2 +-
 libavcodec/ac3dsp.c                         |   4 +-
 libavcodec/alacdsp.c                        |   2 +-
 libavcodec/apv_dsp.c                        |   2 +-
 libavcodec/audiodsp.c                       |   2 +-
 libavcodec/blockdsp.c                       |   2 +-
 libavcodec/bswapdsp.c                       |   2 +-
 libavcodec/cavsdsp.c                        |   2 +-
 libavcodec/cfhddsp.c                        |   2 +-
 libavcodec/cfhdencdsp.c                     |   2 +-
 libavcodec/dcadsp.c                         |   2 +-
 libavcodec/dirac_dwt.c                      |   2 +-
 libavcodec/diracdsp.c                       |   2 +-
 libavcodec/dnxhdenc.c                       |   2 +-
 libavcodec/exrdsp.c                         |   2 +-
 libavcodec/flacdsp.c                        |   2 +-
 libavcodec/flacencdsp.c                     |   2 +-
 libavcodec/fmtconvert.c                     |   2 +-
 libavcodec/g722dsp.c                        |   2 +-
 libavcodec/h263dsp.c                        |   2 +-
 libavcodec/h264chroma.c                     |   2 +-
 libavcodec/h264dsp.c                        |   2 +-
 libavcodec/h264pred.c                       |   2 +-
 libavcodec/h264qpel.c                       |   2 +-
 libavcodec/hevc/dsp.c                       |   2 +-
 libavcodec/hpeldsp.c                        |   2 +-
 libavcodec/huffyuvdsp.c                     |   2 +-
 libavcodec/huffyuvencdsp.c                  |   2 +-
 libavcodec/idctdsp.c                        |   4 +-
 libavcodec/jpeg2000dsp.c                    |   2 +-
 libavcodec/lossless_audiodsp.c              |   2 +-
 libavcodec/lossless_videodsp.c              |   2 +-
 libavcodec/me_cmp.c                         |   2 +-
 libavcodec/opus/dsp.c                       |   2 +-
 libavcodec/opus/pvq.c                       |   2 +-
 libavcodec/pixblockdsp.c                    |   2 +-
 libavcodec/pngdsp.c                         |   2 +-
 libavcodec/proresdsp.c                      |   2 +-
 libavcodec/qpeldsp.c                        |   2 +-
 libavcodec/rv34dsp.c                        |   2 +-
 libavcodec/rv40dsp.c                        |   2 +-
 libavcodec/sbcdsp.c                         |   2 +-
 libavcodec/sbrdsp_template.c                |   2 +-
 libavcodec/svq1encdsp.h                     |   2 +-
 libavcodec/synth_filter.c                   |   2 +-
 libavcodec/takdsp.c                         |   2 +-
 libavcodec/ttadsp.c                         |   2 +-
 libavcodec/ttaencdsp.c                      |   2 +-
 libavcodec/utvideodsp.c                     |   2 +-
 libavcodec/v210dec_init.h                   |   2 +-
 libavcodec/v210enc_init.h                   |   2 +-
 libavcodec/vc1dsp.c                         |   2 +-
 libavcodec/videodsp.c                       |   2 +-
 libavcodec/vorbisdsp.c                      |   2 +-
 libavcodec/vp3dsp.c                         |   2 +-
 libavcodec/vp6dsp.c                         |   2 +-
 libavcodec/vp8dsp.c                         |   4 +-
 libavcodec/vp9dsp.c                         |   2 +-
 libavcodec/vvc/dsp.c                        |   2 +-
 libavcodec/x86/Makefile                     | 124 ++++++++++----------
 libavcodec/x86/alacdsp_init.c               |   3 -
 libavcodec/x86/blockdsp_init.c              |   2 -
 libavcodec/x86/dirac_dwt_init.c             |   6 -
 libavcodec/x86/diracdsp_init.c              |   6 -
 libavcodec/x86/flacdsp_init.c               |   2 -
 libavcodec/x86/flacencdsp_init.c            |   4 +-
 libavcodec/x86/fmtconvert_init.c            |   6 -
 libavcodec/x86/h264_qpel.c                  |   5 -
 libavcodec/x86/h264dsp_init.c               |   2 -
 libavcodec/x86/hevc/Makefile                |   6 +-
 libavcodec/x86/lossless_audiodsp_init.c     |   2 -
 libavcodec/x86/me_cmp_init.c                |   5 -
 libavcodec/x86/qpeldsp_init.c               |   4 -
 libavcodec/x86/rv40dsp_init.c               |   5 -
 libavcodec/x86/synth_filter_init.c          |   4 -
 libavcodec/x86/takdsp_init.c                |   3 -
 libavcodec/x86/ttadsp_init.c                |   3 -
 libavcodec/x86/ttaencdsp_init.c             |   3 -
 libavcodec/x86/v210-init.c                  |   2 -
 libavcodec/x86/vc1dsp_init.c                |   5 -
 libavcodec/x86/videodsp_init.c              |   4 -
 libavcodec/x86/vp8dsp_init.c                |   8 --
 libavcodec/x86/vp9dsp_init.c                |   7 --
 libavcodec/x86/vp9dsp_init_16bpp.c          |   6 -
 libavcodec/x86/vp9dsp_init_16bpp_template.c |   5 -
 libavcodec/x86/vvc/Makefile                 |   6 +-
 libavcodec/x86/xvididct_init.c              |   3 -
 libavcodec/xvididct.c                       |   2 +-
 89 files changed, 133 insertions(+), 236 deletions(-)

diff --git a/libavcodec/aacencdsp.h b/libavcodec/aacencdsp.h
index 684bbc254f..77aa133691 100644
--- a/libavcodec/aacencdsp.h
+++ b/libavcodec/aacencdsp.h
@@ -65,7 +65,7 @@ static inline void ff_aacenc_dsp_init(AACEncDSPContext *s)
 
 #if ARCH_RISCV
     ff_aacenc_dsp_init_riscv(s);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_aacenc_dsp_init_x86(s);
 #elif ARCH_AARCH64
     ff_aacenc_dsp_init_aarch64(s);
diff --git a/libavcodec/aacpsdsp_template.c b/libavcodec/aacpsdsp_template.c
index c28ba2c9a5..341bf77023 100644
--- a/libavcodec/aacpsdsp_template.c
+++ b/libavcodec/aacpsdsp_template.c
@@ -228,7 +228,7 @@ av_cold void AAC_RENAME(ff_psdsp_init)(PSDSPContext *s)
     ff_psdsp_init_aarch64(s);
 #elif ARCH_RISCV
     ff_psdsp_init_riscv(s);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_psdsp_init_x86(s);
 #endif
 #endif /* !USE_FIXED */
diff --git a/libavcodec/ac3dsp.c b/libavcodec/ac3dsp.c
index 730fa70fff..a4a28c8672 100644
--- a/libavcodec/ac3dsp.c
+++ b/libavcodec/ac3dsp.c
@@ -363,7 +363,7 @@ void ff_ac3dsp_downmix(AC3DSPContext *c, float **samples, float **matrix,
             c->downmix = ac3_downmix_5_to_1_symmetric_c;
         }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
         ff_ac3dsp_set_downmix_x86(c);
 #endif
     }
@@ -393,7 +393,7 @@ av_cold void ff_ac3dsp_init(AC3DSPContext *c)
     ff_ac3dsp_init_aarch64(c);
 #elif ARCH_ARM
     ff_ac3dsp_init_arm(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_ac3dsp_init_x86(c);
 #elif ARCH_MIPS
     ff_ac3dsp_init_mips(c);
diff --git a/libavcodec/alacdsp.c b/libavcodec/alacdsp.c
index a604566afb..c06cc9da97 100644
--- a/libavcodec/alacdsp.c
+++ b/libavcodec/alacdsp.c
@@ -60,7 +60,7 @@ av_cold void ff_alacdsp_init(ALACDSPContext *c)
 
 #if ARCH_RISCV
     ff_alacdsp_init_riscv(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_alacdsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/apv_dsp.c b/libavcodec/apv_dsp.c
index 8fbabcf63d..982ec36910 100644
--- a/libavcodec/apv_dsp.c
+++ b/libavcodec/apv_dsp.c
@@ -134,7 +134,7 @@ av_cold void ff_apv_dsp_init(APVDSPContext *dsp)
 {
     dsp->decode_transquant = apv_decode_transquant_c;
 
-#if ARCH_X86_64
+#if ARCH_X86_64 && HAVE_X86ASM
     ff_apv_dsp_init_x86_64(dsp);
 #endif
 }
diff --git a/libavcodec/audiodsp.c b/libavcodec/audiodsp.c
index fd6a00345f..a4758bb4c3 100644
--- a/libavcodec/audiodsp.c
+++ b/libavcodec/audiodsp.c
@@ -74,7 +74,7 @@ av_cold void ff_audiodsp_init(AudioDSPContext *c)
     ff_audiodsp_init_ppc(c);
 #elif ARCH_RISCV
     ff_audiodsp_init_riscv(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_audiodsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/blockdsp.c b/libavcodec/blockdsp.c
index 57ca41bd96..793e7664ec 100644
--- a/libavcodec/blockdsp.c
+++ b/libavcodec/blockdsp.c
@@ -69,7 +69,7 @@ av_cold void ff_blockdsp_init(BlockDSPContext *c)
     ff_blockdsp_init_ppc(c);
 #elif ARCH_RISCV
     ff_blockdsp_init_riscv(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_blockdsp_init_x86(c);
 #elif ARCH_MIPS
     ff_blockdsp_init_mips(c);
diff --git a/libavcodec/bswapdsp.c b/libavcodec/bswapdsp.c
index f0ea2b55c5..f375ab79ac 100644
--- a/libavcodec/bswapdsp.c
+++ b/libavcodec/bswapdsp.c
@@ -53,7 +53,7 @@ av_cold void ff_bswapdsp_init(BswapDSPContext *c)
 
 #if ARCH_RISCV
     ff_bswapdsp_init_riscv(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_bswapdsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/cavsdsp.c b/libavcodec/cavsdsp.c
index 69420242d6..7444f17bb5 100644
--- a/libavcodec/cavsdsp.c
+++ b/libavcodec/cavsdsp.c
@@ -577,7 +577,7 @@ av_cold void ff_cavsdsp_init(CAVSDSPContext* c)
     c->cavs_idct8_add = cavs_idct8_add_c;
     c->idct_perm = FF_IDCT_PERM_NONE;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_cavsdsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/cfhddsp.c b/libavcodec/cfhddsp.c
index a141db5246..05757d6515 100644
--- a/libavcodec/cfhddsp.c
+++ b/libavcodec/cfhddsp.c
@@ -112,7 +112,7 @@ av_cold void ff_cfhddsp_init(CFHDDSPContext *c, int depth, int bayer)
     else
         c->horiz_filter_clip = horiz_filter_clip;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_cfhddsp_init_x86(c, depth, bayer);
 #endif
 }
diff --git a/libavcodec/cfhdencdsp.c b/libavcodec/cfhdencdsp.c
index a122bcaf19..06801aef6d 100644
--- a/libavcodec/cfhdencdsp.c
+++ b/libavcodec/cfhdencdsp.c
@@ -73,7 +73,7 @@ av_cold void ff_cfhdencdsp_init(CFHDEncDSPContext *c)
     c->horiz_filter = horiz_filter;
     c->vert_filter = vert_filter;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_cfhdencdsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/dcadsp.c b/libavcodec/dcadsp.c
index d5fc5c4eb2..a0be676699 100644
--- a/libavcodec/dcadsp.c
+++ b/libavcodec/dcadsp.c
@@ -487,7 +487,7 @@ av_cold void ff_dcadsp_init(DCADSPContext *s)
     s->lbr_bank = lbr_bank_c;
     s->lfe_iir = lfe_iir_c;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_dcadsp_init_x86(s);
 #endif
 }
diff --git a/libavcodec/dirac_dwt.c b/libavcodec/dirac_dwt.c
index d473f64daa..0d92ad06da 100644
--- a/libavcodec/dirac_dwt.c
+++ b/libavcodec/dirac_dwt.c
@@ -59,7 +59,7 @@ int ff_spatial_idwt_init(DWTContext *d, DWTPlane *p, enum dwt_type type,
         return AVERROR_INVALIDDATA;
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     if (bit_depth == 8)
         ff_spatial_idwt_init_x86(d, type);
 #endif
diff --git a/libavcodec/diracdsp.c b/libavcodec/diracdsp.c
index 284f914f9d..a02a23974b 100644
--- a/libavcodec/diracdsp.c
+++ b/libavcodec/diracdsp.c
@@ -247,7 +247,7 @@ av_cold void ff_diracdsp_init(DiracDSPContext *c)
     PIXFUNC(avg, 16);
     PIXFUNC(avg, 32);
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_diracdsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/dnxhdenc.c b/libavcodec/dnxhdenc.c
index 7994b1d497..844731f6c4 100644
--- a/libavcodec/dnxhdenc.c
+++ b/libavcodec/dnxhdenc.c
@@ -1363,7 +1363,7 @@ const FFCodec ff_dnxhd_encoder = {
 
 void ff_dnxhdenc_init(DNXHDEncContext *ctx)
 {
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_dnxhdenc_init_x86(ctx);
 #endif
 }
diff --git a/libavcodec/exrdsp.c b/libavcodec/exrdsp.c
index 248cb93c5a..70914b7e5b 100644
--- a/libavcodec/exrdsp.c
+++ b/libavcodec/exrdsp.c
@@ -63,7 +63,7 @@ av_cold void ff_exrdsp_init(ExrDSPContext *c)
 
 #if ARCH_RISCV
     ff_exrdsp_init_riscv(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_exrdsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/flacdsp.c b/libavcodec/flacdsp.c
index b5b0609716..f8b48770f4 100644
--- a/libavcodec/flacdsp.c
+++ b/libavcodec/flacdsp.c
@@ -154,7 +154,7 @@ av_cold void ff_flacdsp_init(FLACDSPContext *c, enum AVSampleFormat fmt, int cha
     ff_flacdsp_init_arm(c, fmt, channels);
 #elif ARCH_RISCV
     ff_flacdsp_init_riscv(c, fmt, channels);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_flacdsp_init_x86(c, fmt, channels);
 #endif
 }
diff --git a/libavcodec/flacencdsp.c b/libavcodec/flacencdsp.c
index 46e5a0352b..251b3c2d47 100644
--- a/libavcodec/flacencdsp.c
+++ b/libavcodec/flacencdsp.c
@@ -34,7 +34,7 @@ av_cold void ff_flacencdsp_init(FLACEncDSPContext *c)
     c->lpc16_encode = flac_lpc_encode_c_16;
     c->lpc32_encode = flac_lpc_encode_c_32;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_flacencdsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/fmtconvert.c b/libavcodec/fmtconvert.c
index d889e61aca..77d69f8211 100644
--- a/libavcodec/fmtconvert.c
+++ b/libavcodec/fmtconvert.c
@@ -54,7 +54,7 @@ av_cold void ff_fmt_convert_init(FmtConvertContext *c)
     ff_fmt_convert_init_ppc(c);
 #elif ARCH_RISCV
     ff_fmt_convert_init_riscv(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_fmt_convert_init_x86(c);
 #endif
 #if HAVE_MIPSFPU
diff --git a/libavcodec/g722dsp.c b/libavcodec/g722dsp.c
index 302283688f..5807635ce5 100644
--- a/libavcodec/g722dsp.c
+++ b/libavcodec/g722dsp.c
@@ -73,7 +73,7 @@ av_cold void ff_g722dsp_init(G722DSPContext *c)
     ff_g722dsp_init_arm(c);
 #elif ARCH_RISCV
     ff_g722dsp_init_riscv(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_g722dsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/h263dsp.c b/libavcodec/h263dsp.c
index 6a13353499..165174a499 100644
--- a/libavcodec/h263dsp.c
+++ b/libavcodec/h263dsp.c
@@ -121,7 +121,7 @@ av_cold void ff_h263dsp_init(H263DSPContext *ctx)
 
 #if ARCH_RISCV
     ff_h263dsp_init_riscv(ctx);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_h263dsp_init_x86(ctx);
 #elif ARCH_MIPS
     ff_h263dsp_init_mips(ctx);
diff --git a/libavcodec/h264chroma.c b/libavcodec/h264chroma.c
index 5000c89aa7..0d152de59d 100644
--- a/libavcodec/h264chroma.c
+++ b/libavcodec/h264chroma.c
@@ -50,7 +50,7 @@ av_cold void ff_h264chroma_init(H264ChromaContext *c, int bit_depth)
     ff_h264chroma_init_arm(c, bit_depth);
 #elif ARCH_PPC
     ff_h264chroma_init_ppc(c, bit_depth);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_h264chroma_init_x86(c, bit_depth);
 #elif ARCH_MIPS
     ff_h264chroma_init_mips(c, bit_depth);
diff --git a/libavcodec/h264dsp.c b/libavcodec/h264dsp.c
index 8a6a3f5325..f4c5238372 100644
--- a/libavcodec/h264dsp.c
+++ b/libavcodec/h264dsp.c
@@ -160,7 +160,7 @@ av_cold void ff_h264dsp_init(H264DSPContext *c, const int bit_depth,
     ff_h264dsp_init_ppc(c, bit_depth, chroma_format_idc);
 #elif ARCH_RISCV
     ff_h264dsp_init_riscv(c, bit_depth, chroma_format_idc);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_h264dsp_init_x86(c, bit_depth, chroma_format_idc);
 #elif ARCH_MIPS
     ff_h264dsp_init_mips(c, bit_depth, chroma_format_idc);
diff --git a/libavcodec/h264pred.c b/libavcodec/h264pred.c
index 25f9995a0b..fbd8d2b91d 100644
--- a/libavcodec/h264pred.c
+++ b/libavcodec/h264pred.c
@@ -592,7 +592,7 @@ av_cold void ff_h264_pred_init(H264PredContext *h, int codec_id,
     ff_h264_pred_init_aarch64(h, codec_id, bit_depth, chroma_format_idc);
 #elif ARCH_ARM
     ff_h264_pred_init_arm(h, codec_id, bit_depth, chroma_format_idc);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_h264_pred_init_x86(h, codec_id, bit_depth, chroma_format_idc);
 #elif ARCH_MIPS
     ff_h264_pred_init_mips(h, codec_id, bit_depth, chroma_format_idc);
diff --git a/libavcodec/h264qpel.c b/libavcodec/h264qpel.c
index 0bc715c638..c64d35b73d 100644
--- a/libavcodec/h264qpel.c
+++ b/libavcodec/h264qpel.c
@@ -104,7 +104,7 @@ av_cold void ff_h264qpel_init(H264QpelContext *c, int bit_depth)
     ff_h264qpel_init_ppc(c, bit_depth);
 #elif ARCH_RISCV
     ff_h264qpel_init_riscv(c, bit_depth);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_h264qpel_init_x86(c, bit_depth);
 #elif ARCH_MIPS
     ff_h264qpel_init_mips(c, bit_depth);
diff --git a/libavcodec/hevc/dsp.c b/libavcodec/hevc/dsp.c
index a154fab2bf..5ae779f9f7 100644
--- a/libavcodec/hevc/dsp.c
+++ b/libavcodec/hevc/dsp.c
@@ -269,7 +269,7 @@ int i = 0;
     ff_hevc_dsp_init_riscv(hevcdsp, bit_depth);
 #elif ARCH_WASM
     ff_hevc_dsp_init_wasm(hevcdsp, bit_depth);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_hevc_dsp_init_x86(hevcdsp, bit_depth);
 #elif ARCH_MIPS
     ff_hevc_dsp_init_mips(hevcdsp, bit_depth);
diff --git a/libavcodec/hpeldsp.c b/libavcodec/hpeldsp.c
index 688939ad3f..e753d6216c 100644
--- a/libavcodec/hpeldsp.c
+++ b/libavcodec/hpeldsp.c
@@ -360,7 +360,7 @@ av_cold void ff_hpeldsp_init(HpelDSPContext *c, int flags)
     ff_hpeldsp_init_arm(c, flags);
 #elif ARCH_PPC
     ff_hpeldsp_init_ppc(c, flags);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_hpeldsp_init_x86(c, flags);
 #elif ARCH_MIPS
     ff_hpeldsp_init_mips(c, flags);
diff --git a/libavcodec/huffyuvdsp.c b/libavcodec/huffyuvdsp.c
index 80587dac85..1ae2f820d0 100644
--- a/libavcodec/huffyuvdsp.c
+++ b/libavcodec/huffyuvdsp.c
@@ -89,7 +89,7 @@ av_cold void ff_huffyuvdsp_init(HuffYUVDSPContext *c, enum AVPixelFormat pix_fmt
 
 #if ARCH_RISCV
     ff_huffyuvdsp_init_riscv(c, pix_fmt);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_huffyuvdsp_init_x86(c, pix_fmt);
 #endif
 }
diff --git a/libavcodec/huffyuvencdsp.c b/libavcodec/huffyuvencdsp.c
index 27428635af..e332f678d4 100644
--- a/libavcodec/huffyuvencdsp.c
+++ b/libavcodec/huffyuvencdsp.c
@@ -89,7 +89,7 @@ av_cold void ff_huffyuvencdsp_init(HuffYUVEncDSPContext *c, enum AVPixelFormat p
     c->diff_int16           = diff_int16_c;
     c->sub_hfyu_median_pred_int16 = sub_hfyu_median_pred_int16_c;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_huffyuvencdsp_init_x86(c, pix_fmt);
 #endif
 }
diff --git a/libavcodec/idctdsp.c b/libavcodec/idctdsp.c
index 8a71c7ef77..e039f900eb 100644
--- a/libavcodec/idctdsp.c
+++ b/libavcodec/idctdsp.c
@@ -41,7 +41,7 @@ av_cold void ff_init_scantable_permutation(uint8_t *idct_permutation,
 {
     int i;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     if (ff_init_scantable_permutation_x86(idct_permutation,
                                           perm_type))
         return;
@@ -301,7 +301,7 @@ av_cold void ff_idctdsp_init(IDCTDSPContext *c, AVCodecContext *avctx)
     ff_idctdsp_init_ppc(c, avctx, high_bit_depth);
 #elif ARCH_RISCV
     ff_idctdsp_init_riscv(c, avctx, high_bit_depth);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_idctdsp_init_x86(c, avctx, high_bit_depth);
 #elif ARCH_MIPS
     ff_idctdsp_init_mips(c, avctx, high_bit_depth);
diff --git a/libavcodec/jpeg2000dsp.c b/libavcodec/jpeg2000dsp.c
index 7840fdc357..2931a38ef1 100644
--- a/libavcodec/jpeg2000dsp.c
+++ b/libavcodec/jpeg2000dsp.c
@@ -98,7 +98,7 @@ av_cold void ff_jpeg2000dsp_init(Jpeg2000DSPContext *c)
 
 #if ARCH_RISCV
     ff_jpeg2000dsp_init_riscv(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_jpeg2000dsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/lossless_audiodsp.c b/libavcodec/lossless_audiodsp.c
index 94e6ce0989..2d57857dad 100644
--- a/libavcodec/lossless_audiodsp.c
+++ b/libavcodec/lossless_audiodsp.c
@@ -63,7 +63,7 @@ av_cold void ff_llauddsp_init(LLAudDSPContext *c)
     ff_llauddsp_init_arm(c);
 #elif ARCH_RISCV
     ff_llauddsp_init_riscv(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_llauddsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/lossless_videodsp.c b/libavcodec/lossless_videodsp.c
index 229494bb50..bf3a3da90d 100644
--- a/libavcodec/lossless_videodsp.c
+++ b/libavcodec/lossless_videodsp.c
@@ -124,7 +124,7 @@ av_cold void ff_llviddsp_init(LLVidDSPContext *c)
     ff_llviddsp_init_ppc(c);
 #elif ARCH_RISCV
     ff_llviddsp_init_riscv(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_llviddsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/me_cmp.c b/libavcodec/me_cmp.c
index 8e53f6d573..09861e2074 100644
--- a/libavcodec/me_cmp.c
+++ b/libavcodec/me_cmp.c
@@ -1019,7 +1019,7 @@ av_cold void ff_me_cmp_init(MECmpContext *c, AVCodecContext *avctx)
     ff_me_cmp_init_ppc(c, avctx);
 #elif ARCH_RISCV
     ff_me_cmp_init_riscv(c, avctx);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_me_cmp_init_x86(c, avctx);
 #elif ARCH_MIPS
     ff_me_cmp_init_mips(c, avctx);
diff --git a/libavcodec/opus/dsp.c b/libavcodec/opus/dsp.c
index 6cd76ceceb..f2278c5bde 100644
--- a/libavcodec/opus/dsp.c
+++ b/libavcodec/opus/dsp.c
@@ -62,7 +62,7 @@ av_cold void ff_opus_dsp_init(OpusDSP *ctx)
     ff_opus_dsp_init_aarch64(ctx);
 #elif ARCH_RISCV
     ff_opus_dsp_init_riscv(ctx);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_opus_dsp_init_x86(ctx);
 #endif
 }
diff --git a/libavcodec/opus/pvq.c b/libavcodec/opus/pvq.c
index fe57ab02ce..3dea7c19f2 100644
--- a/libavcodec/opus/pvq.c
+++ b/libavcodec/opus/pvq.c
@@ -914,7 +914,7 @@ int av_cold ff_celt_pvq_init(CeltPVQ **pvq, int encode)
 
 #if CONFIG_OPUS_ENCODER
     s->pvq_search = ppp_pvq_search_c;
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_celt_pvq_init_x86(s);
 #endif
 #endif
diff --git a/libavcodec/pixblockdsp.c b/libavcodec/pixblockdsp.c
index 110a374260..5ae840f100 100644
--- a/libavcodec/pixblockdsp.c
+++ b/libavcodec/pixblockdsp.c
@@ -108,7 +108,7 @@ av_cold void ff_pixblockdsp_init(PixblockDSPContext *c, int bits_per_raw_sample)
     ff_pixblockdsp_init_ppc(c, high_bit_depth);
 #elif ARCH_RISCV
     ff_pixblockdsp_init_riscv(c, high_bit_depth);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_pixblockdsp_init_x86(c, high_bit_depth);
 #elif ARCH_MIPS
     ff_pixblockdsp_init_mips(c, high_bit_depth);
diff --git a/libavcodec/pngdsp.c b/libavcodec/pngdsp.c
index 50ee96a684..ae40113a51 100644
--- a/libavcodec/pngdsp.c
+++ b/libavcodec/pngdsp.c
@@ -58,7 +58,7 @@ av_cold void ff_pngdsp_init(PNGDSPContext *dsp)
     dsp->add_bytes_l2         = add_bytes_l2_c;
     dsp->add_paeth_prediction = ff_add_png_paeth_prediction;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_pngdsp_init_x86(dsp);
 #endif
 }
diff --git a/libavcodec/proresdsp.c b/libavcodec/proresdsp.c
index a4921128f7..eb5dbf4799 100644
--- a/libavcodec/proresdsp.c
+++ b/libavcodec/proresdsp.c
@@ -149,7 +149,7 @@ av_cold void ff_proresdsp_init(ProresDSPContext *dsp, int bits_per_raw_sample)
         dsp->idct_permutation_type = FF_IDCT_PERM_NONE;
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_proresdsp_init_x86(dsp, bits_per_raw_sample);
 #endif
 
diff --git a/libavcodec/qpeldsp.c b/libavcodec/qpeldsp.c
index 5f937f9d9e..33a5eccd0b 100644
--- a/libavcodec/qpeldsp.c
+++ b/libavcodec/qpeldsp.c
@@ -810,7 +810,7 @@ av_cold void ff_qpeldsp_init(QpelDSPContext *c)
     dspfunc(avg_qpel, 0, 16);
     dspfunc(avg_qpel, 1, 8);
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_qpeldsp_init_x86(c);
 #elif ARCH_MIPS
     ff_qpeldsp_init_mips(c);
diff --git a/libavcodec/rv34dsp.c b/libavcodec/rv34dsp.c
index 44486f8edd..2e27137be6 100644
--- a/libavcodec/rv34dsp.c
+++ b/libavcodec/rv34dsp.c
@@ -140,7 +140,7 @@ av_cold void ff_rv34dsp_init(RV34DSPContext *c)
     ff_rv34dsp_init_arm(c);
 #elif ARCH_RISCV
     ff_rv34dsp_init_riscv(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_rv34dsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/rv40dsp.c b/libavcodec/rv40dsp.c
index dd73737bd6..7370b89e1b 100644
--- a/libavcodec/rv40dsp.c
+++ b/libavcodec/rv40dsp.c
@@ -712,7 +712,7 @@ av_cold void ff_rv40dsp_init(RV34DSPContext *c)
     ff_rv40dsp_init_arm(c);
 #elif ARCH_RISCV
     ff_rv40dsp_init_riscv(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_rv40dsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/sbcdsp.c b/libavcodec/sbcdsp.c
index 00f9c4c68d..5674bdc4a7 100644
--- a/libavcodec/sbcdsp.c
+++ b/libavcodec/sbcdsp.c
@@ -382,7 +382,7 @@ av_cold void ff_sbcdsp_init(SBCDSPContext *s)
 
 #if ARCH_ARM
     ff_sbcdsp_init_arm(s);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_sbcdsp_init_x86(s);
 #endif
 }
diff --git a/libavcodec/sbrdsp_template.c b/libavcodec/sbrdsp_template.c
index 9a94af8670..b5766c6980 100644
--- a/libavcodec/sbrdsp_template.c
+++ b/libavcodec/sbrdsp_template.c
@@ -102,7 +102,7 @@ av_cold void AAC_RENAME(ff_sbrdsp_init)(SBRDSPContext *s)
     ff_sbrdsp_init_aarch64(s);
 #elif ARCH_RISCV
     ff_sbrdsp_init_riscv(s);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_sbrdsp_init_x86(s);
 #endif
 #endif /* !USE_FIXED */
diff --git a/libavcodec/svq1encdsp.h b/libavcodec/svq1encdsp.h
index 751b5eed86..dcc8e825a3 100644
--- a/libavcodec/svq1encdsp.h
+++ b/libavcodec/svq1encdsp.h
@@ -52,7 +52,7 @@ static inline void ff_svq1enc_init(SVQ1EncDSPContext *c)
     ff_svq1enc_init_ppc(c);
 #elif ARCH_RISCV
     ff_svq1enc_init_riscv(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_svq1enc_init_x86(c);
 #endif
 }
diff --git a/libavcodec/synth_filter.c b/libavcodec/synth_filter.c
index f90c6be7a7..82a2f812b8 100644
--- a/libavcodec/synth_filter.c
+++ b/libavcodec/synth_filter.c
@@ -180,7 +180,7 @@ av_cold void ff_synth_filter_init(SynthFilterContext *c)
     ff_synth_filter_init_aarch64(c);
 #elif ARCH_ARM
     ff_synth_filter_init_arm(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_synth_filter_init_x86(c);
 #endif
 }
diff --git a/libavcodec/takdsp.c b/libavcodec/takdsp.c
index 51b6658de4..a7e281b6e2 100644
--- a/libavcodec/takdsp.c
+++ b/libavcodec/takdsp.c
@@ -79,7 +79,7 @@ av_cold void ff_takdsp_init(TAKDSPContext *c)
 
 #if ARCH_RISCV
     ff_takdsp_init_riscv(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_takdsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/ttadsp.c b/libavcodec/ttadsp.c
index 5dda19587c..af82850869 100644
--- a/libavcodec/ttadsp.c
+++ b/libavcodec/ttadsp.c
@@ -57,7 +57,7 @@ av_cold void ff_ttadsp_init(TTADSPContext *c)
 {
     c->filter_process = tta_filter_process_c;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_ttadsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/ttaencdsp.c b/libavcodec/ttaencdsp.c
index 0efdc109bb..0a717313bf 100644
--- a/libavcodec/ttaencdsp.c
+++ b/libavcodec/ttaencdsp.c
@@ -54,7 +54,7 @@ av_cold void ff_ttaencdsp_init(TTAEncDSPContext *c)
 {
     c->filter_process = ttaenc_filter_process_c;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_ttaencdsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/utvideodsp.c b/libavcodec/utvideodsp.c
index b63dafbe17..209f8561ab 100644
--- a/libavcodec/utvideodsp.c
+++ b/libavcodec/utvideodsp.c
@@ -79,7 +79,7 @@ av_cold void ff_utvideodsp_init(UTVideoDSPContext *c)
 
 #if ARCH_RISCV
     ff_utvideodsp_init_riscv(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_utvideodsp_init_x86(c);
 #endif
 }
diff --git a/libavcodec/v210dec_init.h b/libavcodec/v210dec_init.h
index a0c97bf426..7523cb02be 100644
--- a/libavcodec/v210dec_init.h
+++ b/libavcodec/v210dec_init.h
@@ -54,7 +54,7 @@ static void v210_planar_unpack_c(const uint32_t *src, uint16_t *y, uint16_t *u,
 av_unused static av_cold void ff_v210dec_init(V210DecContext *s)
 {
     s->unpack_frame = v210_planar_unpack_c;
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_v210_x86_init(s);
 #endif
 }
diff --git a/libavcodec/v210enc_init.h b/libavcodec/v210enc_init.h
index 01b6981b50..75ce624854 100644
--- a/libavcodec/v210enc_init.h
+++ b/libavcodec/v210enc_init.h
@@ -83,7 +83,7 @@ av_unused av_cold static void ff_v210enc_init(V210EncContext *s)
     s->sample_factor_8  = 2;
     s->sample_factor_10 = 1;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_v210enc_init_x86(s);
 #endif
 }
diff --git a/libavcodec/vc1dsp.c b/libavcodec/vc1dsp.c
index 2caa3c6863..864a6e5e7b 100644
--- a/libavcodec/vc1dsp.c
+++ b/libavcodec/vc1dsp.c
@@ -1041,7 +1041,7 @@ av_cold void ff_vc1dsp_init(VC1DSPContext *dsp)
     ff_vc1dsp_init_ppc(dsp);
 #elif ARCH_RISCV
     ff_vc1dsp_init_riscv(dsp);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_vc1dsp_init_x86(dsp);
 #elif ARCH_MIPS
     ff_vc1dsp_init_mips(dsp);
diff --git a/libavcodec/videodsp.c b/libavcodec/videodsp.c
index a19e87a819..c66757ce83 100644
--- a/libavcodec/videodsp.c
+++ b/libavcodec/videodsp.c
@@ -53,7 +53,7 @@ av_cold void ff_videodsp_init(VideoDSPContext *ctx, int bpc)
     ff_videodsp_init_ppc(ctx, bpc);
 #elif ARCH_RISCV
     ff_videodsp_init_riscv(ctx, bpc);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_videodsp_init_x86(ctx, bpc);
 #elif ARCH_MIPS
     ff_videodsp_init_mips(ctx, bpc);
diff --git a/libavcodec/vorbisdsp.c b/libavcodec/vorbisdsp.c
index 70022bd262..54a55d109d 100644
--- a/libavcodec/vorbisdsp.c
+++ b/libavcodec/vorbisdsp.c
@@ -55,7 +55,7 @@ av_cold void ff_vorbisdsp_init(VorbisDSPContext *dsp)
     ff_vorbisdsp_init_ppc(dsp);
 #elif ARCH_RISCV
     ff_vorbisdsp_init_riscv(dsp);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_vorbisdsp_init_x86(dsp);
 #endif
 }
diff --git a/libavcodec/vp3dsp.c b/libavcodec/vp3dsp.c
index b96b4dea68..025ad04231 100644
--- a/libavcodec/vp3dsp.c
+++ b/libavcodec/vp3dsp.c
@@ -459,7 +459,7 @@ av_cold void ff_vp3dsp_init(VP3DSPContext *c)
     ff_vp3dsp_init_arm(c);
 #elif ARCH_PPC
     ff_vp3dsp_init_ppc(c);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_vp3dsp_init_x86(c);
 #elif ARCH_MIPS
     ff_vp3dsp_init_mips(c);
diff --git a/libavcodec/vp6dsp.c b/libavcodec/vp6dsp.c
index bdaa054307..e12a7ed2ad 100644
--- a/libavcodec/vp6dsp.c
+++ b/libavcodec/vp6dsp.c
@@ -64,7 +64,7 @@ av_cold void ff_vp6dsp_init(VP6DSPContext *s)
 {
     s->vp6_filter_diag4 = vp6_filter_diag4_c;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_vp6dsp_init_x86(s);
 #endif
 }
diff --git a/libavcodec/vp8dsp.c b/libavcodec/vp8dsp.c
index 146cb0a7c7..5543303adb 100644
--- a/libavcodec/vp8dsp.c
+++ b/libavcodec/vp8dsp.c
@@ -683,7 +683,7 @@ av_cold void ff_vp78dsp_init(VP8DSPContext *dsp)
     ff_vp78dsp_init_ppc(dsp);
 #elif ARCH_RISCV
     ff_vp78dsp_init_riscv(dsp);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_vp78dsp_init_x86(dsp);
 #endif
 }
@@ -750,7 +750,7 @@ av_cold void ff_vp8dsp_init(VP8DSPContext *dsp)
     ff_vp8dsp_init_arm(dsp);
 #elif ARCH_RISCV
     ff_vp8dsp_init_riscv(dsp);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_vp8dsp_init_x86(dsp);
 #elif ARCH_MIPS
     ff_vp8dsp_init_mips(dsp);
diff --git a/libavcodec/vp9dsp.c b/libavcodec/vp9dsp.c
index 967e6e1e1a..147486e10b 100644
--- a/libavcodec/vp9dsp.c
+++ b/libavcodec/vp9dsp.c
@@ -102,7 +102,7 @@ av_cold void ff_vp9dsp_init(VP9DSPContext *dsp, int bpp, int bitexact)
     ff_vp9dsp_init_arm(dsp, bpp);
 #elif ARCH_RISCV
     ff_vp9dsp_init_riscv(dsp, bpp, bitexact);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_vp9dsp_init_x86(dsp, bpp, bitexact);
 #elif ARCH_MIPS
     ff_vp9dsp_init_mips(dsp, bpp);
diff --git a/libavcodec/vvc/dsp.c b/libavcodec/vvc/dsp.c
index af392f2754..60372cca45 100644
--- a/libavcodec/vvc/dsp.c
+++ b/libavcodec/vvc/dsp.c
@@ -113,7 +113,7 @@ void ff_vvc_dsp_init(VVCDSPContext *vvcdsp, int bit_depth)
     ff_vvc_dsp_init_aarch64(vvcdsp, bit_depth);
 #elif ARCH_RISCV
     ff_vvc_dsp_init_riscv(vvcdsp, bit_depth);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_vvc_dsp_init_x86(vvcdsp, bit_depth);
 #endif
 }
diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile
index 84ba395c82..63b16f24c8 100644
--- a/libavcodec/x86/Makefile
+++ b/libavcodec/x86/Makefile
@@ -1,89 +1,87 @@
 OBJS                                   += x86/constants.o               \
 
 # subsystems
-OBJS-$(CONFIG_AC3DSP)                  += x86/ac3dsp_init.o
-OBJS-$(CONFIG_AUDIODSP)                += x86/audiodsp_init.o
-OBJS-$(CONFIG_BLOCKDSP)                += x86/blockdsp_init.o
-OBJS-$(CONFIG_BSWAPDSP)                += x86/bswapdsp_init.o
-OBJS-$(CONFIG_DIRAC_DECODER)           += x86/diracdsp_init.o           \
+X86ASM-OBJS-$(CONFIG_AC3DSP)           += x86/ac3dsp_init.o
+X86ASM-OBJS-$(CONFIG_AUDIODSP)         += x86/audiodsp_init.o
+X86ASM-OBJS-$(CONFIG_BLOCKDSP)         += x86/blockdsp_init.o
+X86ASM-OBJS-$(CONFIG_BSWAPDSP)         += x86/bswapdsp_init.o
+X86ASM-OBJS-$(CONFIG_DIRAC_DECODER)    += x86/diracdsp_init.o           \
                                           x86/dirac_dwt_init.o
 OBJS-$(CONFIG_FDCTDSP)                 += x86/fdctdsp_init.o x86/fdct.o
-OBJS-$(CONFIG_FMTCONVERT)              += x86/fmtconvert_init.o
-OBJS-$(CONFIG_H263DSP)                 += x86/h263dsp_init.o
-OBJS-$(CONFIG_H264CHROMA)              += x86/h264chroma_init.o
-OBJS-$(CONFIG_H264DSP)                 += x86/h264dsp_init.o
-OBJS-$(CONFIG_H264PRED)                += x86/h264_intrapred_init.o
-OBJS-$(CONFIG_H264QPEL)                += x86/h264_qpel.o
-OBJS-$(CONFIG_HPELDSP)                 += x86/hpeldsp_init.o
-OBJS-$(CONFIG_LLAUDDSP)                += x86/lossless_audiodsp_init.o
-OBJS-$(CONFIG_LLVIDDSP)                += x86/lossless_videodsp_init.o
+X86ASM-OBJS-$(CONFIG_FMTCONVERT)       += x86/fmtconvert_init.o
+X86ASM-OBJS-$(CONFIG_H263DSP)          += x86/h263dsp_init.o
+X86ASM-OBJS-$(CONFIG_H264CHROMA)       += x86/h264chroma_init.o
+X86ASM-OBJS-$(CONFIG_H264DSP)          += x86/h264dsp_init.o
+X86ASM-OBJS-$(CONFIG_H264PRED)         += x86/h264_intrapred_init.o
+X86ASM-OBJS-$(CONFIG_H264QPEL)         += x86/h264_qpel.o
+X86ASM-OBJS-$(CONFIG_HPELDSP)          += x86/hpeldsp_init.o
+X86ASM-OBJS-$(CONFIG_HUFFYUVDSP)       += x86/huffyuvdsp_init.o
+X86ASM-OBJS-$(CONFIG_HUFFYUVENCDSP)    += x86/huffyuvencdsp_init.o
+X86ASM-OBJS-$(CONFIG_IDCTDSP)          += x86/idctdsp_init.o
+X86ASM-OBJS-$(CONFIG_LLAUDDSP)         += x86/lossless_audiodsp_init.o
+X86ASM-OBJS-$(CONFIG_LLVIDDSP)         += x86/lossless_videodsp_init.o
 OBJS-$(CONFIG_LLVIDENCDSP)             += x86/lossless_videoencdsp_init.o
-OBJS-$(CONFIG_HUFFYUVDSP)              += x86/huffyuvdsp_init.o
-OBJS-$(CONFIG_HUFFYUVENCDSP)           += x86/huffyuvencdsp_init.o
-OBJS-$(CONFIG_IDCTDSP)                 += x86/idctdsp_init.o
 OBJS-$(CONFIG_LPC)                     += x86/lpc_init.o
-OBJS-$(CONFIG_ME_CMP)                  += x86/me_cmp_init.o
+X86ASM-OBJS-$(CONFIG_ME_CMP)           += x86/me_cmp_init.o
 OBJS-$(CONFIG_MPEGAUDIODSP)            += x86/mpegaudiodsp.o
 OBJS-$(CONFIG_MPEGVIDEO)               += x86/mpegvideo.o
 OBJS-$(CONFIG_MPEGVIDEOENC)            += x86/mpegvideoenc.o
 OBJS-$(CONFIG_MPEGVIDEOENCDSP)         += x86/mpegvideoencdsp_init.o
-OBJS-$(CONFIG_PIXBLOCKDSP)             += x86/pixblockdsp_init.o
-OBJS-$(CONFIG_QPELDSP)                 += x86/qpeldsp_init.o
-OBJS-$(CONFIG_RV34DSP)                 += x86/rv34dsp_init.o
-OBJS-$(CONFIG_VC1DSP)                  += x86/vc1dsp_init.o
-OBJS-$(CONFIG_VIDEODSP)                += x86/videodsp_init.o
-OBJS-$(CONFIG_VP3DSP)                  += x86/vp3dsp_init.o
-OBJS-$(CONFIG_VP8DSP)                  += x86/vp8dsp_init.o
+X86ASM-OBJS-$(CONFIG_PIXBLOCKDSP)      += x86/pixblockdsp_init.o
+X86ASM-OBJS-$(CONFIG_QPELDSP)          += x86/qpeldsp_init.o
+X86ASM-OBJS-$(CONFIG_RV34DSP)          += x86/rv34dsp_init.o
+X86ASM-OBJS-$(CONFIG_VC1DSP)           += x86/vc1dsp_init.o x86/vc1dsp_mmx.o
+X86ASM-OBJS-$(CONFIG_VIDEODSP)         += x86/videodsp_init.o
+X86ASM-OBJS-$(CONFIG_VP3DSP)           += x86/vp3dsp_init.o
+X86ASM-OBJS-$(CONFIG_VP8DSP)           += x86/vp8dsp_init.o
 OBJS-$(CONFIG_XMM_CLOBBER_TEST)        += x86/w64xmmtest.o
 
 # decoders/encoders
-OBJS-$(CONFIG_AAC_DECODER)             += x86/aacpsdsp_init.o          \
+X86ASM-OBJS-$(CONFIG_AAC_DECODER)      += x86/aacpsdsp_init.o          \
                                           x86/sbrdsp_init.o
-OBJS-$(CONFIG_AAC_ENCODER)             += x86/aacencdsp_init.o
-OBJS-$(CONFIG_ADPCM_G722_DECODER)      += x86/g722dsp_init.o
-OBJS-$(CONFIG_ADPCM_G722_ENCODER)      += x86/g722dsp_init.o
-OBJS-$(CONFIG_ALAC_DECODER)            += x86/alacdsp_init.o
-OBJS-$(CONFIG_APNG_DECODER)            += x86/pngdsp_init.o
-OBJS-$(CONFIG_APV_DECODER)             += x86/apv_dsp_init.o
-OBJS-$(CONFIG_CAVS_DECODER)            += x86/cavsdsp.o
-OBJS-$(CONFIG_CFHD_DECODER)            += x86/cfhddsp_init.o
-OBJS-$(CONFIG_CFHD_ENCODER)            += x86/cfhdencdsp_init.o
-OBJS-$(CONFIG_DCA_DECODER)             += x86/dcadsp_init.o x86/synth_filter_init.o
-OBJS-$(CONFIG_DNXHD_ENCODER)           += x86/dnxhdenc_init.o
-OBJS-$(CONFIG_EXR_DECODER)             += x86/exrdsp_init.o
-OBJS-$(CONFIG_FLAC_DECODER)            += x86/flacdsp_init.o
-OBJS-$(CONFIG_FLAC_ENCODER)            += x86/flacencdsp_init.o
-OBJS-$(CONFIG_OPUS_DECODER)            += x86/opusdsp_init.o
-OBJS-$(CONFIG_OPUS_ENCODER)            += x86/celt_pvq_init.o
-OBJS-$(CONFIG_JPEG2000_DECODER)        += x86/jpeg2000dsp_init.o
-OBJS-$(CONFIG_LSCR_DECODER)            += x86/pngdsp_init.o
+X86ASM-OBJS-$(CONFIG_AAC_ENCODER)      += x86/aacencdsp_init.o
+X86ASM-OBJS-$(CONFIG_ADPCM_G722_DECODER) += x86/g722dsp_init.o
+X86ASM-OBJS-$(CONFIG_ADPCM_G722_ENCODER) += x86/g722dsp_init.o
+X86ASM-OBJS-$(CONFIG_ALAC_DECODER)     += x86/alacdsp_init.o
+X86ASM-OBJS-$(CONFIG_APNG_DECODER)     += x86/pngdsp_init.o
+X86ASM-OBJS-$(CONFIG_APV_DECODER)      += x86/apv_dsp_init.o
+X86ASM-OBJS-$(CONFIG_CAVS_DECODER)     += x86/cavsdsp.o
+X86ASM-OBJS-$(CONFIG_CFHD_DECODER)     += x86/cfhddsp_init.o
+X86ASM-OBJS-$(CONFIG_CFHD_ENCODER)     += x86/cfhdencdsp_init.o
+X86ASM-OBJS-$(CONFIG_DCA_DECODER)      += x86/dcadsp_init.o x86/synth_filter_init.o
+X86ASM-OBJS-$(CONFIG_DNXHD_ENCODER)    += x86/dnxhdenc_init.o
+X86ASM-OBJS-$(CONFIG_EXR_DECODER)      += x86/exrdsp_init.o
+X86ASM-OBJS-$(CONFIG_FLAC_DECODER)     += x86/flacdsp_init.o
+X86ASM-OBJS-$(CONFIG_FLAC_ENCODER)     += x86/flacencdsp_init.o
+X86ASM-OBJS-$(CONFIG_OPUS_DECODER)     += x86/opusdsp_init.o
+X86ASM-OBJS-$(CONFIG_OPUS_ENCODER)     += x86/celt_pvq_init.o
+X86ASM-OBJS-$(CONFIG_JPEG2000_DECODER) += x86/jpeg2000dsp_init.o
+X86ASM-OBJS-$(CONFIG_LSCR_DECODER)     += x86/pngdsp_init.o
 OBJS-$(CONFIG_MLP_DECODER)             += x86/mlpdsp_init.o
-OBJS-$(CONFIG_MPEG4_DECODER)           += x86/mpeg4videodsp.o x86/xvididct_init.o
-OBJS-$(CONFIG_PNG_DECODER)             += x86/pngdsp_init.o
-OBJS-$(CONFIG_PRORES_DECODER)          += x86/proresdsp_init.o
-OBJS-$(CONFIG_PRORES_RAW_DECODER)      += x86/proresdsp_init.o
-OBJS-$(CONFIG_RV40_DECODER)            += x86/rv40dsp_init.o
-OBJS-$(CONFIG_SBC_ENCODER)             += x86/sbcdsp_init.o
-OBJS-$(CONFIG_SVQ1_ENCODER)            += x86/svq1enc_init.o
-OBJS-$(CONFIG_TAK_DECODER)             += x86/takdsp_init.o
+OBJS-$(CONFIG_MPEG4_DECODER)           += x86/mpeg4videodsp.o
+X86ASM-OBJS-$(CONFIG_MPEG4_DECODER)    += x86/xvididct_init.o
+X86ASM-OBJS-$(CONFIG_PNG_DECODER)      += x86/pngdsp_init.o
+X86ASM-OBJS-$(CONFIG_PRORES_DECODER)   += x86/proresdsp_init.o
+X86ASM-OBJS-$(CONFIG_PRORES_RAW_DECODER) += x86/proresdsp_init.o
+X86ASM-OBJS-$(CONFIG_RV40_DECODER)     += x86/rv40dsp_init.o
+X86ASM-OBJS-$(CONFIG_SBC_ENCODER)      += x86/sbcdsp_init.o
+X86ASM-OBJS-$(CONFIG_SVQ1_ENCODER)     += x86/svq1enc_init.o
+X86ASM-OBJS-$(CONFIG_TAK_DECODER)      += x86/takdsp_init.o
 OBJS-$(CONFIG_TRUEHD_DECODER)          += x86/mlpdsp_init.o
-OBJS-$(CONFIG_TTA_DECODER)             += x86/ttadsp_init.o
-OBJS-$(CONFIG_TTA_ENCODER)             += x86/ttaencdsp_init.o
-OBJS-$(CONFIG_UTVIDEO_DECODER)         += x86/utvideodsp_init.o
-OBJS-$(CONFIG_V210_DECODER)            += x86/v210-init.o
-OBJS-$(CONFIG_V210_ENCODER)            += x86/v210enc_init.o
-OBJS-$(CONFIG_VORBIS_DECODER)          += x86/vorbisdsp_init.o
-OBJS-$(CONFIG_VP6_DECODER)             += x86/vp6dsp_init.o
-OBJS-$(CONFIG_VP9_DECODER)             += x86/vp9dsp_init.o            \
+X86ASM-OBJS-$(CONFIG_TTA_DECODER)      += x86/ttadsp_init.o
+X86ASM-OBJS-$(CONFIG_TTA_ENCODER)      += x86/ttaencdsp_init.o
+X86ASM-OBJS-$(CONFIG_UTVIDEO_DECODER)  += x86/utvideodsp_init.o
+X86ASM-OBJS-$(CONFIG_V210_DECODER)     += x86/v210-init.o
+X86ASM-OBJS-$(CONFIG_V210_ENCODER)     += x86/v210enc_init.o
+X86ASM-OBJS-$(CONFIG_VORBIS_DECODER)   += x86/vorbisdsp_init.o
+X86ASM-OBJS-$(CONFIG_VP6_DECODER)      += x86/vp6dsp_init.o
+X86ASM-OBJS-$(CONFIG_VP9_DECODER)      += x86/vp9dsp_init.o            \
                                           x86/vp9dsp_init_10bpp.o      \
                                           x86/vp9dsp_init_12bpp.o      \
                                           x86/vp9dsp_init_16bpp.o
 
 
 # GCC inline assembly optimizations
-# subsystems
-MMX-OBJS-$(CONFIG_VC1DSP)              += x86/vc1dsp_mmx.o
-
 # decoders/encoders
 MMX-OBJS-$(CONFIG_SNOW_DECODER)        += x86/snowdsp.o
 MMX-OBJS-$(CONFIG_SNOW_ENCODER)        += x86/snowdsp.o
diff --git a/libavcodec/x86/alacdsp_init.c b/libavcodec/x86/alacdsp_init.c
index 18f7308a12..1b2ff9525b 100644
--- a/libavcodec/x86/alacdsp_init.c
+++ b/libavcodec/x86/alacdsp_init.c
@@ -19,7 +19,6 @@
 #include "libavutil/attributes.h"
 #include "libavutil/x86/cpu.h"
 #include "libavcodec/alacdsp.h"
-#include "config.h"
 
 void ff_alac_decorrelate_stereo_sse4(int32_t *buffer[2], int nb_samples,
                                      int decorr_shift, int decorr_left_weight);
@@ -30,7 +29,6 @@ void ff_alac_append_extra_bits_mono_sse2(int32_t *buffer[2], int32_t *extra_bits
 
 av_cold void ff_alacdsp_init_x86(ALACDSPContext *c)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_SSE2(cpu_flags)) {
@@ -40,5 +38,4 @@ av_cold void ff_alacdsp_init_x86(ALACDSPContext *c)
     if (EXTERNAL_SSE4(cpu_flags)) {
         c->decorrelate_stereo   = ff_alac_decorrelate_stereo_sse4;
     }
-#endif /* HAVE_X86ASM */
 }
diff --git a/libavcodec/x86/blockdsp_init.c b/libavcodec/x86/blockdsp_init.c
index 37f3bb6a84..a2b362e655 100644
--- a/libavcodec/x86/blockdsp_init.c
+++ b/libavcodec/x86/blockdsp_init.c
@@ -36,7 +36,6 @@ void ff_fill_block_tab_8_avx2(uint8_t *block, uint8_t value, ptrdiff_t line_size
 
 av_cold void ff_blockdsp_init_x86(BlockDSPContext *c)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_SSE(cpu_flags)) {
@@ -55,5 +54,4 @@ av_cold void ff_blockdsp_init_x86(BlockDSPContext *c)
         c->fill_block_tab[0] = ff_fill_block_tab_16_avx2;
         c->fill_block_tab[1] = ff_fill_block_tab_8_avx2;
     }
-#endif /* HAVE_X86ASM */
 }
diff --git a/libavcodec/x86/dirac_dwt_init.c b/libavcodec/x86/dirac_dwt_init.c
index 13b42b60cb..ecf89342b0 100644
--- a/libavcodec/x86/dirac_dwt_init.c
+++ b/libavcodec/x86/dirac_dwt_init.c
@@ -20,7 +20,6 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
-#include "libavutil/x86/asm.h"
 #include "libavutil/x86/cpu.h"
 #include "libavcodec/dirac_dwt.h"
 
@@ -133,10 +132,8 @@ static void horizontal_compose_haar1i##ext(uint8_t *_b, uint8_t *_tmp, int w)\
 }\
 \
 
-#if HAVE_X86ASM
 COMPOSE_VERTICAL(_sse2, 8)
 
-
 void ff_horizontal_compose_dd97i_ssse3(int16_t *_b, int16_t *_tmp, int w);
 
 static void horizontal_compose_dd97i_ssse3(uint8_t *_b, uint8_t *_tmp, int w)
@@ -153,11 +150,9 @@ static void horizontal_compose_dd97i_ssse3(uint8_t *_b, uint8_t *_tmp, int w)
         b[2*x+1] = (COMPOSE_DD97iH0(tmp[x-1], tmp[x], b[x+w2], tmp[x+1], tmp[x+2]) + 1)>>1;
     }
 }
-#endif
 
 void ff_spatial_idwt_init_x86(DWTContext *d, enum dwt_type type)
 {
-#if HAVE_X86ASM
   int mm_flags = av_get_cpu_flags();
 
     if (!(mm_flags & AV_CPU_FLAG_SSE2))
@@ -194,5 +189,4 @@ void ff_spatial_idwt_init_x86(DWTContext *d, enum dwt_type type)
         d->horizontal_compose = horizontal_compose_dd97i_ssse3;
         break;
     }
-#endif // HAVE_X86ASM
 }
diff --git a/libavcodec/x86/diracdsp_init.c b/libavcodec/x86/diracdsp_init.c
index ef01ebdf2e..4f27e1fc2b 100644
--- a/libavcodec/x86/diracdsp_init.c
+++ b/libavcodec/x86/diracdsp_init.c
@@ -34,8 +34,6 @@ void ff_put_signed_rect_clamped_10_sse4(uint8_t *dst, int dst_stride, const uint
 
 void ff_dequant_subband_32_sse4(uint8_t *src, uint8_t *dst, ptrdiff_t stride, const int qf, const int qs, int tot_v, int tot_h);
 
-#if HAVE_X86ASM
-
 #define HPEL_FILTER(MMSIZE, EXT)                                                             \
     void ff_dirac_hpel_filter_v_ ## EXT(uint8_t *, const uint8_t *, int, int);               \
     void ff_dirac_hpel_filter_h_ ## EXT(uint8_t *, const uint8_t *, int);                    \
@@ -81,11 +79,8 @@ DIRAC_PIXOP(avg, sse2)
 
 HPEL_FILTER(16, sse2)
 
-#endif // HAVE_X86ASM
-
 void ff_diracdsp_init_x86(DiracDSPContext* c)
 {
-#if HAVE_X86ASM
     int mm_flags = av_get_cpu_flags();
 
     if (EXTERNAL_SSE2(mm_flags)) {
@@ -107,5 +102,4 @@ void ff_diracdsp_init_x86(DiracDSPContext* c)
         c->dequant_subband[1]         = ff_dequant_subband_32_sse4;
         c->put_signed_rect_clamped[1] = ff_put_signed_rect_clamped_10_sse4;
     }
-#endif // HAVE_X86ASM
 }
diff --git a/libavcodec/x86/flacdsp_init.c b/libavcodec/x86/flacdsp_init.c
index 386955ba67..a2c3829d66 100644
--- a/libavcodec/x86/flacdsp_init.c
+++ b/libavcodec/x86/flacdsp_init.c
@@ -62,7 +62,6 @@ DECORRELATE_IFUNCS(32,  avx);
 
 av_cold void ff_flacdsp_init_x86(FLACDSPContext *c, enum AVSampleFormat fmt, int channels)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_SSE2(cpu_flags)) {
@@ -127,5 +126,4 @@ av_cold void ff_flacdsp_init_x86(FLACDSPContext *c, enum AVSampleFormat fmt, int
     if (EXTERNAL_XOP(cpu_flags)) {
         c->lpc32 = ff_flac_lpc_32_xop;
     }
-#endif /* HAVE_X86ASM */
 }
diff --git a/libavcodec/x86/flacencdsp_init.c b/libavcodec/x86/flacencdsp_init.c
index 5ab37e0a8f..5b45f009d8 100644
--- a/libavcodec/x86/flacencdsp_init.c
+++ b/libavcodec/x86/flacencdsp_init.c
@@ -27,12 +27,12 @@ void ff_flac_enc_lpc_16_sse4(int32_t *, const int32_t *, int, int, const int32_t
 
 av_cold void ff_flacencdsp_init_x86(FLACEncDSPContext *c)
 {
-#if HAVE_X86ASM && CONFIG_GPL
+#if CONFIG_GPL
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_SSE4(cpu_flags)) {
         if (CONFIG_GPL)
             c->lpc16_encode = ff_flac_enc_lpc_16_sse4;
     }
-#endif /* HAVE_X86ASM */
+#endif /* CONFIG_GPL */
 }
diff --git a/libavcodec/x86/fmtconvert_init.c b/libavcodec/x86/fmtconvert_init.c
index acbc334565..6cf3a807a6 100644
--- a/libavcodec/x86/fmtconvert_init.c
+++ b/libavcodec/x86/fmtconvert_init.c
@@ -27,22 +27,16 @@
 #include "libavutil/x86/cpu.h"
 #include "libavcodec/fmtconvert.h"
 
-#if HAVE_X86ASM
-
 void ff_int32_to_float_fmul_scalar_sse2(float *dst, const int32_t *src, float mul, int len);
 void ff_int32_to_float_fmul_array8_sse2(FmtConvertContext *c, float *dst, const int32_t *src,
                                         const float *mul, int len);
 
-#endif /* HAVE_X86ASM */
-
 av_cold void ff_fmt_convert_init_x86(FmtConvertContext *c)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_SSE2(cpu_flags)) {
         c->int32_to_float_fmul_scalar = ff_int32_to_float_fmul_scalar_sse2;
         c->int32_to_float_fmul_array8 = ff_int32_to_float_fmul_array8_sse2;
     }
-#endif /* HAVE_X86ASM */
 }
diff --git a/libavcodec/x86/h264_qpel.c b/libavcodec/x86/h264_qpel.c
index f2bcca1e11..5d618651a4 100644
--- a/libavcodec/x86/h264_qpel.c
+++ b/libavcodec/x86/h264_qpel.c
@@ -30,7 +30,6 @@
 #include "fpel.h"
 #include "qpel.h"
 
-#if HAVE_X86ASM
 void ff_avg_pixels4_mmxext(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
 void ff_put_pixels4x4_l2_mmxext(uint8_t *dst, const uint8_t *src1, const uint8_t *src2,
                                 ptrdiff_t stride);
@@ -344,8 +343,6 @@ LUMA_MC_816(10, mc13, sse2)
 LUMA_MC_816(10, mc23, sse2)
 LUMA_MC_816(10, mc33, sse2)
 
-#endif /* HAVE_X86ASM */
-
 #define SET_QPEL_FUNCS_1PP(PFX, IDX, SIZE, CPU, PREFIX)                      \
     do {                                                                     \
     c->PFX ## _pixels_tab[IDX][ 1] = PREFIX ## PFX ## SIZE ## _mc10_ ## CPU; \
@@ -388,7 +385,6 @@ LUMA_MC_816(10, mc33, sse2)
 
 av_cold void ff_h264qpel_init_x86(H264QpelContext *c, int bit_depth)
 {
-#if HAVE_X86ASM
     int high_bit_depth = bit_depth > 8;
     int cpu_flags = av_get_cpu_flags();
 
@@ -455,5 +451,4 @@ av_cold void ff_h264qpel_init_x86(H264QpelContext *c, int bit_depth)
             H264_QPEL_FUNCS_10(3, 0, ssse3_cache64);
         }
     }
-#endif
 }
diff --git a/libavcodec/x86/h264dsp_init.c b/libavcodec/x86/h264dsp_init.c
index dc8fc4f720..66c2f36908 100644
--- a/libavcodec/x86/h264dsp_init.c
+++ b/libavcodec/x86/h264dsp_init.c
@@ -190,7 +190,6 @@ H264_BIWEIGHT_10_SSE(4,  10)
 av_cold void ff_h264dsp_init_x86(H264DSPContext *c, const int bit_depth,
                                  const int chroma_format_idc)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_MMXEXT(cpu_flags) && chroma_format_idc <= 1)
@@ -363,5 +362,4 @@ av_cold void ff_h264dsp_init_x86(H264DSPContext *c, const int bit_depth,
 #endif /* HAVE_ALIGNED_STACK */
         }
     }
-#endif
 }
diff --git a/libavcodec/x86/hevc/Makefile b/libavcodec/x86/hevc/Makefile
index 8f1c88c569..74418a322c 100644
--- a/libavcodec/x86/hevc/Makefile
+++ b/libavcodec/x86/hevc/Makefile
@@ -1,12 +1,12 @@
 clean::
 	$(RM) $(CLEANSUFFIXES:%=libavcodec/x86/hevc/%) $(CLEANSUFFIXES:%=libavcodec/x86/h26x/%)
 
-OBJS-$(CONFIG_HEVC_DECODER)             += x86/hevc/dsp_init.o      \
-                                           x86/h26x/h2656dsp.o
-X86ASM-OBJS-$(CONFIG_HEVC_DECODER)      += x86/hevc/add_res.o       \
+X86ASM-OBJS-$(CONFIG_HEVC_DECODER)      += x86/hevc/dsp_init.o      \
+                                           x86/hevc/add_res.o       \
                                            x86/hevc/deblock.o       \
                                            x86/hevc/idct.o          \
                                            x86/hevc/mc.o            \
                                            x86/hevc/sao.o           \
                                            x86/hevc/sao_10bit.o     \
+                                           x86/h26x/h2656dsp.o      \
                                            x86/h26x/h2656_inter.o
diff --git a/libavcodec/x86/lossless_audiodsp_init.c b/libavcodec/x86/lossless_audiodsp_init.c
index 462329db32..a8bae6ccad 100644
--- a/libavcodec/x86/lossless_audiodsp_init.c
+++ b/libavcodec/x86/lossless_audiodsp_init.c
@@ -34,7 +34,6 @@ int32_t ff_scalarproduct_and_madd_int32_sse4(int16_t *v1, const int32_t *v2,
 
 av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_SSE2(cpu_flags))
@@ -46,5 +45,4 @@ av_cold void ff_llauddsp_init_x86(LLAudDSPContext *c)
 
     if (EXTERNAL_SSE4(cpu_flags))
         c->scalarproduct_and_madd_int32 = ff_scalarproduct_and_madd_int32_sse4;
-#endif
 }
diff --git a/libavcodec/x86/me_cmp_init.c b/libavcodec/x86/me_cmp_init.c
index 3a8b46f4e1..dbb4ef96bb 100644
--- a/libavcodec/x86/me_cmp_init.c
+++ b/libavcodec/x86/me_cmp_init.c
@@ -80,7 +80,6 @@ int ff_vsad16u_approx_sse2(MPVEncContext *v, const uint8_t *pix1, const uint8_t
 hadamard_func(sse2)
 hadamard_func(ssse3)
 
-#if HAVE_X86ASM
 static int nsse16_ssse3(MPVEncContext *c, const uint8_t *pix1, const uint8_t *pix2,
                         ptrdiff_t stride, int h)
 {
@@ -107,11 +106,8 @@ static int nsse8_ssse3(MPVEncContext *c, const uint8_t *pix1, const uint8_t *pix
         return score1 + FFABS(score2) * 8;
 }
 
-#endif /* HAVE_X86ASM */
-
 av_cold void ff_me_cmp_init_x86(MECmpContext *c, AVCodecContext *avctx)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_MMXEXT(cpu_flags)) {
@@ -174,5 +170,4 @@ av_cold void ff_me_cmp_init_x86(MECmpContext *c, AVCodecContext *avctx)
         c->hadamard8_diff[0] = ff_hadamard8_diff16_ssse3;
         c->hadamard8_diff[1] = ff_hadamard8_diff_ssse3;
     }
-#endif
 }
diff --git a/libavcodec/x86/qpeldsp_init.c b/libavcodec/x86/qpeldsp_init.c
index cab2ac433a..a569f564e6 100644
--- a/libavcodec/x86/qpeldsp_init.c
+++ b/libavcodec/x86/qpeldsp_init.c
@@ -80,8 +80,6 @@ void ff_put_no_rnd_mpeg4_qpel8_v_lowpass_mmxext(uint8_t *dst,
                                                 const uint8_t *src,
                                                 ptrdiff_t dstStride, ptrdiff_t srcStride);
 
-#if HAVE_X86ASM
-
 #define QPEL_OP(OPNAME, RND, MMX)                                       \
 static void OPNAME ## qpel8_mc10_ ## MMX(uint8_t *dst,                  \
                                          const uint8_t *src,            \
@@ -485,8 +483,6 @@ QPEL_OP(put_,        _,        mmxext)
 QPEL_OP(avg_,        _,        mmxext)
 QPEL_OP(put_no_rnd_, _no_rnd_, mmxext)
 
-#endif /* HAVE_X86ASM */
-
 #define SET_QPEL_FUNCS(PFX, IDX, SIZE, CPU, PREFIX)                          \
 do {                                                                         \
     c->PFX ## _pixels_tab[IDX][ 1] = PREFIX ## PFX ## SIZE ## _mc10_ ## CPU; \
diff --git a/libavcodec/x86/rv40dsp_init.c b/libavcodec/x86/rv40dsp_init.c
index a07acae6bc..97abee3219 100644
--- a/libavcodec/x86/rv40dsp_init.c
+++ b/libavcodec/x86/rv40dsp_init.c
@@ -39,7 +39,6 @@ static void op##_rv40_qpel##size##_mc33_##insn(uint8_t *dst, const uint8_t *src,
     ff_##op##_pixels##size##_xy2_##insn(dst, src, stride, size); \
 }
 
-#if HAVE_X86ASM
 #define DECLARE_WEIGHT(opt) \
 void ff_rv40_weight_func_rnd_16_##opt(uint8_t *dst, uint8_t *src1, uint8_t *src2, \
                                       int w1, int w2, ptrdiff_t stride); \
@@ -174,13 +173,10 @@ void ff_rv40_ ## OP ## _chroma_mc ## SIZE ## _ ## XMM(uint8_t *dst, const uint8_
                                                       ptrdiff_t stride, int h, int x, int y);\
     c->OP ## _chroma_pixels_tab[SIZE == 4] = ff_rv40_ ## OP ## _chroma_mc ## SIZE ## _ ## XMM
 
-#endif /* HAVE_X86ASM */
-
 av_cold void ff_rv40dsp_init_x86(RV34DSPContext *c)
 {
     av_unused int cpu_flags = av_get_cpu_flags();
 
-#if HAVE_X86ASM
     if (EXTERNAL_SSE2(cpu_flags)) {
         c->put_pixels_tab[0][15]        = put_rv40_qpel16_mc33_sse2;
         c->avg_pixels_tab[0][15]        = avg_rv40_qpel16_mc33_sse2;
@@ -207,5 +203,4 @@ av_cold void ff_rv40dsp_init_x86(RV34DSPContext *c)
         QPEL_MC_SET(put_, _ssse3)
         QPEL_MC_SET(avg_, _ssse3)
     }
-#endif /* HAVE_X86ASM */
 }
diff --git a/libavcodec/x86/synth_filter_init.c b/libavcodec/x86/synth_filter_init.c
index e09870b23d..93407c92da 100644
--- a/libavcodec/x86/synth_filter_init.c
+++ b/libavcodec/x86/synth_filter_init.c
@@ -43,15 +43,12 @@ static void synth_filter_##opt(AVTXContext *imdct,                             \
     *synth_buf_offset = (*synth_buf_offset - 32) & 511;                        \
 }                                                                              \
 
-#if HAVE_X86ASM
 SYNTH_FILTER_FUNC(sse2)
 SYNTH_FILTER_FUNC(avx)
 SYNTH_FILTER_FUNC(fma3)
-#endif /* HAVE_X86ASM */
 
 av_cold void ff_synth_filter_init_x86(SynthFilterContext *s)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_SSE2(cpu_flags)) {
@@ -63,5 +60,4 @@ av_cold void ff_synth_filter_init_x86(SynthFilterContext *s)
     if (EXTERNAL_FMA3_FAST(cpu_flags)) {
         s->synth_filter_float = synth_filter_fma3;
     }
-#endif /* HAVE_X86ASM */
 }
diff --git a/libavcodec/x86/takdsp_init.c b/libavcodec/x86/takdsp_init.c
index 9553f8442c..68eb1b8c55 100644
--- a/libavcodec/x86/takdsp_init.c
+++ b/libavcodec/x86/takdsp_init.c
@@ -21,7 +21,6 @@
 #include "libavutil/attributes.h"
 #include "libavcodec/takdsp.h"
 #include "libavutil/x86/cpu.h"
-#include "config.h"
 
 void ff_tak_decorrelate_ls_sse2(const int32_t *p1, int32_t *p2, int length);
 void ff_tak_decorrelate_ls_avx2(const int32_t *p1, int32_t *p2, int length);
@@ -34,7 +33,6 @@ void ff_tak_decorrelate_sf_avx2(int32_t *p1, const int32_t *p2, int length, int
 
 av_cold void ff_takdsp_init_x86(TAKDSPContext *c)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_SSE2(cpu_flags)) {
@@ -53,5 +51,4 @@ av_cold void ff_takdsp_init_x86(TAKDSPContext *c)
         c->decorrelate_sm = ff_tak_decorrelate_sm_avx2;
         c->decorrelate_sf = ff_tak_decorrelate_sf_avx2;
     }
-#endif
 }
diff --git a/libavcodec/x86/ttadsp_init.c b/libavcodec/x86/ttadsp_init.c
index f2954e5687..b4d5184260 100644
--- a/libavcodec/x86/ttadsp_init.c
+++ b/libavcodec/x86/ttadsp_init.c
@@ -21,7 +21,6 @@
 #include "libavutil/attributes.h"
 #include "libavcodec/ttadsp.h"
 #include "libavutil/x86/cpu.h"
-#include "config.h"
 
 void ff_tta_filter_process_ssse3(int32_t *qm, int32_t *dx, int32_t *dl,
                                  int32_t *error, int32_t *in, int32_t shift,
@@ -32,12 +31,10 @@ void ff_tta_filter_process_sse4(int32_t *qm, int32_t *dx, int32_t *dl,
 
 av_cold void ff_ttadsp_init_x86(TTADSPContext *c)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_SSSE3(cpu_flags))
         c->filter_process = ff_tta_filter_process_ssse3;
     if (EXTERNAL_SSE4(cpu_flags))
         c->filter_process = ff_tta_filter_process_sse4;
-#endif
 }
diff --git a/libavcodec/x86/ttaencdsp_init.c b/libavcodec/x86/ttaencdsp_init.c
index b470142c50..cfe11f9678 100644
--- a/libavcodec/x86/ttaencdsp_init.c
+++ b/libavcodec/x86/ttaencdsp_init.c
@@ -21,7 +21,6 @@
 #include "libavutil/attributes.h"
 #include "libavcodec/ttaencdsp.h"
 #include "libavutil/x86/cpu.h"
-#include "config.h"
 
 void ff_ttaenc_filter_process_ssse3(int32_t *qm, int32_t *dx, int32_t *dl,
                                     int32_t *error, int32_t *in, int32_t shift,
@@ -32,12 +31,10 @@ void ff_ttaenc_filter_process_sse4(int32_t *qm, int32_t *dx, int32_t *dl,
 
 av_cold void ff_ttaencdsp_init_x86(TTAEncDSPContext *c)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_SSSE3(cpu_flags))
         c->filter_process = ff_ttaenc_filter_process_ssse3;
     if (EXTERNAL_SSE4(cpu_flags))
         c->filter_process = ff_ttaenc_filter_process_sse4;
-#endif
 }
diff --git a/libavcodec/x86/v210-init.c b/libavcodec/x86/v210-init.c
index 895cc8f329..7a879f1dc8 100644
--- a/libavcodec/x86/v210-init.c
+++ b/libavcodec/x86/v210-init.c
@@ -32,7 +32,6 @@ extern void ff_v210_planar_unpack_avx512icl(const uint32_t *src, uint16_t *y, ui
 
 av_cold void ff_v210_x86_init(V210DecContext *s)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (s->aligned_input) {
@@ -71,5 +70,4 @@ av_cold void ff_v210_x86_init(V210DecContext *s)
             s->unpack_frame = ff_v210_planar_unpack_avx512icl;
 #endif
     }
-#endif // HAVE_X86ASM
 }
diff --git a/libavcodec/x86/vc1dsp_init.c b/libavcodec/x86/vc1dsp_init.c
index 5cebc1f6f2..e344f233de 100644
--- a/libavcodec/x86/vc1dsp_init.c
+++ b/libavcodec/x86/vc1dsp_init.c
@@ -52,7 +52,6 @@ static void vc1_h_loop_filter16_ ## EXT(uint8_t *src, ptrdiff_t stride, int pq)
     ff_vc1_h_loop_filter8_ ## EXT(src+8*stride, stride, pq); \
 }
 
-#if HAVE_X86ASM
 LOOP_FILTER4(mmxext)
 LOOP_FILTER816(sse2)
 LOOP_FILTER4(ssse3)
@@ -78,8 +77,6 @@ DECLARE_FUNCTION(avg_,  8, _mmxext)
 DECLARE_FUNCTION(put_, 16, _sse2)
 DECLARE_FUNCTION(avg_, 16, _sse2)
 
-#endif /* HAVE_X86ASM */
-
 void ff_put_vc1_chroma_mc8_nornd_ssse3(uint8_t *dst, const uint8_t *src,
                                        ptrdiff_t stride, int h, int x, int y);
 void ff_avg_vc1_chroma_mc8_nornd_ssse3(uint8_t *dst, const uint8_t *src,
@@ -117,7 +114,6 @@ av_cold void ff_vc1dsp_init_x86(VC1DSPContext *dsp)
         dsp->vc1_v_loop_filter16 = vc1_v_loop_filter16_ ## EXT; \
         dsp->vc1_h_loop_filter16 = vc1_h_loop_filter16_ ## EXT
 
-#if HAVE_X86ASM
     if (EXTERNAL_MMXEXT(cpu_flags)) {
         ASSIGN_LF4(mmxext);
 
@@ -145,5 +141,4 @@ av_cold void ff_vc1dsp_init_x86(VC1DSPContext *dsp)
         dsp->vc1_h_loop_filter8  = ff_vc1_h_loop_filter8_sse4;
         dsp->vc1_h_loop_filter16 = vc1_h_loop_filter16_sse4;
     }
-#endif /* HAVE_X86ASM */
 }
diff --git a/libavcodec/x86/videodsp_init.c b/libavcodec/x86/videodsp_init.c
index 602856de1e..7f3c837227 100644
--- a/libavcodec/x86/videodsp_init.c
+++ b/libavcodec/x86/videodsp_init.c
@@ -28,7 +28,6 @@
 #include "libavutil/x86/cpu.h"
 #include "libavcodec/videodsp.h"
 
-#if HAVE_X86ASM
 typedef void emu_edge_vfix_func(uint8_t *dst, x86_reg dst_stride,
                                 const uint8_t *src, x86_reg src_stride,
                                 x86_reg start_y, x86_reg end_y, x86_reg bh);
@@ -213,13 +212,11 @@ static av_noinline void emulated_edge_mc_avx2(uint8_t *buf, const uint8_t *src,
                      hfixtbl_avx2, &ff_emu_edge_hvar_avx2);
 }
 #endif /* HAVE_AVX2_EXTERNAL */
-#endif /* HAVE_X86ASM */
 
 void ff_prefetch_mmxext(const uint8_t *buf, ptrdiff_t stride, int h);
 
 av_cold void ff_videodsp_init_x86(VideoDSPContext *ctx, int bpc)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_MMXEXT(cpu_flags)) {
@@ -233,5 +230,4 @@ av_cold void ff_videodsp_init_x86(VideoDSPContext *ctx, int bpc)
         ctx->emulated_edge_mc = emulated_edge_mc_avx2;
     }
 #endif
-#endif /* HAVE_X86ASM */
 }
diff --git a/libavcodec/x86/vp8dsp_init.c b/libavcodec/x86/vp8dsp_init.c
index bd20da1fc9..e37afab775 100644
--- a/libavcodec/x86/vp8dsp_init.c
+++ b/libavcodec/x86/vp8dsp_init.c
@@ -26,8 +26,6 @@
 #include "libavutil/x86/cpu.h"
 #include "libavcodec/vp8dsp.h"
 
-#if HAVE_X86ASM
-
 /*
  * MC functions
  */
@@ -254,8 +252,6 @@ DECLARE_LOOP_FILTER(sse2)
 DECLARE_LOOP_FILTER(ssse3)
 DECLARE_LOOP_FILTER(sse4)
 
-#endif /* HAVE_X86ASM */
-
 #define VP8_LUMA_MC_FUNC(IDX, SIZE, OPT) \
     c->put_vp8_epel_pixels_tab[IDX][0][2] = ff_put_vp8_epel ## SIZE ## _h6_ ## OPT; \
     c->put_vp8_epel_pixels_tab[IDX][2][0] = ff_put_vp8_epel ## SIZE ## _v6_ ## OPT; \
@@ -282,7 +278,6 @@ DECLARE_LOOP_FILTER(sse4)
 
 av_cold void ff_vp78dsp_init_x86(VP8DSPContext *c)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_MMX(cpu_flags)) {
@@ -317,12 +312,10 @@ av_cold void ff_vp78dsp_init_x86(VP8DSPContext *c)
         VP8_BILINEAR_MC_FUNC(1, 8, ssse3);
         VP8_BILINEAR_MC_FUNC(2, 4, ssse3);
     }
-#endif /* HAVE_X86ASM */
 }
 
 av_cold void ff_vp8dsp_init_x86(VP8DSPContext *c)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_MMX(cpu_flags)) {
@@ -379,5 +372,4 @@ av_cold void ff_vp8dsp_init_x86(VP8DSPContext *c)
         c->vp8_h_loop_filter16y       = ff_vp8_h_loop_filter16y_mbedge_sse4;
         c->vp8_h_loop_filter8uv       = ff_vp8_h_loop_filter8uv_mbedge_sse4;
     }
-#endif /* HAVE_X86ASM */
 }
diff --git a/libavcodec/x86/vp9dsp_init.c b/libavcodec/x86/vp9dsp_init.c
index 72edf6bb45..c103751351 100644
--- a/libavcodec/x86/vp9dsp_init.c
+++ b/libavcodec/x86/vp9dsp_init.c
@@ -26,8 +26,6 @@
 #include "libavcodec/vp9dsp.h"
 #include "libavcodec/x86/vp9dsp_init.h"
 
-#if HAVE_X86ASM
-
 decl_fpel_func(put,  4,   , mmx);
 decl_fpel_func(put,  8,   , mmx);
 decl_fpel_func(put, 16,   , sse);
@@ -215,11 +213,8 @@ ipred_func(32, v, avx2);
 #undef ipred_dir_tm_funcs
 #undef ipred_dc_funcs
 
-#endif /* HAVE_X86ASM */
-
 av_cold void ff_vp9dsp_init_x86(VP9DSPContext *dsp, int bpp, int bitexact)
 {
-#if HAVE_X86ASM
     int cpu_flags;
 
     if (bpp == 10) {
@@ -430,6 +425,4 @@ av_cold void ff_vp9dsp_init_x86(VP9DSPContext *dsp, int bpp, int bitexact)
 #undef init_subpel1
 #undef init_subpel2
 #undef init_subpel3
-
-#endif /* HAVE_X86ASM */
 }
diff --git a/libavcodec/x86/vp9dsp_init_16bpp.c b/libavcodec/x86/vp9dsp_init_16bpp.c
index e5afea1512..2d2f01ba5f 100644
--- a/libavcodec/x86/vp9dsp_init_16bpp.c
+++ b/libavcodec/x86/vp9dsp_init_16bpp.c
@@ -26,8 +26,6 @@
 #include "libavcodec/vp9dsp.h"
 #include "libavcodec/x86/vp9dsp_init.h"
 
-#if HAVE_X86ASM
-
 decl_fpel_func(put,   8,    , mmx);
 decl_fpel_func(avg,   8, _16, mmxext);
 decl_fpel_func(put,  16,    , sse);
@@ -68,11 +66,9 @@ decl_ipred_dir_funcs(vl);
 decl_ipred_dir_funcs(vr);
 decl_ipred_dir_funcs(hu);
 decl_ipred_dir_funcs(hd);
-#endif /* HAVE_X86ASM */
 
 av_cold void ff_vp9dsp_init_16bpp_x86(VP9DSPContext *dsp)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_MMX(cpu_flags)) {
@@ -147,6 +143,4 @@ av_cold void ff_vp9dsp_init_16bpp_x86(VP9DSPContext *dsp)
         init_ipred_func(dr, DIAG_DOWN_RIGHT, 32, 16, avx2);
 #endif
     }
-
-#endif /* HAVE_X86ASM */
 }
diff --git a/libavcodec/x86/vp9dsp_init_16bpp_template.c b/libavcodec/x86/vp9dsp_init_16bpp_template.c
index db775f7c1a..a6aa03bdc8 100644
--- a/libavcodec/x86/vp9dsp_init_16bpp_template.c
+++ b/libavcodec/x86/vp9dsp_init_16bpp_template.c
@@ -26,8 +26,6 @@
 #include "libavcodec/vp9dsp.h"
 #include "libavcodec/x86/vp9dsp_init.h"
 
-#if HAVE_X86ASM
-
 extern const int16_t ff_filters_16bpp[3][15][4][16];
 
 decl_mc_funcs(4, sse2, int16_t, 16, BPC);
@@ -138,11 +136,9 @@ decl_itxfm_func(iadst, iadst, 4, BPC, sse2);
 decl_itxfm_funcs(8, BPC, sse2);
 decl_itxfm_funcs(16, BPC, sse2);
 decl_itxfm_func(idct,  idct, 32, BPC, sse2);
-#endif /* HAVE_X86ASM */
 
 av_cold void INIT_FUNC(VP9DSPContext *dsp, int bitexact)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
 #define init_lpf_8_func(idx1, idx2, dir, wd, bpp, opt) \
@@ -241,7 +237,6 @@ av_cold void INIT_FUNC(VP9DSPContext *dsp, int bitexact)
         init_itx_func_one(TX_32X32, idct, idct, 32, BPC, avx512icl);
     }
 #endif
-#endif /* HAVE_X86ASM */
 
     ff_vp9dsp_init_16bpp_x86(dsp);
 }
diff --git a/libavcodec/x86/vvc/Makefile b/libavcodec/x86/vvc/Makefile
index c426b156c1..0cebfb4e9e 100644
--- a/libavcodec/x86/vvc/Makefile
+++ b/libavcodec/x86/vvc/Makefile
@@ -1,13 +1,13 @@
 clean::
 	$(RM) $(CLEANSUFFIXES:%=libavcodec/x86/vvc/%) $(CLEANSUFFIXES:%=libavcodec/x86/h26x/%)
 
-OBJS-$(CONFIG_VVC_DECODER)             += x86/vvc/dsp_init.o        \
-                                          x86/h26x/h2656dsp.o
-X86ASM-OBJS-$(CONFIG_VVC_DECODER)      += x86/vvc/alf.o             \
+X86ASM-OBJS-$(CONFIG_VVC_DECODER)      += x86/vvc/dsp_init.o        \
+                                          x86/vvc/alf.o             \
                                           x86/vvc/dmvr.o            \
                                           x86/vvc/mc.o              \
                                           x86/vvc/of.o              \
                                           x86/vvc/sad.o             \
                                           x86/vvc/sao.o             \
                                           x86/vvc/sao_10bit.o       \
+                                          x86/h26x/h2656dsp.o       \
                                           x86/h26x/h2656_inter.o
diff --git a/libavcodec/x86/xvididct_init.c b/libavcodec/x86/xvididct_init.c
index c6c87b0c90..81575e93eb 100644
--- a/libavcodec/x86/xvididct_init.c
+++ b/libavcodec/x86/xvididct_init.c
@@ -16,7 +16,6 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
-#include "config.h"
 #include "libavutil/attributes.h"
 #include "libavutil/cpu.h"
 #include "libavutil/x86/cpu.h"
@@ -27,7 +26,6 @@
 
 av_cold void ff_xvid_idct_init_x86(IDCTDSPContext *c)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_SSE2(cpu_flags)) {
@@ -36,5 +34,4 @@ av_cold void ff_xvid_idct_init_x86(IDCTDSPContext *c)
         c->idct      = ff_xvid_idct_sse2;
         c->perm_type = FF_IDCT_PERM_SSE2;
     }
-#endif /* HAVE_X86ASM */
 }
diff --git a/libavcodec/xvididct.c b/libavcodec/xvididct.c
index 317e4e82cd..9d971a21d9 100644
--- a/libavcodec/xvididct.c
+++ b/libavcodec/xvididct.c
@@ -336,7 +336,7 @@ av_cold void ff_xvid_idct_init(IDCTDSPContext *c)
     c->idct      = ff_xvid_idct;
     c->perm_type = FF_IDCT_PERM_NONE;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_xvid_idct_init_x86(c);
 #elif ARCH_MIPS
     ff_xvid_idct_init_mips(c);
-- 
2.52.0


From 51f12befaf0622dbf79eb118403e4af35b8e7528 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Thu, 27 Nov 2025 19:50:15 +0100
Subject: [PATCH 155/304] {lib{avcodec,swscale}/x86/,}Makefile: Kill MMX-OBJS
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 Makefile                                   | 2 +-
 ffbuild/arch.mak                           | 1 -
 libavcodec/x86/Makefile                    | 7 ++-----
 libswscale/x86/Makefile                    | 2 +-
 libswscale/x86/hscale_fast_bilinear_simd.c | 2 --
 5 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/Makefile b/Makefile
index 2f78db02a5..4fc0aebd33 100644
--- a/Makefile
+++ b/Makefile
@@ -107,7 +107,7 @@ ffbuild/.config: $(CONFIGURABLE_COMPONENTS)
 SUBDIR_VARS := CLEANFILES FFLIBS HOSTPROGS TESTPROGS TOOLS               \
                HEADERS ARCH_HEADERS BUILT_HEADERS SKIPHEADERS            \
                ARMV5TE-OBJS ARMV6-OBJS ARMV8-OBJS VFP-OBJS NEON-OBJS     \
-               ALTIVEC-OBJS VSX-OBJS MMX-OBJS X86ASM-OBJS                \
+               ALTIVEC-OBJS VSX-OBJS X86ASM-OBJS                         \
                MIPSFPU-OBJS MIPSDSPR2-OBJS MIPSDSP-OBJS MSA-OBJS         \
                MMI-OBJS LSX-OBJS LASX-OBJS RV-OBJS RVV-OBJS RVVB-OBJS    \
                OBJS SHLIBOBJS STLIBOBJS HOSTOBJS TESTOBJS SIMD128-OBJS
diff --git a/ffbuild/arch.mak b/ffbuild/arch.mak
index 197e30bb89..ec79ae7866 100644
--- a/ffbuild/arch.mak
+++ b/ffbuild/arch.mak
@@ -23,5 +23,4 @@ OBJS-$(HAVE_RV_ZVBB) += $(RVVB-OBJS)    $(RVVB-OBJS-yes)
 
 OBJS-$(HAVE_SIMD128) += $(SIMD128-OBJS) $(SIMD128-OBJS-yes)
 
-OBJS-$(HAVE_MMX)     += $(MMX-OBJS)     $(MMX-OBJS-yes)
 OBJS-$(HAVE_X86ASM)  += $(X86ASM-OBJS)  $(X86ASM-OBJS-yes)
diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile
index 63b16f24c8..bf723ed1a6 100644
--- a/libavcodec/x86/Makefile
+++ b/libavcodec/x86/Makefile
@@ -65,6 +65,8 @@ X86ASM-OBJS-$(CONFIG_PRORES_DECODER)   += x86/proresdsp_init.o
 X86ASM-OBJS-$(CONFIG_PRORES_RAW_DECODER) += x86/proresdsp_init.o
 X86ASM-OBJS-$(CONFIG_RV40_DECODER)     += x86/rv40dsp_init.o
 X86ASM-OBJS-$(CONFIG_SBC_ENCODER)      += x86/sbcdsp_init.o
+OBJS-$(CONFIG_SNOW_DECODER)            += x86/snowdsp.o
+OBJS-$(CONFIG_SNOW_ENCODER)            += x86/snowdsp.o
 X86ASM-OBJS-$(CONFIG_SVQ1_ENCODER)     += x86/svq1enc_init.o
 X86ASM-OBJS-$(CONFIG_TAK_DECODER)      += x86/takdsp_init.o
 OBJS-$(CONFIG_TRUEHD_DECODER)          += x86/mlpdsp_init.o
@@ -81,11 +83,6 @@ X86ASM-OBJS-$(CONFIG_VP9_DECODER)      += x86/vp9dsp_init.o            \
                                           x86/vp9dsp_init_16bpp.o
 
 
-# GCC inline assembly optimizations
-# decoders/encoders
-MMX-OBJS-$(CONFIG_SNOW_DECODER)        += x86/snowdsp.o
-MMX-OBJS-$(CONFIG_SNOW_ENCODER)        += x86/snowdsp.o
-
 # subsystems
 X86ASM-OBJS-$(CONFIG_AC3DSP)           += x86/ac3dsp.o                  \
                                           x86/ac3dsp_downmix.o
diff --git a/libswscale/x86/Makefile b/libswscale/x86/Makefile
index f82b411fb1..9c9d286600 100644
--- a/libswscale/x86/Makefile
+++ b/libswscale/x86/Makefile
@@ -4,7 +4,7 @@ OBJS                            += x86/rgb2rgb.o                        \
                                    x86/swscale.o                        \
                                    x86/yuv2rgb.o                        \
 
-MMX-OBJS                        += x86/hscale_fast_bilinear_simd.o      \
+OBJS-$(HAVE_MMXEXT_INLINE)      += x86/hscale_fast_bilinear_simd.o      \
 
 OBJS-$(CONFIG_XMM_CLOBBER_TEST) += x86/w64xmmtest.o
 
diff --git a/libswscale/x86/hscale_fast_bilinear_simd.c b/libswscale/x86/hscale_fast_bilinear_simd.c
index 47ca020004..d8a4e444b4 100644
--- a/libswscale/x86/hscale_fast_bilinear_simd.c
+++ b/libswscale/x86/hscale_fast_bilinear_simd.c
@@ -27,7 +27,6 @@
 #define RET 0xC3 // near return opcode for x86
 #define PREFETCH "prefetchnta"
 
-#if HAVE_INLINE_ASM
 av_cold int ff_init_hscaler_mmxext(int dstW, int xInc, uint8_t *filterCode,
                                        int16_t *filter, int32_t *filterPos,
                                        int numSplits)
@@ -358,4 +357,3 @@ void ff_hcscale_fast_mmxext(SwsInternal *c, int16_t *dst1, int16_t *dst2,
         dst2[i] = src2[srcW-1]*128;
     }
 }
-#endif //HAVE_INLINE_ASM
-- 
2.52.0


From c2e654ee98d0bdb8b8c6b2ac509f96c1fc4b8517 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Thu, 27 Nov 2025 20:51:13 +0100
Subject: [PATCH 156/304] avfilter/x86/Makefile: Only compile ASM init files
 when X86ASM is enabled
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

To do so, simply add these init files to X86ASM-OBJS instead of OBJS
in the Makefile. The former is already used for the actual assembly
files, but using them for the C init files just works, because the build
system uses file extensions to derive whether it is a C or a NASM file.

This avoids compiling unused function stubs and also reduces our
reliance on DCE: We don't add %if checks to the asm files except
for AVX, AVX2, FMA3, FMA4, XOP and AVX512, so all the MMX-SSE4
functions will be available. It also allows to remove HAVE_X86ASM checks
in these init files.

Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavfilter/af_afirdsp.h              |   2 +-
 libavfilter/af_anlmdn.c               |   2 +-
 libavfilter/af_volume.c               |   2 +-
 libavfilter/avf_showcqt.c             |   2 +-
 libavfilter/bwdifdsp.c                |   2 +-
 libavfilter/colorspacedsp.c           |   2 +-
 libavfilter/convolution.h             |   2 +-
 libavfilter/f_ebur128.c               |   2 +-
 libavfilter/psnr.c                    |   2 +-
 libavfilter/scene_sad.c               |   2 +-
 libavfilter/vf_atadenoise.c           |   2 +-
 libavfilter/vf_blackdetect.h          |   2 +-
 libavfilter/vf_blend_init.h           |   2 +-
 libavfilter/vf_colordetectdsp.h       |   2 +-
 libavfilter/vf_convolution.c          |   2 +-
 libavfilter/vf_eq.h                   |   2 +-
 libavfilter/vf_framerate.c            |   2 +-
 libavfilter/vf_fsppdsp.h              |   2 +-
 libavfilter/vf_gblur_init.h           |   2 +-
 libavfilter/vf_gradfun.c              |   2 +-
 libavfilter/vf_hflip_init.h           |   2 +-
 libavfilter/vf_hqdn3d.c               |   2 +-
 libavfilter/vf_idetdsp.c              |   2 +-
 libavfilter/vf_limiter.c              |   2 +-
 libavfilter/vf_lut3d.c                |   2 +-
 libavfilter/vf_maskedclamp.c          |   2 +-
 libavfilter/vf_maskedmerge.c          |   2 +-
 libavfilter/vf_nlmeans_init.h         |   2 +-
 libavfilter/vf_overlay.c              |   2 +-
 libavfilter/vf_pp7.c                  |   2 +-
 libavfilter/vf_pullup.c               |   2 +-
 libavfilter/vf_removegrain.c          |   2 +-
 libavfilter/vf_ssim.c                 |   2 +-
 libavfilter/vf_stereo3d.c             |   2 +-
 libavfilter/vf_threshold_init.h       |   2 +-
 libavfilter/vf_tinterlace.c           |   4 +-
 libavfilter/vf_transpose.c            |   2 +-
 libavfilter/vf_v360.c                 |   2 +-
 libavfilter/vf_w3fdif.c               |   2 +-
 libavfilter/vf_yadif.c                |   2 +-
 libavfilter/x86/Makefile              | 155 +++++++++++---------------
 libavfilter/x86/scene_sad_init.c      |   4 -
 libavfilter/x86/vf_colordetect_init.c |   4 -
 libavfilter/x86/vf_eq_init.c          |   4 -
 libavfilter/x86/vf_gradfun_init.c     |   5 -
 libavfilter/x86/vf_hqdn3d_init.c      |   3 -
 libavfilter/x86/vf_idetdsp_init.c     |   6 -
 libavfilter/x86/vf_pullup_init.c      |   3 -
 48 files changed, 107 insertions(+), 159 deletions(-)

diff --git a/libavfilter/af_afirdsp.h b/libavfilter/af_afirdsp.h
index ac68447323..4ca859dfd2 100644
--- a/libavfilter/af_afirdsp.h
+++ b/libavfilter/af_afirdsp.h
@@ -77,7 +77,7 @@ av_unused static void ff_afir_init(AudioFIRDSPContext *dsp)
 
 #if ARCH_RISCV
     ff_afir_init_riscv(dsp);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_afir_init_x86(dsp);
 #endif
 }
diff --git a/libavfilter/af_anlmdn.c b/libavfilter/af_anlmdn.c
index b1944fb2d7..ce91eb1cfd 100644
--- a/libavfilter/af_anlmdn.c
+++ b/libavfilter/af_anlmdn.c
@@ -116,7 +116,7 @@ void ff_anlmdn_init(AudioNLMDNDSPContext *dsp)
     dsp->compute_distance_ssd = compute_distance_ssd_c;
     dsp->compute_cache        = compute_cache_c;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_anlmdn_init_x86(dsp);
 #endif
 }
diff --git a/libavfilter/af_volume.c b/libavfilter/af_volume.c
index 471bffeceb..603b92f4af 100644
--- a/libavfilter/af_volume.c
+++ b/libavfilter/af_volume.c
@@ -236,7 +236,7 @@ static av_cold void volume_init(VolumeContext *vol)
         break;
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_volume_init_x86(vol);
 #endif
 }
diff --git a/libavfilter/avf_showcqt.c b/libavfilter/avf_showcqt.c
index abfae1f8fb..428e8bff65 100644
--- a/libavfilter/avf_showcqt.c
+++ b/libavfilter/avf_showcqt.c
@@ -1415,7 +1415,7 @@ static int config_output(AVFilterLink *outlink)
         s->update_sono = update_sono_yuv;
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_showcqt_init_x86(s);
 #endif
 
diff --git a/libavfilter/bwdifdsp.c b/libavfilter/bwdifdsp.c
index e87fe414e0..58e18f0a92 100644
--- a/libavfilter/bwdifdsp.c
+++ b/libavfilter/bwdifdsp.c
@@ -218,7 +218,7 @@ av_cold void ff_bwdif_init_filter_line(BWDIFDSPContext *s, int bit_depth)
         s->filter_edge  = ff_bwdif_filter_edge_c;
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_bwdif_init_x86(s, bit_depth);
 #elif ARCH_AARCH64
     ff_bwdif_init_aarch64(s, bit_depth);
diff --git a/libavfilter/colorspacedsp.c b/libavfilter/colorspacedsp.c
index 72207ffaf3..9c51385ded 100644
--- a/libavfilter/colorspacedsp.c
+++ b/libavfilter/colorspacedsp.c
@@ -143,7 +143,7 @@ void ff_colorspacedsp_init(ColorSpaceDSPContext *dsp)
 
     dsp->multiply3x3 = multiply3x3_c;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_colorspacedsp_x86_init(dsp);
 #endif
 }
diff --git a/libavfilter/convolution.h b/libavfilter/convolution.h
index 1196c1fcdf..f88b708fab 100644
--- a/libavfilter/convolution.h
+++ b/libavfilter/convolution.h
@@ -132,7 +132,7 @@ static inline void ff_sobel_init(ConvolutionContext *s, int depth, int nb_planes
     if (s->depth > 8)
         for (int i = 0; i < 4; i++)
             s->filter[i] = filter16_sobel;
-#if ARCH_X86_64
+#if ARCH_X86_64 && HAVE_X86ASM
     ff_sobel_init_x86(s, depth, nb_planes);
 #endif
 }
diff --git a/libavfilter/f_ebur128.c b/libavfilter/f_ebur128.c
index 84d8e44035..fa6f1375a3 100644
--- a/libavfilter/f_ebur128.c
+++ b/libavfilter/f_ebur128.c
@@ -502,7 +502,7 @@ static int config_audio_output(AVFilterLink *outlink)
             return AVERROR(ENOMEM);
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_ebur128_init_x86(&ebur128->dsp, nb_channels);
 #endif
     return 0;
diff --git a/libavfilter/psnr.c b/libavfilter/psnr.c
index a6b7f5969c..89a205f5cf 100644
--- a/libavfilter/psnr.c
+++ b/libavfilter/psnr.c
@@ -58,7 +58,7 @@ static uint64_t sse_line_16bit(const uint8_t *_main_line, const uint8_t *_ref_li
 void ff_psnr_init(PSNRDSPContext *dsp, int bpp)
 {
     dsp->sse_line = bpp > 8 ? sse_line_16bit : sse_line_8bit;
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_psnr_init_x86(dsp, bpp);
 #endif
 }
diff --git a/libavfilter/scene_sad.c b/libavfilter/scene_sad.c
index 05dd97e055..56177ced76 100644
--- a/libavfilter/scene_sad.c
+++ b/libavfilter/scene_sad.c
@@ -59,7 +59,7 @@ void ff_scene_sad_c(SCENE_SAD_PARAMS)
 ff_scene_sad_fn ff_scene_sad_get_fn(int depth)
 {
     ff_scene_sad_fn sad = NULL;
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     sad = ff_scene_sad_get_fn_x86(depth);
 #endif
     if (!sad) {
diff --git a/libavfilter/vf_atadenoise.c b/libavfilter/vf_atadenoise.c
index cdebdb7f14..f443d34de8 100644
--- a/libavfilter/vf_atadenoise.c
+++ b/libavfilter/vf_atadenoise.c
@@ -426,7 +426,7 @@ static int config_input(AVFilterLink *inlink)
         }
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_atadenoise_init_x86(&s->dsp, depth, s->algorithm, s->sigma);
 #endif
 
diff --git a/libavfilter/vf_blackdetect.h b/libavfilter/vf_blackdetect.h
index 48350568a9..e51beda3a4 100644
--- a/libavfilter/vf_blackdetect.h
+++ b/libavfilter/vf_blackdetect.h
@@ -61,7 +61,7 @@ static unsigned count_pixels16_c(const uint8_t *src, ptrdiff_t stride,
 static inline ff_blackdetect_fn ff_blackdetect_get_fn(int depth)
 {
     ff_blackdetect_fn fn = NULL;
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     fn = ff_blackdetect_get_fn_x86(depth);
 #endif
 
diff --git a/libavfilter/vf_blend_init.h b/libavfilter/vf_blend_init.h
index 297ca0514f..4ee0b94ba5 100644
--- a/libavfilter/vf_blend_init.h
+++ b/libavfilter/vf_blend_init.h
@@ -190,7 +190,7 @@ av_unused static void ff_blend_init(FilterParams *param, int depth)
             param->blend = depth > 8 ? depth > 16 ? blend_copybottom_32 : blend_copybottom_16 : blend_copybottom_8;
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_blend_init_x86(param, depth);
 #endif
 }
diff --git a/libavfilter/vf_colordetectdsp.h b/libavfilter/vf_colordetectdsp.h
index 7a57e7aa73..ca4727b589 100644
--- a/libavfilter/vf_colordetectdsp.h
+++ b/libavfilter/vf_colordetectdsp.h
@@ -208,7 +208,7 @@ ff_color_detect_dsp_init(FFColorDetectDSPContext *dsp, int depth,
 
 #if ARCH_AARCH64
     ff_color_detect_dsp_init_aarch64(dsp, depth, color_range);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_color_detect_dsp_init_x86(dsp, depth, color_range);
 #endif
 }
diff --git a/libavfilter/vf_convolution.c b/libavfilter/vf_convolution.c
index 1f4efea9aa..ce42df2cde 100644
--- a/libavfilter/vf_convolution.c
+++ b/libavfilter/vf_convolution.c
@@ -800,7 +800,7 @@ static int param_init(AVFilterContext *ctx)
                     s->filter[p] = filter16_7x7;
             }
         }
-#if CONFIG_CONVOLUTION_FILTER && ARCH_X86_64
+#if CONFIG_CONVOLUTION_FILTER && ARCH_X86_64 && HAVE_X86ASM
         ff_convolution_init_x86(s);
 #endif
     } else if (!strcmp(ctx->filter->name, "prewitt")) {
diff --git a/libavfilter/vf_eq.h b/libavfilter/vf_eq.h
index 156f6c61fe..ab5e98e83f 100644
--- a/libavfilter/vf_eq.h
+++ b/libavfilter/vf_eq.h
@@ -121,7 +121,7 @@ void ff_eq_init_x86(EQContext *eq);
 av_unused static void ff_eq_init(EQContext *eq)
 {
     eq->process = process_c;
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_eq_init_x86(eq);
 #endif
 }
diff --git a/libavfilter/vf_framerate.c b/libavfilter/vf_framerate.c
index a6598f97bb..468042e7db 100644
--- a/libavfilter/vf_framerate.c
+++ b/libavfilter/vf_framerate.c
@@ -262,7 +262,7 @@ void ff_framerate_init(FrameRateContext *s)
         s->blend_factor_max = 1 << BLEND_FACTOR_DEPTH(16);
         s->blend = blend_frames16_c;
     }
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_framerate_init_x86(s);
 #endif
 }
diff --git a/libavfilter/vf_fsppdsp.h b/libavfilter/vf_fsppdsp.h
index 5a2f1af030..7ba8485b4b 100644
--- a/libavfilter/vf_fsppdsp.h
+++ b/libavfilter/vf_fsppdsp.h
@@ -81,7 +81,7 @@ static inline void ff_fsppdsp_init(FSPPDSPContext *fspp)
     fspp->row_idct     = ff_row_idct_c;
     fspp->row_fdct     = ff_row_fdct_c;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_fsppdsp_init_x86(fspp);
 #endif
 }
diff --git a/libavfilter/vf_gblur_init.h b/libavfilter/vf_gblur_init.h
index 67ca46f95e..1289239778 100644
--- a/libavfilter/vf_gblur_init.h
+++ b/libavfilter/vf_gblur_init.h
@@ -115,7 +115,7 @@ av_unused static void ff_gblur_init(GBlurContext *s)
     s->horiz_slice = horiz_slice_c;
     s->verti_slice = verti_slice_c;
     s->postscale_slice = postscale_c;
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_gblur_init_x86(s);
 #endif
 }
diff --git a/libavfilter/vf_gradfun.c b/libavfilter/vf_gradfun.c
index 4f211c3ddf..c33650576a 100644
--- a/libavfilter/vf_gradfun.c
+++ b/libavfilter/vf_gradfun.c
@@ -130,7 +130,7 @@ static av_cold int init(AVFilterContext *ctx)
     s->blur_line   = ff_gradfun_blur_line_c;
     s->filter_line = ff_gradfun_filter_line_c;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_gradfun_init_x86(s);
 #endif
 
diff --git a/libavfilter/vf_hflip_init.h b/libavfilter/vf_hflip_init.h
index 0f5b0607c2..97cde15b2f 100644
--- a/libavfilter/vf_hflip_init.h
+++ b/libavfilter/vf_hflip_init.h
@@ -102,7 +102,7 @@ av_unused static int ff_hflip_init(FlipContext *s, int step[4], int nb_planes)
             return AVERROR_BUG;
         }
     }
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_hflip_init_x86(s, step, nb_planes);
 #endif
 
diff --git a/libavfilter/vf_hqdn3d.c b/libavfilter/vf_hqdn3d.c
index 1136931b9b..f79ef04cd3 100644
--- a/libavfilter/vf_hqdn3d.c
+++ b/libavfilter/vf_hqdn3d.c
@@ -279,7 +279,7 @@ static int config_input(AVFilterLink *inlink)
 
     calc_coefs(ctx);
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_hqdn3d_init_x86(s);
 #endif
 
diff --git a/libavfilter/vf_idetdsp.c b/libavfilter/vf_idetdsp.c
index 68a2a267d4..d7c271df20 100644
--- a/libavfilter/vf_idetdsp.c
+++ b/libavfilter/vf_idetdsp.c
@@ -56,7 +56,7 @@ int ff_idet_filter_line_c_16bit(const uint8_t *a, const uint8_t *b, const uint8_
 void av_cold ff_idet_dsp_init(IDETDSPContext *dsp, int depth)
 {
     dsp->filter_line = depth > 8 ? ff_idet_filter_line_c_16bit : ff_idet_filter_line_c;
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_idet_dsp_init_x86(dsp, depth);
 #endif
 }
diff --git a/libavfilter/vf_limiter.c b/libavfilter/vf_limiter.c
index 61f6c9e1bf..4f50ace386 100644
--- a/libavfilter/vf_limiter.c
+++ b/libavfilter/vf_limiter.c
@@ -140,7 +140,7 @@ static int config_input(AVFilterLink *inlink)
         s->dsp.limiter = limiter16;
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_limiter_init_x86(&s->dsp, desc->comp[0].depth);
 #endif
 
diff --git a/libavfilter/vf_lut3d.c b/libavfilter/vf_lut3d.c
index 46afe36f6c..4ed609d810 100644
--- a/libavfilter/vf_lut3d.c
+++ b/libavfilter/vf_lut3d.c
@@ -1152,7 +1152,7 @@ static int config_input(AVFilterLink *inlink)
         av_assert0(0);
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_lut3d_init_x86(lut3d, desc);
 #endif
 
diff --git a/libavfilter/vf_maskedclamp.c b/libavfilter/vf_maskedclamp.c
index 39bd596827..606afd3913 100644
--- a/libavfilter/vf_maskedclamp.c
+++ b/libavfilter/vf_maskedclamp.c
@@ -208,7 +208,7 @@ static int config_input(AVFilterLink *inlink)
     else
         s->dsp.maskedclamp = maskedclamp16;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_maskedclamp_init_x86(&s->dsp, s->depth);
 #endif
 
diff --git a/libavfilter/vf_maskedmerge.c b/libavfilter/vf_maskedmerge.c
index 93da0a0edf..18b960034e 100644
--- a/libavfilter/vf_maskedmerge.c
+++ b/libavfilter/vf_maskedmerge.c
@@ -201,7 +201,7 @@ static int config_input(AVFilterLink *inlink)
     else
         s->maskedmerge = maskedmerge32;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_maskedmerge_init_x86(s);
 #endif
 
diff --git a/libavfilter/vf_nlmeans_init.h b/libavfilter/vf_nlmeans_init.h
index 3a533a078a..cf31e74bd7 100644
--- a/libavfilter/vf_nlmeans_init.h
+++ b/libavfilter/vf_nlmeans_init.h
@@ -131,7 +131,7 @@ av_unused static void ff_nlmeans_init(NLMeansDSPContext *dsp)
 
 #if ARCH_AARCH64
     ff_nlmeans_init_aarch64(dsp);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_nlmeans_init_x86(dsp);
 #endif
 }
diff --git a/libavfilter/vf_overlay.c b/libavfilter/vf_overlay.c
index 673c92d6af..9149d061b9 100644
--- a/libavfilter/vf_overlay.c
+++ b/libavfilter/vf_overlay.c
@@ -835,7 +835,7 @@ static int init_slice_fn(AVFilterContext *ctx)
         break;
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_overlay_init_x86(ctx);
 #endif
 
diff --git a/libavfilter/vf_pp7.c b/libavfilter/vf_pp7.c
index e7aae87df0..7b653b977f 100644
--- a/libavfilter/vf_pp7.c
+++ b/libavfilter/vf_pp7.c
@@ -305,7 +305,7 @@ static int config_input(AVFilterLink *inlink)
 
     pp7->dctB = dctB_c;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_pp7_init_x86(pp7);
 #endif
 
diff --git a/libavfilter/vf_pullup.c b/libavfilter/vf_pullup.c
index d963840fe9..04acbadc4e 100644
--- a/libavfilter/vf_pullup.c
+++ b/libavfilter/vf_pullup.c
@@ -207,7 +207,7 @@ static int config_input(AVFilterLink *inlink)
     s->comb = comb_c;
     s->var  = var_c;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_pullup_init_x86(s);
 #endif
     return 0;
diff --git a/libavfilter/vf_removegrain.c b/libavfilter/vf_removegrain.c
index 3209c7db86..0a0c60fb20 100644
--- a/libavfilter/vf_removegrain.c
+++ b/libavfilter/vf_removegrain.c
@@ -500,7 +500,7 @@ static int config_input(AVFilterLink *inlink)
         }
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM && CONFIG_GPL
     ff_removegrain_init_x86(s);
 #endif
 
diff --git a/libavfilter/vf_ssim.c b/libavfilter/vf_ssim.c
index 15c71cf6b9..dfb91e8da5 100644
--- a/libavfilter/vf_ssim.c
+++ b/libavfilter/vf_ssim.c
@@ -481,7 +481,7 @@ static int config_input_ref(AVFilterLink *inlink)
     s->ssim_plane = desc->comp[0].depth > 8 ? ssim_plane_16bit : ssim_plane;
     s->dsp.ssim_4x4_line = ssim_4x4xn_8bit;
     s->dsp.ssim_end_line = ssim_endn_8bit;
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_ssim_init_x86(&s->dsp);
 #endif
 
diff --git a/libavfilter/vf_stereo3d.c b/libavfilter/vf_stereo3d.c
index 8e1fdc5eb3..d5d4cc1702 100644
--- a/libavfilter/vf_stereo3d.c
+++ b/libavfilter/vf_stereo3d.c
@@ -595,7 +595,7 @@ static int config_output(AVFilterLink *outlink)
     s->vsub = desc->log2_chroma_h;
 
     s->dsp.anaglyph = anaglyph;
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_stereo3d_init_x86(&s->dsp);
 #endif
 
diff --git a/libavfilter/vf_threshold_init.h b/libavfilter/vf_threshold_init.h
index 3add7b495e..fb319c6cf8 100644
--- a/libavfilter/vf_threshold_init.h
+++ b/libavfilter/vf_threshold_init.h
@@ -84,7 +84,7 @@ av_unused static void ff_threshold_init(ThresholdContext *s)
         s->bpc = 2;
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_threshold_init_x86(s);
 #endif
 }
diff --git a/libavfilter/vf_tinterlace.c b/libavfilter/vf_tinterlace.c
index 7d5217c364..59fe82abe3 100644
--- a/libavfilter/vf_tinterlace.c
+++ b/libavfilter/vf_tinterlace.c
@@ -286,7 +286,7 @@ static int config_out_props(AVFilterLink *outlink)
             tinterlace->lowpass_line = lowpass_line_complex_c_16;
         else
             tinterlace->lowpass_line = lowpass_line_complex_c;
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
         ff_tinterlace_init_x86(tinterlace);
 #endif
     } else if (tinterlace->flags & TINTERLACE_FLAG_VLPF) {
@@ -294,7 +294,7 @@ static int config_out_props(AVFilterLink *outlink)
             tinterlace->lowpass_line = lowpass_line_c_16;
         else
             tinterlace->lowpass_line = lowpass_line_c;
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
         ff_tinterlace_init_x86(tinterlace);
 #endif
     }
diff --git a/libavfilter/vf_transpose.c b/libavfilter/vf_transpose.c
index 88cc008f69..c457b01a4e 100644
--- a/libavfilter/vf_transpose.c
+++ b/libavfilter/vf_transpose.c
@@ -237,7 +237,7 @@ static int config_props_output(AVFilterLink *outlink)
         }
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     for (int i = 0; i < 4; i++) {
         TransVtable *v = &s->vtables[i];
 
diff --git a/libavfilter/vf_v360.c b/libavfilter/vf_v360.c
index 63412aef87..1cfe408887 100644
--- a/libavfilter/vf_v360.c
+++ b/libavfilter/vf_v360.c
@@ -394,7 +394,7 @@ void ff_v360_init(V360Context *s, int depth)
         break;
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_v360_init_x86(s, depth);
 #endif
 }
diff --git a/libavfilter/vf_w3fdif.c b/libavfilter/vf_w3fdif.c
index 5b4c83bbd9..afde426f1a 100644
--- a/libavfilter/vf_w3fdif.c
+++ b/libavfilter/vf_w3fdif.c
@@ -317,7 +317,7 @@ static int config_input(AVFilterLink *inlink)
         s->dsp.filter_scale        = filter16_scale;
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_w3fdif_init_x86(&s->dsp, depth);
 #endif
 
diff --git a/libavfilter/vf_yadif.c b/libavfilter/vf_yadif.c
index 6e0e500886..9e4f8d5b43 100644
--- a/libavfilter/vf_yadif.c
+++ b/libavfilter/vf_yadif.c
@@ -289,7 +289,7 @@ static int config_output(AVFilterLink *outlink)
         s->filter_edges = filter_edges;
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_yadif_init_x86(s);
 #endif
 
diff --git a/libavfilter/x86/Makefile b/libavfilter/x86/Makefile
index b485c10fbe..ade0efc9ae 100644
--- a/libavfilter/x86/Makefile
+++ b/libavfilter/x86/Makefile
@@ -1,95 +1,72 @@
-OBJS-$(CONFIG_SCENE_SAD)                     += x86/scene_sad_init.o
-
-OBJS-$(CONFIG_AFIR_FILTER)                   += x86/af_afir_init.o
-OBJS-$(CONFIG_ANLMDN_FILTER)                 += x86/af_anlmdn_init.o
-OBJS-$(CONFIG_ATADENOISE_FILTER)             += x86/vf_atadenoise_init.o
-OBJS-$(CONFIG_BLACKDETECT_FILTER)            += x86/vf_blackdetect_init.o
-OBJS-$(CONFIG_BLEND_FILTER)                  += x86/vf_blend_init.o
-OBJS-$(CONFIG_BWDIF_FILTER)                  += x86/vf_bwdif_init.o
-OBJS-$(CONFIG_COLORDETECT_FILTER)            += x86/vf_colordetect_init.o
-OBJS-$(CONFIG_COLORSPACE_FILTER)             += x86/colorspacedsp_init.o
-OBJS-$(CONFIG_CONVOLUTION_FILTER)            += x86/vf_convolution_init.o
-OBJS-$(CONFIG_EBUR128_FILTER)                += x86/f_ebur128_init.o
-OBJS-$(CONFIG_EQ_FILTER)                     += x86/vf_eq_init.o
-OBJS-$(CONFIG_FSPP_FILTER)                   += x86/vf_fspp_init.o
-OBJS-$(CONFIG_GBLUR_FILTER)                  += x86/vf_gblur_init.o
-OBJS-$(CONFIG_GRADFUN_FILTER)                += x86/vf_gradfun_init.o
-OBJS-$(CONFIG_FRAMERATE_FILTER)              += x86/vf_framerate_init.o
-OBJS-$(CONFIG_HALDCLUT_FILTER)               += x86/vf_lut3d_init.o
-OBJS-$(CONFIG_HFLIP_FILTER)                  += x86/vf_hflip_init.o
-OBJS-$(CONFIG_HQDN3D_FILTER)                 += x86/vf_hqdn3d_init.o
-OBJS-$(CONFIG_IDET_FILTER)                   += x86/vf_idetdsp_init.o
-OBJS-$(CONFIG_INTERLACE_FILTER)              += x86/vf_tinterlace_init.o
-OBJS-$(CONFIG_LIMITER_FILTER)                += x86/vf_limiter_init.o
-OBJS-$(CONFIG_LUT3D_FILTER)                  += x86/vf_lut3d_init.o
-OBJS-$(CONFIG_MASKEDCLAMP_FILTER)            += x86/vf_maskedclamp_init.o
-OBJS-$(CONFIG_MASKEDMERGE_FILTER)            += x86/vf_maskedmerge_init.o
-OBJS-$(CONFIG_NLMEANS_FILTER)                += x86/vf_nlmeans_init.o
 OBJS-$(CONFIG_NOISE_FILTER)                  += x86/vf_noise.o
-OBJS-$(CONFIG_OVERLAY_FILTER)                += x86/vf_overlay_init.o
-OBJS-$(CONFIG_PP7_FILTER)                    += x86/vf_pp7_init.o
-OBJS-$(CONFIG_PSNR_FILTER)                   += x86/vf_psnr_init.o
-OBJS-$(CONFIG_PULLUP_FILTER)                 += x86/vf_pullup_init.o
-OBJS-$(CONFIG_REMOVEGRAIN_FILTER)            += x86/vf_removegrain_init.o
-OBJS-$(CONFIG_SHOWCQT_FILTER)                += x86/avf_showcqt_init.o
-OBJS-$(CONFIG_SOBEL_FILTER)                  += x86/vf_convolution_init.o
 OBJS-$(CONFIG_SPP_FILTER)                    += x86/vf_spp.o
-OBJS-$(CONFIG_SSIM_FILTER)                   += x86/vf_ssim_init.o
-OBJS-$(CONFIG_STEREO3D_FILTER)               += x86/vf_stereo3d_init.o
-OBJS-$(CONFIG_TBLEND_FILTER)                 += x86/vf_blend_init.o
-OBJS-$(CONFIG_THRESHOLD_FILTER)              += x86/vf_threshold_init.o
-OBJS-$(CONFIG_TINTERLACE_FILTER)             += x86/vf_tinterlace_init.o
-OBJS-$(CONFIG_TRANSPOSE_FILTER)              += x86/vf_transpose_init.o
-OBJS-$(CONFIG_VOLUME_FILTER)                 += x86/af_volume_init.o
-OBJS-$(CONFIG_V360_FILTER)                   += x86/vf_v360_init.o
-OBJS-$(CONFIG_W3FDIF_FILTER)                 += x86/vf_w3fdif_init.o
-OBJS-$(CONFIG_XPSNR_FILTER)                  += x86/vf_psnr_init.o
-OBJS-$(CONFIG_YADIF_FILTER)                  += x86/vf_yadif_init.o
 
-X86ASM-OBJS-$(CONFIG_SCENE_SAD)              += x86/scene_sad.o
+X86ASM-OBJS-$(CONFIG_SCENE_SAD)              += x86/scene_sad.o x86/scene_sad_init.o
 
-X86ASM-OBJS-$(CONFIG_AFIR_FILTER)            += x86/af_afir.o
-X86ASM-OBJS-$(CONFIG_ANLMDN_FILTER)          += x86/af_anlmdn.o
-X86ASM-OBJS-$(CONFIG_ATADENOISE_FILTER)      += x86/vf_atadenoise.o
-X86ASM-OBJS-$(CONFIG_BLACKDETECT_FILTER)     += x86/vf_blackdetect.o
-X86ASM-OBJS-$(CONFIG_BLEND_FILTER)           += x86/vf_blend.o
-X86ASM-OBJS-$(CONFIG_BWDIF_FILTER)           += x86/vf_bwdif.o
-X86ASM-OBJS-$(CONFIG_COLORDETECT_FILTER)     += x86/vf_colordetect.o
-X86ASM-OBJS-$(CONFIG_COLORSPACE_FILTER)      += x86/colorspacedsp.o
-X86ASM-OBJS-$(CONFIG_CONVOLUTION_FILTER)     += x86/vf_convolution.o
-X86ASM-OBJS-$(CONFIG_EBUR128_FILTER)         += x86/f_ebur128.o
-X86ASM-OBJS-$(CONFIG_EQ_FILTER)              += x86/vf_eq.o
-X86ASM-OBJS-$(CONFIG_FRAMERATE_FILTER)       += x86/vf_framerate.o
-X86ASM-OBJS-$(CONFIG_FSPP_FILTER)            += x86/vf_fspp.o
-X86ASM-OBJS-$(CONFIG_GBLUR_FILTER)           += x86/vf_gblur.o
-X86ASM-OBJS-$(CONFIG_GRADFUN_FILTER)         += x86/vf_gradfun.o
-X86ASM-OBJS-$(CONFIG_HALDCLUT_FILTER)        += x86/vf_lut3d.o
-X86ASM-OBJS-$(CONFIG_HFLIP_FILTER)           += x86/vf_hflip.o
-X86ASM-OBJS-$(CONFIG_HQDN3D_FILTER)          += x86/vf_hqdn3d.o
-X86ASM-OBJS-$(CONFIG_IDET_FILTER)            += x86/vf_idetdsp.o
-X86ASM-OBJS-$(CONFIG_INTERLACE_FILTER)       += x86/vf_interlace.o
-X86ASM-OBJS-$(CONFIG_LIMITER_FILTER)         += x86/vf_limiter.o
-X86ASM-OBJS-$(CONFIG_LUT3D_FILTER)           += x86/vf_lut3d.o
-X86ASM-OBJS-$(CONFIG_MASKEDCLAMP_FILTER)     += x86/vf_maskedclamp.o
-X86ASM-OBJS-$(CONFIG_MASKEDMERGE_FILTER)     += x86/vf_maskedmerge.o
-X86ASM-OBJS-$(CONFIG_NLMEANS_FILTER)         += x86/vf_nlmeans.o
-X86ASM-OBJS-$(CONFIG_OVERLAY_FILTER)         += x86/vf_overlay.o
-X86ASM-OBJS-$(CONFIG_PP7_FILTER)             += x86/vf_pp7.o
-X86ASM-OBJS-$(CONFIG_PSNR_FILTER)            += x86/vf_psnr.o
-X86ASM-OBJS-$(CONFIG_PULLUP_FILTER)          += x86/vf_pullup.o
+X86ASM-OBJS-$(CONFIG_AFIR_FILTER)            += x86/af_afir.o x86/af_afir_init.o
+X86ASM-OBJS-$(CONFIG_ANLMDN_FILTER)          += x86/af_anlmdn.o x86/af_anlmdn_init.o
+X86ASM-OBJS-$(CONFIG_ATADENOISE_FILTER)      += x86/vf_atadenoise.o           \
+                                                x86/vf_atadenoise_init.o
+X86ASM-OBJS-$(CONFIG_BLACKDETECT_FILTER)     += x86/vf_blackdetect.o          \
+                                                x86/vf_blackdetect_init.o
+X86ASM-OBJS-$(CONFIG_BLEND_FILTER)           += x86/vf_blend.o x86/vf_blend_init.o
+X86ASM-OBJS-$(CONFIG_BWDIF_FILTER)           += x86/vf_bwdif.o x86/vf_bwdif_init.o
+X86ASM-OBJS-$(CONFIG_COLORDETECT_FILTER)     += x86/vf_colordetect.o          \
+                                                x86/vf_colordetect_init.o
+X86ASM-OBJS-$(CONFIG_COLORSPACE_FILTER)      += x86/colorspacedsp.o           \
+                                                x86/colorspacedsp_init.o
+X86ASM-OBJS-$(CONFIG_CONVOLUTION_FILTER)     += x86/vf_convolution.o          \
+                                                x86/vf_convolution_init.o
+X86ASM-OBJS-$(CONFIG_EBUR128_FILTER)         += x86/f_ebur128.o x86/f_ebur128_init.o
+X86ASM-OBJS-$(CONFIG_EQ_FILTER)              += x86/vf_eq.o x86/vf_eq_init.o
+X86ASM-OBJS-$(CONFIG_FRAMERATE_FILTER)       += x86/vf_framerate.o            \
+                                                x86/vf_framerate_init.o
+X86ASM-OBJS-$(CONFIG_FSPP_FILTER)            += x86/vf_fspp.o x86/vf_fspp_init.o
+X86ASM-OBJS-$(CONFIG_GBLUR_FILTER)           += x86/vf_gblur.o x86/vf_gblur_init.o
+X86ASM-OBJS-$(CONFIG_GRADFUN_FILTER)         += x86/vf_gradfun.o              \
+                                                x86/vf_gradfun_init.o
+X86ASM-OBJS-$(CONFIG_HALDCLUT_FILTER)        += x86/vf_lut3d.o x86/vf_lut3d_init.o
+X86ASM-OBJS-$(CONFIG_HFLIP_FILTER)           += x86/vf_hflip.o x86/vf_hflip_init.o
+X86ASM-OBJS-$(CONFIG_HQDN3D_FILTER)          += x86/vf_hqdn3d.o x86/vf_hqdn3d_init.o
+X86ASM-OBJS-$(CONFIG_IDET_FILTER)            += x86/vf_idetdsp.o              \
+                                                x86/vf_idetdsp_init.o
+X86ASM-OBJS-$(CONFIG_INTERLACE_FILTER)       += x86/vf_interlace.o            \
+                                                x86/vf_tinterlace_init.o
+X86ASM-OBJS-$(CONFIG_LIMITER_FILTER)         += x86/vf_limiter.o              \
+                                                x86/vf_limiter_init.o
+X86ASM-OBJS-$(CONFIG_LUT3D_FILTER)           += x86/vf_lut3d.o x86/vf_lut3d_init.o
+X86ASM-OBJS-$(CONFIG_MASKEDCLAMP_FILTER)     += x86/vf_maskedclamp.o          \
+                                                x86/vf_maskedclamp_init.o
+X86ASM-OBJS-$(CONFIG_MASKEDMERGE_FILTER)     += x86/vf_maskedmerge.o          \
+                                                x86/vf_maskedmerge_init.o
+X86ASM-OBJS-$(CONFIG_NLMEANS_FILTER)         += x86/vf_nlmeans.o              \
+                                                x86/vf_nlmeans_init.o
+X86ASM-OBJS-$(CONFIG_OVERLAY_FILTER)         += x86/vf_overlay.o              \
+                                                x86/vf_overlay_init.o
+X86ASM-OBJS-$(CONFIG_PP7_FILTER)             += x86/vf_pp7.o x86/vf_pp7_init.o
+X86ASM-OBJS-$(CONFIG_PSNR_FILTER)            += x86/vf_psnr.o x86/vf_psnr_init.o
+X86ASM-OBJS-$(CONFIG_PULLUP_FILTER)          += x86/vf_pullup.o x86/vf_pullup_init.o
 ifdef CONFIG_GPL
-X86ASM-OBJS-$(CONFIG_REMOVEGRAIN_FILTER)     += x86/vf_removegrain.o
+X86ASM-OBJS-$(CONFIG_REMOVEGRAIN_FILTER)     += x86/vf_removegrain.o          \
+                                                x86/vf_removegrain_init.o
 endif
-X86ASM-OBJS-$(CONFIG_SHOWCQT_FILTER)         += x86/avf_showcqt.o
-X86ASM-OBJS-$(CONFIG_SOBEL_FILTER)           += x86/vf_convolution.o
-X86ASM-OBJS-$(CONFIG_SSIM_FILTER)            += x86/vf_ssim.o
-X86ASM-OBJS-$(CONFIG_STEREO3D_FILTER)        += x86/vf_stereo3d.o
-X86ASM-OBJS-$(CONFIG_TBLEND_FILTER)          += x86/vf_blend.o
-X86ASM-OBJS-$(CONFIG_THRESHOLD_FILTER)       += x86/vf_threshold.o
-X86ASM-OBJS-$(CONFIG_TINTERLACE_FILTER)      += x86/vf_interlace.o
-X86ASM-OBJS-$(CONFIG_TRANSPOSE_FILTER)       += x86/vf_transpose.o
-X86ASM-OBJS-$(CONFIG_VOLUME_FILTER)          += x86/af_volume.o
-X86ASM-OBJS-$(CONFIG_V360_FILTER)            += x86/vf_v360.o
-X86ASM-OBJS-$(CONFIG_W3FDIF_FILTER)          += x86/vf_w3fdif.o
-X86ASM-OBJS-$(CONFIG_XPSNR_FILTER)           += x86/vf_psnr.o
-X86ASM-OBJS-$(CONFIG_YADIF_FILTER)           += x86/vf_yadif.o x86/yadif-16.o x86/yadif-10.o
+X86ASM-OBJS-$(CONFIG_SHOWCQT_FILTER)         += x86/avf_showcqt.o             \
+                                                x86/avf_showcqt_init.o
+X86ASM-OBJS-$(CONFIG_SOBEL_FILTER)           += x86/vf_convolution.o          \
+                                                x86/vf_convolution_init.o
+X86ASM-OBJS-$(CONFIG_SSIM_FILTER)            += x86/vf_ssim.o x86/vf_ssim_init.o
+X86ASM-OBJS-$(CONFIG_STEREO3D_FILTER)        += x86/vf_stereo3d.o             \
+                                                x86/vf_stereo3d_init.o
+X86ASM-OBJS-$(CONFIG_TBLEND_FILTER)          += x86/vf_blend.o x86/vf_blend_init.o
+X86ASM-OBJS-$(CONFIG_THRESHOLD_FILTER)       += x86/vf_threshold.o            \
+                                                x86/vf_threshold_init.o
+X86ASM-OBJS-$(CONFIG_TINTERLACE_FILTER)      += x86/vf_interlace.o            \
+                                                x86/vf_tinterlace_init.o
+X86ASM-OBJS-$(CONFIG_TRANSPOSE_FILTER)       += x86/vf_transpose.o            \
+                                                x86/vf_transpose_init.o
+X86ASM-OBJS-$(CONFIG_VOLUME_FILTER)          += x86/af_volume.o x86/af_volume_init.o
+X86ASM-OBJS-$(CONFIG_V360_FILTER)            += x86/vf_v360.o x86/vf_v360_init.o
+X86ASM-OBJS-$(CONFIG_W3FDIF_FILTER)          += x86/vf_w3fdif.o x86/vf_w3fdif_init.o
+X86ASM-OBJS-$(CONFIG_XPSNR_FILTER)           += x86/vf_psnr.o x86/vf_psnr_init.o
+X86ASM-OBJS-$(CONFIG_YADIF_FILTER)           += x86/vf_yadif.o x86/yadif-16.o \
+                                                x86/yadif-10.o x86/vf_yadif_init.o
diff --git a/libavfilter/x86/scene_sad_init.c b/libavfilter/x86/scene_sad_init.c
index 9863839b4e..a35f02c560 100644
--- a/libavfilter/x86/scene_sad_init.c
+++ b/libavfilter/x86/scene_sad_init.c
@@ -52,7 +52,6 @@ static void FUNC_NAME(SCENE_SAD_PARAMS) {                                     \
     *sum += sad[0];                                                           \
 }
 
-#if HAVE_X86ASM
 SCENE_SAD_FUNC(scene_sad_sse2, ff_scene_sad8_sse2, 16)
 #if HAVE_AVX2_EXTERNAL
 SCENE_SAD_FUNC(scene_sad_avx2,     ff_scene_sad8_avx2,  32)
@@ -62,11 +61,9 @@ SCENE_SAD16_FUNC(scene_sad16_avx2, ff_scene_sad16_avx2, 32)
 SCENE_SAD_FUNC(scene_sad_avx512,     ff_scene_sad8_avx512,  64)
 SCENE_SAD16_FUNC(scene_sad16_avx512, ff_scene_sad16_avx512, 64)
 #endif
-#endif
 
 ff_scene_sad_fn ff_scene_sad_get_fn_x86(int depth)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
     if (depth <= 8) {
 #if HAVE_AVX512_EXTERNAL
@@ -89,6 +86,5 @@ ff_scene_sad_fn ff_scene_sad_get_fn_x86(int depth)
             return scene_sad16_avx2;
 #endif
     }
-#endif
     return NULL;
 }
diff --git a/libavfilter/x86/vf_colordetect_init.c b/libavfilter/x86/vf_colordetect_init.c
index 7257b5c4f5..4bff4471a1 100644
--- a/libavfilter/x86/vf_colordetect_init.c
+++ b/libavfilter/x86/vf_colordetect_init.c
@@ -58,7 +58,6 @@ static int FUNC_NAME(const uint8_t *color, ptrdiff_t color_stride,
                              p, q, k);                                          \
 }
 
-#if HAVE_X86ASM
 #if HAVE_AVX512ICL_EXTERNAL
 DETECT_RANGE_FUNC(detect_range_avx512icl,   ff_detect_rangeb_avx512icl, ff_detect_range_c,   0, 64)
 DETECT_RANGE_FUNC(detect_range16_avx512icl, ff_detect_rangew_avx512icl, ff_detect_range16_c, 1, 64)
@@ -75,12 +74,10 @@ DETECT_ALPHA_FUNC(detect_alpha16_full_avx2, ff_detect_alphaw_full_avx2, ff_detec
 DETECT_ALPHA_FUNC(detect_alpha_limited_avx2,   ff_detect_alphab_limited_avx2, ff_detect_alpha_limited_c,   0, 32)
 DETECT_ALPHA_FUNC(detect_alpha16_limited_avx2, ff_detect_alphaw_limited_avx2, ff_detect_alpha16_limited_c, 1, 32)
 #endif
-#endif
 
 av_cold void ff_color_detect_dsp_init_x86(FFColorDetectDSPContext *dsp, int depth,
                                           enum AVColorRange color_range)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 #if HAVE_AVX2_EXTERNAL
     if (EXTERNAL_AVX2_FAST(cpu_flags)) {
@@ -102,5 +99,4 @@ av_cold void ff_color_detect_dsp_init_x86(FFColorDetectDSPContext *dsp, int dept
         }
     }
 #endif
-#endif
 }
diff --git a/libavfilter/x86/vf_eq_init.c b/libavfilter/x86/vf_eq_init.c
index a1719672df..a66dd1b993 100644
--- a/libavfilter/x86/vf_eq_init.c
+++ b/libavfilter/x86/vf_eq_init.c
@@ -28,7 +28,6 @@
 extern void ff_process_one_line_sse2(const uint8_t *src, uint8_t *dst, short contrast,
                                      short brightness, int w);
 
-#if HAVE_X86ASM
 static void process_sse2(EQParameters *param, uint8_t *dst, int dst_stride,
                          const uint8_t *src, int src_stride, int w, int h)
 {
@@ -42,14 +41,11 @@ static void process_sse2(EQParameters *param, uint8_t *dst, int dst_stride,
         dst += dst_stride;
     }
 }
-#endif
 
 av_cold void ff_eq_init_x86(EQContext *eq)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
     if (EXTERNAL_SSE2(cpu_flags)) {
         eq->process = process_sse2;
     }
-#endif
 }
diff --git a/libavfilter/x86/vf_gradfun_init.c b/libavfilter/x86/vf_gradfun_init.c
index f262f0a1bb..6b27169b82 100644
--- a/libavfilter/x86/vf_gradfun_init.c
+++ b/libavfilter/x86/vf_gradfun_init.c
@@ -18,7 +18,6 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
-#include "config.h"
 #include "libavutil/attributes.h"
 #include "libavutil/cpu.h"
 #include "libavutil/x86/cpu.h"
@@ -35,7 +34,6 @@ void ff_gradfun_blur_line_movdqu_sse2(intptr_t x, uint16_t *buf,
                                       const uint16_t *buf1, uint16_t *dc,
                                       const uint8_t *src1, const uint8_t *src2);
 
-#if HAVE_X86ASM
 static void gradfun_filter_line_ssse3(uint8_t *dst, const uint8_t *src, const uint16_t *dc,
                                       int width, int thresh,
                                       const uint16_t *dithers)
@@ -66,11 +64,9 @@ static void gradfun_blur_line_sse2(uint16_t *dc, uint16_t *buf, const uint16_t *
                                          dc + width, src + width * 2,
                                          src + width * 2 + src_linesize);
 }
-#endif /* HAVE_X86ASM */
 
 av_cold void ff_gradfun_init_x86(GradFunContext *gf)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_SSSE3(cpu_flags))
@@ -78,5 +74,4 @@ av_cold void ff_gradfun_init_x86(GradFunContext *gf)
 
     if (EXTERNAL_SSE2(cpu_flags))
         gf->blur_line = gradfun_blur_line_sse2;
-#endif /* HAVE_X86ASM */
 }
diff --git a/libavfilter/x86/vf_hqdn3d_init.c b/libavfilter/x86/vf_hqdn3d_init.c
index 43aca908b0..15e1ab9539 100644
--- a/libavfilter/x86/vf_hqdn3d_init.c
+++ b/libavfilter/x86/vf_hqdn3d_init.c
@@ -23,7 +23,6 @@
 
 #include "libavutil/attributes.h"
 #include "libavfilter/vf_hqdn3d.h"
-#include "config.h"
 
 void ff_hqdn3d_row_8_x86(uint8_t *src, uint8_t *dst, uint16_t *line_ant,
                          uint16_t *frame_ant, ptrdiff_t w, int16_t *spatial,
@@ -40,10 +39,8 @@ void ff_hqdn3d_row_16_x86(uint8_t *src, uint8_t *dst, uint16_t *line_ant,
 
 av_cold void ff_hqdn3d_init_x86(HQDN3DContext *hqdn3d)
 {
-#if HAVE_X86ASM
     hqdn3d->denoise_row[8]  = ff_hqdn3d_row_8_x86;
     hqdn3d->denoise_row[9]  = ff_hqdn3d_row_9_x86;
     hqdn3d->denoise_row[10] = ff_hqdn3d_row_10_x86;
     hqdn3d->denoise_row[16] = ff_hqdn3d_row_16_x86;
-#endif /* HAVE_X86ASM */
 }
diff --git a/libavfilter/x86/vf_idetdsp_init.c b/libavfilter/x86/vf_idetdsp_init.c
index 1f4c2e2bc3..5cfcb4eae5 100644
--- a/libavfilter/x86/vf_idetdsp_init.c
+++ b/libavfilter/x86/vf_idetdsp_init.c
@@ -18,12 +18,9 @@
 
 #include "libavutil/attributes.h"
 #include "libavutil/cpu.h"
-#include "libavutil/x86/asm.h"
 #include "libavutil/x86/cpu.h"
 #include "libavfilter/vf_idetdsp.h"
 
-#if HAVE_X86ASM
-
 /* declares main callable idet_filter_line_sse2() */
 #define FUNC_MAIN_DECL(KIND, SPAN)                                        \
 int ff_idet_filter_line_##KIND(const uint8_t *a, const uint8_t *b,        \
@@ -68,10 +65,8 @@ FUNC_MAIN_DECL_16bit(avx2, 16)
 FUNC_MAIN_DECL(avx512icl, 64)
 FUNC_MAIN_DECL_16bit(avx512icl, 32)
 
-#endif
 av_cold void ff_idet_dsp_init_x86(IDETDSPContext *dsp, int depth)
 {
-#if HAVE_X86ASM
     const int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_SSE2(cpu_flags)) {
@@ -83,5 +78,4 @@ av_cold void ff_idet_dsp_init_x86(IDETDSPContext *dsp, int depth)
     if (EXTERNAL_AVX512ICL(cpu_flags)) {
         dsp->filter_line = depth > 8 ? idet_filter_line_16bit_avx512icl : idet_filter_line_avx512icl;
     }
-#endif // HAVE_X86ASM
 }
diff --git a/libavfilter/x86/vf_pullup_init.c b/libavfilter/x86/vf_pullup_init.c
index 943c1de9d7..fd5e59c6ce 100644
--- a/libavfilter/x86/vf_pullup_init.c
+++ b/libavfilter/x86/vf_pullup_init.c
@@ -18,7 +18,6 @@
 
 #include "libavutil/attributes.h"
 #include "libavutil/cpu.h"
-#include "libavutil/x86/asm.h"
 #include "libavutil/x86/cpu.h"
 #include "libavfilter/vf_pullup.h"
 
@@ -28,7 +27,6 @@ int ff_pullup_filter_var_sse2  (const uint8_t *a, const uint8_t *b, ptrdiff_t s)
 
 av_cold void ff_pullup_init_x86(PullupContext *s)
 {
-#if HAVE_X86ASM
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_SSE2(cpu_flags)) {
@@ -38,5 +36,4 @@ av_cold void ff_pullup_init_x86(PullupContext *s)
     if (EXTERNAL_SSSE3(cpu_flags)) {
         s->comb = ff_pullup_filter_comb_ssse3;
     }
-#endif
 }
-- 
2.52.0


From 62ee1d4735e323976bcf3d087ce72a2ab43965ba Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Thu, 27 Nov 2025 21:04:33 +0100
Subject: [PATCH 157/304] swresample/x86/Makefile: Only compile ASM init files
 when X86ASM is enabled
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

To do so, simply add these init files to X86ASM-OBJS instead of OBJS
in the Makefile. The former is already used for the actual assembly
files, but using them for the C init files just works, because the build
system uses file extensions to derive whether it is a C or a NASM file.

This avoids compiling unused function stubs and also reduces our
reliance on DCE: We don't add %if checks to the asm files except
for AVX, AVX2, FMA3, FMA4, XOP and AVX512, so all the MMX-SSE4
functions will be available. It also allows to remove HAVE_X86ASM checks
in these init files.

(x86/ops.c has already been put in X86ASM-OBJS.)

Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libswresample/resample_dsp.c      |  2 +-
 libswresample/x86/Makefile        | 13 ++++++-------
 libswresample/x86/rematrix_init.c |  2 --
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/libswresample/resample_dsp.c b/libswresample/resample_dsp.c
index 611f44373f..96f567fe32 100644
--- a/libswresample/resample_dsp.c
+++ b/libswresample/resample_dsp.c
@@ -68,7 +68,7 @@ void swri_resample_dsp_init(ResampleContext *c)
         break;
     }
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     swri_resample_dsp_x86_init(c);
 #elif ARCH_ARM
     swri_resample_dsp_arm_init(c);
diff --git a/libswresample/x86/Makefile b/libswresample/x86/Makefile
index fa0641f03f..c0655bb8a8 100644
--- a/libswresample/x86/Makefile
+++ b/libswresample/x86/Makefile
@@ -1,9 +1,8 @@
-X86ASM-OBJS                     += x86/audio_convert.o\
-                                   x86/rematrix.o\
-                                   x86/resample.o\
-
-OBJS                            += x86/audio_convert_init.o\
-                                   x86/rematrix_init.o\
-                                   x86/resample_init.o\
+X86ASM-OBJS                     += x86/audio_convert.o      \
+                                   x86/audio_convert_init.o \
+                                   x86/rematrix.o           \
+                                   x86/rematrix_init.o      \
+                                   x86/resample.o           \
+                                   x86/resample_init.o      \
 
 OBJS-$(CONFIG_XMM_CLOBBER_TEST) += x86/w64xmmtest.o
diff --git a/libswresample/x86/rematrix_init.c b/libswresample/x86/rematrix_init.c
index 89ec362d62..88f27f5e93 100644
--- a/libswresample/x86/rematrix_init.c
+++ b/libswresample/x86/rematrix_init.c
@@ -32,7 +32,6 @@ D(float, avx)
 D(int16, sse2)
 
 av_cold int swri_rematrix_init_x86(struct SwrContext *s){
-#if HAVE_X86ASM
     int mm_flags = av_get_cpu_flags();
     int nb_in  = s->used_ch_layout.nb_channels;
     int nb_out = s->out.ch_count;
@@ -79,7 +78,6 @@ av_cold int swri_rematrix_init_x86(struct SwrContext *s){
         memcpy(s->native_simd_matrix, s->native_matrix, num * sizeof(float));
         s->native_simd_one.f = s->native_one.f;
     }
-#endif
 
     return 0;
 }
-- 
2.52.0


From d96b56a3b3ec4f43b3af57555d397f24303e78a1 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Thu, 27 Nov 2025 21:26:31 +0100
Subject: [PATCH 158/304] avutil/x86/Makefile: Only compile ASM init files when
 X86ASM is enabled
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

To do so, simply add these init files to X86ASM-OBJS instead of OBJS
in the Makefile. The former is already used for the actual assembly
files, but using them for the C init files just works, because the build
system uses file extensions to derive whether it is a C or a NASM file.

This avoids compiling unused function stubs and also reduces our
reliance on DCE: We don't add %if checks to the asm files except
for AVX, AVX2, FMA3, FMA4, XOP and AVX512, so all the MMX-SSE4
functions will be available. It also allows to remove HAVE_X86ASM checks
in these init files.

Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavutil/aes.c        |  2 +-
 libavutil/fixed_dsp.c  |  2 +-
 libavutil/float_dsp.c  |  2 +-
 libavutil/imgutils.c   |  2 +-
 libavutil/lls.c        |  2 +-
 libavutil/pixelutils.c |  2 +-
 libavutil/x86/Makefile | 29 ++++++++++-------------------
 7 files changed, 16 insertions(+), 25 deletions(-)

diff --git a/libavutil/aes.c b/libavutil/aes.c
index 80f8a2253d..de7144fab8 100644
--- a/libavutil/aes.c
+++ b/libavutil/aes.c
@@ -237,7 +237,7 @@ int av_aes_init(AVAES *a, const uint8_t *key, int key_bits, int decrypt)
 
     a->rounds = rounds;
     a->crypt = decrypt ? aes_decrypt : aes_encrypt;
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_init_aes_x86(a, decrypt);
 #endif
 
diff --git a/libavutil/fixed_dsp.c b/libavutil/fixed_dsp.c
index 95f0eb2595..74cefdb145 100644
--- a/libavutil/fixed_dsp.c
+++ b/libavutil/fixed_dsp.c
@@ -165,7 +165,7 @@ AVFixedDSPContext * avpriv_alloc_fixed_dsp(int bit_exact)
 
 #if ARCH_RISCV
     ff_fixed_dsp_init_riscv(fdsp);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_fixed_dsp_init_x86(fdsp);
 #endif
 
diff --git a/libavutil/float_dsp.c b/libavutil/float_dsp.c
index f38ad565cf..5bfa1008d1 100644
--- a/libavutil/float_dsp.c
+++ b/libavutil/float_dsp.c
@@ -159,7 +159,7 @@ av_cold AVFloatDSPContext *avpriv_float_dsp_alloc(int bit_exact)
     ff_float_dsp_init_ppc(fdsp, bit_exact);
 #elif ARCH_RISCV
     ff_float_dsp_init_riscv(fdsp);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_float_dsp_init_x86(fdsp);
 #elif ARCH_MIPS
     ff_float_dsp_init_mips(fdsp);
diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
index 7b88738e2d..681482dc08 100644
--- a/libavutil/imgutils.c
+++ b/libavutil/imgutils.c
@@ -362,7 +362,7 @@ void av_image_copy_plane_uc_from(uint8_t *dst, ptrdiff_t dst_linesize,
 {
     int ret = -1;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ret = ff_image_copy_plane_uc_from_x86(dst, dst_linesize, src, src_linesize,
                                           bytewidth, height);
 #endif
diff --git a/libavutil/lls.c b/libavutil/lls.c
index fe8f55976d..27168009c5 100644
--- a/libavutil/lls.c
+++ b/libavutil/lls.c
@@ -114,7 +114,7 @@ av_cold void avpriv_init_lls(LLSModel *m, int indep_count)
     m->evaluate_lls = evaluate_lls;
 #if ARCH_RISCV
     ff_init_lls_riscv(m);
-#elif ARCH_X86
+#elif ARCH_X86 && HAVE_X86ASM
     ff_init_lls_x86(m);
 #endif
 }
diff --git a/libavutil/pixelutils.c b/libavutil/pixelutils.c
index 820889a143..8e91f0a2cc 100644
--- a/libavutil/pixelutils.c
+++ b/libavutil/pixelutils.c
@@ -86,7 +86,7 @@ av_pixelutils_sad_fn av_pixelutils_get_sad_fn(int w_bits, int h_bits, int aligne
     if (w_bits != h_bits) // only squared sad for now
         return NULL;
 
-#if ARCH_X86
+#if ARCH_X86 && HAVE_X86ASM
     ff_pixelutils_sad_init_x86(sad, aligned);
 #endif
 
diff --git a/libavutil/x86/Makefile b/libavutil/x86/Makefile
index 8cfd646108..4e1b4b1176 100644
--- a/libavutil/x86/Makefile
+++ b/libavutil/x86/Makefile
@@ -1,23 +1,14 @@
-OBJS += x86/aes_init.o                                                  \
-        x86/cpu.o                                                       \
-        x86/fixed_dsp_init.o                                            \
-        x86/float_dsp_init.o                                            \
-        x86/imgutils_init.o                                             \
-        x86/lls_init.o                                                  \
-
-OBJS-$(HAVE_X86ASM) += x86/tx_float_init.o                              \
-
-OBJS-$(CONFIG_PIXELUTILS) += x86/pixelutils_init.o                      \
+OBJS += x86/cpu.o                                                       \
 
 EMMS_OBJS_$(HAVE_MMX_INLINE)_$(HAVE_MMX_EXTERNAL)_$(HAVE_MM_EMPTY) = x86/emms.o
 
-X86ASM-OBJS += x86/aes.o                                                \
-             x86/cpuid.o                                                \
-             $(EMMS_OBJS__yes_)                                      \
-             x86/fixed_dsp.o                                            \
-             x86/float_dsp.o                                            \
-             x86/imgutils.o                                             \
-             x86/lls.o                                                  \
-             x86/tx_float.o                                             \
+X86ASM-OBJS += x86/aes.o x86/aes_init.o                                 \
+               x86/cpuid.o                                              \
+               $(EMMS_OBJS__yes_)                                       \
+               x86/fixed_dsp.o x86/fixed_dsp_init.o                     \
+               x86/float_dsp.o x86/float_dsp_init.o                     \
+               x86/imgutils.o x86/imgutils_init.o                       \
+               x86/lls.o x86/lls_init.o                                 \
+               x86/tx_float.o x86/tx_float_init.o                       \
 
-X86ASM-OBJS-$(CONFIG_PIXELUTILS) += x86/pixelutils.o                    \
+X86ASM-OBJS-$(CONFIG_PIXELUTILS) += x86/pixelutils.o x86/pixelutils_init.o
-- 
2.52.0


From d3f8f809a9a14579cffeb670a9d574be89e20dcc Mon Sep 17 00:00:00 2001
From: Russell Greene <russellgreene8@gmail.com>
Date: Sat, 29 Nov 2025 12:42:13 -0700
Subject: [PATCH 159/304] hwcontext_vulkan: remove VK_HOST_IMAGE_COPY_MEMCPY
 flag

Reading the spec for what this flag means, it copies the data verbatim, including any swizzling/tiling, this has two issues

1. the format may not be what ffmpeg expects elsewhere, as it is expecing normal pitch linear host memeory in `swf`
2. the size of the copied data may not match the size of buffer provided, causing heap buffer overflow

It seems like addition of this flag is an oversight as it seems to be for caching/backups of image data, just to be used with copying back to the GPU with the MEMCPY flag, which is *not* how its used in ffmpeg.

Additionally, set memoryRowLength as if it isn't set, it assumes pitch = width_in_bytes, which I don't think is necessarily the case
---
 libavutil/hwcontext_vulkan.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libavutil/hwcontext_vulkan.c b/libavutil/hwcontext_vulkan.c
index a2caaa0959..aac7768033 100644
--- a/libavutil/hwcontext_vulkan.c
+++ b/libavutil/hwcontext_vulkan.c
@@ -4440,7 +4440,6 @@ static int vulkan_transfer_host(AVHWFramesContext *hwfc, AVFrame *hwf,
         };
         VkCopyMemoryToImageInfoEXT copy_info = {
             .sType = VK_STRUCTURE_TYPE_COPY_MEMORY_TO_IMAGE_INFO_EXT,
-            .flags = VK_HOST_IMAGE_COPY_MEMCPY_EXT,
             .regionCount = 1,
             .pRegions = &region_info,
         };
@@ -4466,7 +4465,6 @@ static int vulkan_transfer_host(AVHWFramesContext *hwfc, AVFrame *hwf,
         };
         VkCopyImageToMemoryInfoEXT copy_info = {
             .sType = VK_STRUCTURE_TYPE_COPY_IMAGE_TO_MEMORY_INFO_EXT,
-            .flags = VK_HOST_IMAGE_COPY_MEMCPY_EXT,
             .regionCount = 1,
             .pRegions = &region_info,
         };
@@ -4476,6 +4474,7 @@ static int vulkan_transfer_host(AVHWFramesContext *hwfc, AVFrame *hwf,
             get_plane_wh(&p_w, &p_h, swf->format, swf->width, swf->height, i);
 
             region_info.pHostPointer = swf->data[i];
+            region_info.memoryRowLength = swf->linesize[i];
             region_info.imageSubresource.aspectMask = ff_vk_aspect_flag(hwf, i);
             region_info.imageExtent = (VkExtent3D){ p_w, p_h, 1 };
             copy_info.srcImage = hwf_vk->img[img_idx];
-- 
2.52.0


From adf025b4f2001964a318d8ebdbebaad089427359 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Sun, 30 Nov 2025 07:38:31 +0100
Subject: [PATCH 160/304] tests/checkasm: fix check for 32-bit Windows build
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

With --disable-asm, ARCH_X86_32 is set to 0, but we still build the
checkasm binary. Update the check so it is config.h agnostic.

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 tests/checkasm/checkasm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index bd33aba263..05f74ca16b 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -44,7 +44,7 @@
 
 #ifdef _WIN32
 #include <windows.h>
-#if ARCH_X86_32
+#if defined(__i386__) || defined(_M_IX86)
 #include <setjmp.h>
 typedef jmp_buf checkasm_context;
 #define checkasm_save_context() checkasm_handle_signal(setjmp(checkasm_context_buf))
-- 
2.52.0


From 2e7756097dbacb8c43b973cb0a503ac70d5fe86c Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Sun, 30 Nov 2025 23:10:31 +0100
Subject: [PATCH 161/304] hwcontext_vulkan: fix VkImageToMemoryCopyEXT.sType

It was copy pasted from the upload path.
Somehow, it was missed, despite god knows how many validation layer runs.
---
 libavutil/hwcontext_vulkan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavutil/hwcontext_vulkan.c b/libavutil/hwcontext_vulkan.c
index aac7768033..c3322bafbd 100644
--- a/libavutil/hwcontext_vulkan.c
+++ b/libavutil/hwcontext_vulkan.c
@@ -4458,7 +4458,7 @@ static int vulkan_transfer_host(AVHWFramesContext *hwfc, AVFrame *hwf,
         }
     } else {
         VkImageToMemoryCopyEXT region_info = {
-            .sType = VK_STRUCTURE_TYPE_MEMORY_TO_IMAGE_COPY_EXT,
+            .sType = VK_STRUCTURE_TYPE_IMAGE_TO_MEMORY_COPY_EXT,
             .imageSubresource = {
                 .layerCount = 1,
             },
-- 
2.52.0


From 36e8917e9b6fd627fa6c0df2f86ed9f3df57b997 Mon Sep 17 00:00:00 2001
From: llyyr <llyyr.public@gmail.com>
Date: Sat, 29 Nov 2025 17:03:17 +0530
Subject: [PATCH 162/304] avutil/hwcontext_d3d12va: use hwdev context for
 logging

This fixes warning about av_log being called with NULL AVClass. This is
also an API violation

Fixes: https://trac.ffmpeg.org/ticket/11335
---
 libavutil/hwcontext_d3d12va.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavutil/hwcontext_d3d12va.c b/libavutil/hwcontext_d3d12va.c
index b80227b56c..164f15a285 100644
--- a/libavutil/hwcontext_d3d12va.c
+++ b/libavutil/hwcontext_d3d12va.c
@@ -746,7 +746,7 @@ static int d3d12va_device_create(AVHWDeviceContext *hwdev, const char *device,
             DXGI_ADAPTER_DESC desc;
             hr = IDXGIAdapter2_GetDesc(pAdapter, &desc);
             if (!FAILED(hr)) {
-                av_log(ctx, AV_LOG_INFO, "Using device %04x:%04x (%ls).\n",
+                av_log(hwdev, AV_LOG_INFO, "Using device %04x:%04x (%ls).\n",
                        desc.VendorId, desc.DeviceId, desc.Description);
             }
         }
@@ -754,7 +754,7 @@ static int d3d12va_device_create(AVHWDeviceContext *hwdev, const char *device,
         hr = priv->create_device((IUnknown *)pAdapter, D3D_FEATURE_LEVEL_12_0, &IID_ID3D12Device, (void **)&ctx->device);
         D3D12_OBJECT_RELEASE(pAdapter);
         if (FAILED(hr)) {
-            av_log(ctx, AV_LOG_ERROR, "Failed to create Direct 3D 12 device (%lx)\n", (long)hr);
+            av_log(hwdev, AV_LOG_ERROR, "Failed to create Direct 3D 12 device (%lx)\n", (long)hr);
             return AVERROR_UNKNOWN;
         }
     }
-- 
2.52.0


From 39ecf5d030f12fc0649abfb74c239cf521ca5e8b Mon Sep 17 00:00:00 2001
From: averne <averne381@gmail.com>
Date: Mon, 1 Dec 2025 15:40:40 +0100
Subject: [PATCH 163/304] vulkan: fix host copy stride

memoryRowLength is is texels, not bytes
---
 libavutil/hwcontext_vulkan.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libavutil/hwcontext_vulkan.c b/libavutil/hwcontext_vulkan.c
index c3322bafbd..89bf39bb3f 100644
--- a/libavutil/hwcontext_vulkan.c
+++ b/libavutil/hwcontext_vulkan.c
@@ -4382,6 +4382,7 @@ static int vulkan_transfer_host(AVHWFramesContext *hwfc, AVFrame *hwf,
     FFVulkanFunctions *vk = &p->vkctx.vkfn;
 
     AVVkFrame *hwf_vk = (AVVkFrame *)hwf->data[0];
+    const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(swf->format);
     const int planes = av_pix_fmt_count_planes(swf->format);
     const int nb_images = ff_vk_count_images(hwf_vk);
 
@@ -4474,7 +4475,7 @@ static int vulkan_transfer_host(AVHWFramesContext *hwfc, AVFrame *hwf,
             get_plane_wh(&p_w, &p_h, swf->format, swf->width, swf->height, i);
 
             region_info.pHostPointer = swf->data[i];
-            region_info.memoryRowLength = swf->linesize[i];
+            region_info.memoryRowLength = swf->linesize[i] / desc->comp[i].step;
             region_info.imageSubresource.aspectMask = ff_vk_aspect_flag(hwf, i);
             region_info.imageExtent = (VkExtent3D){ p_w, p_h, 1 };
             copy_info.srcImage = hwf_vk->img[img_idx];
-- 
2.52.0


From 57e470b2cca9ca6eca72427a5e662f6ab0fd0a18 Mon Sep 17 00:00:00 2001
From: Thomas Gritzan <phygon@gmx.de>
Date: Thu, 27 Nov 2025 02:30:25 +0100
Subject: [PATCH 164/304] libavdevice/decklink: add support for DeckLink SDK
 14.3

This patch adds support for DeckLink SDK 14.3 and newer by using
the legacy interfaces in the header <DeckLinkAPI_v14_2_1.h>.

The missing QueryInterface implementations are also provided.
---
 libavdevice/decklink_common.cpp | 13 +++++---
 libavdevice/decklink_common.h   | 31 +++++++++++++++++--
 libavdevice/decklink_dec.cpp    | 55 ++++++++++++++++++++++++++-------
 libavdevice/decklink_enc.cpp    | 46 +++++++++++++++------------
 4 files changed, 107 insertions(+), 38 deletions(-)

diff --git a/libavdevice/decklink_common.cpp b/libavdevice/decklink_common.cpp
index 47de7ef6b0..fe187cd2c9 100644
--- a/libavdevice/decklink_common.cpp
+++ b/libavdevice/decklink_common.cpp
@@ -25,7 +25,12 @@ extern "C" {
 #include "libavformat/internal.h"
 }
 
+#include <DeckLinkAPIVersion.h>
 #include <DeckLinkAPI.h>
+#if BLACKMAGIC_DECKLINK_API_VERSION >= 0x0e030000
+#include <DeckLinkAPI_v14_2_1.h>
+#endif
+
 #ifdef _WIN32
 #include <DeckLinkAPI_i.c>
 #else
@@ -512,8 +517,8 @@ int ff_decklink_list_devices(AVFormatContext *avctx,
         return AVERROR(EIO);
 
     while (ret == 0 && iter->Next(&dl) == S_OK) {
-        IDeckLinkOutput *output_config;
-        IDeckLinkInput *input_config;
+        IDeckLinkOutput_v14_2_1 *output_config;
+        IDeckLinkInput_v14_2_1 *input_config;
         const char *display_name = NULL;
         const char *unique_name = NULL;
         AVDeviceInfo *new_device = NULL;
@@ -527,14 +532,14 @@ int ff_decklink_list_devices(AVFormatContext *avctx,
             goto next;
 
         if (show_outputs) {
-            if (dl->QueryInterface(IID_IDeckLinkOutput, (void **)&output_config) == S_OK) {
+            if (dl->QueryInterface(IID_IDeckLinkOutput_v14_2_1, (void **)&output_config) == S_OK) {
                 output_config->Release();
                 add = 1;
             }
         }
 
         if (show_inputs) {
-            if (dl->QueryInterface(IID_IDeckLinkInput, (void **)&input_config) == S_OK) {
+            if (dl->QueryInterface(IID_IDeckLinkInput_v14_2_1, (void **)&input_config) == S_OK) {
                 input_config->Release();
                 add = 1;
             }
diff --git a/libavdevice/decklink_common.h b/libavdevice/decklink_common.h
index 6b32dc2d09..095b438bce 100644
--- a/libavdevice/decklink_common.h
+++ b/libavdevice/decklink_common.h
@@ -29,6 +29,23 @@
 #define IDeckLinkProfileAttributes IDeckLinkAttributes
 #endif
 
+#if BLACKMAGIC_DECKLINK_API_VERSION < 0x0e030000
+#define IDeckLinkInput_v14_2_1 IDeckLinkInput
+#define IDeckLinkInputCallback_v14_2_1 IDeckLinkInputCallback
+#define IDeckLinkMemoryAllocator_v14_2_1 IDeckLinkMemoryAllocator
+#define IDeckLinkOutput_v14_2_1 IDeckLinkOutput
+#define IDeckLinkVideoFrame_v14_2_1 IDeckLinkVideoFrame
+#define IDeckLinkVideoInputFrame_v14_2_1 IDeckLinkVideoInputFrame
+#define IDeckLinkVideoOutputCallback_v14_2_1 IDeckLinkVideoOutputCallback
+#define IID_IDeckLinkInput_v14_2_1 IID_IDeckLinkInput
+#define IID_IDeckLinkInputCallback_v14_2_1 IID_IDeckLinkInputCallback
+#define IID_IDeckLinkMemoryAllocator_v14_2_1 IID_IDeckLinkMemoryAllocator
+#define IID_IDeckLinkOutput_v14_2_1 IID_IDeckLinkOutput
+#define IID_IDeckLinkVideoFrame_v14_2_1 IID_IDeckLinkVideoFrame
+#define IID_IDeckLinkVideoInputFrame_v14_2_1 IID_IDeckLinkVideoInputFrame
+#define IID_IDeckLinkVideoOutputCallback_v14_2_1 IID_IDeckLinkVideoOutputCallback
+#endif
+
 extern "C" {
 #include "libavutil/mem.h"
 #include "libavcodec/packet_internal.h"
@@ -76,6 +93,16 @@ static char *dup_cfstring_to_utf8(CFStringRef w)
 #define DECKLINK_FREE(s) free((void *) s)
 #endif
 
+#ifdef _WIN32
+#include <guiddef.h>    // REFIID, IsEqualIID()
+#define DECKLINK_IsEqualIID IsEqualIID
+#else
+static inline bool DECKLINK_IsEqualIID(const REFIID& riid1, const REFIID& riid2)
+{
+    return memcmp(&riid1, &riid2, sizeof(REFIID)) == 0;
+}
+#endif
+
 class decklink_output_callback;
 class decklink_input_callback;
 
@@ -93,8 +120,8 @@ typedef struct DecklinkPacketQueue {
 struct decklink_ctx {
     /* DeckLink SDK interfaces */
     IDeckLink *dl;
-    IDeckLinkOutput *dlo;
-    IDeckLinkInput *dli;
+    IDeckLinkOutput_v14_2_1 *dlo;
+    IDeckLinkInput_v14_2_1 *dli;
     IDeckLinkConfiguration *cfg;
     IDeckLinkProfileAttributes *attr;
     decklink_output_callback *output_callback;
diff --git a/libavdevice/decklink_dec.cpp b/libavdevice/decklink_dec.cpp
index 418701e4e0..8830779990 100644
--- a/libavdevice/decklink_dec.cpp
+++ b/libavdevice/decklink_dec.cpp
@@ -31,7 +31,11 @@ extern "C" {
 #include "libavformat/internal.h"
 }
 
+#include <DeckLinkAPIVersion.h>
 #include <DeckLinkAPI.h>
+#if BLACKMAGIC_DECKLINK_API_VERSION >= 0x0e030000
+#include <DeckLinkAPI_v14_2_1.h>
+#endif
 
 extern "C" {
 #include "config.h"
@@ -105,7 +109,7 @@ static VANCLineNumber vanc_line_numbers[] = {
     {bmdModeUnknown, 0, -1, -1, -1}
 };
 
-class decklink_allocator : public IDeckLinkMemoryAllocator
+class decklink_allocator : public IDeckLinkMemoryAllocator_v14_2_1
 {
 public:
         decklink_allocator(): _refs(1) { }
@@ -129,7 +133,21 @@ public:
         virtual HRESULT STDMETHODCALLTYPE Decommit() { return S_OK; }
 
         // IUnknown methods
-        virtual HRESULT STDMETHODCALLTYPE QueryInterface(REFIID iid, LPVOID *ppv) { return E_NOINTERFACE; }
+        virtual HRESULT STDMETHODCALLTYPE QueryInterface(REFIID riid, LPVOID *ppv)
+        {
+            if (DECKLINK_IsEqualIID(riid, IID_IUnknown)) {
+                *ppv = static_cast<IUnknown*>(this);
+            } else if (DECKLINK_IsEqualIID(riid, IID_IDeckLinkMemoryAllocator_v14_2_1)) {
+                *ppv = static_cast<IDeckLinkMemoryAllocator_v14_2_1*>(this);
+            } else {
+                *ppv = NULL;
+                return E_NOINTERFACE;
+            }
+
+            AddRef();
+            return S_OK;
+        }
+
         virtual ULONG   STDMETHODCALLTYPE AddRef(void) { return ++_refs; }
         virtual ULONG   STDMETHODCALLTYPE Release(void)
         {
@@ -472,7 +490,7 @@ skip_packet:
 }
 
 
-static void handle_klv(AVFormatContext *avctx, decklink_ctx *ctx, IDeckLinkVideoInputFrame *videoFrame, int64_t pts)
+static void handle_klv(AVFormatContext *avctx, decklink_ctx *ctx, IDeckLinkVideoInputFrame_v14_2_1 *videoFrame, int64_t pts)
 {
     const uint8_t KLV_DID = 0x44;
     const uint8_t KLV_IN_VANC_SDID = 0x04;
@@ -574,17 +592,30 @@ static void handle_klv(AVFormatContext *avctx, decklink_ctx *ctx, IDeckLinkVideo
     }
 }
 
-class decklink_input_callback : public IDeckLinkInputCallback
+class decklink_input_callback : public IDeckLinkInputCallback_v14_2_1
 {
 public:
         explicit decklink_input_callback(AVFormatContext *_avctx);
         ~decklink_input_callback();
 
-        virtual HRESULT STDMETHODCALLTYPE QueryInterface(REFIID iid, LPVOID *ppv) { return E_NOINTERFACE; }
+        virtual HRESULT STDMETHODCALLTYPE QueryInterface(REFIID riid, LPVOID *ppv)
+        {
+            if (DECKLINK_IsEqualIID(riid, IID_IUnknown)) {
+                *ppv = static_cast<IUnknown*>(this);
+            } else if (DECKLINK_IsEqualIID(riid, IID_IDeckLinkInputCallback_v14_2_1)) {
+                *ppv = static_cast<IDeckLinkInputCallback_v14_2_1*>(this);
+            } else {
+                *ppv = NULL;
+                return E_NOINTERFACE;
+            }
+
+            AddRef();
+            return S_OK;
+        }
         virtual ULONG STDMETHODCALLTYPE AddRef(void);
         virtual ULONG STDMETHODCALLTYPE  Release(void);
         virtual HRESULT STDMETHODCALLTYPE VideoInputFormatChanged(BMDVideoInputFormatChangedEvents, IDeckLinkDisplayMode*, BMDDetectedVideoInputFormatFlags);
-        virtual HRESULT STDMETHODCALLTYPE VideoInputFrameArrived(IDeckLinkVideoInputFrame*, IDeckLinkAudioInputPacket*);
+        virtual HRESULT STDMETHODCALLTYPE VideoInputFrameArrived(IDeckLinkVideoInputFrame_v14_2_1*, IDeckLinkAudioInputPacket*);
 
 private:
         std::atomic<int>  _refs;
@@ -593,7 +624,7 @@ private:
         int no_video;
         int64_t initial_video_pts;
         int64_t initial_audio_pts;
-        IDeckLinkVideoInputFrame* last_video_frame;
+        IDeckLinkVideoInputFrame_v14_2_1* last_video_frame;
 };
 
 decklink_input_callback::decklink_input_callback(AVFormatContext *_avctx) : _refs(1)
@@ -625,7 +656,7 @@ ULONG decklink_input_callback::Release(void)
     return ret;
 }
 
-static int64_t get_pkt_pts(IDeckLinkVideoInputFrame *videoFrame,
+static int64_t get_pkt_pts(IDeckLinkVideoInputFrame_v14_2_1 *videoFrame,
                            IDeckLinkAudioInputPacket *audioFrame,
                            int64_t wallclock,
                            int64_t abs_wallclock,
@@ -679,7 +710,7 @@ static int64_t get_pkt_pts(IDeckLinkVideoInputFrame *videoFrame,
     return pts;
 }
 
-static int get_bmd_timecode(AVFormatContext *avctx, AVTimecode *tc, AVRational frame_rate, BMDTimecodeFormat tc_format, IDeckLinkVideoInputFrame *videoFrame)
+static int get_bmd_timecode(AVFormatContext *avctx, AVTimecode *tc, AVRational frame_rate, BMDTimecodeFormat tc_format, IDeckLinkVideoInputFrame_v14_2_1 *videoFrame)
 {
     IDeckLinkTimecode *timecode;
     int ret = AVERROR(ENOENT);
@@ -701,7 +732,7 @@ static int get_bmd_timecode(AVFormatContext *avctx, AVTimecode *tc, AVRational f
     return ret;
 }
 
-static int get_frame_timecode(AVFormatContext *avctx, decklink_ctx *ctx, AVTimecode *tc, IDeckLinkVideoInputFrame *videoFrame)
+static int get_frame_timecode(AVFormatContext *avctx, decklink_ctx *ctx, AVTimecode *tc, IDeckLinkVideoInputFrame_v14_2_1 *videoFrame)
 {
     AVRational frame_rate = ctx->video_st->r_frame_rate;
     int ret;
@@ -726,7 +757,7 @@ static int get_frame_timecode(AVFormatContext *avctx, decklink_ctx *ctx, AVTimec
 }
 
 HRESULT decklink_input_callback::VideoInputFrameArrived(
-    IDeckLinkVideoInputFrame *videoFrame, IDeckLinkAudioInputPacket *audioFrame)
+    IDeckLinkVideoInputFrame_v14_2_1 *videoFrame, IDeckLinkAudioInputPacket *audioFrame)
 {
     void *frameBytes;
     void *audioFrameBytes;
@@ -1141,7 +1172,7 @@ av_cold int ff_decklink_read_header(AVFormatContext *avctx)
         goto error;
 
     /* Get input device. */
-    if (ctx->dl->QueryInterface(IID_IDeckLinkInput, (void **) &ctx->dli) != S_OK) {
+    if (ctx->dl->QueryInterface(IID_IDeckLinkInput_v14_2_1, (void **) &ctx->dli) != S_OK) {
         av_log(avctx, AV_LOG_ERROR, "Could not open input device from '%s'\n",
                avctx->url);
         ret = AVERROR(EIO);
diff --git a/libavdevice/decklink_enc.cpp b/libavdevice/decklink_enc.cpp
index 195f005c17..d2e246c818 100644
--- a/libavdevice/decklink_enc.cpp
+++ b/libavdevice/decklink_enc.cpp
@@ -28,7 +28,11 @@ extern "C" {
 #include "libavformat/internal.h"
 }
 
+#include <DeckLinkAPIVersion.h>
 #include <DeckLinkAPI.h>
+#if BLACKMAGIC_DECKLINK_API_VERSION >= 0x0e030000
+#include <DeckLinkAPI_v14_2_1.h>
+#endif
 
 extern "C" {
 #include "libavformat/avformat.h"
@@ -47,18 +51,8 @@ extern "C" {
 #include "libklvanc/pixels.h"
 #endif
 
-#ifdef _WIN32
-#include <guiddef.h>
-#else
-/* There is no guiddef.h in Linux builds, so we provide our own IsEqualIID() */
-static bool IsEqualIID(REFIID riid1, REFIID riid2)
-{
-    return memcmp(&riid1, &riid2, sizeof(REFIID)) == 0;
-}
-#endif
-
 /* DeckLink callback class declaration */
-class decklink_frame : public IDeckLinkVideoFrame
+class decklink_frame : public IDeckLinkVideoFrame_v14_2_1
 {
 public:
     decklink_frame(struct decklink_ctx *ctx, AVFrame *avframe, AVCodecID codec_id, int height, int width) :
@@ -123,10 +117,10 @@ public:
     }
     virtual HRESULT STDMETHODCALLTYPE QueryInterface(REFIID riid, LPVOID *ppv)
     {
-        if (IsEqualIID(riid, IID_IUnknown)) {
+        if (DECKLINK_IsEqualIID(riid, IID_IUnknown)) {
             *ppv = static_cast<IUnknown*>(this);
-        } else if (IsEqualIID(riid, IID_IDeckLinkVideoFrame)) {
-            *ppv = static_cast<IDeckLinkVideoFrame*>(this);
+        } else if (DECKLINK_IsEqualIID(riid, IID_IDeckLinkVideoFrame_v14_2_1)) {
+            *ppv = static_cast<IDeckLinkVideoFrame_v14_2_1*>(this);
         } else {
             *ppv = NULL;
             return E_NOINTERFACE;
@@ -135,7 +129,6 @@ public:
         AddRef();
         return S_OK;
     }
-
     virtual ULONG   STDMETHODCALLTYPE AddRef(void)                            { return ++_refs; }
     virtual ULONG   STDMETHODCALLTYPE Release(void)
     {
@@ -162,10 +155,10 @@ private:
     std::atomic<int>  _refs;
 };
 
-class decklink_output_callback : public IDeckLinkVideoOutputCallback
+class decklink_output_callback : public IDeckLinkVideoOutputCallback_v14_2_1
 {
 public:
-    virtual HRESULT STDMETHODCALLTYPE ScheduledFrameCompleted(IDeckLinkVideoFrame *_frame, BMDOutputFrameCompletionResult result)
+    virtual HRESULT STDMETHODCALLTYPE ScheduledFrameCompleted(IDeckLinkVideoFrame_v14_2_1 *_frame, BMDOutputFrameCompletionResult result)
     {
         decklink_frame *frame = static_cast<decklink_frame *>(_frame);
         struct decklink_ctx *ctx = frame->_ctx;
@@ -183,7 +176,20 @@ public:
         return S_OK;
     }
     virtual HRESULT STDMETHODCALLTYPE ScheduledPlaybackHasStopped(void)       { return S_OK; }
-    virtual HRESULT STDMETHODCALLTYPE QueryInterface(REFIID iid, LPVOID *ppv) { return E_NOINTERFACE; }
+    virtual HRESULT STDMETHODCALLTYPE QueryInterface(REFIID riid, LPVOID *ppv)
+    {
+        if (DECKLINK_IsEqualIID(riid, IID_IUnknown)) {
+            *ppv = static_cast<IUnknown*>(this);
+        } else if (DECKLINK_IsEqualIID(riid, IID_IDeckLinkVideoOutputCallback_v14_2_1)) {
+            *ppv = static_cast<IDeckLinkVideoOutputCallback_v14_2_1*>(this);
+        } else {
+            *ppv = NULL;
+            return E_NOINTERFACE;
+        }
+
+        AddRef();
+        return S_OK;
+    }
     virtual ULONG   STDMETHODCALLTYPE AddRef(void)                            { return 1; }
     virtual ULONG   STDMETHODCALLTYPE Release(void)                           { return 1; }
 };
@@ -763,7 +769,7 @@ static int decklink_write_video_packet(AVFormatContext *avctx, AVPacket *pkt)
         ctx->first_pts = pkt->pts;
 
     /* Schedule frame for playback. */
-    hr = ctx->dlo->ScheduleVideoFrame((class IDeckLinkVideoFrame *) frame,
+    hr = ctx->dlo->ScheduleVideoFrame(frame,
                                       pkt->pts * ctx->bmd_tb_num,
                                       ctx->bmd_tb_num, ctx->bmd_tb_den);
     /* Pass ownership to DeckLink, or release on failure */
@@ -898,7 +904,7 @@ av_cold int ff_decklink_write_header(AVFormatContext *avctx)
         return ret;
 
     /* Get output device. */
-    if (ctx->dl->QueryInterface(IID_IDeckLinkOutput, (void **) &ctx->dlo) != S_OK) {
+    if (ctx->dl->QueryInterface(IID_IDeckLinkOutput_v14_2_1, (void **) &ctx->dlo) != S_OK) {
         av_log(avctx, AV_LOG_ERROR, "Could not open output device from '%s'\n",
                avctx->url);
         ret = AVERROR(EIO);
-- 
2.52.0


From 90d55ff32389631a31c0e45fcc13b9ab633caea3 Mon Sep 17 00:00:00 2001
From: Zhao Zhili <zhilizhao@tencent.com>
Date: Fri, 28 Nov 2025 11:17:05 +0800
Subject: [PATCH 165/304] fftools/ffmpeg: add force key frame by scdet metadata
 support

For example:

./ffmpeg -hwaccel videotoolbox \
	-i input.mp4 -c:a copy \
	-vf scdet=threshold=10 \
	-c:v h264_videotoolbox \
	-force_key_frames scd_metadata \
	-g 1000 -t 30 output.mp4
---
 doc/ffmpeg.texi           | 9 +++++++++
 fftools/ffmpeg.h          | 2 ++
 fftools/ffmpeg_enc.c      | 3 +++
 fftools/ffmpeg_mux_init.c | 2 ++
 4 files changed, 16 insertions(+)

diff --git a/doc/ffmpeg.texi b/doc/ffmpeg.texi
index 3daf2f7ec2..2dae6632bc 100644
--- a/doc/ffmpeg.texi
+++ b/doc/ffmpeg.texi
@@ -1665,6 +1665,7 @@ Force video tag/fourcc. This is an alias for @code{-tag:v}.
 @item -force_key_frames[:@var{stream_specifier}] @var{time}[,@var{time}...] (@emph{output,per-stream})
 @item -force_key_frames[:@var{stream_specifier}] expr:@var{expr} (@emph{output,per-stream})
 @item -force_key_frames[:@var{stream_specifier}] source (@emph{output,per-stream})
+@item -force_key_frames[:@var{stream_specifier}] scd_metadata (@emph{output,per-stream})
 
 @var{force_key_frames} can take arguments of the following form:
 
@@ -1728,6 +1729,14 @@ the current frame being encoded is marked as a key frame in its source.
 In cases where this particular source frame has to be dropped,
 enforce the next available frame to become a key frame instead.
 
+@item scd_metadata
+If the argument is @code{scd_metadata}, ffmpeg will force a key frame if
+the current frame contains a metadata entry with the key @code{lavfi.scd.time}.
+The metadata can be added by filters like @code{scdet} and @code{scdet_vulkan}.
+Avoid inserting filters that duplicate frames after @code{scdet}, as this can
+cause duplicate metadata for multiple frames and repeated insertion of key
+frames.
+
 @end table
 
 Note that forcing too many keyframes is very harmful for the lookahead
diff --git a/fftools/ffmpeg.h b/fftools/ffmpeg.h
index cc2ea1a56e..7720dd9c59 100644
--- a/fftools/ffmpeg.h
+++ b/fftools/ffmpeg.h
@@ -602,6 +602,8 @@ enum {
 #if FFMPEG_OPT_FORCE_KF_SOURCE_NO_DROP
     KF_FORCE_SOURCE_NO_DROP = 2,
 #endif
+    // force keyframe if lavfi.scd.time metadata is set
+    KF_FORCE_SCD_METADATA = 3,
 };
 
 typedef struct KeyframeForceCtx {
diff --git a/fftools/ffmpeg_enc.c b/fftools/ffmpeg_enc.c
index 8f07a10848..0f7d961472 100644
--- a/fftools/ffmpeg_enc.c
+++ b/fftools/ffmpeg_enc.c
@@ -768,6 +768,9 @@ static enum AVPictureType forced_kf_apply(void *logctx, KeyframeForceCtx *kf,
         }
     } else if (kf->type == KF_FORCE_SOURCE && (frame->flags & AV_FRAME_FLAG_KEY)) {
         goto force_keyframe;
+    } else if (kf->type == KF_FORCE_SCD_METADATA &&
+               av_dict_get(frame->metadata, "lavfi.scd.time", NULL, 0)) {
+        goto force_keyframe;
     }
 
     return AV_PICTURE_TYPE_NONE;
diff --git a/fftools/ffmpeg_mux_init.c b/fftools/ffmpeg_mux_init.c
index bcbbee9126..194a87875d 100644
--- a/fftools/ffmpeg_mux_init.c
+++ b/fftools/ffmpeg_mux_init.c
@@ -3279,6 +3279,8 @@ static int process_forced_keyframes(Muxer *mux, const OptionsContext *o)
                    "-force_key_frames is deprecated, use just 'source'\n");
             ost->kf.type = KF_FORCE_SOURCE;
 #endif
+        } else if (!strcmp(forced_keyframes, "scd_metadata")) {
+            ost->kf.type = KF_FORCE_SCD_METADATA;
         } else {
             int ret = parse_forced_key_frames(ost, &ost->kf, mux, forced_keyframes);
             if (ret < 0)
-- 
2.52.0


From 8d47b22486e2abccc5e84f09901492a2ee13c56b Mon Sep 17 00:00:00 2001
From: Zhao Zhili <zhilizhao@tencent.com>
Date: Sun, 30 Nov 2025 16:55:02 +0800
Subject: [PATCH 166/304] tests/fate/ffmpeg: add test for -force_key_frames
 scd_metadata

---
 tests/fate/ffmpeg.mak                        |   9 +-
 tests/ref/fate/force_key_frames-scd_metadata | 397 +++++++++++++++++++
 2 files changed, 404 insertions(+), 2 deletions(-)
 create mode 100644 tests/ref/fate/force_key_frames-scd_metadata

diff --git a/tests/fate/ffmpeg.mak b/tests/fate/ffmpeg.mak
index 57028a7936..c0da2da7f8 100644
--- a/tests/fate/ffmpeg.mak
+++ b/tests/fate/ffmpeg.mak
@@ -40,8 +40,13 @@ fate-force_key_frames-source-dup: CMD = framecrc -i $(TARGET_SAMPLES)/h264/intra
   -c:v mpeg2video -g 400 -sc_threshold 99999 \
   -force_key_frames source -r 39 -force_fps -strict experimental
 
-FATE_SAMPLES_FFMPEG-$(call ENCDEC, MPEG2VIDEO H264, FRAMECRC H264, H264_PARSER CROP_FILTER DRAWBOX_FILTER PIPE_PROTOCOL) += \
-    fate-force_key_frames-source fate-force_key_frames-source-drop fate-force_key_frames-source-dup
+fate-force_key_frames-scd_metadata: CMD = framecrc -i $(TARGET_SAMPLES)/h264/intra_refresh.h264 \
+  -vf scdet=threshold=10,crop=2:2,drawbox=color=black:t=fill \
+  -c:v mpeg2video -g 400 \
+  -force_key_frames scd_metadata
+
+FATE_SAMPLES_FFMPEG-$(call ENCDEC, MPEG2VIDEO H264, FRAMECRC H264, H264_PARSER CROP_FILTER DRAWBOX_FILTER SCDET_FILTER PIPE_PROTOCOL) += \
+    fate-force_key_frames-source fate-force_key_frames-source-drop fate-force_key_frames-source-dup fate-force_key_frames-scd_metadata
 
 # Tests that the video is properly autorotated using the contained
 # display matrix and that the generated file does not contain
diff --git a/tests/ref/fate/force_key_frames-scd_metadata b/tests/ref/fate/force_key_frames-scd_metadata
new file mode 100644
index 0000000000..f5757bf6c8
--- /dev/null
+++ b/tests/ref/fate/force_key_frames-scd_metadata
@@ -0,0 +1,397 @@
+#tb 0: 1/25
+#media_type 0: video
+#codec_id 0: mpeg2video
+#dimensions 0: 2x2
+#sar 0: 0/1
+0,         -1,          0,        1,       57, 0x7db00eb7, S=1, Quality stats,        8, 0x05ec00be
+0,          0,          1,        1,       24, 0x4f1c0660, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,          1,          2,        1,       24, 0x53dc06a0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,          2,          3,        1,       24, 0x589c06e0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,          3,          4,        1,       24, 0x4a700621, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,          4,          5,        1,       24, 0x4f300661, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,          5,          6,        1,       24, 0x53f006a1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,          6,          7,        1,       24, 0x58b006e1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,          7,          8,        1,       24, 0x4a840622, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,          8,          9,        1,       24, 0x4f440662, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,          9,         10,        1,       24, 0x540406a2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         10,         11,        1,       24, 0x58c406e2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         11,         12,        1,       24, 0x4a980623, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         12,         13,        1,       24, 0x4f580663, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         13,         14,        1,       24, 0x541806a3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         14,         15,        1,       24, 0x58d806e3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         15,         16,        1,       24, 0x4aac0624, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         16,         17,        1,       24, 0x4f6c0664, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         17,         18,        1,       24, 0x542c06a4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         18,         19,        1,       24, 0x58ec06e4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         19,         20,        1,       24, 0x4ac00625, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         20,         21,        1,       24, 0x4f800665, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         21,         22,        1,       24, 0x544006a5, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         22,         23,        1,       24, 0x590006e5, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         23,         24,        1,       24, 0x4ad40626, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         24,         25,        1,       24, 0x4f940666, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         25,         26,        1,       24, 0x545406a6, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         26,         27,        1,       24, 0x591406e6, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         27,         28,        1,       24, 0x4ae80627, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         28,         29,        1,       24, 0x4fa80667, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         29,         30,        1,       24, 0x546806a7, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         30,         31,        1,       24, 0x592806e7, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         31,         32,        1,       24, 0x4afc0628, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         32,         33,        1,       24, 0x4fbc0668, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         33,         34,        1,       24, 0x547c06a8, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         34,         35,        1,       24, 0x593c06e8, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         35,         36,        1,       24, 0x4b100629, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         36,         37,        1,       24, 0x4fd00669, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         37,         38,        1,       24, 0x549006a9, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         38,         39,        1,       24, 0x595006e9, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         39,         40,        1,       24, 0x4b24062a, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         40,         41,        1,       24, 0x4fe4066a, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         41,         42,        1,       24, 0x54a406aa, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         42,         43,        1,       24, 0x596406ea, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         43,         44,        1,       24, 0x4b38062b, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         44,         45,        1,       24, 0x4ff8066b, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         45,         46,        1,       24, 0x54b806ab, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         46,         47,        1,       24, 0x597806eb, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         47,         48,        1,       24, 0x4b4c062c, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         48,         49,        1,       24, 0x500c066c, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         49,         50,        1,       24, 0x54cc06ac, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         50,         51,        1,       24, 0x598c06ec, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         51,         52,        1,       24, 0x4b60062d, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         52,         53,        1,       24, 0x5020066d, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         53,         54,        1,       24, 0x54e006ad, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         54,         55,        1,       24, 0x59a006ed, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         55,         56,        1,       24, 0x4b74062e, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         56,         57,        1,       24, 0x5034066e, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         57,         58,        1,       24, 0x54f406ae, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         58,         59,        1,       24, 0x59b406ee, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         59,         60,        1,       24, 0x4b88062f, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         60,         61,        1,       24, 0x5048066f, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         61,         62,        1,       24, 0x550806af, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         62,         63,        1,       24, 0x59c806ef, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         63,         64,        1,       24, 0x4b9c0630, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         64,         65,        1,       24, 0x505c0670, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         65,         66,        1,       24, 0x551c06b0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         66,         67,        1,       24, 0x59dc06f0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         67,         68,        1,       24, 0x4bb00631, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         68,         69,        1,       24, 0x50700671, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         69,         70,        1,       24, 0x553006b1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         70,         71,        1,       24, 0x59f006f1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         71,         72,        1,       24, 0x4bc40632, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         72,         73,        1,       24, 0x50840672, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         73,         74,        1,       24, 0x554406b2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         74,         75,        1,       24, 0x5a0406f2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         75,         76,        1,       24, 0x4bd80633, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         76,         77,        1,       24, 0x50980673, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         77,         78,        1,       24, 0x555806b3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         78,         79,        1,       24, 0x5a1806f3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         79,         80,        1,       24, 0x4bec0634, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         80,         81,        1,       24, 0x50ac0674, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         81,         82,        1,       24, 0x556c06b4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         82,         83,        1,       24, 0x5a2c06f4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         83,         84,        1,       24, 0x4c000635, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         84,         85,        1,       24, 0x50c00675, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         85,         86,        1,       24, 0x558006b5, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         86,         87,        1,       24, 0x5a4006f5, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         87,         88,        1,       24, 0x4c140636, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         88,         89,        1,       24, 0x50d40676, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         89,         90,        1,       24, 0x559406b6, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         90,         91,        1,       24, 0x5a5406f6, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         91,         92,        1,       24, 0x4c280637, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         92,         93,        1,       24, 0x50e80677, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         93,         94,        1,       24, 0x55a806b7, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         94,         95,        1,       24, 0x5a6806f7, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         95,         96,        1,       24, 0x4c3c0638, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         96,         97,        1,       24, 0x50fc0678, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         97,         98,        1,       24, 0x55bc06b8, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         98,         99,        1,       24, 0x5a7c06f8, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,         99,        100,        1,       24, 0x4c500639, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        100,        101,        1,       24, 0x51100679, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        101,        102,        1,       24, 0x55d006b9, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        102,        103,        1,       24, 0x5a9006f9, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        103,        104,        1,       24, 0x4c64063a, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        104,        105,        1,       24, 0x5124067a, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        105,        106,        1,       24, 0x55e406ba, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        106,        107,        1,       24, 0x5aa406fa, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        107,        108,        1,       57, 0x85a40efb, S=1, Quality stats,        8, 0x05ec00be
+0,        108,        109,        1,       24, 0x4f1c0660, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        109,        110,        1,       24, 0x53dc06a0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        110,        111,        1,       24, 0x589c06e0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        111,        112,        1,       24, 0x4a700621, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        112,        113,        1,       24, 0x4f300661, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        113,        114,        1,       24, 0x53f006a1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        114,        115,        1,       24, 0x58b006e1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        115,        116,        1,       24, 0x4a840622, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        116,        117,        1,       24, 0x4f440662, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        117,        118,        1,       24, 0x540406a2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        118,        119,        1,       24, 0x58c406e2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        119,        120,        1,       24, 0x4a980623, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        120,        121,        1,       24, 0x4f580663, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        121,        122,        1,       24, 0x541806a3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        122,        123,        1,       24, 0x58d806e3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        123,        124,        1,       24, 0x4aac0624, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        124,        125,        1,       24, 0x4f6c0664, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        125,        126,        1,       24, 0x542c06a4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        126,        127,        1,       24, 0x58ec06e4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        127,        128,        1,       24, 0x4ac00625, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        128,        129,        1,       24, 0x4f800665, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        129,        130,        1,       24, 0x544006a5, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        130,        131,        1,       24, 0x590006e5, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        131,        132,        1,       24, 0x4ad40626, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        132,        133,        1,       24, 0x4f940666, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        133,        134,        1,       24, 0x545406a6, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        134,        135,        1,       24, 0x591406e6, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        135,        136,        1,       24, 0x4ae80627, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        136,        137,        1,       24, 0x4fa80667, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        137,        138,        1,       24, 0x546806a7, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        138,        139,        1,       24, 0x592806e7, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        139,        140,        1,       24, 0x4afc0628, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        140,        141,        1,       24, 0x4fbc0668, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        141,        142,        1,       24, 0x547c06a8, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        142,        143,        1,       24, 0x593c06e8, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        143,        144,        1,       24, 0x4b100629, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        144,        145,        1,       24, 0x4fd00669, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        145,        146,        1,       24, 0x549006a9, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        146,        147,        1,       24, 0x595006e9, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        147,        148,        1,       24, 0x4b24062a, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        148,        149,        1,       24, 0x4fe4066a, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        149,        150,        1,       24, 0x54a406aa, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        150,        151,        1,       24, 0x596406ea, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        151,        152,        1,       57, 0x8c8d0f38, S=1, Quality stats,        8, 0x05ec00be
+0,        152,        153,        1,       24, 0x4f1c0660, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        153,        154,        1,       24, 0x53dc06a0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        154,        155,        1,       24, 0x589c06e0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        155,        156,        1,       24, 0x4a700621, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        156,        157,        1,       24, 0x4f300661, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        157,        158,        1,       24, 0x53f006a1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        158,        159,        1,       24, 0x58b006e1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        159,        160,        1,       24, 0x4a840622, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        160,        161,        1,       24, 0x4f440662, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        161,        162,        1,       24, 0x540406a2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        162,        163,        1,       24, 0x58c406e2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        163,        164,        1,       24, 0x4a980623, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        164,        165,        1,       24, 0x4f580663, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        165,        166,        1,       24, 0x541806a3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        166,        167,        1,       24, 0x58d806e3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        167,        168,        1,       24, 0x4aac0624, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        168,        169,        1,       24, 0x4f6c0664, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        169,        170,        1,       24, 0x542c06a4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        170,        171,        1,       24, 0x58ec06e4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        171,        172,        1,       24, 0x4ac00625, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        172,        173,        1,       24, 0x4f800665, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        173,        174,        1,       24, 0x544006a5, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        174,        175,        1,       24, 0x590006e5, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        175,        176,        1,       24, 0x4ad40626, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        176,        177,        1,       24, 0x4f940666, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        177,        178,        1,       24, 0x545406a6, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        178,        179,        1,       24, 0x591406e6, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        179,        180,        1,       24, 0x4ae80627, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        180,        181,        1,       24, 0x4fa80667, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        181,        182,        1,       24, 0x546806a7, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        182,        183,        1,       24, 0x592806e7, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        183,        184,        1,       24, 0x4afc0628, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        184,        185,        1,       24, 0x4fbc0668, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        185,        186,        1,       24, 0x547c06a8, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        186,        187,        1,       24, 0x593c06e8, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        187,        188,        1,       24, 0x4b100629, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        188,        189,        1,       24, 0x4fd00669, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        189,        190,        1,       24, 0x549006a9, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        190,        191,        1,       24, 0x595006e9, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        191,        192,        1,       24, 0x4b24062a, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        192,        193,        1,       24, 0x4fe4066a, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        193,        194,        1,       24, 0x54a406aa, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        194,        195,        1,       24, 0x596406ea, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        195,        196,        1,       24, 0x4b38062b, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        196,        197,        1,       24, 0x4ff8066b, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        197,        198,        1,       24, 0x54b806ab, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        198,        199,        1,       24, 0x597806eb, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        199,        200,        1,       24, 0x4b4c062c, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        200,        201,        1,       24, 0x500c066c, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        201,        202,        1,       24, 0x54cc06ac, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        202,        203,        1,       24, 0x598c06ec, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        203,        204,        1,       24, 0x4b60062d, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        204,        205,        1,       24, 0x5020066d, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        205,        206,        1,       24, 0x54e006ad, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        206,        207,        1,       24, 0x59a006ed, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        207,        208,        1,       24, 0x4b74062e, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        208,        209,        1,       24, 0x5034066e, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        209,        210,        1,       24, 0x54f406ae, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        210,        211,        1,       24, 0x59b406ee, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        211,        212,        1,       24, 0x4b88062f, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        212,        213,        1,       24, 0x5048066f, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        213,        214,        1,       24, 0x550806af, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        214,        215,        1,       24, 0x59c806ef, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        215,        216,        1,       24, 0x4b9c0630, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        216,        217,        1,       24, 0x505c0670, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        217,        218,        1,       24, 0x551c06b0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        218,        219,        1,       24, 0x59dc06f0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        219,        220,        1,       24, 0x4bb00631, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        220,        221,        1,       24, 0x50700671, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        221,        222,        1,       24, 0x553006b1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        222,        223,        1,       24, 0x59f006f1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        223,        224,        1,       24, 0x4bc40632, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        224,        225,        1,       24, 0x50840672, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        225,        226,        1,       24, 0x554406b2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        226,        227,        1,       24, 0x5a0406f2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        227,        228,        1,       24, 0x4bd80633, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        228,        229,        1,       24, 0x50980673, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        229,        230,        1,       24, 0x555806b3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        230,        231,        1,       24, 0x5a1806f3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        231,        232,        1,       24, 0x4bec0634, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        232,        233,        1,       24, 0x50ac0674, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        233,        234,        1,       24, 0x556c06b4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        234,        235,        1,       24, 0x5a2c06f4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        235,        236,        1,       24, 0x4c000635, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        236,        237,        1,       57, 0x7b1c0e9e, S=1, Quality stats,        8, 0x05ec00be
+0,        237,        238,        1,       24, 0x4f1c0660, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        238,        239,        1,       24, 0x53dc06a0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        239,        240,        1,       24, 0x589c06e0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        240,        241,        1,       24, 0x4a700621, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        241,        242,        1,       24, 0x4f300661, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        242,        243,        1,       24, 0x53f006a1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        243,        244,        1,       24, 0x58b006e1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        244,        245,        1,       24, 0x4a840622, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        245,        246,        1,       24, 0x4f440662, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        246,        247,        1,       24, 0x540406a2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        247,        248,        1,       24, 0x58c406e2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        248,        249,        1,       24, 0x4a980623, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        249,        250,        1,       24, 0x4f580663, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        250,        251,        1,       24, 0x541806a3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        251,        252,        1,       24, 0x58d806e3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        252,        253,        1,       24, 0x4aac0624, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        253,        254,        1,       24, 0x4f6c0664, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        254,        255,        1,       24, 0x542c06a4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        255,        256,        1,       24, 0x58ec06e4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        256,        257,        1,       24, 0x4ac00625, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        257,        258,        1,       24, 0x4f800665, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        258,        259,        1,       24, 0x544006a5, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        259,        260,        1,       24, 0x590006e5, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        260,        261,        1,       24, 0x4ad40626, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        261,        262,        1,       24, 0x4f940666, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        262,        263,        1,       24, 0x545406a6, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        263,        264,        1,       24, 0x591406e6, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        264,        265,        1,       24, 0x4ae80627, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        265,        266,        1,       24, 0x4fa80667, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        266,        267,        1,       24, 0x546806a7, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        267,        268,        1,       24, 0x592806e7, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        268,        269,        1,       24, 0x4afc0628, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        269,        270,        1,       24, 0x4fbc0668, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        270,        271,        1,       24, 0x547c06a8, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        271,        272,        1,       24, 0x593c06e8, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        272,        273,        1,       24, 0x4b100629, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        273,        274,        1,       24, 0x4fd00669, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        274,        275,        1,       24, 0x549006a9, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        275,        276,        1,       24, 0x595006e9, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        276,        277,        1,       24, 0x4b24062a, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        277,        278,        1,       24, 0x4fe4066a, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        278,        279,        1,       24, 0x54a406aa, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        279,        280,        1,       24, 0x596406ea, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        280,        281,        1,       24, 0x4b38062b, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        281,        282,        1,       24, 0x4ff8066b, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        282,        283,        1,       24, 0x54b806ab, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        283,        284,        1,       24, 0x597806eb, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        284,        285,        1,       24, 0x4b4c062c, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        285,        286,        1,       24, 0x500c066c, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        286,        287,        1,       24, 0x54cc06ac, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        287,        288,        1,       24, 0x598c06ec, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        288,        289,        1,       24, 0x4b60062d, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        289,        290,        1,       24, 0x5020066d, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        290,        291,        1,       24, 0x54e006ad, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        291,        292,        1,       24, 0x59a006ed, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        292,        293,        1,       24, 0x4b74062e, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        293,        294,        1,       24, 0x5034066e, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        294,        295,        1,       24, 0x54f406ae, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        295,        296,        1,       24, 0x59b406ee, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        296,        297,        1,       24, 0x4b88062f, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        297,        298,        1,       24, 0x5048066f, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        298,        299,        1,       24, 0x550806af, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        299,        300,        1,       24, 0x59c806ef, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        300,        301,        1,       24, 0x4b9c0630, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        301,        302,        1,       24, 0x505c0670, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        302,        303,        1,       24, 0x551c06b0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        303,        304,        1,       24, 0x59dc06f0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        304,        305,        1,       24, 0x4bb00631, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        305,        306,        1,       24, 0x50700671, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        306,        307,        1,       24, 0x553006b1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        307,        308,        1,       24, 0x59f006f1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        308,        309,        1,       24, 0x4bc40632, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        309,        310,        1,       24, 0x50840672, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        310,        311,        1,       24, 0x554406b2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        311,        312,        1,       24, 0x5a0406f2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        312,        313,        1,       24, 0x4bd80633, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        313,        314,        1,       24, 0x50980673, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        314,        315,        1,       24, 0x555806b3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        315,        316,        1,       24, 0x5a1806f3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        316,        317,        1,       24, 0x4bec0634, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        317,        318,        1,       24, 0x50ac0674, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        318,        319,        1,       24, 0x556c06b4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        319,        320,        1,       24, 0x5a2c06f4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        320,        321,        1,       24, 0x4c000635, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        321,        322,        1,       24, 0x50c00675, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        322,        323,        1,       24, 0x558006b5, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        323,        324,        1,       24, 0x5a4006f5, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        324,        325,        1,       24, 0x4c140636, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        325,        326,        1,       24, 0x50d40676, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        326,        327,        1,       24, 0x559406b6, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        327,        328,        1,       24, 0x5a5406f6, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        328,        329,        1,       24, 0x4c280637, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        329,        330,        1,       24, 0x50e80677, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        330,        331,        1,       24, 0x55a806b7, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        331,        332,        1,       24, 0x5a6806f7, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        332,        333,        1,       24, 0x4c3c0638, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        333,        334,        1,       24, 0x50fc0678, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        334,        335,        1,       24, 0x55bc06b8, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        335,        336,        1,       24, 0x5a7c06f8, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        336,        337,        1,       57, 0x899c0f1e, S=1, Quality stats,        8, 0x05ec00be
+0,        337,        338,        1,       24, 0x4f1c0660, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        338,        339,        1,       24, 0x53dc06a0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        339,        340,        1,       24, 0x589c06e0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        340,        341,        1,       24, 0x4a700621, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        341,        342,        1,       24, 0x4f300661, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        342,        343,        1,       24, 0x53f006a1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        343,        344,        1,       24, 0x58b006e1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        344,        345,        1,       24, 0x4a840622, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        345,        346,        1,       24, 0x4f440662, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        346,        347,        1,       24, 0x540406a2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        347,        348,        1,       24, 0x58c406e2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        348,        349,        1,       24, 0x4a980623, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        349,        350,        1,       24, 0x4f580663, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        350,        351,        1,       24, 0x541806a3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        351,        352,        1,       24, 0x58d806e3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        352,        353,        1,       24, 0x4aac0624, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        353,        354,        1,       24, 0x4f6c0664, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        354,        355,        1,       24, 0x542c06a4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        355,        356,        1,       24, 0x58ec06e4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        356,        357,        1,       24, 0x4ac00625, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        357,        358,        1,       24, 0x4f800665, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        358,        359,        1,       24, 0x544006a5, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        359,        360,        1,       24, 0x590006e5, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        360,        361,        1,       24, 0x4ad40626, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        361,        362,        1,       24, 0x4f940666, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        362,        363,        1,       24, 0x545406a6, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        363,        364,        1,       24, 0x591406e6, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        364,        365,        1,       24, 0x4ae80627, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        365,        366,        1,       24, 0x4fa80667, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        366,        367,        1,       24, 0x546806a7, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        367,        368,        1,       24, 0x592806e7, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        368,        369,        1,       57, 0x9b930fc1, S=1, Quality stats,        8, 0x05ec00be
+0,        369,        370,        1,       24, 0x4f1c0660, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        370,        371,        1,       24, 0x53dc06a0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        371,        372,        1,       24, 0x589c06e0, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        372,        373,        1,       24, 0x4a700621, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        373,        374,        1,       24, 0x4f300661, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        374,        375,        1,       24, 0x53f006a1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        375,        376,        1,       24, 0x58b006e1, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        376,        377,        1,       24, 0x4a840622, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        377,        378,        1,       24, 0x4f440662, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        378,        379,        1,       24, 0x540406a2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        379,        380,        1,       24, 0x58c406e2, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        380,        381,        1,       24, 0x4a980623, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        381,        382,        1,       24, 0x4f580663, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        382,        383,        1,       24, 0x541806a3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        383,        384,        1,       24, 0x58d806e3, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        384,        385,        1,       24, 0x4aac0624, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        385,        386,        1,       24, 0x4f6c0664, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        386,        387,        1,       24, 0x542c06a4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        387,        388,        1,       24, 0x58ec06e4, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        388,        389,        1,       24, 0x4ac00625, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        389,        390,        1,       24, 0x4f800665, F=0x0, S=1, Quality stats,        8, 0x076800ee
+0,        390,        391,        1,       24, 0x544006a5, F=0x0, S=1, Quality stats,        8, 0x076800ee
-- 
2.52.0


From 6f4fb515952901794f26109581d71aaca6c5b375 Mon Sep 17 00:00:00 2001
From: yuanhecai <yuanhecai@loongson.cn>
Date: Mon, 17 Nov 2025 17:32:53 +0800
Subject: [PATCH 167/304] avcodec: fix checkasm-hpeldsp failed on LA

---
 libavcodec/loongarch/hpeldsp_init_loongarch.c |    2 +-
 libavcodec/loongarch/hpeldsp_lasx.c           | 1192 ++++-------------
 libavcodec/loongarch/hpeldsp_lasx.h           |    4 +-
 3 files changed, 231 insertions(+), 967 deletions(-)

diff --git a/libavcodec/loongarch/hpeldsp_init_loongarch.c b/libavcodec/loongarch/hpeldsp_init_loongarch.c
index 1690be5438..ffdfb69bba 100644
--- a/libavcodec/loongarch/hpeldsp_init_loongarch.c
+++ b/libavcodec/loongarch/hpeldsp_init_loongarch.c
@@ -45,6 +45,6 @@ void ff_hpeldsp_init_loongarch(HpelDSPContext *c, int flags)
         c->put_no_rnd_pixels_tab[1][0] = ff_put_pixels8_8_lasx;
         c->put_no_rnd_pixels_tab[1][1] = ff_put_no_rnd_pixels8_x2_8_lasx;
         c->put_no_rnd_pixels_tab[1][2] = ff_put_no_rnd_pixels8_y2_8_lasx;
-        c->put_no_rnd_pixels_tab[1][3] = ff_put_no_rnd_pixels8_xy2_8_lasx;
+        c->put_no_rnd_pixels_tab[1][3] = ff_put_no_rnd_pixels8_xy2_8_lsx;
     }
 }
diff --git a/libavcodec/loongarch/hpeldsp_lasx.c b/libavcodec/loongarch/hpeldsp_lasx.c
index dd2ae173da..68cab715b9 100644
--- a/libavcodec/loongarch/hpeldsp_lasx.c
+++ b/libavcodec/loongarch/hpeldsp_lasx.c
@@ -192,61 +192,33 @@ void ff_put_pixels8_8_lasx(uint8_t *block, const uint8_t *pixels,
     );
 }
 
+/**
+ * For widths 16, h is always a positive multiple of 4.
+ * The function processes 4 rows per iteration.
+ */
 void ff_put_pixels16_8_lsx(uint8_t *block, const uint8_t *pixels,
                            ptrdiff_t line_size, int h)
 {
-    int h_8 = h >> 3;
-    int res = h & 7;
-    ptrdiff_t stride2, stride3, stride4;
+    int h_4 = h >> 2;
+    ptrdiff_t stride2 = line_size << 1;
+    ptrdiff_t stride3 = stride2 + line_size;
+    ptrdiff_t stride4 = line_size << 2;
+    __m128i src0, src1, src2, src3;
 
-    __asm__ volatile (
-        "beqz     %[h_8],                           2f            \n\t"
-        "slli.d   %[stride2],    %[stride],         1             \n\t"
-        "add.d    %[stride3],    %[stride2],        %[stride]     \n\t"
-        "slli.d   %[stride4],    %[stride2],        1             \n\t"
-        "1:                                                       \n\t"
-        "vld      $vr0,          %[src],            0x0           \n\t"
-        "vldx     $vr1,          %[src],            %[stride]     \n\t"
-        "vldx     $vr2,          %[src],            %[stride2]    \n\t"
-        "vldx     $vr3,          %[src],            %[stride3]    \n\t"
-        "add.d    %[src],        %[src],            %[stride4]    \n\t"
-        "vld      $vr4,          %[src],            0x0           \n\t"
-        "vldx     $vr5,          %[src],            %[stride]     \n\t"
-        "vldx     $vr6,          %[src],            %[stride2]    \n\t"
-        "vldx     $vr7,          %[src],            %[stride3]    \n\t"
-        "add.d    %[src],        %[src],            %[stride4]    \n\t"
+    for (int i = 0; i < h_4; i++) {
+        src0 = __lsx_vld(pixels, 0);
+        src1 = __lsx_vldx(pixels, line_size);
+        src2 = __lsx_vldx(pixels, stride2);
+        src3 = __lsx_vldx(pixels, stride3);
 
-        "addi.d   %[h_8],        %[h_8],            -1            \n\t"
+        __lsx_vst(src0, block, 0);
+        __lsx_vstx(src1, block, line_size);
+        __lsx_vstx(src2, block, stride2);
+        __lsx_vstx(src3, block, stride3);
 
-        "vst      $vr0,          %[dst],            0x0           \n\t"
-        "vstx     $vr1,          %[dst],            %[stride]     \n\t"
-        "vstx     $vr2,          %[dst],            %[stride2]    \n\t"
-        "vstx     $vr3,          %[dst],            %[stride3]    \n\t"
-        "add.d    %[dst],        %[dst],            %[stride4]    \n\t"
-        "vst      $vr4,          %[dst],            0x0           \n\t"
-        "vstx     $vr5,          %[dst],            %[stride]     \n\t"
-        "vstx     $vr6,          %[dst],            %[stride2]    \n\t"
-        "vstx     $vr7,          %[dst],            %[stride3]    \n\t"
-        "add.d    %[dst],        %[dst],            %[stride4]    \n\t"
-        "bnez     %[h_8],        1b                               \n\t"
-
-        "2:                                                       \n\t"
-        "beqz     %[res],        4f                               \n\t"
-        "3:                                                       \n\t"
-        "vld      $vr0,          %[src],            0x0           \n\t"
-        "add.d    %[src],        %[src],            %[stride]     \n\t"
-        "addi.d   %[res],        %[res],            -1            \n\t"
-        "vst      $vr0,          %[dst],            0x0           \n\t"
-        "add.d    %[dst],        %[dst],            %[stride]     \n\t"
-        "bnez     %[res],        3b                               \n\t"
-        "4:                                                       \n\t"
-        : [dst]"+&r"(block),          [src]"+&r"(pixels),
-          [h_8]"+&r"(h_8),            [res]"+&r"(res),
-          [stride2]"=&r"(stride2),    [stride3]"=&r"(stride3),
-          [stride4]"=&r"(stride4)
-        : [stride]"r"(line_size)
-        : "memory"
-    );
+        pixels += stride4;
+        block += stride4;
+    }
 }
 
 void ff_put_pixels8_x2_8_lasx(uint8_t *block, const uint8_t *pixels,
@@ -277,961 +249,253 @@ void ff_put_pixels16_y2_8_lasx(uint8_t *block, const uint8_t *pixels,
                           line_size, line_size, h);
 }
 
-static void common_hz_bil_no_rnd_16x16_lasx(const uint8_t *src,
-                                            int32_t src_stride,
-                                            uint8_t *dst, int32_t dst_stride)
-{
-    __m256i src0, src1, src2, src3, src4, src5, src6, src7;
-    int32_t src_stride_2x = src_stride << 1;
-    int32_t src_stride_4x = src_stride << 2;
-    int32_t src_stride_3x = src_stride_2x + src_stride;
-    uint8_t *_src = (uint8_t*)src;
-
-    src0 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src1, src2);
-    src3 = __lasx_xvldx(_src, src_stride_3x);
-    _src += 1;
-    src4 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src5, src6);
-    src7 = __lasx_xvldx(_src, src_stride_3x);
-    _src += (src_stride_4x -1);
-    DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src5,
-              src4, 0x20, src7, src6, 0x20, src0, src1, src2, src3);
-    src0 = __lasx_xvavg_bu(src0, src2);
-    src1 = __lasx_xvavg_bu(src1, src3);
-    __lasx_xvstelm_d(src0, dst, 0, 0);
-    __lasx_xvstelm_d(src0, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src0, dst, 0, 2);
-    __lasx_xvstelm_d(src0, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src1, dst, 0, 0);
-    __lasx_xvstelm_d(src1, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src1, dst, 0, 2);
-    __lasx_xvstelm_d(src1, dst, 8, 3);
-    dst += dst_stride;
-
-    src0 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src1, src2);
-    src3 = __lasx_xvldx(_src, src_stride_3x);
-    _src += 1;
-    src4 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src5, src6);
-    src7 = __lasx_xvldx(_src, src_stride_3x);
-    _src += (src_stride_4x - 1);
-    DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src5, src4,
-              0x20, src7, src6, 0x20, src0, src1, src2, src3);
-    src0 = __lasx_xvavg_bu(src0, src2);
-    src1 = __lasx_xvavg_bu(src1, src3);
-    __lasx_xvstelm_d(src0, dst, 0, 0);
-    __lasx_xvstelm_d(src0, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src0, dst, 0, 2);
-    __lasx_xvstelm_d(src0, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src1, dst, 0, 0);
-    __lasx_xvstelm_d(src1, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src1, dst, 0, 2);
-    __lasx_xvstelm_d(src1, dst, 8, 3);
-    dst += dst_stride;
-
-    src0 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src1, src2);
-    src3 = __lasx_xvldx(_src, src_stride_3x);
-    _src += 1;
-    src4 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src5, src6);
-    src7 = __lasx_xvldx(_src, src_stride_3x);
-    _src += (src_stride_4x - 1);
-    DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src5, src4,
-              0x20, src7, src6, 0x20, src0, src1, src2, src3);
-    src0 = __lasx_xvavg_bu(src0, src2);
-    src1 = __lasx_xvavg_bu(src1, src3);
-    __lasx_xvstelm_d(src0, dst, 0, 0);
-    __lasx_xvstelm_d(src0, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src0, dst, 0, 2);
-    __lasx_xvstelm_d(src0, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src1, dst, 0, 0);
-    __lasx_xvstelm_d(src1, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src1, dst, 0, 2);
-    __lasx_xvstelm_d(src1, dst, 8, 3);
-    dst += dst_stride;
-
-    src0 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src1, src2);
-    src3 = __lasx_xvldx(_src, src_stride_3x);
-    _src += 1;
-    src4 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src5, src6);
-    src7 = __lasx_xvldx(_src, src_stride_3x);
-    DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src5, src4,
-              0x20, src7, src6, 0x20, src0, src1, src2, src3);
-    src0 = __lasx_xvavg_bu(src0, src2);
-    src1 = __lasx_xvavg_bu(src1, src3);
-    __lasx_xvstelm_d(src0, dst, 0, 0);
-    __lasx_xvstelm_d(src0, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src0, dst, 0, 2);
-    __lasx_xvstelm_d(src0, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src1, dst, 0, 0);
-    __lasx_xvstelm_d(src1, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src1, dst, 0, 2);
-    __lasx_xvstelm_d(src1, dst, 8, 3);
-}
-
-static void common_hz_bil_no_rnd_8x16_lasx(const uint8_t *src,
-                                           int32_t src_stride,
-                                           uint8_t *dst, int32_t dst_stride)
-{
-    __m256i src0, src1, src2, src3, src4, src5, src6, src7;
-    int32_t src_stride_2x = src_stride << 1;
-    int32_t src_stride_4x = src_stride << 2;
-    int32_t src_stride_3x = src_stride_2x + src_stride;
-    uint8_t* _src = (uint8_t*)src;
-
-    src0 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src1, src2);
-    src3 = __lasx_xvldx(_src, src_stride_3x);
-    _src += 1;
-    src4 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src5, src6);
-    src7 = __lasx_xvldx(_src, src_stride_3x);
-    _src += (src_stride_4x - 1);
-    DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src5, src4,
-              0x20, src7, src6, 0x20, src0, src1, src2, src3);
-    src0 = __lasx_xvavg_bu(src0, src2);
-    src1 = __lasx_xvavg_bu(src1, src3);
-    __lasx_xvstelm_d(src0, dst, 0, 0);
-    __lasx_xvstelm_d(src0, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src0, dst, 0, 2);
-    __lasx_xvstelm_d(src0, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src1, dst, 0, 0);
-    __lasx_xvstelm_d(src1, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src1, dst, 0, 2);
-    __lasx_xvstelm_d(src1, dst, 8, 3);
-    dst += dst_stride;
-
-    src0 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src1, src2);
-    src3 = __lasx_xvldx(_src, src_stride_3x);
-    _src += 1;
-    src4 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src5, src6);
-    src7 = __lasx_xvldx(_src, src_stride_3x);
-    DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src5, src4,
-              0x20, src7, src6, 0x20, src0, src1, src2, src3);
-    src0 = __lasx_xvavg_bu(src0, src2);
-    src1 = __lasx_xvavg_bu(src1, src3);
-    __lasx_xvstelm_d(src0, dst, 0, 0);
-    __lasx_xvstelm_d(src0, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src0, dst, 0, 2);
-    __lasx_xvstelm_d(src0, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src1, dst, 0, 0);
-    __lasx_xvstelm_d(src1, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src1, dst, 0, 2);
-    __lasx_xvstelm_d(src1, dst, 8, 3);
-}
-
+/**
+ * For widths 16, h is always a positive multiple of 4.
+ * The function processes 4 rows per iteration.
+ */
 void ff_put_no_rnd_pixels16_x2_8_lasx(uint8_t *block, const uint8_t *pixels,
                                       ptrdiff_t line_size, int h)
 {
-    if (h == 16) {
-        common_hz_bil_no_rnd_16x16_lasx(pixels, line_size, block, line_size);
-    } else if (h == 8) {
-        common_hz_bil_no_rnd_8x16_lasx(pixels, line_size, block, line_size);
+    __m256i src0, src1, src2, src3, src4, src5, src6, src7;
+    int32_t h_4 = h >> 2;
+    int32_t stride2x = line_size << 1;
+    int32_t stride4x = line_size << 2;
+    int32_t stride3x = stride2x + line_size;
+    uint8_t* _src = (uint8_t*)pixels + 1;
+
+    for (int i = 0; i < h_4; i++) {
+        src0 = __lasx_xvld(pixels, 0);
+        DUP2_ARG2(__lasx_xvldx, pixels, line_size, pixels, stride2x, src1, src2);
+        src3 = __lasx_xvldx(pixels, stride3x);
+        src4 = __lasx_xvld(_src, 0);
+        DUP2_ARG2(__lasx_xvldx, _src, line_size, _src, stride2x, src5, src6);
+        src7 = __lasx_xvldx(_src, stride3x);
+        DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src5, src4,
+                  0x20, src7, src6, 0x20, src0, src1, src2, src3);
+        src0 = __lasx_xvavg_bu(src0, src2);
+        src1 = __lasx_xvavg_bu(src1, src3);
+        __lasx_xvstelm_d(src0, block, 0, 0);
+        __lasx_xvstelm_d(src0, block, 8, 1);
+        block += line_size;
+        __lasx_xvstelm_d(src0, block, 0, 2);
+        __lasx_xvstelm_d(src0, block, 8, 3);
+        block += line_size;
+        __lasx_xvstelm_d(src1, block, 0, 0);
+        __lasx_xvstelm_d(src1, block, 8, 1);
+        block += line_size;
+        __lasx_xvstelm_d(src1, block, 0, 2);
+        __lasx_xvstelm_d(src1, block, 8, 3);
+        block += line_size;
+
+        _src += stride4x;
+        pixels += stride4x;
     }
 }
 
-static void common_vt_bil_no_rnd_16x16_lasx(const uint8_t *src,
-                                            int32_t src_stride,
-                                            uint8_t *dst, int32_t dst_stride)
-{
-    __m256i src0, src1, src2, src3, src4, src5, src6, src7, src8;
-    __m256i src9, src10, src11, src12, src13, src14, src15, src16;
-    int32_t src_stride_2x = src_stride << 1;
-    int32_t src_stride_4x = src_stride << 2;
-    int32_t src_stride_3x = src_stride_2x + src_stride;
-    uint8_t* _src = (uint8_t*)src;
-
-    src0 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src1, src2);
-    src3 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src4 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src5, src6);
-    src7 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src8 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src9, src10);
-    src11 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src12 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x,
-              src13, src14);
-    src15 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src16 = __lasx_xvld(_src, 0);
-
-    DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src2, src1, 0x20, src3, src2,
-              0x20, src4, src3, 0x20, src0, src1, src2, src3);
-    DUP4_ARG3(__lasx_xvpermi_q, src5, src4, 0x20, src6, src5, 0x20, src7, src6,
-              0x20, src8, src7, 0x20, src4, src5, src6, src7);
-    DUP4_ARG3(__lasx_xvpermi_q, src9, src8, 0x20, src10, src9, 0x20, src11,
-              src10, 0x20, src12, src11, 0x20, src8, src9, src10, src11);
-    DUP4_ARG3(__lasx_xvpermi_q, src13, src12, 0x20, src14, src13, 0x20, src15,
-              src14, 0x20, src16, src15, 0x20, src12, src13, src14, src15);
-    DUP4_ARG2(__lasx_xvavg_bu, src0, src1, src2, src3, src4, src5, src6, src7,
-              src0, src2, src4, src6);
-    DUP4_ARG2(__lasx_xvavg_bu, src8, src9, src10, src11, src12, src13, src14,
-              src15, src8, src10, src12, src14);
-
-    __lasx_xvstelm_d(src0, dst, 0, 0);
-    __lasx_xvstelm_d(src0, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src0, dst, 0, 2);
-    __lasx_xvstelm_d(src0, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src2, dst, 0, 0);
-    __lasx_xvstelm_d(src2, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src2, dst, 0, 2);
-    __lasx_xvstelm_d(src2, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src4, dst, 0, 0);
-    __lasx_xvstelm_d(src4, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src4, dst, 0, 2);
-    __lasx_xvstelm_d(src4, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src6, dst, 0, 0);
-    __lasx_xvstelm_d(src6, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src6, dst, 0, 2);
-    __lasx_xvstelm_d(src6, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src8, dst, 0, 0);
-    __lasx_xvstelm_d(src8, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src8, dst, 0, 2);
-    __lasx_xvstelm_d(src8, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src10, dst, 0, 0);
-    __lasx_xvstelm_d(src10, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src10, dst, 0, 2);
-    __lasx_xvstelm_d(src10, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src12, dst, 0, 0);
-    __lasx_xvstelm_d(src12, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src12, dst, 0, 2);
-    __lasx_xvstelm_d(src12, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src14, dst, 0, 0);
-    __lasx_xvstelm_d(src14, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src14, dst, 0, 2);
-    __lasx_xvstelm_d(src14, dst, 8, 3);
-}
-
-static void common_vt_bil_no_rnd_8x16_lasx(const uint8_t *src,
-                                           int32_t src_stride,
-                                           uint8_t *dst, int32_t dst_stride)
-{
-    __m256i src0, src1, src2, src3, src4, src5, src6, src7, src8;
-    int32_t src_stride_2x = src_stride << 1;
-    int32_t src_stride_4x = src_stride << 2;
-    int32_t src_stride_3x = src_stride_2x + src_stride;
-    uint8_t* _src = (uint8_t*)src;
-
-    src0 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src1, src2);
-    src3 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src4 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src5, src6);
-    src7 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src8 = __lasx_xvld(_src, 0);
-
-    DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src2, src1, 0x20, src3, src2,
-              0x20, src4, src3, 0x20, src0, src1, src2, src3);
-    DUP4_ARG3(__lasx_xvpermi_q, src5, src4, 0x20, src6, src5, 0x20, src7, src6,
-              0x20, src8, src7, 0x20, src4, src5, src6, src7);
-    DUP4_ARG2(__lasx_xvavg_bu, src0, src1, src2, src3, src4, src5, src6, src7,
-              src0, src2, src4, src6);
-
-    __lasx_xvstelm_d(src0, dst, 0, 0);
-    __lasx_xvstelm_d(src0, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src0, dst, 0, 2);
-    __lasx_xvstelm_d(src0, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src2, dst, 0, 0);
-    __lasx_xvstelm_d(src2, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src2, dst, 0, 2);
-    __lasx_xvstelm_d(src2, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src4, dst, 0, 0);
-    __lasx_xvstelm_d(src4, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src4, dst, 0, 2);
-    __lasx_xvstelm_d(src4, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src6, dst, 0, 0);
-    __lasx_xvstelm_d(src6, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(src6, dst, 0, 2);
-    __lasx_xvstelm_d(src6, dst, 8, 3);
-}
-
+/**
+ * For widths 16, h is always a positive multiple of 4.
+ * The function processes 4 rows per iteration.
+ */
 void ff_put_no_rnd_pixels16_y2_8_lasx(uint8_t *block, const uint8_t *pixels,
                                       ptrdiff_t line_size, int h)
 {
-    if (h == 16) {
-        common_vt_bil_no_rnd_16x16_lasx(pixels, line_size, block, line_size);
-    } else if (h == 8) {
-        common_vt_bil_no_rnd_8x16_lasx(pixels, line_size, block, line_size);
+    __m256i src0, src1, src2, src3, src4;
+    int32_t stride2x = line_size << 1;
+    int32_t stride4x = line_size << 2;
+    int32_t stride3x = stride2x + line_size;
+    uint8_t* _src = (uint8_t*)pixels;
+    int32_t h_4 = h >> 2;
+
+    for (int i = 0; i < h_4; i++) {
+        src0 = __lasx_xvld(_src, 0);
+        DUP2_ARG2(__lasx_xvldx, _src, line_size, _src, stride2x, src1, src2);
+        src3 = __lasx_xvldx(_src, stride3x);
+        _src += stride4x;
+        src4 = __lasx_xvld(_src, 0);
+
+        DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src2, src1, 0x20, src3, src2,
+                  0x20, src4, src3, 0x20, src0, src1, src2, src3);
+        DUP2_ARG2(__lasx_xvavg_bu, src0, src1, src2, src3, src0, src2);
+
+        __lasx_xvstelm_d(src0, block, 0, 0);
+        __lasx_xvstelm_d(src0, block, 8, 1);
+        block += line_size;
+        __lasx_xvstelm_d(src0, block, 0, 2);
+        __lasx_xvstelm_d(src0, block, 8, 3);
+        block += line_size;
+        __lasx_xvstelm_d(src2, block, 0, 0);
+        __lasx_xvstelm_d(src2, block, 8, 1);
+        block += line_size;
+        __lasx_xvstelm_d(src2, block, 0, 2);
+        __lasx_xvstelm_d(src2, block, 8, 3);
+        block += line_size;
     }
 }
 
-static void common_hv_bil_no_rnd_16x16_lasx(const uint8_t *src,
-                                            int32_t src_stride,
-                                            uint8_t *dst, int32_t dst_stride)
-{
-    __m256i src0, src1, src2, src3, src4, src5, src6, src7, src8, src9;
-    __m256i src10, src11, src12, src13, src14, src15, src16, src17;
-    __m256i sum0, sum1, sum2, sum3, sum4, sum5, sum6, sum7;
-    int32_t src_stride_2x = src_stride << 1;
-    int32_t src_stride_4x = src_stride << 2;
-    int32_t src_stride_3x = src_stride_2x + src_stride;
-    uint8_t* _src = (uint8_t*)src;
-
-    src0 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src1, src2);
-    src3 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src4 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src5, src6);
-    src7 = __lasx_xvldx(_src, src_stride_3x);
-    _src += (1 - src_stride_4x);
-    src9 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x,
-              src10, src11);
-    src12 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src13 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x,
-              src14, src15);
-    src16 = __lasx_xvldx(_src, src_stride_3x);
-    _src += (src_stride_4x - 1);
-    DUP2_ARG2(__lasx_xvld, _src, 0, _src, 1, src8, src17);
-
-    DUP4_ARG3(__lasx_xvpermi_q, src0, src4, 0x02, src1, src5, 0x02, src2,
-              src6, 0x02, src3, src7, 0x02, src0, src1, src2, src3);
-    DUP4_ARG3(__lasx_xvpermi_q, src4, src8, 0x02, src9, src13, 0x02, src10,
-              src14, 0x02, src11, src15, 0x02, src4, src5, src6, src7);
-    DUP2_ARG3(__lasx_xvpermi_q, src12, src16, 0x02, src13, src17, 0x02,
-              src8, src9);
-    DUP4_ARG2(__lasx_xvilvl_h, src5, src0, src6, src1, src7, src2, src8, src3,
-              sum0, sum2, sum4, sum6);
-    DUP4_ARG2(__lasx_xvilvh_h, src5, src0, src6, src1, src7, src2, src8, src3,
-              sum1, sum3, sum5, sum7);
-    src8 = __lasx_xvilvl_h(src9, src4);
-    src9 = __lasx_xvilvh_h(src9, src4);
-
-    DUP4_ARG2(__lasx_xvhaddw_hu_bu, sum0, sum0, sum1, sum1, sum2, sum2,
-              sum3, sum3, src0, src1, src2, src3);
-    DUP4_ARG2(__lasx_xvhaddw_hu_bu, sum4, sum4, sum5, sum5, sum6, sum6,
-              sum7, sum7, src4, src5, src6, src7);
-    DUP2_ARG2(__lasx_xvhaddw_hu_bu, src8, src8, src9, src9, src8, src9);
-
-    DUP4_ARG2(__lasx_xvadd_h, src0, src2, src1, src3, src2, src4, src3, src5,
-              sum0, sum1, sum2, sum3);
-    DUP4_ARG2(__lasx_xvadd_h, src4, src6, src5, src7, src6, src8, src7, src9,
-              sum4, sum5, sum6, sum7);
-    DUP4_ARG2(__lasx_xvaddi_hu, sum0, 1, sum1, 1, sum2, 1, sum3, 1,
-              sum0, sum1, sum2, sum3);
-    DUP4_ARG2(__lasx_xvaddi_hu, sum4, 1, sum5, 1, sum6, 1, sum7, 1,
-              sum4, sum5, sum6, sum7);
-    DUP4_ARG3(__lasx_xvsrani_b_h, sum1, sum0, 2, sum3, sum2, 2, sum5, sum4, 2,
-              sum7, sum6, 2, sum0, sum1, sum2, sum3);
-    __lasx_xvstelm_d(sum0, dst, 0, 0);
-    __lasx_xvstelm_d(sum0, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum1, dst, 0, 0);
-    __lasx_xvstelm_d(sum1, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum2, dst, 0, 0);
-    __lasx_xvstelm_d(sum2, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum3, dst, 0, 0);
-    __lasx_xvstelm_d(sum3, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum0, dst, 0, 2);
-    __lasx_xvstelm_d(sum0, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum1, dst, 0, 2);
-    __lasx_xvstelm_d(sum1, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum2, dst, 0, 2);
-    __lasx_xvstelm_d(sum2, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum3, dst, 0, 2);
-    __lasx_xvstelm_d(sum3, dst, 8, 3);
-    dst += dst_stride;
-
-    src0 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src1, src2);
-    src3 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src4 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src5, src6);
-    src7 = __lasx_xvldx(_src, src_stride_3x);
-    _src += (1 - src_stride_4x);
-    src9 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x,
-              src10, src11);
-    src12 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src13 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x,
-              src14, src15);
-    src16 = __lasx_xvldx(_src, src_stride_3x);
-    _src += (src_stride_4x - 1);
-    DUP2_ARG2(__lasx_xvld, _src, 0, _src, 1, src8, src17);
-
-    DUP4_ARG3(__lasx_xvpermi_q, src0, src4, 0x02, src1, src5, 0x02, src2, src6, 0x02,
-              src3, src7, 0x02, src0, src1, src2, src3);
-    DUP4_ARG3(__lasx_xvpermi_q, src4, src8, 0x02, src9, src13, 0x02, src10, src14, 0x02,
-              src11, src15, 0x02, src4, src5, src6, src7);
-    DUP2_ARG3(__lasx_xvpermi_q, src12, src16, 0x02, src13, src17, 0x02, src8, src9);
-
-    DUP4_ARG2(__lasx_xvilvl_h, src5, src0, src6, src1, src7, src2, src8, src3,
-              sum0, sum2, sum4, sum6);
-    DUP4_ARG2(__lasx_xvilvh_h, src5, src0, src6, src1, src7, src2, src8, src3,
-              sum1, sum3, sum5, sum7);
-    src8 = __lasx_xvilvl_h(src9, src4);
-    src9 = __lasx_xvilvh_h(src9, src4);
-
-    DUP4_ARG2(__lasx_xvhaddw_hu_bu, sum0, sum0, sum1, sum1, sum2, sum2,
-              sum3, sum3, src0, src1, src2, src3);
-    DUP4_ARG2(__lasx_xvhaddw_hu_bu, sum4, sum4, sum5, sum5, sum6, sum6,
-              sum7, sum7, src4, src5, src6, src7);
-    DUP2_ARG2(__lasx_xvhaddw_hu_bu, src8, src8, src9, src9, src8, src9);
-
-    DUP4_ARG2(__lasx_xvadd_h, src0, src2, src1, src3, src2, src4, src3, src5,
-              sum0, sum1, sum2, sum3);
-    DUP4_ARG2(__lasx_xvadd_h, src4, src6, src5, src7, src6, src8, src7, src9,
-              sum4, sum5, sum6, sum7);
-    DUP4_ARG2(__lasx_xvaddi_hu, sum0, 1, sum1, 1, sum2, 1, sum3, 1,
-              sum0, sum1, sum2, sum3);
-    DUP4_ARG2(__lasx_xvaddi_hu, sum4, 1, sum5, 1, sum6, 1, sum7, 1,
-              sum4, sum5, sum6, sum7);
-    DUP4_ARG3(__lasx_xvsrani_b_h, sum1, sum0, 2, sum3, sum2, 2, sum5, sum4, 2,
-              sum7, sum6, 2, sum0, sum1, sum2, sum3);
-    __lasx_xvstelm_d(sum0, dst, 0, 0);
-    __lasx_xvstelm_d(sum0, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum1, dst, 0, 0);
-    __lasx_xvstelm_d(sum1, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum2, dst, 0, 0);
-    __lasx_xvstelm_d(sum2, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum3, dst, 0, 0);
-    __lasx_xvstelm_d(sum3, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum0, dst, 0, 2);
-    __lasx_xvstelm_d(sum0, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum1, dst, 0, 2);
-    __lasx_xvstelm_d(sum1, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum2, dst, 0, 2);
-    __lasx_xvstelm_d(sum2, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum3, dst, 0, 2);
-    __lasx_xvstelm_d(sum3, dst, 8, 3);
-}
-
-static void common_hv_bil_no_rnd_8x16_lasx(const uint8_t *src,
-                                           int32_t src_stride,
-                                           uint8_t *dst, int32_t dst_stride)
-{
-    __m256i src0, src1, src2, src3, src4, src5, src6, src7, src8, src9;
-    __m256i src10, src11, src12, src13, src14, src15, src16, src17;
-    __m256i sum0, sum1, sum2, sum3, sum4, sum5, sum6, sum7;
-    int32_t src_stride_2x = src_stride << 1;
-    int32_t src_stride_4x = src_stride << 2;
-    int32_t src_stride_3x = src_stride_2x + src_stride;
-    uint8_t* _src = (uint8_t*)src;
-
-    src0 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src1, src2);
-    src3 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src4 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src5, src6);
-    src7 = __lasx_xvldx(_src, src_stride_3x);
-    _src += (1 - src_stride_4x);
-    src9 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x,
-              src10, src11);
-    src12 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src13 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x,
-              src14, src15);
-    src16 = __lasx_xvldx(_src, src_stride_3x);
-    _src += (src_stride_4x - 1);
-    DUP2_ARG2(__lasx_xvld, _src, 0, _src, 1, src8, src17);
-
-    DUP4_ARG3(__lasx_xvpermi_q, src0, src4, 0x02, src1, src5, 0x02, src2,
-              src6, 0x02, src3, src7, 0x02, src0, src1, src2, src3);
-    DUP4_ARG3(__lasx_xvpermi_q, src4, src8, 0x02, src9, src13, 0x02, src10,
-              src14, 0x02, src11, src15, 0x02, src4, src5, src6, src7);
-    DUP2_ARG3(__lasx_xvpermi_q, src12, src16, 0x02, src13, src17, 0x02, src8, src9);
-
-    DUP4_ARG2(__lasx_xvilvl_h, src5, src0, src6, src1, src7, src2, src8, src3,
-              sum0, sum2, sum4, sum6);
-    DUP4_ARG2(__lasx_xvilvh_h, src5, src0, src6, src1, src7, src2, src8, src3,
-              sum1, sum3, sum5, sum7);
-    src8 = __lasx_xvilvl_h(src9, src4);
-    src9 = __lasx_xvilvh_h(src9, src4);
-
-    DUP4_ARG2(__lasx_xvhaddw_hu_bu, sum0, sum0, sum1, sum1, sum2, sum2,
-              sum3, sum3, src0, src1, src2, src3);
-    DUP4_ARG2(__lasx_xvhaddw_hu_bu, sum4, sum4, sum5, sum5, sum6, sum6,
-              sum7, sum7, src4, src5, src6, src7);
-    DUP2_ARG2(__lasx_xvhaddw_hu_bu, src8, src8, src9, src9, src8, src9);
-
-    DUP4_ARG2(__lasx_xvadd_h, src0, src2, src1, src3, src2, src4, src3, src5,
-              sum0, sum1, sum2, sum3);
-    DUP4_ARG2(__lasx_xvadd_h, src4, src6, src5, src7, src6, src8, src7, src9,
-              sum4, sum5, sum6, sum7);
-    DUP4_ARG2(__lasx_xvaddi_hu, sum0, 1, sum1, 1, sum2, 1, sum3, 1,
-              sum0, sum1, sum2, sum3);
-    DUP4_ARG2(__lasx_xvaddi_hu, sum4, 1, sum5, 1, sum6, 1, sum7, 1,
-              sum4, sum5, sum6, sum7);
-    DUP4_ARG3(__lasx_xvsrani_b_h, sum1, sum0, 2, sum3, sum2, 2, sum5, sum4, 2,
-              sum7, sum6, 2, sum0, sum1, sum2, sum3);
-    __lasx_xvstelm_d(sum0, dst, 0, 0);
-    __lasx_xvstelm_d(sum0, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum1, dst, 0, 0);
-    __lasx_xvstelm_d(sum1, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum2, dst, 0, 0);
-    __lasx_xvstelm_d(sum2, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum3, dst, 0, 0);
-    __lasx_xvstelm_d(sum3, dst, 8, 1);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum0, dst, 0, 2);
-    __lasx_xvstelm_d(sum0, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum1, dst, 0, 2);
-    __lasx_xvstelm_d(sum1, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum2, dst, 0, 2);
-    __lasx_xvstelm_d(sum2, dst, 8, 3);
-    dst += dst_stride;
-    __lasx_xvstelm_d(sum3, dst, 0, 2);
-    __lasx_xvstelm_d(sum3, dst, 8, 3);
-}
-
 void ff_put_no_rnd_pixels16_xy2_8_lasx(uint8_t *block,
                                        const uint8_t *pixels,
                                        ptrdiff_t line_size, int h)
 {
-    if (h == 16) {
-        common_hv_bil_no_rnd_16x16_lasx(pixels, line_size, block, line_size);
-    } else if (h == 8) {
-        common_hv_bil_no_rnd_8x16_lasx(pixels, line_size, block, line_size);
+    __m256i src0, src1, src2, src3;
+    __m256i sum0, sum1, sum2;
+    src0 = __lasx_xvld(pixels, 0);
+    src1 = __lasx_xvld(pixels, 1);
+    src2 = __lasx_vext2xv_hu_bu(src0);
+    src3 = __lasx_vext2xv_hu_bu(src1);
+    sum0 = __lasx_xvadd_h(src2, src3);
+    sum0 = __lasx_xvaddi_hu(sum0, 1);
+
+    for (int i= 0; i < h; i++) {
+        pixels += line_size;
+        src0 = __lasx_xvld(pixels, 0);
+        src1 = __lasx_xvld(pixels, 1);
+
+        src2 = __lasx_vext2xv_hu_bu(src0);
+        src3 = __lasx_vext2xv_hu_bu(src1);
+        sum1 = __lasx_xvadd_h(src2, src3);
+        sum2 = __lasx_xvadd_h(sum0, sum1);
+        sum2 = __lasx_xvsrani_b_h(sum2, sum2, 2);
+
+        sum0 = __lasx_xvaddi_hu(sum1, 1);
+        __lasx_xvstelm_d(sum2, block, 0, 0);
+        __lasx_xvstelm_d(sum2, block, 8, 3);
+
+        block += line_size;
     }
 }
 
-static void common_hz_bil_no_rnd_8x8_lasx(const uint8_t *src,
-                                          int32_t src_stride,
-                                          uint8_t *dst, int32_t dst_stride)
-{
-    __m256i src0, src1, src2, src3, src4, src5, src6, src7;
-    __m256i src8, src9, src10, src11, src12, src13, src14, src15;
-    int32_t src_stride_2x = src_stride << 1;
-    int32_t src_stride_4x = src_stride << 2;
-    int32_t dst_stride_2x = dst_stride << 1;
-    int32_t dst_stride_4x = dst_stride << 2;
-    int32_t dst_stride_3x = dst_stride_2x + dst_stride;
-    int32_t src_stride_3x = src_stride_2x + src_stride;
-    uint8_t* _src = (uint8_t*)src;
-
-    src0 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src1, src2);
-    src3 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src4 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src5, src6);
-    src7 = __lasx_xvldx(_src, src_stride_3x);
-    _src += (1 - src_stride_4x);
-    src8 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src9, src10);
-    src11 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src12 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x,
-              src13, src14);
-    src15 = __lasx_xvldx(_src, src_stride_3x);
-
-    DUP4_ARG2(__lasx_xvpickev_d, src1, src0, src3, src2, src5, src4, src7,
-              src6, src0, src1, src2, src3);
-    DUP4_ARG2(__lasx_xvpickev_d, src9, src8, src11, src10, src13, src12, src15,
-              src14, src4, src5, src6, src7);
-    DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src5, src4,
-              0x20, src7, src6, 0x20, src0, src1, src2, src3);
-    src0 = __lasx_xvavg_bu(src0, src2);
-    src1 = __lasx_xvavg_bu(src1, src3);
-    __lasx_xvstelm_d(src0, dst, 0, 0);
-    __lasx_xvstelm_d(src0, dst + dst_stride, 0, 1);
-    __lasx_xvstelm_d(src0, dst + dst_stride_2x, 0, 2);
-    __lasx_xvstelm_d(src0, dst + dst_stride_3x, 0, 3);
-    dst += dst_stride_4x;
-    __lasx_xvstelm_d(src1, dst, 0, 0);
-    __lasx_xvstelm_d(src1, dst + dst_stride, 0, 1);
-    __lasx_xvstelm_d(src1, dst + dst_stride_2x, 0, 2);
-    __lasx_xvstelm_d(src1, dst + dst_stride_3x, 0, 3);
-}
-
-static void common_hz_bil_no_rnd_4x8_lasx(const uint8_t *src,
-                                          int32_t src_stride,
-                                          uint8_t *dst, int32_t dst_stride)
-{
-    __m256i src0, src1, src2, src3, src4, src5, src6, src7;
-    int32_t src_stride_2x = src_stride << 1;
-    int32_t src_stride_3x = src_stride_2x + src_stride;
-    int32_t dst_stride_2x = dst_stride << 1;
-    int32_t dst_stride_3x = dst_stride_2x + dst_stride;
-    uint8_t *_src = (uint8_t*)src;
-
-    src0 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src1, src2);
-    src3 = __lasx_xvldx(_src, src_stride_3x);
-    _src += 1;
-    src4 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src5, src6);
-    src7 = __lasx_xvldx(_src, src_stride_3x);
-    DUP4_ARG2(__lasx_xvpickev_d, src1, src0, src3, src2, src5, src4, src7, src6,
-              src0, src1, src2, src3);
-    DUP2_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src0, src1);
-    src0 = __lasx_xvavg_bu(src0, src1);
-    __lasx_xvstelm_d(src0, dst, 0, 0);
-    __lasx_xvstelm_d(src0, dst + dst_stride, 0, 1);
-    __lasx_xvstelm_d(src0, dst + dst_stride_2x, 0, 2);
-    __lasx_xvstelm_d(src0, dst + dst_stride_3x, 0, 3);
-}
-
+/**
+ * For widths 8, h is always a positive multiple of 4.
+ * The function processes 4 rows per iteration.
+ */
 void ff_put_no_rnd_pixels8_x2_8_lasx(uint8_t *block, const uint8_t *pixels,
                                      ptrdiff_t line_size, int h)
 {
-    if (h == 8) {
-        common_hz_bil_no_rnd_8x8_lasx(pixels, line_size, block, line_size);
-    } else if (h == 4) {
-        common_hz_bil_no_rnd_4x8_lasx(pixels, line_size, block, line_size);
+    __m256i src0, src1, src2, src3, src4, src5, src6, src7;
+    int32_t stride2x = line_size << 1;
+    int32_t stride3x = stride2x + line_size;
+    int32_t stride4x = line_size << 2;
+    uint8_t *_src = (uint8_t*)pixels + 1;
+    int32_t h_4 = h >> 2;
+
+    for (int i = 0; i < h_4; i++) {
+        src0 = __lasx_xvld(pixels, 0);
+        DUP2_ARG2(__lasx_xvldx, pixels, line_size, pixels, stride2x, src1, src2);
+        src3 = __lasx_xvldx(pixels, stride3x);
+        src4 = __lasx_xvld(_src, 0);
+        DUP2_ARG2(__lasx_xvldx, _src, line_size, _src, stride2x, src5, src6);
+        src7 = __lasx_xvldx(_src, stride3x);
+        DUP4_ARG2(__lasx_xvpickev_d, src1, src0, src3, src2, src5, src4, src7, src6,
+                  src0, src1, src2, src3);
+        DUP2_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src0, src1);
+        src0 = __lasx_xvavg_bu(src0, src1);
+        __lasx_xvstelm_d(src0, block, 0, 0);
+        block += line_size;
+        __lasx_xvstelm_d(src0, block, 0, 1);
+        block += line_size;
+        __lasx_xvstelm_d(src0, block, 0, 2);
+        block += line_size;
+        __lasx_xvstelm_d(src0, block, 0, 3);
+        block += line_size;
+
+        pixels += stride4x;
+        _src += stride4x;
     }
 }
 
-static void common_vt_bil_no_rnd_8x8_lasx(const uint8_t *src, int32_t src_stride,
-                                          uint8_t *dst, int32_t dst_stride)
-{
-    __m256i src0, src1, src2, src3, src4, src5, src6, src7, src8;
-    int32_t src_stride_2x = src_stride << 1;
-    int32_t src_stride_4x = src_stride << 2;
-    int32_t dst_stride_2x = dst_stride << 1;
-    int32_t dst_stride_4x = dst_stride << 2;
-    int32_t dst_stride_3x = dst_stride_2x + dst_stride;
-    int32_t src_stride_3x = src_stride_2x + src_stride;
-    uint8_t* _src = (uint8_t*)src;
-
-    src0 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src1, src2);
-    src3 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src4 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src5, src6);
-    src7 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src8 = __lasx_xvld(_src, 0);
-
-    DUP4_ARG2(__lasx_xvpickev_d, src1, src0, src2, src1, src3, src2, src4, src3,
-              src0, src1, src2, src3);
-    DUP4_ARG2(__lasx_xvpickev_d, src5, src4, src6, src5, src7, src6, src8, src7,
-              src4, src5, src6, src7);
-    DUP4_ARG3(__lasx_xvpermi_q, src2, src0, 0x20, src3, src1, 0x20, src6, src4,
-              0x20, src7, src5, 0x20, src0, src1, src2, src3);
-    src0 = __lasx_xvavg_bu(src0, src1);
-    src1 = __lasx_xvavg_bu(src2, src3);
-    __lasx_xvstelm_d(src0, dst, 0, 0);
-    __lasx_xvstelm_d(src0, dst + dst_stride, 0, 1);
-    __lasx_xvstelm_d(src0, dst + dst_stride_2x, 0, 2);
-    __lasx_xvstelm_d(src0, dst + dst_stride_3x, 0, 3);
-    dst += dst_stride_4x;
-    __lasx_xvstelm_d(src1, dst, 0, 0);
-    __lasx_xvstelm_d(src1, dst + dst_stride, 0, 1);
-    __lasx_xvstelm_d(src1, dst + dst_stride_2x, 0, 2);
-    __lasx_xvstelm_d(src1, dst + dst_stride_3x, 0, 3);
-}
-
-static void common_vt_bil_no_rnd_4x8_lasx(const uint8_t *src, int32_t src_stride,
-                                          uint8_t *dst, int32_t dst_stride)
-{
-    __m256i src0, src1, src2, src3, src4;
-    int32_t src_stride_2x = src_stride << 1;
-    int32_t src_stride_4x = src_stride << 2;
-    int32_t dst_stride_2x = dst_stride << 1;
-    int32_t dst_stride_3x = dst_stride_2x + dst_stride;
-    int32_t src_stride_3x = src_stride_2x + src_stride;
-    uint8_t* _src = (uint8_t*)src;
-
-    src0 = __lasx_xvld(_src, 0);
-    DUP4_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, _src,
-              src_stride_3x, _src, src_stride_4x, src1, src2, src3, src4);
-    DUP4_ARG2(__lasx_xvpickev_d, src1, src0, src2, src1, src3, src2, src4, src3,
-              src0, src1, src2, src3);
-    DUP2_ARG3(__lasx_xvpermi_q, src2, src0, 0x20, src3, src1, 0x20, src0, src1);
-    src0 = __lasx_xvavg_bu(src0, src1);
-    __lasx_xvstelm_d(src0, dst, 0, 0);
-    __lasx_xvstelm_d(src0, dst + dst_stride, 0, 1);
-    __lasx_xvstelm_d(src0, dst + dst_stride_2x, 0, 2);
-    __lasx_xvstelm_d(src0, dst + dst_stride_3x, 0, 3);
-}
-
+/**
+ * For widths 8, h is always a positive multiple of 4.
+ * The function processes 4 rows per iteration.
+ */
 void ff_put_no_rnd_pixels8_y2_8_lasx(uint8_t *block, const uint8_t *pixels,
                                      ptrdiff_t line_size, int h)
 {
-    if (h == 8) {
-        common_vt_bil_no_rnd_8x8_lasx(pixels, line_size, block, line_size);
-    } else if (h == 4) {
-        common_vt_bil_no_rnd_4x8_lasx(pixels, line_size, block, line_size);
-    }
-}
+    __m256i src0, src1, src2, src3, src4;
+    int32_t stride2x = line_size << 1;
+    int32_t stride4x = line_size << 2;
+    int32_t stride3x = stride2x + line_size;
+    uint8_t* _src = (uint8_t*)pixels;
+    int32_t h_4 = h >> 2;
 
-static void common_hv_bil_no_rnd_8x8_lasx(const uint8_t *src, int32_t src_stride,
-                                          uint8_t *dst, int32_t dst_stride)
-{
-    __m256i src0, src1, src2, src3, src4, src5, src6, src7;
-    __m256i src8, src9, src10, src11, src12, src13, src14, src15, src16, src17;
-    __m256i sum0, sum1, sum2, sum3;
-    int32_t src_stride_2x = src_stride << 1;
-    int32_t src_stride_4x = src_stride << 2;
-    int32_t dst_stride_2x = dst_stride << 1;
-    int32_t dst_stride_4x = dst_stride << 2;
-    int32_t dst_stride_3x = dst_stride_2x + dst_stride;
-    int32_t src_stride_3x = src_stride_2x + src_stride;
-    uint8_t* _src = (uint8_t*)src;
-
-    src0 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src1, src2);
-    src3 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src4 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src5, src6);
-    src7 = __lasx_xvldx(_src, src_stride_3x);
-    _src += (1 - src_stride_4x);
-    src9 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x,
-              src10, src11);
-    src12 = __lasx_xvldx(_src, src_stride_3x);
-    _src += src_stride_4x;
-    src13 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x,
-              src14, src15);
-    src16 = __lasx_xvldx(_src, src_stride_3x);
-    _src += (src_stride_4x - 1);
-    DUP2_ARG2(__lasx_xvld, _src, 0, _src, 1, src8, src17);
-
-    DUP4_ARG2(__lasx_xvilvl_b, src9, src0, src10, src1, src11, src2, src12, src3,
-              src0, src1, src2, src3);
-    DUP4_ARG2(__lasx_xvilvl_b, src13, src4, src14, src5, src15, src6, src16, src7,
-              src4, src5, src6, src7);
-    src8 = __lasx_xvilvl_b(src17, src8);
-    DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src2, src1, 0x20, src3, src2,
-              0x20, src4, src3, 0x20, src0, src1, src2, src3);
-    DUP4_ARG3(__lasx_xvpermi_q, src5, src4, 0x20, src6, src5, 0x20, src7, src6,
-              0x20, src8, src7, 0x20, src4, src5, src6, src7);
-    DUP4_ARG2(__lasx_xvhaddw_hu_bu, src0, src0, src1, src1, src2, src2,
-              src3, src3, src0, src1, src2, src3);
-    DUP4_ARG2(__lasx_xvhaddw_hu_bu, src4, src4, src5, src5, src6, src6,
-              src7, src7, src4, src5, src6, src7);
-    DUP4_ARG2(__lasx_xvadd_h, src0, src1, src2, src3, src4, src5, src6, src7,
-              sum0, sum1, sum2, sum3);
-    DUP4_ARG2(__lasx_xvaddi_hu, sum0, 1, sum1, 1, sum2, 1, sum3, 1,
-              sum0, sum1, sum2, sum3);
-    DUP2_ARG3(__lasx_xvsrani_b_h, sum1, sum0, 2, sum3, sum2, 2, sum0, sum1);
-    __lasx_xvstelm_d(sum0, dst, 0, 0);
-    __lasx_xvstelm_d(sum0, dst + dst_stride, 0, 2);
-    __lasx_xvstelm_d(sum0, dst + dst_stride_2x, 0, 1);
-    __lasx_xvstelm_d(sum0, dst + dst_stride_3x, 0, 3);
-    dst += dst_stride_4x;
-    __lasx_xvstelm_d(sum1, dst, 0, 0);
-    __lasx_xvstelm_d(sum1, dst + dst_stride, 0, 2);
-    __lasx_xvstelm_d(sum1, dst + dst_stride_2x, 0, 1);
-    __lasx_xvstelm_d(sum1, dst + dst_stride_3x, 0, 3);
-}
-
-static void common_hv_bil_no_rnd_4x8_lasx(const uint8_t *src, int32_t src_stride,
-                                          uint8_t *dst, int32_t dst_stride)
-{
-    __m256i src0, src1, src2, src3, src4, src5, src6, src7;
-    __m256i src8, src9, sum0, sum1;
-    int32_t src_stride_2x = src_stride << 1;
-    int32_t src_stride_4x = src_stride << 2;
-    int32_t dst_stride_2x = dst_stride << 1;
-    int32_t dst_stride_3x = dst_stride_2x + dst_stride;
-    int32_t src_stride_3x = src_stride_2x + src_stride;
-    uint8_t *_src = (uint8_t*)src;
-
-    src0 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src1, src2);
-    src3 = __lasx_xvldx(_src, src_stride_3x);
-    _src += 1;
-    src5 = __lasx_xvld(_src, 0);
-    DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src6, src7);
-    src8 = __lasx_xvldx(_src, src_stride_3x);
-    _src += (src_stride_4x - 1);
-    DUP2_ARG2(__lasx_xvld, _src, 0, _src, 1, src4, src9);
-
-    DUP4_ARG2(__lasx_xvilvl_b, src5, src0, src6, src1, src7, src2, src8, src3,
-              src0, src1, src2, src3);
-    src4 = __lasx_xvilvl_b(src9, src4);
-    DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src2, src1, 0x20, src3, src2,
-              0x20, src4, src3, 0x20, src0, src1, src2, src3);
-    DUP4_ARG2(__lasx_xvhaddw_hu_bu, src0, src0, src1, src1, src2, src2,
-              src3, src3, src0, src1, src2, src3);
-    DUP2_ARG2(__lasx_xvadd_h, src0, src1, src2, src3, sum0, sum1);
-    sum0 = __lasx_xvaddi_hu(sum0, 1);
-    sum1 = __lasx_xvaddi_hu(sum1, 1);
-    sum0 = __lasx_xvsrani_b_h(sum1, sum0, 2);
-    __lasx_xvstelm_d(sum0, dst, 0, 0);
-    __lasx_xvstelm_d(sum0, dst + dst_stride, 0, 2);
-    __lasx_xvstelm_d(sum0, dst + dst_stride_2x, 0, 1);
-    __lasx_xvstelm_d(sum0, dst + dst_stride_3x, 0, 3);
-}
-
-void ff_put_no_rnd_pixels8_xy2_8_lasx(uint8_t *block, const uint8_t *pixels,
-                                      ptrdiff_t line_size, int h)
-{
-    if (h == 8) {
-        common_hv_bil_no_rnd_8x8_lasx(pixels, line_size, block, line_size);
-    } else if (h == 4) {
-        common_hv_bil_no_rnd_4x8_lasx(pixels, line_size, block, line_size);
-    }
-}
-
-static void common_hv_bil_16w_lasx(const uint8_t *src, int32_t src_stride,
-                                   uint8_t *dst, int32_t dst_stride,
-                                   uint8_t height)
-{
-    __m256i src0, src1, src2, src3, src4, src5, src6, src7, src8, src9;
-    __m256i src10, src11, src12, src13, src14, src15, src16, src17;
-    __m256i sum0, sum1, sum2, sum3, sum4, sum5, sum6, sum7;
-    uint8_t loop_cnt;
-    int32_t src_stride_2x = src_stride << 1;
-    int32_t src_stride_4x = src_stride << 2;
-    int32_t src_stride_3x = src_stride_2x + src_stride;
-    uint8_t* _src = (uint8_t*)src;
-
-    for (loop_cnt = (height >> 3); loop_cnt--;) {
+    for (int i = 0; i < h_4; i++) {
         src0 = __lasx_xvld(_src, 0);
-        DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src1, src2);
-        src3 = __lasx_xvldx(_src, src_stride_3x);
-        _src += src_stride_4x;
-        src4 = __lasx_xvld(_src, 0);
-        DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x, src5, src6);
-        src7 = __lasx_xvldx(_src, src_stride_3x);
-        _src += (1 - src_stride_4x);
-        src9 = __lasx_xvld(_src, 0);
-        DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x,
-                  src10, src11);
-        src12 = __lasx_xvldx(_src, src_stride_3x);
-        _src += src_stride_4x;
-        src13 = __lasx_xvld(_src, 0);
-        DUP2_ARG2(__lasx_xvldx, _src, src_stride, _src, src_stride_2x,
-                  src14, src15);
-        src16 = __lasx_xvldx(_src, src_stride_3x);
-        _src += (src_stride_4x - 1);
-        DUP2_ARG2(__lasx_xvld, _src, 0, _src, 1, src8, src17);
+        DUP4_ARG2(__lasx_xvldx, _src, line_size, _src, stride2x, _src,
+                  stride3x, _src, stride4x, src1, src2, src3, src4);
+        DUP4_ARG2(__lasx_xvpickev_d, src1, src0, src2, src1, src3, src2, src4, src3,
+                  src0, src1, src2, src3);
+        DUP2_ARG3(__lasx_xvpermi_q, src2, src0, 0x20, src3, src1, 0x20, src0, src1);
+        src0 = __lasx_xvavg_bu(src0, src1);
+        __lasx_xvstelm_d(src0, block, 0, 0);
+        block += line_size;
+        __lasx_xvstelm_d(src0, block, 0, 1);
+        block += line_size;
+        __lasx_xvstelm_d(src0, block, 0, 2);
+        block += line_size;
+        __lasx_xvstelm_d(src0, block, 0, 3);
+        block += line_size;
 
-        DUP4_ARG3(__lasx_xvpermi_q, src0, src4, 0x02, src1, src5, 0x02, src2,
-                  src6, 0x02, src3, src7, 0x02, src0, src1, src2, src3);
-        DUP4_ARG3(__lasx_xvpermi_q, src4, src8, 0x02, src9, src13, 0x02, src10,
-                  src14, 0x02, src11, src15, 0x02, src4, src5, src6, src7);
-        DUP2_ARG3(__lasx_xvpermi_q, src12, src16, 0x02, src13, src17, 0x02,
-                   src8, src9);
+        _src += stride4x;
+    }
+}
 
-        DUP4_ARG2(__lasx_xvilvl_h, src5, src0, src6, src1, src7, src2, src8,
-                  src3, sum0, sum2, sum4, sum6);
-        DUP4_ARG2(__lasx_xvilvh_h, src5, src0, src6, src1, src7, src2, src8,
-                  src3, sum1, sum3, sum5, sum7);
-        src8 = __lasx_xvilvl_h(src9, src4);
-        src9 = __lasx_xvilvh_h(src9, src4);
+void ff_put_no_rnd_pixels8_xy2_8_lsx(uint8_t *block, const uint8_t *pixels,
+                                     ptrdiff_t line_size, int h)
+{
+    __m128i src0, src1, src2, src3;
+    __m128i sum0, sum1, sum2;
 
-        DUP4_ARG2(__lasx_xvhaddw_hu_bu, sum0, sum0, sum1, sum1, sum2, sum2,
-                  sum3, sum3, src0, src1, src2, src3);
-        DUP4_ARG2(__lasx_xvhaddw_hu_bu, sum4, sum4, sum5, sum5, sum6, sum6,
-                  sum7, sum7, src4, src5, src6, src7);
-        DUP2_ARG2(__lasx_xvhaddw_hu_bu, src8, src8, src9, src9, src8, src9);
+    src0 = __lsx_vld(pixels, 0);
+    src1 = __lsx_vld(pixels, 1);
+    src2 = __lsx_vsllwil_hu_bu(src0, 0);
+    src3 = __lsx_vsllwil_hu_bu(src1, 0);
+    sum0 = __lsx_vadd_h(src2, src3);
+    sum0 = __lsx_vaddi_hu(sum0, 1);
 
-        DUP4_ARG2(__lasx_xvadd_h, src0, src2, src1, src3, src2, src4, src3,
-                  src5, sum0, sum1, sum2, sum3);
-        DUP4_ARG2(__lasx_xvadd_h, src4, src6, src5, src7, src6, src8, src7,
-                  src9, sum4, sum5, sum6, sum7);
-        DUP4_ARG3(__lasx_xvsrarni_b_h, sum1, sum0, 2, sum3, sum2, 2, sum5,
-                  sum4, 2, sum7, sum6, 2, sum0, sum1, sum2, sum3);
-        __lasx_xvstelm_d(sum0, dst, 0, 0);
-        __lasx_xvstelm_d(sum0, dst, 8, 1);
-        dst += dst_stride;
-        __lasx_xvstelm_d(sum1, dst, 0, 0);
-        __lasx_xvstelm_d(sum1, dst, 8, 1);
-        dst += dst_stride;
-        __lasx_xvstelm_d(sum2, dst, 0, 0);
-        __lasx_xvstelm_d(sum2, dst, 8, 1);
-        dst += dst_stride;
-        __lasx_xvstelm_d(sum3, dst, 0, 0);
-        __lasx_xvstelm_d(sum3, dst, 8, 1);
-        dst += dst_stride;
-        __lasx_xvstelm_d(sum0, dst, 0, 2);
-        __lasx_xvstelm_d(sum0, dst, 8, 3);
-        dst += dst_stride;
-        __lasx_xvstelm_d(sum1, dst, 0, 2);
-        __lasx_xvstelm_d(sum1, dst, 8, 3);
-        dst += dst_stride;
-        __lasx_xvstelm_d(sum2, dst, 0, 2);
-        __lasx_xvstelm_d(sum2, dst, 8, 3);
-        dst += dst_stride;
-        __lasx_xvstelm_d(sum3, dst, 0, 2);
-        __lasx_xvstelm_d(sum3, dst, 8, 3);
-        dst += dst_stride;
+    for (int i = 0; i < h; i++) {
+        pixels += line_size;
+        src0 = __lsx_vld(pixels, 0);
+        src1 = __lsx_vld(pixels, 1);
+        src2 = __lsx_vsllwil_hu_bu(src0, 0);
+        src3 = __lsx_vsllwil_hu_bu(src1, 0);
+        sum1 = __lsx_vadd_h(src2, src3);
+        sum2 = __lsx_vadd_h(sum0, sum1);
+        sum2 = __lsx_vsrani_b_h(sum2, sum2, 2);
+
+        sum0 = __lsx_vaddi_hu(sum1, 1);
+        __lsx_vstelm_d(sum2, block, 0, 0);
+
+        block += line_size;
     }
 }
 
 void ff_put_pixels16_xy2_8_lasx(uint8_t *block, const uint8_t *pixels,
                                 ptrdiff_t line_size, int h)
 {
-    common_hv_bil_16w_lasx(pixels, line_size, block, line_size, h);
+    __m256i src0, src1, src2, src3;
+    __m256i sum0, sum1, sum2;
+
+    src0 = __lasx_xvld(pixels, 0);
+    src1 = __lasx_xvld(pixels, 1);
+    src2 = __lasx_vext2xv_hu_bu(src0);
+    src3 = __lasx_vext2xv_hu_bu(src1);
+    sum0 = __lasx_xvadd_h(src2, src3);
+    sum0 = __lasx_xvaddi_hu(sum0, 2);
+
+    for (int i = 0; i < h; i++) {
+        pixels += line_size;
+        src0 = __lasx_xvld(pixels, 0);
+        src1 = __lasx_xvld(pixels, 1);
+
+        src2 = __lasx_vext2xv_hu_bu(src0);
+        src3 = __lasx_vext2xv_hu_bu(src1);
+        sum1 = __lasx_xvadd_h(src2, src3);
+        sum2 = __lasx_xvadd_h(sum0, sum1);
+        sum2 = __lasx_xvsrani_b_h(sum2, sum2, 2);
+        sum0 = __lasx_xvaddi_hu(sum1, 2);
+        __lasx_xvstelm_d(sum2, block, 0, 0);
+        __lasx_xvstelm_d(sum2, block, 8, 3);
+        block += line_size;
+    }
 }
 
 static void common_hv_bil_8w_lasx(const uint8_t *src, int32_t src_stride,
diff --git a/libavcodec/loongarch/hpeldsp_lasx.h b/libavcodec/loongarch/hpeldsp_lasx.h
index 2e035eade8..df3987d308 100644
--- a/libavcodec/loongarch/hpeldsp_lasx.h
+++ b/libavcodec/loongarch/hpeldsp_lasx.h
@@ -49,8 +49,8 @@ void ff_put_no_rnd_pixels8_x2_8_lasx(uint8_t *block, const uint8_t *pixels,
                                      ptrdiff_t line_size, int h);
 void ff_put_no_rnd_pixels8_y2_8_lasx(uint8_t *block, const uint8_t *pixels,
                                      ptrdiff_t line_size, int h);
-void ff_put_no_rnd_pixels8_xy2_8_lasx(uint8_t *block, const uint8_t *pixels,
-                                      ptrdiff_t line_size, int h);
+void ff_put_no_rnd_pixels8_xy2_8_lsx(uint8_t *block, const uint8_t *pixels,
+                                     ptrdiff_t line_size, int h);
 void ff_put_pixels8_xy2_8_lasx(uint8_t *block, const uint8_t *pixels,
                                ptrdiff_t line_size, int h);
 void ff_put_pixels16_xy2_8_lasx(uint8_t *block, const uint8_t *pixels,
-- 
2.52.0


From f77fe16afb173117c417af875af3bc481b25719d Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Fri, 14 Nov 2025 11:24:45 +0100
Subject: [PATCH 168/304] avcodec/mpegvideo_unquantize: Constify MPVContext
 pointee

Also use MPVContext instead of MpegEncContext.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/arm/mpegvideo_arm.c     |  4 ++--
 libavcodec/arm/mpegvideo_armv5te.c |  8 ++++----
 libavcodec/mips/h263dsp_mips.h     |  6 +++---
 libavcodec/mips/mpegvideo_mips.h   | 20 ++++++++++----------
 libavcodec/mips/mpegvideo_mmi.c    | 20 ++++++++++----------
 libavcodec/mips/mpegvideo_msa.c    |  6 +++---
 libavcodec/mpeg4videodec.h         |  6 +++---
 libavcodec/mpegvideo.h             | 12 ++++++------
 libavcodec/mpegvideo_unquantize.c  | 28 ++++++++++++++--------------
 libavcodec/mpegvideo_unquantize.h  | 26 +++++++++++++-------------
 libavcodec/neon/mpegvideo.c        |  4 ++--
 libavcodec/ppc/mpegvideo_altivec.c |  4 ++--
 libavcodec/x86/mpegvideo.c         | 24 ++++++++++++------------
 13 files changed, 84 insertions(+), 84 deletions(-)

diff --git a/libavcodec/arm/mpegvideo_arm.c b/libavcodec/arm/mpegvideo_arm.c
index 5c96c9df2c..cb109cd832 100644
--- a/libavcodec/arm/mpegvideo_arm.c
+++ b/libavcodec/arm/mpegvideo_arm.c
@@ -41,9 +41,9 @@ CHECK_OFFSET(MpegEncContext, inter_scantable.raster_end,
 CHECK_OFFSET(MpegEncContext, h263_aic,         H263_AIC);
 #endif
 
-void ff_dct_unquantize_h263_inter_neon(MpegEncContext *s, int16_t *block,
+void ff_dct_unquantize_h263_inter_neon(const MPVContext *s, int16_t *block,
                                        int n, int qscale);
-void ff_dct_unquantize_h263_intra_neon(MpegEncContext *s, int16_t *block,
+void ff_dct_unquantize_h263_intra_neon(const MPVContext *s, int16_t *block,
                                        int n, int qscale);
 
 av_cold void ff_mpv_unquantize_init_arm(MPVUnquantDSPContext *s, int bitexact)
diff --git a/libavcodec/arm/mpegvideo_armv5te.c b/libavcodec/arm/mpegvideo_armv5te.c
index 2737f68643..3a6d015767 100644
--- a/libavcodec/arm/mpegvideo_armv5te.c
+++ b/libavcodec/arm/mpegvideo_armv5te.c
@@ -50,8 +50,8 @@ static inline void dct_unquantize_h263_helper_c(int16_t *block, int qmul, int qa
 }
 #endif
 
-static void dct_unquantize_h263_intra_armv5te(MpegEncContext *s,
-                                  int16_t *block, int n, int qscale)
+static void dct_unquantize_h263_intra_armv5te(const MPVContext *s,
+                                              int16_t *block, int n, int qscale)
 {
     int level, qmul, qadd;
     int nCoeffs;
@@ -79,8 +79,8 @@ static void dct_unquantize_h263_intra_armv5te(MpegEncContext *s,
     block[0] = level;
 }
 
-static void dct_unquantize_h263_inter_armv5te(MpegEncContext *s,
-                                  int16_t *block, int n, int qscale)
+static void dct_unquantize_h263_inter_armv5te(const MPVContext *s,
+                                              int16_t *block, int n, int qscale)
 {
     int qmul, qadd;
     int nCoeffs;
diff --git a/libavcodec/mips/h263dsp_mips.h b/libavcodec/mips/h263dsp_mips.h
index d4de2233a7..5ea9fcbb88 100644
--- a/libavcodec/mips/h263dsp_mips.h
+++ b/libavcodec/mips/h263dsp_mips.h
@@ -25,11 +25,11 @@
 
 void ff_h263_h_loop_filter_msa(uint8_t *src, int stride, int q_scale);
 void ff_h263_v_loop_filter_msa(uint8_t *src, int stride, int q_scale);
-void ff_dct_unquantize_mpeg2_inter_msa(MpegEncContext *s, int16_t *block,
+void ff_dct_unquantize_mpeg2_inter_msa(const MPVContext *s, int16_t *block,
                                        int32_t index, int32_t q_scale);
-void ff_dct_unquantize_h263_inter_msa(MpegEncContext *s, int16_t *block,
+void ff_dct_unquantize_h263_inter_msa(const MPVContext *s, int16_t *block,
                                       int32_t index, int32_t q_scale);
-void ff_dct_unquantize_h263_intra_msa(MpegEncContext *s, int16_t *block,
+void ff_dct_unquantize_h263_intra_msa(const MPVContext *s, int16_t *block,
                                       int32_t index, int32_t q_scale);
 int ff_pix_sum_msa(const uint8_t *pix, ptrdiff_t line_size);
 
diff --git a/libavcodec/mips/mpegvideo_mips.h b/libavcodec/mips/mpegvideo_mips.h
index 2a9ea4006e..2544279ac5 100644
--- a/libavcodec/mips/mpegvideo_mips.h
+++ b/libavcodec/mips/mpegvideo_mips.h
@@ -23,16 +23,16 @@
 
 #include "libavcodec/mpegvideo.h"
 
-void ff_dct_unquantize_h263_intra_mmi(MpegEncContext *s, int16_t *block,
-        int n, int qscale);
-void ff_dct_unquantize_h263_inter_mmi(MpegEncContext *s, int16_t *block,
-        int n, int qscale);
-void ff_dct_unquantize_mpeg1_intra_mmi(MpegEncContext *s, int16_t *block,
-        int n, int qscale);
-void ff_dct_unquantize_mpeg1_inter_mmi(MpegEncContext *s, int16_t *block,
-        int n, int qscale);
-void ff_dct_unquantize_mpeg2_intra_mmi(MpegEncContext *s, int16_t *block,
-        int n, int qscale);
+void ff_dct_unquantize_h263_intra_mmi(const MPVContext *s, int16_t *block,
+                                      int n, int qscale);
+void ff_dct_unquantize_h263_inter_mmi(const MPVContext *s, int16_t *block,
+                                      int n, int qscale);
+void ff_dct_unquantize_mpeg1_intra_mmi(const MPVContext *s, int16_t *block,
+                                       int n, int qscale);
+void ff_dct_unquantize_mpeg1_inter_mmi(const MPVContext *s, int16_t *block,
+                                       int n, int qscale);
+void ff_dct_unquantize_mpeg2_intra_mmi(const MPVContext *s, int16_t *block,
+                                       int n, int qscale);
 void ff_denoise_dct_mmi(int16_t block[64], int sum[64], const uint16_t offset[64]);
 
 #endif /* AVCODEC_MIPS_MPEGVIDEO_MIPS_H */
diff --git a/libavcodec/mips/mpegvideo_mmi.c b/libavcodec/mips/mpegvideo_mmi.c
index 87d4aafd8c..90bd90c147 100644
--- a/libavcodec/mips/mpegvideo_mmi.c
+++ b/libavcodec/mips/mpegvideo_mmi.c
@@ -25,8 +25,8 @@
 #include "mpegvideo_mips.h"
 #include "libavutil/mips/mmiutils.h"
 
-void ff_dct_unquantize_h263_intra_mmi(MpegEncContext *s, int16_t *block,
-        int n, int qscale)
+void ff_dct_unquantize_h263_intra_mmi(const MPVContext *s, int16_t *block,
+                                      int n, int qscale)
 {
     int64_t level, nCoeffs;
     double ftmp[6];
@@ -101,8 +101,8 @@ void ff_dct_unquantize_h263_intra_mmi(MpegEncContext *s, int16_t *block,
     block[0] = level;
 }
 
-void ff_dct_unquantize_h263_inter_mmi(MpegEncContext *s, int16_t *block,
-        int n, int qscale)
+void ff_dct_unquantize_h263_inter_mmi(const MPVContext *s, int16_t *block,
+                                      int n, int qscale)
 {
     int64_t nCoeffs;
     double ftmp[6];
@@ -160,8 +160,8 @@ void ff_dct_unquantize_h263_inter_mmi(MpegEncContext *s, int16_t *block,
     );
 }
 
-void ff_dct_unquantize_mpeg1_intra_mmi(MpegEncContext *s, int16_t *block,
-        int n, int qscale)
+void ff_dct_unquantize_mpeg1_intra_mmi(const MPVContext *s, int16_t *block,
+                                       int n, int qscale)
 {
     int64_t nCoeffs;
     const uint16_t *quant_matrix;
@@ -254,8 +254,8 @@ void ff_dct_unquantize_mpeg1_intra_mmi(MpegEncContext *s, int16_t *block,
     block[0] = block0;
 }
 
-void ff_dct_unquantize_mpeg1_inter_mmi(MpegEncContext *s, int16_t *block,
-        int n, int qscale)
+void ff_dct_unquantize_mpeg1_inter_mmi(const MPVContext *s, int16_t *block,
+                                       int n, int qscale)
 {
     int64_t nCoeffs;
     const uint16_t *quant_matrix;
@@ -342,8 +342,8 @@ void ff_dct_unquantize_mpeg1_inter_mmi(MpegEncContext *s, int16_t *block,
     );
 }
 
-void ff_dct_unquantize_mpeg2_intra_mmi(MpegEncContext *s, int16_t *block,
-        int n, int qscale)
+void ff_dct_unquantize_mpeg2_intra_mmi(const MPVContext *s, int16_t *block,
+                                       int n, int qscale)
 {
     uint64_t nCoeffs;
     const uint16_t *quant_matrix;
diff --git a/libavcodec/mips/mpegvideo_msa.c b/libavcodec/mips/mpegvideo_msa.c
index cd4adc0f77..a870a2cd79 100644
--- a/libavcodec/mips/mpegvideo_msa.c
+++ b/libavcodec/mips/mpegvideo_msa.c
@@ -194,7 +194,7 @@ static int32_t mpeg2_dct_unquantize_inter_msa(int16_t *block,
     return sum_res;
 }
 
-void ff_dct_unquantize_h263_intra_msa(MpegEncContext *s,
+void ff_dct_unquantize_h263_intra_msa(const MPVContext *s,
                                       int16_t *block, int32_t index,
                                       int32_t qscale)
 {
@@ -219,7 +219,7 @@ void ff_dct_unquantize_h263_intra_msa(MpegEncContext *s,
     h263_dct_unquantize_msa(block, qmul, qadd, nCoeffs, 1);
 }
 
-void ff_dct_unquantize_h263_inter_msa(MpegEncContext *s,
+void ff_dct_unquantize_h263_inter_msa(const MPVContext *s,
                                       int16_t *block, int32_t index,
                                       int32_t qscale)
 {
@@ -236,7 +236,7 @@ void ff_dct_unquantize_h263_inter_msa(MpegEncContext *s,
     h263_dct_unquantize_msa(block, qmul, qadd, nCoeffs, 0);
 }
 
-void ff_dct_unquantize_mpeg2_inter_msa(MpegEncContext *s,
+void ff_dct_unquantize_mpeg2_inter_msa(const MPVContext *s,
                                        int16_t *block, int32_t index,
                                        int32_t qscale)
 {
diff --git a/libavcodec/mpeg4videodec.h b/libavcodec/mpeg4videodec.h
index aafde454ea..2eafa1ef8b 100644
--- a/libavcodec/mpeg4videodec.h
+++ b/libavcodec/mpeg4videodec.h
@@ -93,11 +93,11 @@ typedef struct Mpeg4DecContext {
 
     Mpeg4VideoDSPContext mdsp;
 
-    void (*dct_unquantize_mpeg2_inter)(MpegEncContext *s,
+    void (*dct_unquantize_mpeg2_inter)(const MPVContext *s,
                                        int16_t *block, int n, int qscale);
-    void (*dct_unquantize_mpeg2_intra)(MpegEncContext *s,
+    void (*dct_unquantize_mpeg2_intra)(const MPVContext *s,
                                        int16_t *block, int n, int qscale);
-    void (*dct_unquantize_h263_intra)(MpegEncContext *s,
+    void (*dct_unquantize_h263_intra)(const MPVContext *s,
                                       int16_t *block, int n, int qscale);
 
     union {
diff --git a/libavcodec/mpegvideo.h b/libavcodec/mpegvideo.h
index cb4b99acd3..e21ce5164d 100644
--- a/libavcodec/mpegvideo.h
+++ b/libavcodec/mpegvideo.h
@@ -57,6 +57,8 @@ enum OutputFormat {
     FMT_SPEEDHQ,
 };
 
+typedef struct MpegEncContext MPVContext;
+
 /**
  * MpegEncContext.
  */
@@ -271,10 +273,10 @@ typedef struct MpegEncContext {
     int interlaced_dct;
     int first_field;         ///< is 1 for the first field of a field picture 0 otherwise
 
-    void (*dct_unquantize_intra)(struct MpegEncContext *s, // unquantizer to use (MPEG-4 can use both)
-                           int16_t *block/*align 16*/, int n, int qscale);
-    void (*dct_unquantize_inter)(struct MpegEncContext *s, // unquantizer to use (MPEG-4 can use both)
-                           int16_t *block/*align 16*/, int n, int qscale);
+    void (*dct_unquantize_intra)(const MPVContext *s, // unquantizer to use (MPEG-4 can use both)
+                                 int16_t *block/*align 16*/, int n, int qscale);
+    void (*dct_unquantize_inter)(const MPVContext *s, // unquantizer to use (MPEG-4 can use both)
+                                 int16_t *block/*align 16*/, int n, int qscale);
 
     /* flag to indicate a reinitialization is required, e.g. after
      * a frame size change */
@@ -286,8 +288,6 @@ typedef struct MpegEncContext {
     ERContext er;
 } MpegEncContext;
 
-typedef MpegEncContext MPVContext;
-
 /**
  * Set the given MpegEncContext to common defaults (same for encoding
  * and decoding).  The changed fields will not depend upon the prior
diff --git a/libavcodec/mpegvideo_unquantize.c b/libavcodec/mpegvideo_unquantize.c
index 213e37a514..06c29d0753 100644
--- a/libavcodec/mpegvideo_unquantize.c
+++ b/libavcodec/mpegvideo_unquantize.c
@@ -33,8 +33,8 @@
 #include "mpegvideodata.h"
 #include "mpegvideo_unquantize.h"
 
-static void dct_unquantize_mpeg1_intra_c(MpegEncContext *s,
-                                   int16_t *block, int n, int qscale)
+static void dct_unquantize_mpeg1_intra_c(const MPVContext *s,
+                                         int16_t *block, int n, int qscale)
 {
     int i, level, nCoeffs;
     const uint16_t *quant_matrix;
@@ -62,8 +62,8 @@ static void dct_unquantize_mpeg1_intra_c(MpegEncContext *s,
     }
 }
 
-static void dct_unquantize_mpeg1_inter_c(MpegEncContext *s,
-                                   int16_t *block, int n, int qscale)
+static void dct_unquantize_mpeg1_inter_c(const MPVContext *s,
+                                         int16_t *block, int n, int qscale)
 {
     int i, level, nCoeffs;
     const uint16_t *quant_matrix;
@@ -91,8 +91,8 @@ static void dct_unquantize_mpeg1_inter_c(MpegEncContext *s,
     }
 }
 
-static void dct_unquantize_mpeg2_intra_c(MpegEncContext *s,
-                                   int16_t *block, int n, int qscale)
+static void dct_unquantize_mpeg2_intra_c(const MPVContext *s,
+                                         int16_t *block, int n, int qscale)
 {
     int i, level, nCoeffs;
     const uint16_t *quant_matrix;
@@ -120,8 +120,8 @@ static void dct_unquantize_mpeg2_intra_c(MpegEncContext *s,
     }
 }
 
-static void dct_unquantize_mpeg2_intra_bitexact(MpegEncContext *s,
-                                   int16_t *block, int n, int qscale)
+static void dct_unquantize_mpeg2_intra_bitexact(const MPVContext *s,
+                                                int16_t *block, int n, int qscale)
 {
     int i, level, nCoeffs;
     const uint16_t *quant_matrix;
@@ -153,8 +153,8 @@ static void dct_unquantize_mpeg2_intra_bitexact(MpegEncContext *s,
     block[63]^=sum&1;
 }
 
-static void dct_unquantize_mpeg2_inter_c(MpegEncContext *s,
-                                   int16_t *block, int n, int qscale)
+static void dct_unquantize_mpeg2_inter_c(const MPVContext *s,
+                                         int16_t *block, int n, int qscale)
 {
     int i, level, nCoeffs;
     const uint16_t *quant_matrix;
@@ -186,8 +186,8 @@ static void dct_unquantize_mpeg2_inter_c(MpegEncContext *s,
     block[63]^=sum&1;
 }
 
-static void dct_unquantize_h263_intra_c(MpegEncContext *s,
-                                  int16_t *block, int n, int qscale)
+static void dct_unquantize_h263_intra_c(const MPVContext *s,
+                                        int16_t *block, int n, int qscale)
 {
     int i, level, qmul, qadd;
     int nCoeffs;
@@ -220,8 +220,8 @@ static void dct_unquantize_h263_intra_c(MpegEncContext *s,
     }
 }
 
-static void dct_unquantize_h263_inter_c(MpegEncContext *s,
-                                  int16_t *block, int n, int qscale)
+static void dct_unquantize_h263_inter_c(const MPVContext *s,
+                                        int16_t *block, int n, int qscale)
 {
     int i, level, qmul, qadd;
     int nCoeffs;
diff --git a/libavcodec/mpegvideo_unquantize.h b/libavcodec/mpegvideo_unquantize.h
index 3e6d8aedf7..1a43f467c6 100644
--- a/libavcodec/mpegvideo_unquantize.h
+++ b/libavcodec/mpegvideo_unquantize.h
@@ -29,21 +29,21 @@
 
 #include "config.h"
 
-typedef struct MpegEncContext MpegEncContext;
+typedef struct MpegEncContext MPVContext;
 
 typedef struct MPVUnquantDSPContext {
-    void (*dct_unquantize_mpeg1_intra)(struct MpegEncContext *s,
-                           int16_t *block/*align 16*/, int n, int qscale);
-    void (*dct_unquantize_mpeg1_inter)(struct MpegEncContext *s,
-                           int16_t *block/*align 16*/, int n, int qscale);
-    void (*dct_unquantize_mpeg2_intra)(struct MpegEncContext *s,
-                           int16_t *block/*align 16*/, int n, int qscale);
-    void (*dct_unquantize_mpeg2_inter)(struct MpegEncContext *s,
-                           int16_t *block/*align 16*/, int n, int qscale);
-    void (*dct_unquantize_h263_intra)(struct MpegEncContext *s,
-                           int16_t *block/*align 16*/, int n, int qscale);
-    void (*dct_unquantize_h263_inter)(struct MpegEncContext *s,
-                           int16_t *block/*align 16*/, int n, int qscale);
+    void (*dct_unquantize_mpeg1_intra)(const MPVContext *s,
+                                       int16_t *block/*align 16*/, int n, int qscale);
+    void (*dct_unquantize_mpeg1_inter)(const MPVContext *s,
+                                       int16_t *block/*align 16*/, int n, int qscale);
+    void (*dct_unquantize_mpeg2_intra)(const MPVContext *s,
+                                       int16_t *block/*align 16*/, int n, int qscale);
+    void (*dct_unquantize_mpeg2_inter)(const MPVContext *s,
+                                       int16_t *block/*align 16*/, int n, int qscale);
+    void (*dct_unquantize_h263_intra)(const MPVContext *s,
+                                      int16_t *block/*align 16*/, int n, int qscale);
+    void (*dct_unquantize_h263_inter)(const MPVContext *s,
+                                      int16_t *block/*align 16*/, int n, int qscale);
 } MPVUnquantDSPContext;
 
 #if !ARCH_MIPS
diff --git a/libavcodec/neon/mpegvideo.c b/libavcodec/neon/mpegvideo.c
index a0276ad808..fdc57d3876 100644
--- a/libavcodec/neon/mpegvideo.c
+++ b/libavcodec/neon/mpegvideo.c
@@ -84,7 +84,7 @@ static void inline ff_dct_unquantize_h263_neon(int qscale, int qadd, int nCoeffs
     vst1_s16(block, d0s16);
 }
 
-static void dct_unquantize_h263_inter_neon(MpegEncContext *s, int16_t *block,
+static void dct_unquantize_h263_inter_neon(const MPVContext *s, int16_t *block,
                                            int n, int qscale)
 {
     int nCoeffs = s->inter_scantable.raster_end[s->block_last_index[n]];
@@ -93,7 +93,7 @@ static void dct_unquantize_h263_inter_neon(MpegEncContext *s, int16_t *block,
     ff_dct_unquantize_h263_neon(qscale, qadd, nCoeffs + 1, block);
 }
 
-static void dct_unquantize_h263_intra_neon(MpegEncContext *s, int16_t *block,
+static void dct_unquantize_h263_intra_neon(const MPVContext *s, int16_t *block,
                                            int n, int qscale)
 {
     int qadd;
diff --git a/libavcodec/ppc/mpegvideo_altivec.c b/libavcodec/ppc/mpegvideo_altivec.c
index 26e98acfb8..ad3a783a87 100644
--- a/libavcodec/ppc/mpegvideo_altivec.c
+++ b/libavcodec/ppc/mpegvideo_altivec.c
@@ -40,8 +40,8 @@
 
 /* AltiVec version of dct_unquantize_h263
    this code assumes `block' is 16 bytes-aligned */
-static void dct_unquantize_h263_altivec(MpegEncContext *s,
-                                 int16_t *block, int n, int qscale)
+static void dct_unquantize_h263_altivec(const MPVContext *s,
+                                        int16_t *block, int n, int qscale)
 {
     int i, qmul, qadd;
     int nCoeffs;
diff --git a/libavcodec/x86/mpegvideo.c b/libavcodec/x86/mpegvideo.c
index 8632acd412..4c3299362e 100644
--- a/libavcodec/x86/mpegvideo.c
+++ b/libavcodec/x86/mpegvideo.c
@@ -30,8 +30,8 @@
 
 #if HAVE_MMX_INLINE
 
-static void dct_unquantize_h263_intra_mmx(MpegEncContext *s,
-                                  int16_t *block, int n, int qscale)
+static void dct_unquantize_h263_intra_mmx(const MPVContext *s,
+                                          int16_t *block, int n, int qscale)
 {
     x86_reg level, qmul, qadd, nCoeffs;
 
@@ -105,8 +105,8 @@ __asm__ volatile(
 }
 
 
-static void dct_unquantize_h263_inter_mmx(MpegEncContext *s,
-                                  int16_t *block, int n, int qscale)
+static void dct_unquantize_h263_inter_mmx(const MPVContext *s,
+                                          int16_t *block, int n, int qscale)
 {
     x86_reg qmul, qadd, nCoeffs;
 
@@ -166,8 +166,8 @@ __asm__ volatile(
         );
 }
 
-static void dct_unquantize_mpeg1_intra_mmx(MpegEncContext *s,
-                                     int16_t *block, int n, int qscale)
+static void dct_unquantize_mpeg1_intra_mmx(const MPVContext *s,
+                                           int16_t *block, int n, int qscale)
 {
     x86_reg nCoeffs;
     const uint16_t *quant_matrix;
@@ -235,8 +235,8 @@ __asm__ volatile(
     block[0]= block0;
 }
 
-static void dct_unquantize_mpeg1_inter_mmx(MpegEncContext *s,
-                                     int16_t *block, int n, int qscale)
+static void dct_unquantize_mpeg1_inter_mmx(const MPVContext *s,
+                                           int16_t *block, int n, int qscale)
 {
     x86_reg nCoeffs;
     const uint16_t *quant_matrix;
@@ -301,8 +301,8 @@ __asm__ volatile(
         );
 }
 
-static void dct_unquantize_mpeg2_intra_mmx(MpegEncContext *s,
-                                     int16_t *block, int n, int qscale)
+static void dct_unquantize_mpeg2_intra_mmx(const MPVContext *s,
+                                           int16_t *block, int n, int qscale)
 {
     x86_reg nCoeffs;
     const uint16_t *quant_matrix;
@@ -369,8 +369,8 @@ __asm__ volatile(
         //Note, we do not do mismatch control for intra as errors cannot accumulate
 }
 
-static void dct_unquantize_mpeg2_inter_mmx(MpegEncContext *s,
-                                     int16_t *block, int n, int qscale)
+static void dct_unquantize_mpeg2_inter_mmx(const MPVContext *s,
+                                           int16_t *block, int n, int qscale)
 {
     x86_reg nCoeffs;
     const uint16_t *quant_matrix;
-- 
2.52.0


From caacf0ad2cf4f683f26644dde63c595b8ff1a817 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Wed, 19 Nov 2025 11:51:03 +0100
Subject: [PATCH 169/304] avcodec/ppc/mpegvideo_altivec: Split intra/inter
 unquantizing

Don't use a single function that checks mb_intra. Forgotten
in d50635cd247e17fe16c63219b9ae80d45a8185b1.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/ppc/mpegvideo_altivec.c | 71 ++++++++++++++----------------
 1 file changed, 33 insertions(+), 38 deletions(-)

diff --git a/libavcodec/ppc/mpegvideo_altivec.c b/libavcodec/ppc/mpegvideo_altivec.c
index ad3a783a87..7b54de3d91 100644
--- a/libavcodec/ppc/mpegvideo_altivec.c
+++ b/libavcodec/ppc/mpegvideo_altivec.c
@@ -40,41 +40,14 @@
 
 /* AltiVec version of dct_unquantize_h263
    this code assumes `block' is 16 bytes-aligned */
-static void dct_unquantize_h263_altivec(const MPVContext *s,
-                                        int16_t *block, int n, int qscale)
+static av_always_inline
+void dct_unquantize_h263_altivec(int16_t *block, int nb_coeffs, int qadd, int qmul)
 {
-    int i, qmul, qadd;
-    int nCoeffs;
-
-    qadd = (qscale - 1) | 1;
-    qmul = qscale << 1;
-
-    if (s->mb_intra) {
-        if (!s->h263_aic) {
-            if (n < 4)
-                block[0] = block[0] * s->y_dc_scale;
-            else
-                block[0] = block[0] * s->c_dc_scale;
-        }else
-            qadd = 0;
-        i = 1;
-        if (s->ac_pred)
-            nCoeffs = 63;
-        else
-            nCoeffs = s->intra_scantable.raster_end[s->block_last_index[n]];
-    } else {
-        i = 0;
-        av_assert2(s->block_last_index[n]>=0);
-        nCoeffs = s->inter_scantable.raster_end[s->block_last_index[n]];
-    }
-
-    {
         register const vector signed short vczero = (const vector signed short)vec_splat_s16(0);
         DECLARE_ALIGNED(16, short, qmul8) = qmul;
         DECLARE_ALIGNED(16, short, qadd8) = qadd;
         register vector signed short blockv, qmulv, qaddv, nqaddv, temp1;
         register vector bool short blockv_null, blockv_neg;
-        register short backup_0 = block[0];
 
         qmulv = vec_splat((vec_s16)vec_lde(0, &qmul8), 0);
         qaddv = vec_splat((vec_s16)vec_lde(0, &qadd8), 0);
@@ -82,7 +55,7 @@ static void dct_unquantize_h263_altivec(const MPVContext *s,
 
         // vectorize all the 16 bytes-aligned blocks
         // of 8 elements
-        for (register int j = 0; j <= nCoeffs ; j += 8) {
+        for (register int j = 0; j <= nb_coeffs; j += 8) {
             blockv = vec_ld(j << 1, block);
             blockv_neg = vec_cmplt(blockv, vczero);
             blockv_null = vec_cmpeq(blockv, vczero);
@@ -94,14 +67,36 @@ static void dct_unquantize_h263_altivec(const MPVContext *s,
             blockv = vec_sel(temp1, blockv, blockv_null);
             vec_st(blockv, j << 1, block);
         }
-
-        if (i == 1) {
-            // cheat. this avoid special-casing the first iteration
-            block[0] = backup_0;
-        }
-    }
 }
 
+static void dct_unquantize_h263_intra_altivec(const MPVContext *s,
+                                              int16_t *block, int n, int qscale)
+{
+    int qadd = (qscale - 1) | 1;
+    int qmul = qscale << 1;
+    int block0 = block[0];
+    if (!s->h263_aic) {
+        block0 *= n < 4 ? s->y_dc_scale : s->c_dc_scale;
+    } else
+        qadd = 0;
+    int nb_coeffs = s->ac_pred ? 63 : s->intra_scantable.raster_end[s->block_last_index[n]];
+
+    dct_unquantize_h263_altivec(block, nb_coeffs, qadd, qmul);
+
+    // cheat. this avoid special-casing the first iteration
+    block[0] = block0;
+}
+
+static void dct_unquantize_h263_inter_altivec(const MPVContext *s,
+                                              int16_t *block, int n, int qscale)
+{
+    int qadd = (qscale - 1) | 1;
+    int qmul = qscale << 1;
+    av_assert2(s->block_last_index[n]>=0);
+    int nb_coeffs = s->inter_scantable.raster_end[s->block_last_index[n]];
+
+    dct_unquantize_h263_altivec(block, nb_coeffs, qadd, qmul);
+}
 #endif /* HAVE_ALTIVEC */
 
 av_cold void ff_mpv_unquantize_init_ppc(MPVUnquantDSPContext *s, int bitexact)
@@ -110,7 +105,7 @@ av_cold void ff_mpv_unquantize_init_ppc(MPVUnquantDSPContext *s, int bitexact)
     if (!PPC_ALTIVEC(av_get_cpu_flags()))
         return;
 
-    s->dct_unquantize_h263_intra = dct_unquantize_h263_altivec;
-    s->dct_unquantize_h263_inter = dct_unquantize_h263_altivec;
+    s->dct_unquantize_h263_intra = dct_unquantize_h263_intra_altivec;
+    s->dct_unquantize_h263_inter = dct_unquantize_h263_inter_altivec;
 #endif /* HAVE_ALTIVEC */
 }
-- 
2.52.0


From d12c84b682579354fc0376a40c85c6f60b086816 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Wed, 19 Nov 2025 12:00:08 +0100
Subject: [PATCH 170/304] avcodec/ppc/mpegvideo_altivec: Reindent after the
 previous commit

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/ppc/mpegvideo_altivec.c | 44 +++++++++++++++---------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/libavcodec/ppc/mpegvideo_altivec.c b/libavcodec/ppc/mpegvideo_altivec.c
index 7b54de3d91..71894e760b 100644
--- a/libavcodec/ppc/mpegvideo_altivec.c
+++ b/libavcodec/ppc/mpegvideo_altivec.c
@@ -43,30 +43,30 @@
 static av_always_inline
 void dct_unquantize_h263_altivec(int16_t *block, int nb_coeffs, int qadd, int qmul)
 {
-        register const vector signed short vczero = (const vector signed short)vec_splat_s16(0);
-        DECLARE_ALIGNED(16, short, qmul8) = qmul;
-        DECLARE_ALIGNED(16, short, qadd8) = qadd;
-        register vector signed short blockv, qmulv, qaddv, nqaddv, temp1;
-        register vector bool short blockv_null, blockv_neg;
+    register const vector signed short vczero = (const vector signed short)vec_splat_s16(0);
+    DECLARE_ALIGNED(16, short, qmul8) = qmul;
+    DECLARE_ALIGNED(16, short, qadd8) = qadd;
+    register vector signed short blockv, qmulv, qaddv, nqaddv, temp1;
+    register vector bool short blockv_null, blockv_neg;
 
-        qmulv = vec_splat((vec_s16)vec_lde(0, &qmul8), 0);
-        qaddv = vec_splat((vec_s16)vec_lde(0, &qadd8), 0);
-        nqaddv = vec_sub(vczero, qaddv);
+    qmulv = vec_splat((vec_s16)vec_lde(0, &qmul8), 0);
+    qaddv = vec_splat((vec_s16)vec_lde(0, &qadd8), 0);
+    nqaddv = vec_sub(vczero, qaddv);
 
-        // vectorize all the 16 bytes-aligned blocks
-        // of 8 elements
-        for (register int j = 0; j <= nb_coeffs; j += 8) {
-            blockv = vec_ld(j << 1, block);
-            blockv_neg = vec_cmplt(blockv, vczero);
-            blockv_null = vec_cmpeq(blockv, vczero);
-            // choose between +qadd or -qadd as the third operand
-            temp1 = vec_sel(qaddv, nqaddv, blockv_neg);
-            // multiply & add (block{i,i+7} * qmul [+-] qadd)
-            temp1 = vec_mladd(blockv, qmulv, temp1);
-            // put 0 where block[{i,i+7} used to have 0
-            blockv = vec_sel(temp1, blockv, blockv_null);
-            vec_st(blockv, j << 1, block);
-        }
+    // vectorize all the 16 bytes-aligned blocks
+    // of 8 elements
+    for (register int j = 0; j <= nb_coeffs; j += 8) {
+        blockv = vec_ld(j << 1, block);
+        blockv_neg = vec_cmplt(blockv, vczero);
+        blockv_null = vec_cmpeq(blockv, vczero);
+        // choose between +qadd or -qadd as the third operand
+        temp1 = vec_sel(qaddv, nqaddv, blockv_neg);
+        // multiply & add (block{i,i+7} * qmul [+-] qadd)
+        temp1 = vec_mladd(blockv, qmulv, temp1);
+        // put 0 where block[{i,i+7} used to have 0
+        blockv = vec_sel(temp1, blockv, blockv_null);
+        vec_st(blockv, j << 1, block);
+    }
 }
 
 static void dct_unquantize_h263_intra_altivec(const MPVContext *s,
-- 
2.52.0


From 840a4799c315629d183651bf57ea58d7302fd9b3 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Fri, 28 Nov 2025 16:58:44 +0100
Subject: [PATCH 171/304] avcodec/{arm,neon}/mpegvideo: Use intra scantable to
 unquant H263 intra
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Forgotten in 70a7df049c411d9247eb6075720c84196c3e55e8.

Using the wrong scantable matters for codecs for which both scantables
can differ, namely the MPEG-4 decoder and the WMV1/2 codecs.

For WMV1 it can lead to wrong output in case the IDCT permutation
is FF_IDCT_PERM_PARTTRANS, because in this case the entries of
of the intra scantable's raster end are not always <= the corresponding
entries of the inter scantable's raster end when the former is
initialized via ff_wmv1_scantable[1] and the latter via ff_wmv1_scantable[0].
FF_IDCT_PERM_PARTTRANS is used iff the Neon IDCT is used (for both arm
and aarch64).* Said IDCT is not used during FATE, so that this issue
went unnoticed.

WMV2 uses the same scantables, but uses a custom IDCT
which always uses FF_IDCT_PERM_NONE for which the inter_scantable,
so that the output is always correct for it.

The scantable for MPEG-4 can change mid-stream (for the decoder),
but since c41818dc5dc14eb944761204e7b0ac179a6dcd1a only the intra
scantable is updated, so that both scantables can get out of sync.
In such a case the unquantize intra functions could unquantize
an incorrect number of coefficients.

Using raster_end of the wrong scantable can also lead to an
unnecessarily large amount of coefficients unquantized.

*: FF_IDCT_PERM_SIMPLE and FF_IDCT_PERM_TRANSPOSE would also not work,
but they are not used at all by arm and aarch64.

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/arm/asm-offsets.h       | 1 +
 libavcodec/arm/mpegvideo_arm.c     | 2 ++
 libavcodec/arm/mpegvideo_armv5te.c | 2 +-
 libavcodec/arm/mpegvideo_neon.S    | 2 +-
 libavcodec/mpegvideo.h             | 2 +-
 libavcodec/neon/mpegvideo.c        | 2 +-
 6 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/libavcodec/arm/asm-offsets.h b/libavcodec/arm/asm-offsets.h
index a2174b0a08..67e1f2ff6d 100644
--- a/libavcodec/arm/asm-offsets.h
+++ b/libavcodec/arm/asm-offsets.h
@@ -28,5 +28,6 @@
 #define BLOCK_LAST_INDEX         0x10
 #define H263_AIC                 0x40
 #define INTER_SCANTAB_RASTER_END 0x88
+#define INTRA_SCANTAB_RASTER_END 0x10c
 
 #endif /* AVCODEC_ARM_ASM_OFFSETS_H */
diff --git a/libavcodec/arm/mpegvideo_arm.c b/libavcodec/arm/mpegvideo_arm.c
index cb109cd832..593e998181 100644
--- a/libavcodec/arm/mpegvideo_arm.c
+++ b/libavcodec/arm/mpegvideo_arm.c
@@ -38,6 +38,8 @@ CHECK_OFFSET(MpegEncContext, ac_pred,          AC_PRED);
 CHECK_OFFSET(MpegEncContext, block_last_index, BLOCK_LAST_INDEX);
 CHECK_OFFSET(MpegEncContext, inter_scantable.raster_end,
              INTER_SCANTAB_RASTER_END);
+CHECK_OFFSET(MpegEncContext, intra_scantable.raster_end,
+             INTRA_SCANTAB_RASTER_END);
 CHECK_OFFSET(MpegEncContext, h263_aic,         H263_AIC);
 #endif
 
diff --git a/libavcodec/arm/mpegvideo_armv5te.c b/libavcodec/arm/mpegvideo_armv5te.c
index 3a6d015767..b2790b48fe 100644
--- a/libavcodec/arm/mpegvideo_armv5te.c
+++ b/libavcodec/arm/mpegvideo_armv5te.c
@@ -73,7 +73,7 @@ static void dct_unquantize_h263_intra_armv5te(const MPVContext *s,
     if(s->ac_pred)
         nCoeffs=63;
     else
-        nCoeffs= s->inter_scantable.raster_end[ s->block_last_index[n] ];
+        nCoeffs = s->intra_scantable.raster_end[s->block_last_index[n]];
 
     ff_dct_unquantize_h263_armv5te(block, qmul, qadd, nCoeffs + 1);
     block[0] = level;
diff --git a/libavcodec/arm/mpegvideo_neon.S b/libavcodec/arm/mpegvideo_neon.S
index 1889d7a912..c7a35ea267 100644
--- a/libavcodec/arm/mpegvideo_neon.S
+++ b/libavcodec/arm/mpegvideo_neon.S
@@ -77,7 +77,7 @@ function ff_dct_unquantize_h263_intra_neon, export=1
         push            {r4-r6,lr}
         add             r12, r0,  #BLOCK_LAST_INDEX
         ldr             r6,  [r0, #AC_PRED]
-        add             lr,  r0,  #INTER_SCANTAB_RASTER_END
+        add             lr,  r0,  #INTRA_SCANTAB_RASTER_END
         cmp             r6,  #0
         it              ne
         movne           r12, #63
diff --git a/libavcodec/mpegvideo.h b/libavcodec/mpegvideo.h
index e21ce5164d..758bf57ab9 100644
--- a/libavcodec/mpegvideo.h
+++ b/libavcodec/mpegvideo.h
@@ -72,11 +72,11 @@ typedef struct MpegEncContext {
 
     /* scantables */
     ScanTable inter_scantable; ///< if inter == intra then intra should be used to reduce the cache usage
+    ScanTable intra_scantable;
 
     /* WARNING: changes above this line require updates to hardcoded
      *          offsets used in ASM. */
 
-    ScanTable intra_scantable;
     uint8_t permutated_intra_h_scantable[64];
     uint8_t permutated_intra_v_scantable[64];
 
diff --git a/libavcodec/neon/mpegvideo.c b/libavcodec/neon/mpegvideo.c
index fdc57d3876..3427dbe427 100644
--- a/libavcodec/neon/mpegvideo.c
+++ b/libavcodec/neon/mpegvideo.c
@@ -112,7 +112,7 @@ static void dct_unquantize_h263_intra_neon(const MPVContext *s, int16_t *block,
     if (s->ac_pred) {
         nCoeffs = 63;
     } else {
-        nCoeffs = s->inter_scantable.raster_end[s->block_last_index[n]];
+        nCoeffs = s->intra_scantable.raster_end[s->block_last_index[n]];
         if (nCoeffs <= 0)
             return;
     }
-- 
2.52.0


From 3e29f976105a5f898ed3a89f2f31ad4627f4b35b Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Fri, 28 Nov 2025 22:25:39 +0100
Subject: [PATCH 172/304] tests/checkasm: Add mpegvideo unquantize test

This adds a test for the mpegvideo unquantize functions.

It has been written in order to be able to easily bench
these functions. It should be noted that the random input
fed to the tested functions is not necessarily representative
of the stuff actually occuring in the wild. So benchmarks should
be taken with a grain of salt; but comparisons between two functions
that do not depend on branch predictions are valid (the usecase
for this is to port the x86 mmx functions to use xmm registers).

During testing I have found a bug in the arm/aarch64 neon optimizations
when using the LIBMPEG2 permutation (used by FF_IDCT_INT): The code
seems to be based on the presumption that the remainder of the number
of coefficients to process is always <= 4 mod 16. The test therefore
sometimes fails for these arches.

Hint: I am not certain that 16 bits are enough for the intermediate
values of all the computations involved; e.g. both FLV and MPEG-4
escape values can go beyond that after the corresponding
multiplications. The input in this test is nevertheless designed
to fit into 16 bits.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 tests/checkasm/Makefile               |   1 +
 tests/checkasm/checkasm.c             |   3 +
 tests/checkasm/checkasm.h             |   1 +
 tests/checkasm/mpegvideo_unquantize.c | 273 ++++++++++++++++++++++++++
 tests/fate/checkasm.mak               |   1 +
 5 files changed, 279 insertions(+)
 create mode 100644 tests/checkasm/mpegvideo_unquantize.c

diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index 3762c0d83b..b9c8adb21f 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -19,6 +19,7 @@ AVCODECOBJS-$(CONFIG_LLVIDDSP)          += llviddsp.o
 AVCODECOBJS-$(CONFIG_LLVIDENCDSP)       += llviddspenc.o
 AVCODECOBJS-$(CONFIG_LPC)               += lpc.o
 AVCODECOBJS-$(CONFIG_ME_CMP)            += motion.o
+AVCODECOBJS-$(CONFIG_MPEGVIDEO)         += mpegvideo_unquantize.o
 AVCODECOBJS-$(CONFIG_MPEGVIDEOENCDSP)   += mpegvideoencdsp.o
 AVCODECOBJS-$(CONFIG_QPELDSP)           += qpeldsp.o
 AVCODECOBJS-$(CONFIG_VC1DSP)            += vc1dsp.o
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 8c64684fa3..a899967937 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -215,6 +215,9 @@ static const struct {
     #if CONFIG_ME_CMP
         { "motion", checkasm_check_motion },
     #endif
+    #if CONFIG_MPEGVIDEO
+        { "mpegvideo_unquantize", checkasm_check_mpegvideo_unquantize },
+    #endif
     #if CONFIG_MPEGVIDEOENCDSP
         { "mpegvideoencdsp", checkasm_check_mpegvideoencdsp },
     #endif
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index 05f74ca16b..ec075c4763 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -123,6 +123,7 @@ void checkasm_check_llviddsp(void);
 void checkasm_check_llviddspenc(void);
 void checkasm_check_lpc(void);
 void checkasm_check_motion(void);
+void checkasm_check_mpegvideo_unquantize(void);
 void checkasm_check_mpegvideoencdsp(void);
 void checkasm_check_nlmeans(void);
 void checkasm_check_opusdsp(void);
diff --git a/tests/checkasm/mpegvideo_unquantize.c b/tests/checkasm/mpegvideo_unquantize.c
new file mode 100644
index 0000000000..837606e60e
--- /dev/null
+++ b/tests/checkasm/mpegvideo_unquantize.c
@@ -0,0 +1,273 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <assert.h>
+#include <stddef.h>
+
+#include "config.h"
+
+#include "checkasm.h"
+
+#include "libavcodec/idctdsp.h"
+#include "libavcodec/mathops.h"
+#include "libavcodec/mpegvideo.h"
+#include "libavcodec/mpegvideodata.h"
+#include "libavcodec/mpegvideo_unquantize.h"
+
+#include "libavutil/intreadwrite.h"
+#include "libavutil/mem_internal.h"
+
+#define randomize_struct(TYPE, s) do {                    \
+    static_assert(!(_Alignof(TYPE) % 4),                  \
+                  "can't use aligned stores");            \
+    unsigned char *ptr = (unsigned char*)s;               \
+    for (size_t i = 0; i < sizeof(*s) & ~3; i += 4)       \
+        AV_WN32A(ptr + i, rnd());                         \
+    for (size_t i = sizeof(*s) & ~3; i < sizeof(*s); ++i) \
+        ptr[i] = rnd();                                   \
+   } while (0)
+
+enum TestType {
+    H263,
+    MPEG1,
+    MPEG2,
+};
+
+static void init_idct_scantable(MPVContext *const s, int intra_scantable)
+{
+    static const enum idct_permutation_type permutation_types[] = {
+        FF_IDCT_PERM_NONE,
+        FF_IDCT_PERM_LIBMPEG2,
+#if ARCH_X86_32 && HAVE_X86ASM
+        FF_IDCT_PERM_SIMPLE,
+#endif
+#if ARCH_PPC || ARCH_X86
+        FF_IDCT_PERM_TRANSPOSE,
+#endif
+#if ARCH_ARM || ARCH_AARCH64
+        FF_IDCT_PERM_PARTTRANS,
+#endif
+#if ARCH_X86 && HAVE_X86ASM
+        FF_IDCT_PERM_SSE2,
+#endif
+    };
+    // Copied here to avoid #ifs.
+    static const uint8_t ff_wmv1_scantable[][64] = {
+    { 0x00, 0x08, 0x01, 0x02, 0x09, 0x10, 0x18, 0x11,
+      0x0A, 0x03, 0x04, 0x0B, 0x12, 0x19, 0x20, 0x28,
+      0x30, 0x38, 0x29, 0x21, 0x1A, 0x13, 0x0C, 0x05,
+      0x06, 0x0D, 0x14, 0x1B, 0x22, 0x31, 0x39, 0x3A,
+      0x32, 0x2A, 0x23, 0x1C, 0x15, 0x0E, 0x07, 0x0F,
+      0x16, 0x1D, 0x24, 0x2B, 0x33, 0x3B, 0x3C, 0x34,
+      0x2C, 0x25, 0x1E, 0x17, 0x1F, 0x26, 0x2D, 0x35,
+      0x3D, 0x3E, 0x36, 0x2E, 0x27, 0x2F, 0x37, 0x3F, },
+    { 0x00, 0x08, 0x01, 0x02, 0x09, 0x10, 0x18, 0x11,
+      0x0A, 0x03, 0x04, 0x0B, 0x12, 0x19, 0x20, 0x28,
+      0x21, 0x30, 0x1A, 0x13, 0x0C, 0x05, 0x06, 0x0D,
+      0x14, 0x1B, 0x22, 0x29, 0x38, 0x31, 0x39, 0x2A,
+      0x23, 0x1C, 0x15, 0x0E, 0x07, 0x0F, 0x16, 0x1D,
+      0x24, 0x2B, 0x32, 0x3A, 0x33, 0x3B, 0x2C, 0x25,
+      0x1E, 0x17, 0x1F, 0x26, 0x2D, 0x34, 0x3C, 0x35,
+      0x3D, 0x2E, 0x27, 0x2F, 0x36, 0x3E, 0x37, 0x3F, },
+    { 0x00, 0x01, 0x08, 0x02, 0x03, 0x09, 0x10, 0x18,
+      0x11, 0x0A, 0x04, 0x05, 0x0B, 0x12, 0x19, 0x20,
+      0x28, 0x30, 0x21, 0x1A, 0x13, 0x0C, 0x06, 0x07,
+      0x0D, 0x14, 0x1B, 0x22, 0x29, 0x38, 0x31, 0x39,
+      0x2A, 0x23, 0x1C, 0x15, 0x0E, 0x0F, 0x16, 0x1D,
+      0x24, 0x2B, 0x32, 0x3A, 0x33, 0x2C, 0x25, 0x1E,
+      0x17, 0x1F, 0x26, 0x2D, 0x34, 0x3B, 0x3C, 0x35,
+      0x2E, 0x27, 0x2F, 0x36, 0x3D, 0x3E, 0x37, 0x3F, },
+    { 0x00, 0x08, 0x10, 0x01, 0x18, 0x20, 0x28, 0x09,
+      0x02, 0x03, 0x0A, 0x11, 0x19, 0x30, 0x38, 0x29,
+      0x21, 0x1A, 0x12, 0x0B, 0x04, 0x05, 0x0C, 0x13,
+      0x1B, 0x22, 0x31, 0x39, 0x32, 0x2A, 0x23, 0x1C,
+      0x14, 0x0D, 0x06, 0x07, 0x0E, 0x15, 0x1D, 0x24,
+      0x2B, 0x33, 0x3A, 0x3B, 0x34, 0x2C, 0x25, 0x1E,
+      0x16, 0x0F, 0x17, 0x1F, 0x26, 0x2D, 0x3C, 0x35,
+      0x2E, 0x27, 0x2F, 0x36, 0x3D, 0x3E, 0x37, 0x3F, }
+    };
+
+    static const uint8_t *const scantables[] = {
+        ff_alternate_vertical_scan,
+        ff_alternate_horizontal_scan,
+        ff_zigzag_direct,
+        ff_wmv1_scantable[0],
+        ff_wmv1_scantable[1],
+        ff_wmv1_scantable[2],
+        ff_wmv1_scantable[3],
+    };
+    static const uint8_t *scantable = NULL;
+    static enum idct_permutation_type idct_permutation;
+
+    if (!scantable) {
+        scantable        = scantables[rnd() % FF_ARRAY_ELEMS(scantables)];
+        idct_permutation = permutation_types[rnd() % FF_ARRAY_ELEMS(permutation_types)];
+    }
+    ff_init_scantable_permutation(s->idsp.idct_permutation, idct_permutation);
+    ff_init_scantable(s->idsp.idct_permutation,
+                      intra_scantable ? &s->intra_scantable : &s->inter_scantable,
+                      scantable);
+}
+
+static void init_h263_test(MPVContext *const s, int16_t block[64],
+                           int last_nonzero_coeff, int qscale, int intra)
+{
+    const uint8_t *permutation = s->inter_scantable.permutated;
+    if (intra) {
+        permutation = s->intra_scantable.permutated;
+        block[0]    = rnd() & 511;
+        static int h263_aic = -1, ac_pred;
+        if (h263_aic < 0) {
+            h263_aic = rnd() & 1;
+            ac_pred  = rnd() & 1;
+        }
+        s->h263_aic = h263_aic;
+        s->ac_pred  = ac_pred;
+        if (s->ac_pred)
+            last_nonzero_coeff = 63;
+    }
+    for (int i = intra; i <= last_nonzero_coeff; ++i) {
+        int random = rnd();
+        if (random & 1)
+            continue;
+        random >>= 1;
+        // Select level so that the multiplication fits into 16 bits.
+        // FIXME: The FLV and MPEG-4 decoders can have escape values exceeding this.
+        block[permutation[i]] = sign_extend(random, 10);
+    }
+}
+
+static void init_mpeg12_test(MPVContext *const s, int16_t block[64],
+                             int last_nonzero_coeff, int qscale, int intra,
+                             enum TestType type)
+{
+    uint16_t *matrix = intra ? s->intra_matrix : s->inter_matrix;
+
+    if (type == MPEG2)
+        qscale = s->q_scale_type ? ff_mpeg2_non_linear_qscale[qscale] : qscale << 1;
+
+    for (int i = 0; i < 64; ++i)
+        matrix[i] = 1 + rnd() % 254;
+
+    const uint8_t *permutation = s->intra_scantable.permutated;
+    if (intra) {
+        block[0] = (int8_t)rnd();
+        for (int i = 1; i <= last_nonzero_coeff; ++i) {
+            int j = permutation[i];
+            unsigned random = rnd();
+            if (random & 1)
+                continue;
+            random >>= 1;
+            // Select level so that the multiplication does not overflow
+            // an int16_t and so that it is within the possible range
+            // (-2048..2047). FIXME: It seems that this need not be fulfilled
+            // in practice for the MPEG-4 decoder at least.
+            int limit = FFMIN(INT16_MAX / (qscale * matrix[j]), 2047);
+            block[j] = random % (2 * limit + 1) - limit;
+        }
+    } else {
+        for (int i = 0; i <= last_nonzero_coeff; ++i) {
+            int j = permutation[i];
+            unsigned random = rnd();
+            if (random & 1)
+                continue;
+            random >>= 1;
+            int limit = FFMIN((INT16_MAX / (qscale * matrix[j]) - 1) / 2, 2047);
+            block[j] = random % (2 * limit + 1) - limit;
+        }
+    }
+}
+
+void checkasm_check_mpegvideo_unquantize(void)
+{
+    static const struct {
+        const char *name;
+        size_t offset;
+        int intra, intra_scantable;
+        enum TestType type;
+    } tests[] = {
+#define TEST(NAME, INTRA, INTRA_SCANTABLE, TYPE)                         \
+    { .name = #NAME, .offset = offsetof(MPVUnquantDSPContext, NAME),     \
+      .intra = INTRA, .intra_scantable = INTRA_SCANTABLE, .type = TYPE }
+        TEST(dct_unquantize_mpeg1_intra, 1, 1, MPEG1),
+        TEST(dct_unquantize_mpeg1_inter, 0, 1, MPEG1),
+        TEST(dct_unquantize_mpeg2_intra, 1, 1, MPEG2),
+        TEST(dct_unquantize_mpeg2_inter, 0, 1, MPEG2),
+        TEST(dct_unquantize_h263_intra,  1, 1, H263),
+        TEST(dct_unquantize_h263_inter,  0, 0, H263),
+    };
+    MPVUnquantDSPContext unquant_dsp_ctx;
+    int q_scale_type = rnd() & 1;
+
+    ff_mpv_unquantize_init(&unquant_dsp_ctx, 1 /* bitexact */, q_scale_type);
+    declare_func_emms(AV_CPU_FLAG_MMX, void, MPVContext *s, int16_t *block, int n, int qscale);
+
+    for (size_t i = 0; i < FF_ARRAY_ELEMS(tests); ++i) {
+        void (*func)(MPVContext *s, int16_t *block, int n, int qscale) =
+            *(void (**)(MPVContext *, int16_t *, int, int))((char*)&unquant_dsp_ctx + tests[i].offset);
+        if (check_func(func, "%s", tests[i].name)) {
+            MPVContext new, ref;
+            DECLARE_ALIGNED(16, int16_t, block_new)[64];
+            DECLARE_ALIGNED(16, int16_t, block_ref)[64];
+            static int block_last_index = -1;
+
+            randomize_struct(MPVContext, &ref);
+
+            ref.q_scale_type = q_scale_type;
+
+            init_idct_scantable(&ref, tests[i].intra_scantable);
+
+            if (block_last_index < 0)
+                block_last_index = rnd() % 64;
+
+            memset(block_ref, 0, sizeof(block_ref));
+
+            if (tests[i].intra) {
+                // Less restricted than real dc_scale values
+                ref.y_dc_scale = 1 + rnd() % 64;
+                ref.c_dc_scale = 1 + rnd() % 64;
+            }
+
+            static int qscale = 0;
+
+            if (qscale == 0)
+                qscale = 1 + rnd() % 31;
+
+            if (tests[i].type == H263)
+                init_h263_test(&ref, block_ref, block_last_index, qscale,
+                               tests[i].intra);
+            else
+                init_mpeg12_test(&ref, block_ref, block_last_index, qscale,
+                                 tests[i].intra, tests[i].type);
+
+            int n = rnd() % 6;
+            ref.block_last_index[n] = block_last_index;
+
+            memcpy(&new, &ref, sizeof(new));
+            memcpy(block_new, block_ref, sizeof(block_new));
+
+            call_ref(&ref, block_ref, n, qscale);
+            call_new(&new, block_new, n, qscale);
+
+            if (memcmp(&ref, &new, sizeof(new)) || memcmp(block_new, block_ref, sizeof(block_new)))
+                fail();
+
+            bench_new(&new, block_new, n, qscale);
+        }
+    }
+}
diff --git a/tests/fate/checkasm.mak b/tests/fate/checkasm.mak
index f182efde46..48edd17bf2 100644
--- a/tests/fate/checkasm.mak
+++ b/tests/fate/checkasm.mak
@@ -39,6 +39,7 @@ FATE_CHECKASM = fate-checkasm-aacencdsp                                 \
                 fate-checkasm-llviddspenc                               \
                 fate-checkasm-lpc                                       \
                 fate-checkasm-motion                                    \
+                fate-checkasm-mpegvideo_unquantize                      \
                 fate-checkasm-mpegvideoencdsp                           \
                 fate-checkasm-opusdsp                                   \
                 fate-checkasm-pixblockdsp                               \
-- 
2.52.0


From d84148e8ed8906c2f9e334e1aaf43e4995e2db2e Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sat, 29 Nov 2025 01:05:51 +0100
Subject: [PATCH 173/304] avcodec/{arm,neon}/mpegvideo: Fix h263 unquantize
 functions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

These functions currently operate on the assumption that the number
of coefficients to process is always of the form 16k+m with m<=4 or >8.
Yet this is not true when the IDCT permutation is of type FF_IDCT_PERM_LIBMPEG2
(i.e. when FF_IDCT_INT is in use).

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/arm/mpegvideo_neon.S | 18 +++++++++---------
 libavcodec/neon/mpegvideo.c     | 22 ++++++++--------------
 2 files changed, 17 insertions(+), 23 deletions(-)

diff --git a/libavcodec/arm/mpegvideo_neon.S b/libavcodec/arm/mpegvideo_neon.S
index c7a35ea267..7e42bdf6c5 100644
--- a/libavcodec/arm/mpegvideo_neon.S
+++ b/libavcodec/arm/mpegvideo_neon.S
@@ -36,7 +36,7 @@ function ff_dct_unquantize_h263_neon, export=1
         vdup.16         q15, r0                 @ qmul
         vdup.16         q14, r2                 @ qadd
         vneg.s16        q13, q14
-        cmp             r3,  #4
+        cmp             r3,  #8
         mov             r0,  r1
         ble             2f
 1:
@@ -62,14 +62,14 @@ function ff_dct_unquantize_h263_neon, export=1
         cmp             r3,  #8
         bgt             1b
 2:
-        vld1.16         {d0},     [r0,:64]
-        vclt.s16        d3,  d0,  #0
-        vceq.s16        d1,  d0,  #0
-        vmul.s16        d2,  d0,  d30
-        vbsl            d3,  d26, d28
-        vadd.s16        d2,  d2,  d3
-        vbif            d0,  d2,  d1
-        vst1.16         {d0},     [r1,:64]
+        vld1.16         {q0},     [r0,:128]
+        vclt.s16        q3,  q0,  #0
+        vceq.s16        q1,  q0,  #0
+        vmul.s16        q2,  q0,  q15
+        vbsl            q3,  q13, q14
+        vadd.s16        q2,  q2,  q3
+        vbif            q0,  q2,  q1
+        vst1.16         {q0},     [r1,:128]
         bx              lr
 endfunc
 
diff --git a/libavcodec/neon/mpegvideo.c b/libavcodec/neon/mpegvideo.c
index 3427dbe427..44e9b70303 100644
--- a/libavcodec/neon/mpegvideo.c
+++ b/libavcodec/neon/mpegvideo.c
@@ -39,12 +39,7 @@ static void inline ff_dct_unquantize_h263_neon(int qscale, int qadd, int nCoeffs
 {
     int16x8_t q0s16, q2s16, q3s16, q8s16, q10s16, q11s16, q13s16;
     int16x8_t q14s16, q15s16, qzs16;
-    int16x4_t d0s16, d2s16, d3s16, dzs16;
     uint16x8_t q1u16, q9u16;
-    uint16x4_t d1u16;
-
-    dzs16 = vdup_n_s16(0);
-    qzs16 = vdupq_n_s16(0);
 
     q15s16 = vdupq_n_s16(qscale << 1);
     q14s16 = vdupq_n_s16(qadd);
@@ -73,15 +68,14 @@ static void inline ff_dct_unquantize_h263_neon(int qscale, int qadd, int nCoeffs
     if (nCoeffs <= 0)
         return;
 
-    d0s16 = vld1_s16(block);
-    d3s16 = vreinterpret_s16_u16(vclt_s16(d0s16, dzs16));
-    d1u16 = vceq_s16(d0s16, dzs16);
-    d2s16 = vmul_s16(d0s16, vget_high_s16(q15s16));
-    d3s16 = vbsl_s16(vreinterpret_u16_s16(d3s16),
-                     vget_high_s16(q13s16), vget_high_s16(q14s16));
-    d2s16 = vadd_s16(d2s16, d3s16);
-    d0s16 = vbsl_s16(d1u16, d0s16, d2s16);
-    vst1_s16(block, d0s16);
+    q0s16 = vld1q_s16(block);
+    q3s16 = vreinterpretq_s16_u16(vcltq_s16(q0s16, qzs16));
+    q1u16 = vceqq_s16(q0s16, qzs16);
+    q2s16 = vmulq_s16(q0s16, q15s16);
+    q3s16 = vbslq_s16(vreinterpretq_u16_s16(q3s16), q13s16, q14s16);
+    q2s16 = vaddq_s16(q2s16, q3s16);
+    q0s16 = vbslq_s16(q1u16, q0s16, q2s16);
+    vst1q_s16(block, q0s16);
 }
 
 static void dct_unquantize_h263_inter_neon(const MPVContext *s, int16_t *block,
-- 
2.52.0


From 84f8fbc6e729632ae45c14399f9729c2a56819ad Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sat, 29 Nov 2025 01:17:08 +0100
Subject: [PATCH 174/304] avcodec/mpegvideo: Move ff_init_scantable() to
 mpegvideo_unquantize.c

This is necessary so that the mpegvideo_unquantize checkasm test
does not pull mpegvideo.o and then all of libavcodec into checkasm.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/mpegvideo.c            | 15 ---------------
 libavcodec/mpegvideo_unquantize.c | 14 ++++++++++++++
 2 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/libavcodec/mpegvideo.c b/libavcodec/mpegvideo.c
index a137fe31db..7ca2c8f701 100644
--- a/libavcodec/mpegvideo.c
+++ b/libavcodec/mpegvideo.c
@@ -42,7 +42,6 @@
 #include "mpegutils.h"
 #include "mpegvideo.h"
 #include "mpegvideodata.h"
-#include "mpegvideo_unquantize.h"
 #include "libavutil/refstruct.h"
 
 
@@ -79,20 +78,6 @@ static av_cold void dsp_init(MpegEncContext *s)
     }
 }
 
-av_cold void ff_init_scantable(const uint8_t *permutation, ScanTable *st,
-                               const uint8_t *src_scantable)
-{
-    st->scantable = src_scantable;
-
-    for (int i = 0, end = -1; i < 64; i++) {
-        int j = src_scantable[i];
-        st->permutated[i] = permutation[j];
-        if (permutation[j] > end)
-            end = permutation[j];
-        st->raster_end[i] = end;
-    }
-}
-
 av_cold void ff_mpv_idct_init(MpegEncContext *s)
 {
     if (s->codec_id == AV_CODEC_ID_MPEG4)
diff --git a/libavcodec/mpegvideo_unquantize.c b/libavcodec/mpegvideo_unquantize.c
index 06c29d0753..9297c80b47 100644
--- a/libavcodec/mpegvideo_unquantize.c
+++ b/libavcodec/mpegvideo_unquantize.c
@@ -33,6 +33,20 @@
 #include "mpegvideodata.h"
 #include "mpegvideo_unquantize.h"
 
+av_cold void ff_init_scantable(const uint8_t *permutation, ScanTable *st,
+                               const uint8_t *src_scantable)
+{
+    st->scantable = src_scantable;
+
+    for (int i = 0, end = -1; i < 64; i++) {
+        int j = src_scantable[i];
+        st->permutated[i] = permutation[j];
+        if (permutation[j] > end)
+            end = permutation[j];
+        st->raster_end[i] = end;
+    }
+}
+
 static void dct_unquantize_mpeg1_intra_c(const MPVContext *s,
                                          int16_t *block, int n, int qscale)
 {
-- 
2.52.0


From 96408234957fb253d2778385bd0295ca14d8b4a2 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sat, 29 Nov 2025 22:23:50 +0100
Subject: [PATCH 175/304] avcodec/x86/mpegvideo: Use correct inline assembly
 constraints

The H.263 unquantize functions modified an input parameter.
(And they did so since this code was added in
7f3f5ec87bcbf244fce49ffdb476d4ae6e523af6. I am surprised
that this didn't cause issues, particularly with the intra function.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/mpegvideo.c | 64 +++++++++++++++++++-------------------
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/libavcodec/x86/mpegvideo.c b/libavcodec/x86/mpegvideo.c
index 4c3299362e..38dcd8fc6e 100644
--- a/libavcodec/x86/mpegvideo.c
+++ b/libavcodec/x86/mpegvideo.c
@@ -33,9 +33,8 @@
 static void dct_unquantize_h263_intra_mmx(const MPVContext *s,
                                           int16_t *block, int n, int qscale)
 {
-    x86_reg level, qmul, qadd, nCoeffs;
-
-    qmul = qscale << 1;
+    x86_reg qmul = (unsigned)qscale << 1;
+    int level, qadd;
 
     av_assert2(s->block_last_index[n]>=0 || s->h263_aic);
 
@@ -49,16 +48,15 @@ static void dct_unquantize_h263_intra_mmx(const MPVContext *s,
         qadd = 0;
         level= block[0];
     }
-    if(s->ac_pred)
-        nCoeffs=63;
-    else
-        nCoeffs= s->intra_scantable.raster_end[ s->block_last_index[n] ];
+    x86_reg offset = s->ac_pred ? 63 << 1 : s->intra_scantable.raster_end[s->block_last_index[n]] << 1;
 
 __asm__ volatile(
-                "movd %1, %%mm6                 \n\t" //qmul
+                "movd          %k1, %%mm6       \n\t" //qmul
+                "lea      (%2, %0), %1          \n\t"
+                "neg            %0              \n\t"
                 "packssdw %%mm6, %%mm6          \n\t"
                 "packssdw %%mm6, %%mm6          \n\t"
-                "movd %2, %%mm5                 \n\t" //qadd
+                "movd           %3, %%mm5       \n\t" //qadd
                 "pxor %%mm7, %%mm7              \n\t"
                 "packssdw %%mm5, %%mm5          \n\t"
                 "packssdw %%mm5, %%mm5          \n\t"
@@ -66,14 +64,14 @@ __asm__ volatile(
                 "pxor %%mm4, %%mm4              \n\t"
                 ".p2align 4                     \n\t"
                 "1:                             \n\t"
-                "movq (%0, %3), %%mm0           \n\t"
-                "movq 8(%0, %3), %%mm1          \n\t"
+                "movq     (%1, %0), %%mm0       \n\t"
+                "movq    8(%1, %0), %%mm1       \n\t"
 
                 "pmullw %%mm6, %%mm0            \n\t"
                 "pmullw %%mm6, %%mm1            \n\t"
 
-                "movq (%0, %3), %%mm2           \n\t"
-                "movq 8(%0, %3), %%mm3          \n\t"
+                "movq     (%1, %0), %%mm2       \n\t"
+                "movq    8(%1, %0), %%mm3       \n\t"
 
                 "pcmpgtw %%mm4, %%mm2           \n\t" // block[i] < 0 ? -1 : 0
                 "pcmpgtw %%mm4, %%mm3           \n\t" // block[i] < 0 ? -1 : 0
@@ -93,12 +91,13 @@ __asm__ volatile(
                 "pandn %%mm2, %%mm0             \n\t"
                 "pandn %%mm3, %%mm1             \n\t"
 
-                "movq %%mm0, (%0, %3)           \n\t"
-                "movq %%mm1, 8(%0, %3)          \n\t"
+                "movq        %%mm0, (%1, %0)    \n\t"
+                "movq        %%mm1, 8(%1, %0)   \n\t"
 
-                "add $16, %3                    \n\t"
+                "add           $16, %0          \n\t"
                 "jng 1b                         \n\t"
-                ::"r" (block+nCoeffs), "rm"(qmul), "rm" (qadd), "r" (2*(-nCoeffs))
+                : "+r"(offset), "+r"(qmul)
+                : "r" (block), "rm" (qadd)
                 : "memory"
         );
         block[0]= level;
@@ -108,20 +107,20 @@ __asm__ volatile(
 static void dct_unquantize_h263_inter_mmx(const MPVContext *s,
                                           int16_t *block, int n, int qscale)
 {
-    x86_reg qmul, qadd, nCoeffs;
-
-    qmul = qscale << 1;
-    qadd = (qscale - 1) | 1;
+    int qmul = qscale << 1;
+    int qadd = (qscale - 1) | 1;
 
     av_assert2(s->block_last_index[n]>=0 || s->h263_aic);
 
-    nCoeffs= s->inter_scantable.raster_end[ s->block_last_index[n] ];
+    x86_reg offset = s->inter_scantable.raster_end[s->block_last_index[n]] << 1;
 
 __asm__ volatile(
-                "movd %1, %%mm6                 \n\t" //qmul
+                "movd           %2, %%mm6       \n\t" //qmul
                 "packssdw %%mm6, %%mm6          \n\t"
                 "packssdw %%mm6, %%mm6          \n\t"
-                "movd %2, %%mm5                 \n\t" //qadd
+                "movd           %3, %%mm5       \n\t" //qadd
+                "add            %1, %0          \n\t"
+                "neg            %1              \n\t"
                 "pxor %%mm7, %%mm7              \n\t"
                 "packssdw %%mm5, %%mm5          \n\t"
                 "packssdw %%mm5, %%mm5          \n\t"
@@ -129,14 +128,14 @@ __asm__ volatile(
                 "pxor %%mm4, %%mm4              \n\t"
                 ".p2align 4                     \n\t"
                 "1:                             \n\t"
-                "movq (%0, %3), %%mm0           \n\t"
-                "movq 8(%0, %3), %%mm1          \n\t"
+                "movq     (%0, %1), %%mm0       \n\t"
+                "movq    8(%0, %1), %%mm1       \n\t"
 
                 "pmullw %%mm6, %%mm0            \n\t"
                 "pmullw %%mm6, %%mm1            \n\t"
 
-                "movq (%0, %3), %%mm2           \n\t"
-                "movq 8(%0, %3), %%mm3          \n\t"
+                "movq     (%0, %1), %%mm2       \n\t"
+                "movq    8(%0, %1), %%mm3       \n\t"
 
                 "pcmpgtw %%mm4, %%mm2           \n\t" // block[i] < 0 ? -1 : 0
                 "pcmpgtw %%mm4, %%mm3           \n\t" // block[i] < 0 ? -1 : 0
@@ -156,12 +155,13 @@ __asm__ volatile(
                 "pandn %%mm2, %%mm0             \n\t"
                 "pandn %%mm3, %%mm1             \n\t"
 
-                "movq %%mm0, (%0, %3)           \n\t"
-                "movq %%mm1, 8(%0, %3)          \n\t"
+                "movq        %%mm0, (%0, %1)    \n\t"
+                "movq        %%mm1, 8(%0, %1)   \n\t"
 
-                "add $16, %3                    \n\t"
+                "add           $16, %1          \n\t"
                 "jng 1b                         \n\t"
-                ::"r" (block+nCoeffs), "rm"(qmul), "rm" (qadd), "r" (2*(-nCoeffs))
+                : "+r" (block), "+r" (offset)
+                : "rm"(qmul), "rm" (qadd)
                 : "memory"
         );
 }
-- 
2.52.0


From ac44dbb58062fd7c6db0ef44668fabef09d7fa36 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 7 Oct 2025 10:35:08 +0200
Subject: [PATCH 176/304] avcodec/x86/mpegvideo: Improve unquantizing MPEG-2
 intra blocks

Unquantizing involves calculating
    (block[j] * qscale * quant_matrix[j]) / 16
where / rounds towards zero. Arithmetic right shifts
naturally round towards -inf, so the earlier code
calculated the absolute value first, then used a right-shift
and then negated the result if necessary.

This commit uses a different procedure: It biases the product
for negative values of block[j] by 0xf. The combination of
this and the arithmetic right shift is the same as rounding
towards zero.

Furthermore, a write-only store to mm7 has been removed.

Benchmarks:
dct_unquantize_mpeg2_intra_c:                          214.3 ( 1.00x)
dct_unquantize_mpeg2_intra_mmx (old):                   43.0 ( 4.98x)
dct_unquantize_mpeg2_intra_mmx (new):                   28.4 ( 7.56x)

(The bitexact flag and the test for correctness have beem removed
from checkasm for the benchmarks.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/mpegvideo.c | 38 ++++++++++++--------------------------
 1 file changed, 12 insertions(+), 26 deletions(-)

diff --git a/libavcodec/x86/mpegvideo.c b/libavcodec/x86/mpegvideo.c
index 38dcd8fc6e..d1614eb1eb 100644
--- a/libavcodec/x86/mpegvideo.c
+++ b/libavcodec/x86/mpegvideo.c
@@ -321,8 +321,6 @@ static void dct_unquantize_mpeg2_intra_mmx(const MPVContext *s,
         block0 = block[0] * s->c_dc_scale;
     quant_matrix = s->intra_matrix;
 __asm__ volatile(
-                "pcmpeqw %%mm7, %%mm7           \n\t"
-                "psrlw $15, %%mm7               \n\t"
                 "movd %2, %%mm6                 \n\t"
                 "packssdw %%mm6, %%mm6          \n\t"
                 "packssdw %%mm6, %%mm6          \n\t"
@@ -335,30 +333,18 @@ __asm__ volatile(
                 "movq 8(%1, %%"FF_REG_a"), %%mm5\n\t"
                 "pmullw %%mm6, %%mm4            \n\t" // q=qscale*quant_matrix[i]
                 "pmullw %%mm6, %%mm5            \n\t" // q=qscale*quant_matrix[i]
-                "pxor %%mm2, %%mm2              \n\t"
-                "pxor %%mm3, %%mm3              \n\t"
-                "pcmpgtw %%mm0, %%mm2           \n\t" // block[i] < 0 ? -1 : 0
-                "pcmpgtw %%mm1, %%mm3           \n\t" // block[i] < 0 ? -1 : 0
-                "pxor %%mm2, %%mm0              \n\t"
-                "pxor %%mm3, %%mm1              \n\t"
-                "psubw %%mm2, %%mm0             \n\t" // abs(block[i])
-                "psubw %%mm3, %%mm1             \n\t" // abs(block[i])
-                "pmullw %%mm4, %%mm0            \n\t" // abs(block[i])*q
-                "pmullw %%mm5, %%mm1            \n\t" // abs(block[i])*q
-                "pxor %%mm4, %%mm4              \n\t"
-                "pxor %%mm5, %%mm5              \n\t" // FIXME slow
-                "pcmpeqw (%0, %%"FF_REG_a"), %%mm4 \n\t" // block[i] == 0 ? -1 : 0
-                "pcmpeqw 8(%0, %%"FF_REG_a"), %%mm5\n\t" // block[i] == 0 ? -1 : 0
-                "psraw $4, %%mm0                \n\t"
-                "psraw $4, %%mm1                \n\t"
-                "pxor %%mm2, %%mm0              \n\t"
-                "pxor %%mm3, %%mm1              \n\t"
-                "psubw %%mm2, %%mm0             \n\t"
-                "psubw %%mm3, %%mm1             \n\t"
-                "pandn %%mm0, %%mm4             \n\t"
-                "pandn %%mm1, %%mm5             \n\t"
-                "movq %%mm4, (%0, %%"FF_REG_a") \n\t"
-                "movq %%mm5, 8(%0, %%"FF_REG_a")\n\t"
+                "movq %%mm0, %%mm2              \n\t"
+                "movq %%mm1, %%mm3              \n\t"
+                "psrlw $12, %%mm2               \n\t" // block[i] < 0 ? 0xf : 0
+                "psrlw $12, %%mm3               \n\t" // (block[i] is in the -2048..2047 range)
+                "pmullw %%mm4, %%mm0            \n\t" // block[i]*q
+                "pmullw %%mm5, %%mm1            \n\t" // block[i]*q
+                "paddw %%mm2, %%mm0             \n\t" // bias negative block[i]
+                "paddw %%mm3, %%mm1             \n\t" // so that a right-shift
+                "psraw $4, %%mm0                \n\t" // is equivalent to divide
+                "psraw $4, %%mm1                \n\t" // with rounding towards zero
+                "movq %%mm0, (%0, %%"FF_REG_a") \n\t"
+                "movq %%mm1, 8(%0, %%"FF_REG_a")\n\t"
 
                 "add $16, %%"FF_REG_a"          \n\t"
                 "jng 1b                         \n\t"
-- 
2.52.0


From a9b0a4ee618b43cf530e1dcd41889474d8c34ad8 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Mon, 3 Nov 2025 19:17:16 +0100
Subject: [PATCH 177/304] avcodec/x86/mpegvideo: Don't duplicate register

Currently several inline ASM blocks used a value as
an input and rax as clobber register. The input value
was just moved into the register which then served as loop
counter. This is wasteful, as one can just use the value's
register directly as loop counter.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/mpegvideo.c | 119 +++++++++++++++++++------------------
 1 file changed, 60 insertions(+), 59 deletions(-)

diff --git a/libavcodec/x86/mpegvideo.c b/libavcodec/x86/mpegvideo.c
index d1614eb1eb..aa15e2b32a 100644
--- a/libavcodec/x86/mpegvideo.c
+++ b/libavcodec/x86/mpegvideo.c
@@ -183,19 +183,19 @@ static void dct_unquantize_mpeg1_intra_mmx(const MPVContext *s,
         block0 = block[0] * s->c_dc_scale;
     /* XXX: only MPEG-1 */
     quant_matrix = s->intra_matrix;
+    x86_reg offset = -2 * nCoeffs;
 __asm__ volatile(
                 "pcmpeqw %%mm7, %%mm7           \n\t"
                 "psrlw $15, %%mm7               \n\t"
-                "movd %2, %%mm6                 \n\t"
+                "movd %3, %%mm6                 \n\t"
                 "packssdw %%mm6, %%mm6          \n\t"
                 "packssdw %%mm6, %%mm6          \n\t"
-                "mov %3, %%"FF_REG_a"           \n\t"
                 ".p2align 4                     \n\t"
                 "1:                             \n\t"
-                "movq (%0, %%"FF_REG_a"), %%mm0 \n\t"
-                "movq 8(%0, %%"FF_REG_a"), %%mm1\n\t"
-                "movq (%1, %%"FF_REG_a"), %%mm4 \n\t"
-                "movq 8(%1, %%"FF_REG_a"), %%mm5\n\t"
+                "movq (%1, %0), %%mm0           \n\t"
+                "movq 8(%1, %0), %%mm1          \n\t"
+                "movq (%2, %0), %%mm4           \n\t"
+                "movq 8(%2, %0), %%mm5          \n\t"
                 "pmullw %%mm6, %%mm4            \n\t" // q=qscale*quant_matrix[i]
                 "pmullw %%mm6, %%mm5            \n\t" // q=qscale*quant_matrix[i]
                 "pxor %%mm2, %%mm2              \n\t"
@@ -210,8 +210,8 @@ __asm__ volatile(
                 "pmullw %%mm5, %%mm1            \n\t" // abs(block[i])*q
                 "pxor %%mm4, %%mm4              \n\t"
                 "pxor %%mm5, %%mm5              \n\t" // FIXME slow
-                "pcmpeqw (%0, %%"FF_REG_a"), %%mm4 \n\t" // block[i] == 0 ? -1 : 0
-                "pcmpeqw 8(%0, %%"FF_REG_a"), %%mm5\n\t" // block[i] == 0 ? -1 : 0
+                "pcmpeqw (%1, %0), %%mm4        \n\t" // block[i] == 0 ? -1 : 0
+                "pcmpeqw 8(%1, %0), %%mm5       \n\t" // block[i] == 0 ? -1 : 0
                 "psraw $3, %%mm0                \n\t"
                 "psraw $3, %%mm1                \n\t"
                 "psubw %%mm7, %%mm0             \n\t"
@@ -224,13 +224,14 @@ __asm__ volatile(
                 "psubw %%mm3, %%mm1             \n\t"
                 "pandn %%mm0, %%mm4             \n\t"
                 "pandn %%mm1, %%mm5             \n\t"
-                "movq %%mm4, (%0, %%"FF_REG_a") \n\t"
-                "movq %%mm5, 8(%0, %%"FF_REG_a")\n\t"
+                "movq %%mm4, (%1, %0)           \n\t"
+                "movq %%mm5, 8(%1, %0)          \n\t"
 
-                "add $16, %%"FF_REG_a"          \n\t"
+                "add $16, %0                    \n\t"
                 "js 1b                          \n\t"
-                ::"r" (block+nCoeffs), "r"(quant_matrix+nCoeffs), "rm" (qscale), "g" (-2*nCoeffs)
-                : "%"FF_REG_a, "memory"
+                : "+r" (offset)
+                : "r" (block+nCoeffs), "r"(quant_matrix+nCoeffs), "rm" (qscale)
+                : "memory"
         );
     block[0]= block0;
 }
@@ -246,19 +247,19 @@ static void dct_unquantize_mpeg1_inter_mmx(const MPVContext *s,
     nCoeffs= s->intra_scantable.raster_end[ s->block_last_index[n] ]+1;
 
         quant_matrix = s->inter_matrix;
+    x86_reg offset = -2 * nCoeffs;
 __asm__ volatile(
                 "pcmpeqw %%mm7, %%mm7           \n\t"
                 "psrlw $15, %%mm7               \n\t"
-                "movd %2, %%mm6                 \n\t"
+                "movd %3, %%mm6                 \n\t"
                 "packssdw %%mm6, %%mm6          \n\t"
                 "packssdw %%mm6, %%mm6          \n\t"
-                "mov %3, %%"FF_REG_a"           \n\t"
                 ".p2align 4                     \n\t"
                 "1:                             \n\t"
-                "movq (%0, %%"FF_REG_a"), %%mm0 \n\t"
-                "movq 8(%0, %%"FF_REG_a"), %%mm1\n\t"
-                "movq (%1, %%"FF_REG_a"), %%mm4 \n\t"
-                "movq 8(%1, %%"FF_REG_a"), %%mm5\n\t"
+                "movq (%1, %0), %%mm0           \n\t"
+                "movq 8(%1, %0), %%mm1          \n\t"
+                "movq (%2, %0), %%mm4           \n\t"
+                "movq 8(%2, %0), %%mm5          \n\t"
                 "pmullw %%mm6, %%mm4            \n\t" // q=qscale*quant_matrix[i]
                 "pmullw %%mm6, %%mm5            \n\t" // q=qscale*quant_matrix[i]
                 "pxor %%mm2, %%mm2              \n\t"
@@ -277,8 +278,8 @@ __asm__ volatile(
                 "pmullw %%mm5, %%mm1            \n\t" // (abs(block[i])*2 + 1)*q
                 "pxor %%mm4, %%mm4              \n\t"
                 "pxor %%mm5, %%mm5              \n\t" // FIXME slow
-                "pcmpeqw (%0, %%"FF_REG_a"), %%mm4 \n\t" // block[i] == 0 ? -1 : 0
-                "pcmpeqw 8(%0, %%"FF_REG_a"), %%mm5\n\t" // block[i] == 0 ? -1 : 0
+                "pcmpeqw (%1, %0), %%mm4        \n\t" // block[i] == 0 ? -1 : 0
+                "pcmpeqw 8(%1, %0), %%mm5       \n\t" // block[i] == 0 ? -1 : 0
                 "psraw $4, %%mm0                \n\t"
                 "psraw $4, %%mm1                \n\t"
                 "psubw %%mm7, %%mm0             \n\t"
@@ -291,13 +292,14 @@ __asm__ volatile(
                 "psubw %%mm3, %%mm1             \n\t"
                 "pandn %%mm0, %%mm4             \n\t"
                 "pandn %%mm1, %%mm5             \n\t"
-                "movq %%mm4, (%0, %%"FF_REG_a") \n\t"
-                "movq %%mm5, 8(%0, %%"FF_REG_a")\n\t"
+                "movq %%mm4, (%1, %0)           \n\t"
+                "movq %%mm5, 8(%1, %0)          \n\t"
 
-                "add $16, %%"FF_REG_a"          \n\t"
+                "add $16, %0                    \n\t"
                 "js 1b                          \n\t"
-                ::"r" (block+nCoeffs), "r"(quant_matrix+nCoeffs), "rm" (qscale), "g" (-2*nCoeffs)
-                : "%"FF_REG_a, "memory"
+                : "+r" (offset)
+                : "r" (block+nCoeffs), "r"(quant_matrix+nCoeffs), "rm" (qscale)
+                : "memory"
         );
 }
 
@@ -320,17 +322,17 @@ static void dct_unquantize_mpeg2_intra_mmx(const MPVContext *s,
     else
         block0 = block[0] * s->c_dc_scale;
     quant_matrix = s->intra_matrix;
+    x86_reg offset = -2 * nCoeffs;
 __asm__ volatile(
-                "movd %2, %%mm6                 \n\t"
+                "movd %3, %%mm6                 \n\t"
                 "packssdw %%mm6, %%mm6          \n\t"
                 "packssdw %%mm6, %%mm6          \n\t"
-                "mov %3, %%"FF_REG_a"           \n\t"
                 ".p2align 4                     \n\t"
                 "1:                             \n\t"
-                "movq (%0, %%"FF_REG_a"), %%mm0 \n\t"
-                "movq 8(%0, %%"FF_REG_a"), %%mm1\n\t"
-                "movq (%1, %%"FF_REG_a"), %%mm4 \n\t"
-                "movq 8(%1, %%"FF_REG_a"), %%mm5\n\t"
+                "movq (%1, %0), %%mm0           \n\t"
+                "movq 8(%1, %0), %%mm1          \n\t"
+                "movq (%2, %0), %%mm4           \n\t"
+                "movq 8(%2, %0), %%mm5          \n\t"
                 "pmullw %%mm6, %%mm4            \n\t" // q=qscale*quant_matrix[i]
                 "pmullw %%mm6, %%mm5            \n\t" // q=qscale*quant_matrix[i]
                 "movq %%mm0, %%mm2              \n\t"
@@ -343,13 +345,14 @@ __asm__ volatile(
                 "paddw %%mm3, %%mm1             \n\t" // so that a right-shift
                 "psraw $4, %%mm0                \n\t" // is equivalent to divide
                 "psraw $4, %%mm1                \n\t" // with rounding towards zero
-                "movq %%mm0, (%0, %%"FF_REG_a") \n\t"
-                "movq %%mm1, 8(%0, %%"FF_REG_a")\n\t"
+                "movq %%mm0, (%1, %0)           \n\t"
+                "movq %%mm1, 8(%1, %0)          \n\t"
 
-                "add $16, %%"FF_REG_a"          \n\t"
+                "add $16, %0                    \n\t"
                 "jng 1b                         \n\t"
-                ::"r" (block+nCoeffs), "r"(quant_matrix+nCoeffs), "rm" (qscale), "g" (-2*nCoeffs)
-                : "%"FF_REG_a, "memory"
+                : "+r" (offset)
+                : "r" (block+nCoeffs), "r"(quant_matrix+nCoeffs), "rm" (qscale)
+                : "memory"
         );
     block[0]= block0;
         //Note, we do not do mismatch control for intra as errors cannot accumulate
@@ -358,30 +361,27 @@ __asm__ volatile(
 static void dct_unquantize_mpeg2_inter_mmx(const MPVContext *s,
                                            int16_t *block, int n, int qscale)
 {
-    x86_reg nCoeffs;
-    const uint16_t *quant_matrix;
-
     av_assert2(s->block_last_index[n]>=0);
 
-    if (s->q_scale_type) qscale = ff_mpeg2_non_linear_qscale[qscale];
-    else                 qscale <<= 1;
+    x86_reg qscale2 = s->q_scale_type ? ff_mpeg2_non_linear_qscale[qscale] : (unsigned)qscale << 1;
+    x86_reg offset  = s->intra_scantable.raster_end[s->block_last_index[n]] << 1;
+    const void *quant_matrix = (const char*)s->inter_matrix + offset;
 
-    nCoeffs= s->intra_scantable.raster_end[ s->block_last_index[n] ];
 
-        quant_matrix = s->inter_matrix;
 __asm__ volatile(
+                "movd          %k1, %%mm6      \n\t"
+                "lea      (%2, %0), %1         \n\t"
+                "neg            %0             \n\t"
                 "pcmpeqw %%mm7, %%mm7           \n\t"
                 "psrlq $48, %%mm7               \n\t"
-                "movd %2, %%mm6                 \n\t"
                 "packssdw %%mm6, %%mm6          \n\t"
                 "packssdw %%mm6, %%mm6          \n\t"
-                "mov %3, %%"FF_REG_a"           \n\t"
                 ".p2align 4                     \n\t"
                 "1:                             \n\t"
-                "movq (%0, %%"FF_REG_a"), %%mm0 \n\t"
-                "movq 8(%0, %%"FF_REG_a"), %%mm1\n\t"
-                "movq (%1, %%"FF_REG_a"), %%mm4 \n\t"
-                "movq 8(%1, %%"FF_REG_a"), %%mm5\n\t"
+                "movq     (%1, %0), %%mm0      \n\t"
+                "movq    8(%1, %0), %%mm1      \n\t"
+                "movq     (%3, %0), %%mm4      \n\t"
+                "movq    8(%3, %0), %%mm5      \n\t"
                 "pmullw %%mm6, %%mm4            \n\t" // q=qscale*quant_matrix[i]
                 "pmullw %%mm6, %%mm5            \n\t" // q=qscale*quant_matrix[i]
                 "pxor %%mm2, %%mm2              \n\t"
@@ -400,8 +400,8 @@ __asm__ volatile(
                 "paddw %%mm5, %%mm1             \n\t" // (abs(block[i])*2 + 1)*q
                 "pxor %%mm4, %%mm4              \n\t"
                 "pxor %%mm5, %%mm5              \n\t" // FIXME slow
-                "pcmpeqw (%0, %%"FF_REG_a"), %%mm4 \n\t" // block[i] == 0 ? -1 : 0
-                "pcmpeqw 8(%0, %%"FF_REG_a"), %%mm5\n\t" // block[i] == 0 ? -1 : 0
+                "pcmpeqw  (%1, %0), %%mm4      \n\t" // block[i] == 0 ? -1 : 0
+                "pcmpeqw 8(%1, %0), %%mm5      \n\t" // block[i] == 0 ? -1 : 0
                 "psrlw $5, %%mm0                \n\t"
                 "psrlw $5, %%mm1                \n\t"
                 "pxor %%mm2, %%mm0              \n\t"
@@ -412,12 +412,12 @@ __asm__ volatile(
                 "pandn %%mm1, %%mm5             \n\t"
                 "pxor %%mm4, %%mm7              \n\t"
                 "pxor %%mm5, %%mm7              \n\t"
-                "movq %%mm4, (%0, %%"FF_REG_a") \n\t"
-                "movq %%mm5, 8(%0, %%"FF_REG_a")\n\t"
+                "movq        %%mm4, (%1, %0)   \n\t"
+                "movq        %%mm5, 8(%1, %0)  \n\t"
 
-                "add $16, %%"FF_REG_a"          \n\t"
+                "add           $16, %0          \n\t"
                 "jng 1b                         \n\t"
-                "movd 124(%0, %3), %%mm0        \n\t"
+                "movd      124(%2), %%mm0      \n\t"
                 "movq %%mm7, %%mm6              \n\t"
                 "psrlq $32, %%mm7               \n\t"
                 "pxor %%mm6, %%mm7              \n\t"
@@ -427,10 +427,11 @@ __asm__ volatile(
                 "pslld $31, %%mm7               \n\t"
                 "psrlq $15, %%mm7               \n\t"
                 "pxor %%mm7, %%mm0              \n\t"
-                "movd %%mm0, 124(%0, %3)        \n\t"
+                "movd        %%mm0, 124(%2)    \n\t"
 
-                ::"r" (block+nCoeffs), "r"(quant_matrix+nCoeffs), "rm" (qscale), "r" (-2*nCoeffs)
-                : "%"FF_REG_a, "memory"
+                : "+r"(offset), "+r" (qscale2)
+                : "r" (block), "r"(quant_matrix)
+                : "memory"
         );
 }
 
-- 
2.52.0


From 5f4aace9c497f2b9ff55a9d4a0414e5f9e29ed89 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 4 Nov 2025 07:53:09 +0100
Subject: [PATCH 178/304] avcodec/x86/mpegvideo: Port
 dct_unquantize_h263_{intra,inter}_mmx to SSSE3

It benefits from wider registers and psignw.

Benchmarks:
dct_unquantize_h263_inter_c:                            88.3 ( 1.00x)
dct_unquantize_h263_inter_mmx:                          24.7 ( 3.58x)
dct_unquantize_h263_inter_ssse3:                         9.3 ( 9.47x)
dct_unquantize_h263_intra_c:                            93.7 ( 1.00x)
dct_unquantize_h263_intra_mmx:                          30.6 ( 3.06x)
dct_unquantize_h263_intra_ssse3:                        16.5 ( 5.69x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/mpegvideo.c | 146 ++++++++++++++++---------------------
 1 file changed, 62 insertions(+), 84 deletions(-)

diff --git a/libavcodec/x86/mpegvideo.c b/libavcodec/x86/mpegvideo.c
index aa15e2b32a..82a29d1bcf 100644
--- a/libavcodec/x86/mpegvideo.c
+++ b/libavcodec/x86/mpegvideo.c
@@ -30,8 +30,13 @@
 
 #if HAVE_MMX_INLINE
 
-static void dct_unquantize_h263_intra_mmx(const MPVContext *s,
-                                          int16_t *block, int n, int qscale)
+#define SPLATW(reg) "punpcklwd    %%" #reg ", %%" #reg "\n\t" \
+                    "pshufd   $0, %%" #reg ", %%" #reg "\n\t"
+
+#if HAVE_SSSE3_INLINE
+
+static void dct_unquantize_h263_intra_ssse3(const MPVContext *s,
+                                            int16_t *block, int n, int qscale)
 {
     x86_reg qmul = (unsigned)qscale << 1;
     int level, qadd;
@@ -51,61 +56,45 @@ static void dct_unquantize_h263_intra_mmx(const MPVContext *s,
     x86_reg offset = s->ac_pred ? 63 << 1 : s->intra_scantable.raster_end[s->block_last_index[n]] << 1;
 
 __asm__ volatile(
-                "movd          %k1, %%mm6       \n\t" //qmul
-                "lea      (%2, %0), %1          \n\t"
-                "neg            %0              \n\t"
-                "packssdw %%mm6, %%mm6          \n\t"
-                "packssdw %%mm6, %%mm6          \n\t"
-                "movd           %3, %%mm5       \n\t" //qadd
-                "pxor %%mm7, %%mm7              \n\t"
-                "packssdw %%mm5, %%mm5          \n\t"
-                "packssdw %%mm5, %%mm5          \n\t"
-                "psubw %%mm5, %%mm7             \n\t"
-                "pxor %%mm4, %%mm4              \n\t"
-                ".p2align 4                     \n\t"
-                "1:                             \n\t"
-                "movq     (%1, %0), %%mm0       \n\t"
-                "movq    8(%1, %0), %%mm1       \n\t"
+                "movd          %k1, %%xmm0     \n\t" //qmul
+                "lea      (%2, %0), %1         \n\t"
+                "neg            %0             \n\t"
+                "movd           %3, %%xmm1     \n\t" //qadd
+                SPLATW(xmm0)
+                SPLATW(xmm1)
 
-                "pmullw %%mm6, %%mm0            \n\t"
-                "pmullw %%mm6, %%mm1            \n\t"
+                ".p2align 4                    \n\t"
+                "1:                            \n\t"
+                "movdqa   (%1, %0), %%xmm2     \n\t"
+                "movdqa 16(%1, %0), %%xmm3     \n\t"
 
-                "movq     (%1, %0), %%mm2       \n\t"
-                "movq    8(%1, %0), %%mm3       \n\t"
+                "movdqa     %%xmm1, %%xmm4     \n\t"
+                "movdqa     %%xmm1, %%xmm5     \n\t"
 
-                "pcmpgtw %%mm4, %%mm2           \n\t" // block[i] < 0 ? -1 : 0
-                "pcmpgtw %%mm4, %%mm3           \n\t" // block[i] < 0 ? -1 : 0
+                "psignw     %%xmm2, %%xmm4     \n\t" // sgn(block[i])*qadd
+                "psignw     %%xmm3, %%xmm5     \n\t" // sgn(block[i])*qadd
 
-                "pxor %%mm2, %%mm0              \n\t"
-                "pxor %%mm3, %%mm1              \n\t"
+                "pmullw     %%xmm0, %%xmm2     \n\t"
+                "pmullw     %%xmm0, %%xmm3     \n\t"
 
-                "paddw %%mm7, %%mm0             \n\t"
-                "paddw %%mm7, %%mm1             \n\t"
+                "paddw      %%xmm4, %%xmm2     \n\t"
+                "paddw      %%xmm5, %%xmm3     \n\t"
 
-                "pxor %%mm0, %%mm2              \n\t"
-                "pxor %%mm1, %%mm3              \n\t"
+                "movdqa     %%xmm2, (%1, %0)   \n\t"
+                "movdqa     %%xmm3, 16(%1, %0) \n\t"
 
-                "pcmpeqw %%mm7, %%mm0           \n\t" // block[i] == 0 ? -1 : 0
-                "pcmpeqw %%mm7, %%mm1           \n\t" // block[i] == 0 ? -1 : 0
-
-                "pandn %%mm2, %%mm0             \n\t"
-                "pandn %%mm3, %%mm1             \n\t"
-
-                "movq        %%mm0, (%1, %0)    \n\t"
-                "movq        %%mm1, 8(%1, %0)   \n\t"
-
-                "add           $16, %0          \n\t"
-                "jng 1b                         \n\t"
+                "add           $32, %0         \n\t"
+                "jng            1b             \n\t"
                 : "+r"(offset), "+r"(qmul)
                 : "r" (block), "rm" (qadd)
-                : "memory"
+                : XMM_CLOBBERS("%xmm0", "%xmm1", "%xmm2", "%xmm3", "%xmm4", "%xmm5",) "memory"
         );
         block[0]= level;
 }
 
 
-static void dct_unquantize_h263_inter_mmx(const MPVContext *s,
-                                          int16_t *block, int n, int qscale)
+static void dct_unquantize_h263_inter_ssse3(const MPVContext *s,
+                                            int16_t *block, int n, int qscale)
 {
     int qmul = qscale << 1;
     int qadd = (qscale - 1) | 1;
@@ -115,56 +104,41 @@ static void dct_unquantize_h263_inter_mmx(const MPVContext *s,
     x86_reg offset = s->inter_scantable.raster_end[s->block_last_index[n]] << 1;
 
 __asm__ volatile(
-                "movd           %2, %%mm6       \n\t" //qmul
-                "packssdw %%mm6, %%mm6          \n\t"
-                "packssdw %%mm6, %%mm6          \n\t"
-                "movd           %3, %%mm5       \n\t" //qadd
-                "add            %1, %0          \n\t"
-                "neg            %1              \n\t"
-                "pxor %%mm7, %%mm7              \n\t"
-                "packssdw %%mm5, %%mm5          \n\t"
-                "packssdw %%mm5, %%mm5          \n\t"
-                "psubw %%mm5, %%mm7             \n\t"
-                "pxor %%mm4, %%mm4              \n\t"
-                ".p2align 4                     \n\t"
-                "1:                             \n\t"
-                "movq     (%0, %1), %%mm0       \n\t"
-                "movq    8(%0, %1), %%mm1       \n\t"
+                "movd           %2, %%xmm0     \n\t" //qmul
+                "movd           %3, %%xmm1     \n\t" //qadd
+                "add            %1, %0         \n\t"
+                "neg            %1             \n\t"
+                SPLATW(xmm0)
+                SPLATW(xmm1)
 
-                "pmullw %%mm6, %%mm0            \n\t"
-                "pmullw %%mm6, %%mm1            \n\t"
+                ".p2align 4                    \n\t"
+                "1:                            \n\t"
+                "movdqa   (%0, %1), %%xmm2     \n\t"
+                "movdqa 16(%0, %1), %%xmm3     \n\t"
 
-                "movq     (%0, %1), %%mm2       \n\t"
-                "movq    8(%0, %1), %%mm3       \n\t"
+                "movdqa     %%xmm1, %%xmm4     \n\t"
+                "movdqa     %%xmm1, %%xmm5     \n\t"
 
-                "pcmpgtw %%mm4, %%mm2           \n\t" // block[i] < 0 ? -1 : 0
-                "pcmpgtw %%mm4, %%mm3           \n\t" // block[i] < 0 ? -1 : 0
+                "psignw     %%xmm2, %%xmm4     \n\t" // sgn(block[i])*qadd
+                "psignw     %%xmm3, %%xmm5     \n\t" // sgn(block[i])*qadd
 
-                "pxor %%mm2, %%mm0              \n\t"
-                "pxor %%mm3, %%mm1              \n\t"
+                "pmullw     %%xmm0, %%xmm2     \n\t"
+                "pmullw     %%xmm0, %%xmm3     \n\t"
 
-                "paddw %%mm7, %%mm0             \n\t"
-                "paddw %%mm7, %%mm1             \n\t"
+                "paddw      %%xmm4, %%xmm2     \n\t"
+                "paddw      %%xmm5, %%xmm3     \n\t"
 
-                "pxor %%mm0, %%mm2              \n\t"
-                "pxor %%mm1, %%mm3              \n\t"
+                "movdqa     %%xmm2, (%0, %1)   \n\t"
+                "movdqa     %%xmm3, 16(%0, %1) \n\t"
 
-                "pcmpeqw %%mm7, %%mm0           \n\t" // block[i] == 0 ? -1 : 0
-                "pcmpeqw %%mm7, %%mm1           \n\t" // block[i] == 0 ? -1 : 0
-
-                "pandn %%mm2, %%mm0             \n\t"
-                "pandn %%mm3, %%mm1             \n\t"
-
-                "movq        %%mm0, (%0, %1)    \n\t"
-                "movq        %%mm1, 8(%0, %1)   \n\t"
-
-                "add           $16, %1          \n\t"
-                "jng 1b                         \n\t"
+                "add           $32, %1         \n\t"
+                "jng 1b                        \n\t"
                 : "+r" (block), "+r" (offset)
                 : "rm"(qmul), "rm" (qadd)
-                : "memory"
+                : XMM_CLOBBERS("%xmm0", "%xmm1", "%xmm2", "%xmm3", "%xmm4", "%xmm5",) "memory"
         );
 }
+#endif
 
 static void dct_unquantize_mpeg1_intra_mmx(const MPVContext *s,
                                            int16_t *block, int n, int qscale)
@@ -443,13 +417,17 @@ av_cold void ff_mpv_unquantize_init_x86(MPVUnquantDSPContext *s, int bitexact)
     int cpu_flags = av_get_cpu_flags();
 
     if (INLINE_MMX(cpu_flags)) {
-        s->dct_unquantize_h263_intra = dct_unquantize_h263_intra_mmx;
-        s->dct_unquantize_h263_inter = dct_unquantize_h263_inter_mmx;
         s->dct_unquantize_mpeg1_intra = dct_unquantize_mpeg1_intra_mmx;
         s->dct_unquantize_mpeg1_inter = dct_unquantize_mpeg1_inter_mmx;
         if (!bitexact)
             s->dct_unquantize_mpeg2_intra = dct_unquantize_mpeg2_intra_mmx;
         s->dct_unquantize_mpeg2_inter = dct_unquantize_mpeg2_inter_mmx;
     }
+#if HAVE_SSSE3_INLINE
+    if (INLINE_SSSE3(cpu_flags)) {
+        s->dct_unquantize_h263_intra  = dct_unquantize_h263_intra_ssse3;
+        s->dct_unquantize_h263_inter  = dct_unquantize_h263_inter_ssse3;
+    }
+#endif /* HAVE_SSSE3_INLINE */
 #endif /* HAVE_MMX_INLINE */
 }
-- 
2.52.0


From e9fe3b5cc0b64d6589d44deb4ea1cd2eb2fdb2e0 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Mon, 3 Nov 2025 19:45:49 +0100
Subject: [PATCH 179/304] avcodec/x86/mpegvideo: Port MPEG-1 unquantize
 functions to SSSE3

Benefits from wider registers and pabsw, psignw.

Benchmarks:
dct_unquantize_mpeg1_inter_c:                          343.0 ( 1.00x)
dct_unquantize_mpeg1_inter_mmx:                         50.6 ( 6.78x)
dct_unquantize_mpeg1_inter_ssse3:                       17.2 (19.94x)
dct_unquantize_mpeg1_intra_c:                          352.1 ( 1.00x)
dct_unquantize_mpeg1_intra_mmx:                         48.8 ( 7.22x)
dct_unquantize_mpeg1_intra_ssse3:                       19.5 (18.03x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/mpegvideo.h     |  10 ++-
 libavcodec/x86/mpegvideo.c | 171 ++++++++++++++++---------------------
 2 files changed, 78 insertions(+), 103 deletions(-)

diff --git a/libavcodec/mpegvideo.h b/libavcodec/mpegvideo.h
index 758bf57ab9..6aff5fbcd0 100644
--- a/libavcodec/mpegvideo.h
+++ b/libavcodec/mpegvideo.h
@@ -38,6 +38,8 @@
 #include "qpeldsp.h"
 #include "videodsp.h"
 
+#include "libavutil/mem_internal.h"
+
 #define MAX_THREADS 32
 
 /**
@@ -202,10 +204,10 @@ typedef struct MpegEncContext {
     int *mb_index2xy;        ///< mb_index -> mb_x + mb_y*mb_stride
 
     /** matrix transmitted in the bitstream */
-    uint16_t intra_matrix[64];
-    uint16_t chroma_intra_matrix[64];
-    uint16_t inter_matrix[64];
-    uint16_t chroma_inter_matrix[64];
+    DECLARE_ALIGNED(16, uint16_t, intra_matrix)[64];
+    DECLARE_ALIGNED(16, uint16_t, chroma_intra_matrix)[64];
+    DECLARE_ALIGNED(16, uint16_t, inter_matrix)[64];
+    DECLARE_ALIGNED(16, uint16_t, chroma_inter_matrix)[64];
 
     /* error concealment / resync */
     int resync_mb_x;                 ///< x position of last resync marker
diff --git a/libavcodec/x86/mpegvideo.c b/libavcodec/x86/mpegvideo.c
index 82a29d1bcf..01048df47d 100644
--- a/libavcodec/x86/mpegvideo.c
+++ b/libavcodec/x86/mpegvideo.c
@@ -138,10 +138,9 @@ __asm__ volatile(
                 : XMM_CLOBBERS("%xmm0", "%xmm1", "%xmm2", "%xmm3", "%xmm4", "%xmm5",) "memory"
         );
 }
-#endif
 
-static void dct_unquantize_mpeg1_intra_mmx(const MPVContext *s,
-                                           int16_t *block, int n, int qscale)
+static void dct_unquantize_mpeg1_intra_ssse3(const MPVContext *s,
+                                             int16_t *block, int n, int qscale)
 {
     x86_reg nCoeffs;
     const uint16_t *quant_matrix;
@@ -159,59 +158,45 @@ static void dct_unquantize_mpeg1_intra_mmx(const MPVContext *s,
     quant_matrix = s->intra_matrix;
     x86_reg offset = -2 * nCoeffs;
 __asm__ volatile(
-                "pcmpeqw %%mm7, %%mm7           \n\t"
-                "psrlw $15, %%mm7               \n\t"
-                "movd %3, %%mm6                 \n\t"
-                "packssdw %%mm6, %%mm6          \n\t"
-                "packssdw %%mm6, %%mm6          \n\t"
-                ".p2align 4                     \n\t"
-                "1:                             \n\t"
-                "movq (%1, %0), %%mm0           \n\t"
-                "movq 8(%1, %0), %%mm1          \n\t"
-                "movq (%2, %0), %%mm4           \n\t"
-                "movq 8(%2, %0), %%mm5          \n\t"
-                "pmullw %%mm6, %%mm4            \n\t" // q=qscale*quant_matrix[i]
-                "pmullw %%mm6, %%mm5            \n\t" // q=qscale*quant_matrix[i]
-                "pxor %%mm2, %%mm2              \n\t"
-                "pxor %%mm3, %%mm3              \n\t"
-                "pcmpgtw %%mm0, %%mm2           \n\t" // block[i] < 0 ? -1 : 0
-                "pcmpgtw %%mm1, %%mm3           \n\t" // block[i] < 0 ? -1 : 0
-                "pxor %%mm2, %%mm0              \n\t"
-                "pxor %%mm3, %%mm1              \n\t"
-                "psubw %%mm2, %%mm0             \n\t" // abs(block[i])
-                "psubw %%mm3, %%mm1             \n\t" // abs(block[i])
-                "pmullw %%mm4, %%mm0            \n\t" // abs(block[i])*q
-                "pmullw %%mm5, %%mm1            \n\t" // abs(block[i])*q
-                "pxor %%mm4, %%mm4              \n\t"
-                "pxor %%mm5, %%mm5              \n\t" // FIXME slow
-                "pcmpeqw (%1, %0), %%mm4        \n\t" // block[i] == 0 ? -1 : 0
-                "pcmpeqw 8(%1, %0), %%mm5       \n\t" // block[i] == 0 ? -1 : 0
-                "psraw $3, %%mm0                \n\t"
-                "psraw $3, %%mm1                \n\t"
-                "psubw %%mm7, %%mm0             \n\t"
-                "psubw %%mm7, %%mm1             \n\t"
-                "por %%mm7, %%mm0               \n\t"
-                "por %%mm7, %%mm1               \n\t"
-                "pxor %%mm2, %%mm0              \n\t"
-                "pxor %%mm3, %%mm1              \n\t"
-                "psubw %%mm2, %%mm0             \n\t"
-                "psubw %%mm3, %%mm1             \n\t"
-                "pandn %%mm0, %%mm4             \n\t"
-                "pandn %%mm1, %%mm5             \n\t"
-                "movq %%mm4, (%1, %0)           \n\t"
-                "movq %%mm5, 8(%1, %0)          \n\t"
+                "movd           %3, %%xmm6     \n\t"
+                "pcmpeqw    %%xmm7, %%xmm7     \n\t"
+                "psrlw         $15, %%xmm7     \n\t"
+                SPLATW(xmm6)
+                ".p2align 4                    \n\t"
+                "1:                            \n\t"
+                "movdqa   (%2, %0), %%xmm4     \n\t"
+                "movdqa 16(%2, %0), %%xmm5     \n\t"
+                "movdqa   (%1, %0), %%xmm0     \n\t"
+                "movdqa 16(%1, %0), %%xmm1     \n\t"
+                "pmullw     %%xmm6, %%xmm4     \n\t" // q=qscale*quant_matrix[i]
+                "pmullw     %%xmm6, %%xmm5     \n\t" // q=qscale*quant_matrix[i]
+                "pabsw      %%xmm0, %%xmm2     \n\t" // abs(block[i])
+                "pabsw      %%xmm1, %%xmm3     \n\t" // abs(block[i])
+                "pmullw     %%xmm4, %%xmm2     \n\t" // abs(block[i])*q
+                "pmullw     %%xmm5, %%xmm3     \n\t" // abs(block[i])*q
+                "psraw          $3, %%xmm2     \n\t"
+                "psraw          $3, %%xmm3     \n\t"
+                "psubw      %%xmm7, %%xmm2     \n\t"
+                "psubw      %%xmm7, %%xmm3     \n\t"
+                "por        %%xmm7, %%xmm2     \n\t"
+                "por        %%xmm7, %%xmm3     \n\t"
+                "psignw     %%xmm0, %%xmm2     \n\t"
+                "psignw     %%xmm1, %%xmm3     \n\t"
+                "movdqa     %%xmm2, (%1, %0)   \n\t"
+                "movdqa     %%xmm3, 16(%1, %0) \n\t"
 
-                "add $16, %0                    \n\t"
-                "js 1b                          \n\t"
+                "add           $32, %0         \n\t"
+                "js 1b                         \n\t"
                 : "+r" (offset)
                 : "r" (block+nCoeffs), "r"(quant_matrix+nCoeffs), "rm" (qscale)
-                : "memory"
+                : XMM_CLOBBERS("%xmm0", "%xmm1", "%xmm2", "%xmm3", "%xmm4", "%xmm5", "%xmm6", "%xmm7",)
+                  "memory"
         );
     block[0]= block0;
 }
 
-static void dct_unquantize_mpeg1_inter_mmx(const MPVContext *s,
-                                           int16_t *block, int n, int qscale)
+static void dct_unquantize_mpeg1_inter_ssse3(const MPVContext *s,
+                                             int16_t *block, int n, int qscale)
 {
     x86_reg nCoeffs;
     const uint16_t *quant_matrix;
@@ -223,60 +208,48 @@ static void dct_unquantize_mpeg1_inter_mmx(const MPVContext *s,
         quant_matrix = s->inter_matrix;
     x86_reg offset = -2 * nCoeffs;
 __asm__ volatile(
-                "pcmpeqw %%mm7, %%mm7           \n\t"
-                "psrlw $15, %%mm7               \n\t"
-                "movd %3, %%mm6                 \n\t"
-                "packssdw %%mm6, %%mm6          \n\t"
-                "packssdw %%mm6, %%mm6          \n\t"
-                ".p2align 4                     \n\t"
-                "1:                             \n\t"
-                "movq (%1, %0), %%mm0           \n\t"
-                "movq 8(%1, %0), %%mm1          \n\t"
-                "movq (%2, %0), %%mm4           \n\t"
-                "movq 8(%2, %0), %%mm5          \n\t"
-                "pmullw %%mm6, %%mm4            \n\t" // q=qscale*quant_matrix[i]
-                "pmullw %%mm6, %%mm5            \n\t" // q=qscale*quant_matrix[i]
-                "pxor %%mm2, %%mm2              \n\t"
-                "pxor %%mm3, %%mm3              \n\t"
-                "pcmpgtw %%mm0, %%mm2           \n\t" // block[i] < 0 ? -1 : 0
-                "pcmpgtw %%mm1, %%mm3           \n\t" // block[i] < 0 ? -1 : 0
-                "pxor %%mm2, %%mm0              \n\t"
-                "pxor %%mm3, %%mm1              \n\t"
-                "psubw %%mm2, %%mm0             \n\t" // abs(block[i])
-                "psubw %%mm3, %%mm1             \n\t" // abs(block[i])
-                "paddw %%mm0, %%mm0             \n\t" // abs(block[i])*2
-                "paddw %%mm1, %%mm1             \n\t" // abs(block[i])*2
-                "paddw %%mm7, %%mm0             \n\t" // abs(block[i])*2 + 1
-                "paddw %%mm7, %%mm1             \n\t" // abs(block[i])*2 + 1
-                "pmullw %%mm4, %%mm0            \n\t" // (abs(block[i])*2 + 1)*q
-                "pmullw %%mm5, %%mm1            \n\t" // (abs(block[i])*2 + 1)*q
-                "pxor %%mm4, %%mm4              \n\t"
-                "pxor %%mm5, %%mm5              \n\t" // FIXME slow
-                "pcmpeqw (%1, %0), %%mm4        \n\t" // block[i] == 0 ? -1 : 0
-                "pcmpeqw 8(%1, %0), %%mm5       \n\t" // block[i] == 0 ? -1 : 0
-                "psraw $4, %%mm0                \n\t"
-                "psraw $4, %%mm1                \n\t"
-                "psubw %%mm7, %%mm0             \n\t"
-                "psubw %%mm7, %%mm1             \n\t"
-                "por %%mm7, %%mm0               \n\t"
-                "por %%mm7, %%mm1               \n\t"
-                "pxor %%mm2, %%mm0              \n\t"
-                "pxor %%mm3, %%mm1              \n\t"
-                "psubw %%mm2, %%mm0             \n\t"
-                "psubw %%mm3, %%mm1             \n\t"
-                "pandn %%mm0, %%mm4             \n\t"
-                "pandn %%mm1, %%mm5             \n\t"
-                "movq %%mm4, (%1, %0)           \n\t"
-                "movq %%mm5, 8(%1, %0)          \n\t"
+                "movd           %3, %%xmm6     \n\t"
+                "pcmpeqw    %%xmm7, %%xmm7     \n\t"
+                "psrlw         $15, %%xmm7     \n\t"
+                SPLATW(xmm6)
+                ".p2align 4                    \n\t"
+                "1:                            \n\t"
+                "movdqa   (%2, %0), %%xmm4     \n\t"
+                "movdqa 16(%2, %0), %%xmm5     \n\t"
+                "movdqa   (%1, %0), %%xmm0     \n\t"
+                "movdqa 16(%1, %0), %%xmm1     \n\t"
+                "pmullw     %%xmm6, %%xmm4     \n\t" // q=qscale*quant_matrix[i]
+                "pmullw     %%xmm6, %%xmm5     \n\t" // q=qscale*quant_matrix[i]
+                "pabsw      %%xmm0, %%xmm2     \n\t" // abs(block[i])
+                "pabsw      %%xmm1, %%xmm3     \n\t" // abs(block[i])
+                "paddw      %%xmm2, %%xmm2     \n\t" // abs(block[i])*2
+                "paddw      %%xmm3, %%xmm3     \n\t" // abs(block[i])*2
+                "paddw      %%xmm7, %%xmm2     \n\t" // abs(block[i])*2 + 1
+                "paddw      %%xmm7, %%xmm3     \n\t" // abs(block[i])*2 + 1
+                "pmullw     %%xmm4, %%xmm2     \n\t" // (abs(block[i])*2 + 1)*q
+                "pmullw     %%xmm5, %%xmm3     \n\t" // (abs(block[i])*2 + 1)*q
+                "psraw          $4, %%xmm2     \n\t"
+                "psraw          $4, %%xmm3     \n\t"
+                "psubw      %%xmm7, %%xmm2     \n\t"
+                "psubw      %%xmm7, %%xmm3     \n\t"
+                "por        %%xmm7, %%xmm2     \n\t"
+                "por        %%xmm7, %%xmm3     \n\t"
+                "psignw     %%xmm0, %%xmm2     \n\t"
+                "psignw     %%xmm1, %%xmm3     \n\t"
+                "movdqa     %%xmm2, (%1, %0)   \n\t"
+                "movdqa     %%xmm3, 16(%1, %0) \n\t"
 
-                "add $16, %0                    \n\t"
-                "js 1b                          \n\t"
+                "add           $32, %0         \n\t"
+                "js 1b                         \n\t"
                 : "+r" (offset)
                 : "r" (block+nCoeffs), "r"(quant_matrix+nCoeffs), "rm" (qscale)
-                : "memory"
+                : XMM_CLOBBERS("%xmm0", "%xmm1", "%xmm2", "%xmm3", "%xmm4", "%xmm5", "%xmm6", "%xmm7",)
+                  "memory"
         );
 }
 
+#endif /* HAVE_SSSE3_INLINE */
+
 static void dct_unquantize_mpeg2_intra_mmx(const MPVContext *s,
                                            int16_t *block, int n, int qscale)
 {
@@ -417,8 +390,6 @@ av_cold void ff_mpv_unquantize_init_x86(MPVUnquantDSPContext *s, int bitexact)
     int cpu_flags = av_get_cpu_flags();
 
     if (INLINE_MMX(cpu_flags)) {
-        s->dct_unquantize_mpeg1_intra = dct_unquantize_mpeg1_intra_mmx;
-        s->dct_unquantize_mpeg1_inter = dct_unquantize_mpeg1_inter_mmx;
         if (!bitexact)
             s->dct_unquantize_mpeg2_intra = dct_unquantize_mpeg2_intra_mmx;
         s->dct_unquantize_mpeg2_inter = dct_unquantize_mpeg2_inter_mmx;
@@ -427,6 +398,8 @@ av_cold void ff_mpv_unquantize_init_x86(MPVUnquantDSPContext *s, int bitexact)
     if (INLINE_SSSE3(cpu_flags)) {
         s->dct_unquantize_h263_intra  = dct_unquantize_h263_intra_ssse3;
         s->dct_unquantize_h263_inter  = dct_unquantize_h263_inter_ssse3;
+        s->dct_unquantize_mpeg1_intra = dct_unquantize_mpeg1_intra_ssse3;
+        s->dct_unquantize_mpeg1_inter = dct_unquantize_mpeg1_inter_ssse3;
     }
 #endif /* HAVE_SSSE3_INLINE */
 #endif /* HAVE_MMX_INLINE */
-- 
2.52.0


From 75930d907abd03757f8b69102db4f29fd59aa20f Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 4 Nov 2025 06:45:12 +0100
Subject: [PATCH 180/304] avcodec/x86/mpegvideo: Port
 dct_unquantize_mpeg2_inter_mmx to SSSE3

Benefits from wider registers, pabsw and psignw.

Benchmarks:
dct_unquantize_mpeg2_inter_c:                          131.2 ( 1.00x)
dct_unquantize_mpeg2_inter_mmx:                         50.2 ( 2.62x)
dct_unquantize_mpeg2_inter_ssse3:                       20.5 ( 6.38x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/mpegvideo.c | 109 +++++++++++++++++--------------------
 1 file changed, 49 insertions(+), 60 deletions(-)

diff --git a/libavcodec/x86/mpegvideo.c b/libavcodec/x86/mpegvideo.c
index 01048df47d..576f8f320f 100644
--- a/libavcodec/x86/mpegvideo.c
+++ b/libavcodec/x86/mpegvideo.c
@@ -305,8 +305,10 @@ __asm__ volatile(
         //Note, we do not do mismatch control for intra as errors cannot accumulate
 }
 
-static void dct_unquantize_mpeg2_inter_mmx(const MPVContext *s,
-                                           int16_t *block, int n, int qscale)
+#if HAVE_SSSE3_INLINE
+
+static void dct_unquantize_mpeg2_inter_ssse3(const MPVContext *s,
+                                             int16_t *block, int n, int qscale)
 {
     av_assert2(s->block_last_index[n]>=0);
 
@@ -316,72 +318,59 @@ static void dct_unquantize_mpeg2_inter_mmx(const MPVContext *s,
 
 
 __asm__ volatile(
-                "movd          %k1, %%mm6      \n\t"
+                "movd          %k1, %%xmm6     \n\t"
                 "lea      (%2, %0), %1         \n\t"
                 "neg            %0             \n\t"
-                "pcmpeqw %%mm7, %%mm7           \n\t"
-                "psrlq $48, %%mm7               \n\t"
-                "packssdw %%mm6, %%mm6          \n\t"
-                "packssdw %%mm6, %%mm6          \n\t"
-                ".p2align 4                     \n\t"
-                "1:                             \n\t"
-                "movq     (%1, %0), %%mm0      \n\t"
-                "movq    8(%1, %0), %%mm1      \n\t"
-                "movq     (%3, %0), %%mm4      \n\t"
-                "movq    8(%3, %0), %%mm5      \n\t"
-                "pmullw %%mm6, %%mm4            \n\t" // q=qscale*quant_matrix[i]
-                "pmullw %%mm6, %%mm5            \n\t" // q=qscale*quant_matrix[i]
-                "pxor %%mm2, %%mm2              \n\t"
-                "pxor %%mm3, %%mm3              \n\t"
-                "pcmpgtw %%mm0, %%mm2           \n\t" // block[i] < 0 ? -1 : 0
-                "pcmpgtw %%mm1, %%mm3           \n\t" // block[i] < 0 ? -1 : 0
-                "pxor %%mm2, %%mm0              \n\t"
-                "pxor %%mm3, %%mm1              \n\t"
-                "psubw %%mm2, %%mm0             \n\t" // abs(block[i])
-                "psubw %%mm3, %%mm1             \n\t" // abs(block[i])
-                "paddw %%mm0, %%mm0             \n\t" // abs(block[i])*2
-                "paddw %%mm1, %%mm1             \n\t" // abs(block[i])*2
-                "pmullw %%mm4, %%mm0            \n\t" // abs(block[i])*2*q
-                "pmullw %%mm5, %%mm1            \n\t" // abs(block[i])*2*q
-                "paddw %%mm4, %%mm0             \n\t" // (abs(block[i])*2 + 1)*q
-                "paddw %%mm5, %%mm1             \n\t" // (abs(block[i])*2 + 1)*q
-                "pxor %%mm4, %%mm4              \n\t"
-                "pxor %%mm5, %%mm5              \n\t" // FIXME slow
-                "pcmpeqw  (%1, %0), %%mm4      \n\t" // block[i] == 0 ? -1 : 0
-                "pcmpeqw 8(%1, %0), %%mm5      \n\t" // block[i] == 0 ? -1 : 0
-                "psrlw $5, %%mm0                \n\t"
-                "psrlw $5, %%mm1                \n\t"
-                "pxor %%mm2, %%mm0              \n\t"
-                "pxor %%mm3, %%mm1              \n\t"
-                "psubw %%mm2, %%mm0             \n\t"
-                "psubw %%mm3, %%mm1             \n\t"
-                "pandn %%mm0, %%mm4             \n\t"
-                "pandn %%mm1, %%mm5             \n\t"
-                "pxor %%mm4, %%mm7              \n\t"
-                "pxor %%mm5, %%mm7              \n\t"
-                "movq        %%mm4, (%1, %0)   \n\t"
-                "movq        %%mm5, 8(%1, %0)  \n\t"
+                SPLATW(xmm6)
+                "pcmpeqw    %%xmm7, %%xmm7     \n\t"
+                "psrldq        $14, %%xmm7     \n\t"
+                ".p2align 4                    \n\t"
+                "1:                            \n\t"
+                "movdqa   (%3, %0), %%xmm4     \n\t"
+                "movdqa 16(%3, %0), %%xmm5     \n\t"
+                "movdqa   (%1, %0), %%xmm0     \n\t"
+                "movdqa 16(%1, %0), %%xmm1     \n\t"
+                "pmullw     %%xmm6, %%xmm4     \n\t" // q=qscale*quant_matrix[i]
+                "pmullw     %%xmm6, %%xmm5     \n\t" // q=qscale*quant_matrix[i]
+                "pabsw      %%xmm0, %%xmm2     \n\t" // abs(block[i])
+                "pabsw      %%xmm1, %%xmm3     \n\t" // abs(block[i])
+                "paddw      %%xmm2, %%xmm2     \n\t" // abs(block[i])*2
+                "paddw      %%xmm3, %%xmm3     \n\t" // abs(block[i])*2
+                "pmullw     %%xmm4, %%xmm2     \n\t" // abs(block[i])*2*q
+                "pmullw     %%xmm5, %%xmm3     \n\t" // abs(block[i])*2*q
+                "paddw      %%xmm4, %%xmm2     \n\t" // (abs(block[i])*2 + 1)*q
+                "paddw      %%xmm5, %%xmm3     \n\t" // (abs(block[i])*2 + 1)*q
+                "psrlw          $5, %%xmm2     \n\t"
+                "psrlw          $5, %%xmm3     \n\t"
+                "psignw     %%xmm0, %%xmm2     \n\t"
+                "psignw     %%xmm1, %%xmm3     \n\t"
+                "movdqa     %%xmm2, (%1, %0)   \n\t"
+                "movdqa     %%xmm3, 16(%1, %0) \n\t"
+                "pxor       %%xmm2, %%xmm7     \n\t"
+                "pxor       %%xmm3, %%xmm7     \n\t"
 
-                "add           $16, %0          \n\t"
-                "jng 1b                         \n\t"
-                "movd      124(%2), %%mm0      \n\t"
-                "movq %%mm7, %%mm6              \n\t"
-                "psrlq $32, %%mm7               \n\t"
-                "pxor %%mm6, %%mm7              \n\t"
-                "movq %%mm7, %%mm6              \n\t"
-                "psrlq $16, %%mm7               \n\t"
-                "pxor %%mm6, %%mm7              \n\t"
-                "pslld $31, %%mm7               \n\t"
-                "psrlq $15, %%mm7               \n\t"
-                "pxor %%mm7, %%mm0              \n\t"
-                "movd        %%mm0, 124(%2)    \n\t"
+                "add           $32, %0         \n\t"
+                "jng 1b                        \n\t"
+                "movd      124(%2), %%xmm0     \n\t"
+                "movhlps    %%xmm7, %%xmm6     \n\t"
+                "pxor       %%xmm6, %%xmm7     \n\t"
+                "pshufd $1, %%xmm7, %%xmm6     \n\t"
+                "pxor       %%xmm6, %%xmm7     \n\t"
+                "pshuflw $1, %%xmm7, %%xmm6    \n\t"
+                "pxor       %%xmm6, %%xmm7     \n\t"
+                "pslld         $31, %%xmm7     \n\t"
+                "psrld         $15, %%xmm7     \n\t"
+                "pxor       %%xmm7, %%xmm0     \n\t"
+                "movd       %%xmm0, 124(%2)    \n\t"
 
                 : "+r"(offset), "+r" (qscale2)
                 : "r" (block), "r"(quant_matrix)
-                : "memory"
+                : XMM_CLOBBERS("%xmm0", "%xmm1", "%xmm2", "%xmm3", "%xmm4", "%xmm5", "%xmm6", "%xmm7",)
+                  "memory"
         );
 }
 
+#endif /* HAVE_SSSE3_INLINE */
 #endif /* HAVE_MMX_INLINE */
 
 av_cold void ff_mpv_unquantize_init_x86(MPVUnquantDSPContext *s, int bitexact)
@@ -392,7 +381,6 @@ av_cold void ff_mpv_unquantize_init_x86(MPVUnquantDSPContext *s, int bitexact)
     if (INLINE_MMX(cpu_flags)) {
         if (!bitexact)
             s->dct_unquantize_mpeg2_intra = dct_unquantize_mpeg2_intra_mmx;
-        s->dct_unquantize_mpeg2_inter = dct_unquantize_mpeg2_inter_mmx;
     }
 #if HAVE_SSSE3_INLINE
     if (INLINE_SSSE3(cpu_flags)) {
@@ -400,6 +388,7 @@ av_cold void ff_mpv_unquantize_init_x86(MPVUnquantDSPContext *s, int bitexact)
         s->dct_unquantize_h263_inter  = dct_unquantize_h263_inter_ssse3;
         s->dct_unquantize_mpeg1_intra = dct_unquantize_mpeg1_intra_ssse3;
         s->dct_unquantize_mpeg1_inter = dct_unquantize_mpeg1_inter_ssse3;
+        s->dct_unquantize_mpeg2_inter = dct_unquantize_mpeg2_inter_ssse3;
     }
 #endif /* HAVE_SSSE3_INLINE */
 #endif /* HAVE_MMX_INLINE */
-- 
2.52.0


From 0b52c830ef6910e58c2b316bbe66a2732f3b5fb1 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 4 Nov 2025 06:48:19 +0100
Subject: [PATCH 181/304] avcodec/mpegvideo: Port
 dct_unquantize_mpeg2_intra_mmx to SSE2

Benefits from wider registers.

Benchmarks:
dct_unquantize_mpeg2_intra_c:                          228.2 ( 1.00x)
dct_unquantize_mpeg2_intra_mmx:                         28.2 ( 8.10x)
dct_unquantize_mpeg2_intra_sse2:                        18.4 (12.37x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/mpegvideo.c            | 68 +++++++++++++--------------
 tests/checkasm/mpegvideo_unquantize.c |  2 +-
 2 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/libavcodec/x86/mpegvideo.c b/libavcodec/x86/mpegvideo.c
index 576f8f320f..7c137cf75e 100644
--- a/libavcodec/x86/mpegvideo.c
+++ b/libavcodec/x86/mpegvideo.c
@@ -28,7 +28,7 @@
 #include "libavcodec/mpegvideodata.h"
 #include "libavcodec/mpegvideo_unquantize.h"
 
-#if HAVE_MMX_INLINE
+#if HAVE_SSE2_INLINE
 
 #define SPLATW(reg) "punpcklwd    %%" #reg ", %%" #reg "\n\t" \
                     "pshufd   $0, %%" #reg ", %%" #reg "\n\t"
@@ -250,8 +250,8 @@ __asm__ volatile(
 
 #endif /* HAVE_SSSE3_INLINE */
 
-static void dct_unquantize_mpeg2_intra_mmx(const MPVContext *s,
-                                           int16_t *block, int n, int qscale)
+static void dct_unquantize_mpeg2_intra_sse2(const MPVContext *s,
+                                            int16_t *block, int n, int qscale)
 {
     x86_reg nCoeffs;
     const uint16_t *quant_matrix;
@@ -271,35 +271,35 @@ static void dct_unquantize_mpeg2_intra_mmx(const MPVContext *s,
     quant_matrix = s->intra_matrix;
     x86_reg offset = -2 * nCoeffs;
 __asm__ volatile(
-                "movd %3, %%mm6                 \n\t"
-                "packssdw %%mm6, %%mm6          \n\t"
-                "packssdw %%mm6, %%mm6          \n\t"
-                ".p2align 4                     \n\t"
-                "1:                             \n\t"
-                "movq (%1, %0), %%mm0           \n\t"
-                "movq 8(%1, %0), %%mm1          \n\t"
-                "movq (%2, %0), %%mm4           \n\t"
-                "movq 8(%2, %0), %%mm5          \n\t"
-                "pmullw %%mm6, %%mm4            \n\t" // q=qscale*quant_matrix[i]
-                "pmullw %%mm6, %%mm5            \n\t" // q=qscale*quant_matrix[i]
-                "movq %%mm0, %%mm2              \n\t"
-                "movq %%mm1, %%mm3              \n\t"
-                "psrlw $12, %%mm2               \n\t" // block[i] < 0 ? 0xf : 0
-                "psrlw $12, %%mm3               \n\t" // (block[i] is in the -2048..2047 range)
-                "pmullw %%mm4, %%mm0            \n\t" // block[i]*q
-                "pmullw %%mm5, %%mm1            \n\t" // block[i]*q
-                "paddw %%mm2, %%mm0             \n\t" // bias negative block[i]
-                "paddw %%mm3, %%mm1             \n\t" // so that a right-shift
-                "psraw $4, %%mm0                \n\t" // is equivalent to divide
-                "psraw $4, %%mm1                \n\t" // with rounding towards zero
-                "movq %%mm0, (%1, %0)           \n\t"
-                "movq %%mm1, 8(%1, %0)          \n\t"
+                "movd           %3, %%xmm6     \n\t"
+                SPLATW(xmm6)
+                ".p2align 4                    \n\t"
+                "1:                            \n\t"
+                "movdqa   (%1, %0), %%xmm0     \n\t"
+                "movdqa 16(%1, %0), %%xmm1     \n\t"
+                "movdqa   (%2, %0), %%xmm4     \n\t"
+                "movdqa 16(%2, %0), %%xmm5     \n\t"
+                "pmullw     %%xmm6, %%xmm4     \n\t" // q=qscale*quant_matrix[i]
+                "pmullw     %%xmm6, %%xmm5     \n\t" // q=qscale*quant_matrix[i]
+                "movdqa     %%xmm0, %%xmm2     \n\t"
+                "movdqa     %%xmm1, %%xmm3     \n\t"
+                "psrlw         $12, %%xmm2     \n\t" // block[i] < 0 ? 0xf : 0
+                "psrlw         $12, %%xmm3     \n\t" // (block[i] is in the -2048..2047 range)
+                "pmullw     %%xmm4, %%xmm0     \n\t" // block[i]*q
+                "pmullw     %%xmm5, %%xmm1     \n\t" // block[i]*q
+                "paddw      %%xmm2, %%xmm0     \n\t" // bias negative block[i]
+                "paddw      %%xmm3, %%xmm1     \n\t" // so that a right-shift
+                "psraw          $4, %%xmm0     \n\t" // is equivalent to divide
+                "psraw          $4, %%xmm1     \n\t" // with rounding towards zero
+                "movdqa     %%xmm0, (%1, %0)   \n\t"
+                "movdqa     %%xmm1, 16(%1, %0) \n\t"
 
-                "add $16, %0                    \n\t"
-                "jng 1b                         \n\t"
+                "add           $32, %0         \n\t"
+                "jng 1b                        \n\t"
                 : "+r" (offset)
                 : "r" (block+nCoeffs), "r"(quant_matrix+nCoeffs), "rm" (qscale)
-                : "memory"
+                : XMM_CLOBBERS("%xmm0", "%xmm1", "%xmm2", "%xmm3", "%xmm4", "%xmm5", "%xmm6",)
+                  "memory"
         );
     block[0]= block0;
         //Note, we do not do mismatch control for intra as errors cannot accumulate
@@ -371,16 +371,16 @@ __asm__ volatile(
 }
 
 #endif /* HAVE_SSSE3_INLINE */
-#endif /* HAVE_MMX_INLINE */
+#endif /* HAVE_SSE2_INLINE */
 
 av_cold void ff_mpv_unquantize_init_x86(MPVUnquantDSPContext *s, int bitexact)
 {
-#if HAVE_MMX_INLINE
+#if HAVE_SSE2_INLINE
     int cpu_flags = av_get_cpu_flags();
 
-    if (INLINE_MMX(cpu_flags)) {
+    if (INLINE_SSE2(cpu_flags)) {
         if (!bitexact)
-            s->dct_unquantize_mpeg2_intra = dct_unquantize_mpeg2_intra_mmx;
+            s->dct_unquantize_mpeg2_intra = dct_unquantize_mpeg2_intra_sse2;
     }
 #if HAVE_SSSE3_INLINE
     if (INLINE_SSSE3(cpu_flags)) {
@@ -391,5 +391,5 @@ av_cold void ff_mpv_unquantize_init_x86(MPVUnquantDSPContext *s, int bitexact)
         s->dct_unquantize_mpeg2_inter = dct_unquantize_mpeg2_inter_ssse3;
     }
 #endif /* HAVE_SSSE3_INLINE */
-#endif /* HAVE_MMX_INLINE */
+#endif /* HAVE_SSE2_INLINE */
 }
diff --git a/tests/checkasm/mpegvideo_unquantize.c b/tests/checkasm/mpegvideo_unquantize.c
index 837606e60e..220a743a96 100644
--- a/tests/checkasm/mpegvideo_unquantize.c
+++ b/tests/checkasm/mpegvideo_unquantize.c
@@ -215,7 +215,7 @@ void checkasm_check_mpegvideo_unquantize(void)
     int q_scale_type = rnd() & 1;
 
     ff_mpv_unquantize_init(&unquant_dsp_ctx, 1 /* bitexact */, q_scale_type);
-    declare_func_emms(AV_CPU_FLAG_MMX, void, MPVContext *s, int16_t *block, int n, int qscale);
+    declare_func(void, MPVContext *s, int16_t *block, int n, int qscale);
 
     for (size_t i = 0; i < FF_ARRAY_ELEMS(tests); ++i) {
         void (*func)(MPVContext *s, int16_t *block, int n, int qscale) =
-- 
2.52.0


From e7099d061dd4a09577076b913f09e41c2c81db54 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Wed, 3 Dec 2025 00:41:00 +0100
Subject: [PATCH 182/304] libavutil/internal: Remove {SIZE,PTRDIFF}_SPECIFIER
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Possible since 222127418bbdee3247eb9c02bb8cf31991e32241.

Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/alsdec.c                 |  4 ++--
 libavcodec/av1dec.c                 |  2 +-
 libavcodec/cbs.c                    |  2 +-
 libavcodec/cbs_apv.c                |  2 +-
 libavcodec/cbs_av1.c                |  8 ++++----
 libavcodec/cbs_vp9.c                |  2 +-
 libavcodec/d3d12va_encode.c         |  2 +-
 libavcodec/d3d12va_encode_av1.c     |  2 +-
 libavcodec/decode.c                 |  6 +++---
 libavcodec/dnxhddec.c               |  2 +-
 libavcodec/dvdsubenc.c              |  2 +-
 libavcodec/g2meet.c                 |  2 +-
 libavcodec/h264_ps.c                |  2 +-
 libavcodec/h264_slice.c             |  4 ++--
 libavcodec/hapenc.c                 |  2 +-
 libavcodec/hq_hqa.c                 |  4 ++--
 libavcodec/jpeg2000dec.c            |  6 +++---
 libavcodec/libaomenc.c              | 10 +++++-----
 libavcodec/libilbc.c                |  3 +--
 libavcodec/libvpxenc.c              |  6 +++---
 libavcodec/mjpegdec.c               |  6 +++---
 libavcodec/mjpegenc_common.c        |  2 +-
 libavcodec/mpeg12dec.c              |  2 +-
 libavcodec/mpeg12enc.c              |  2 +-
 libavcodec/mpegpicture.c            |  2 +-
 libavcodec/mpegvideo_enc.c          |  2 +-
 libavcodec/mscc.c                   |  2 +-
 libavcodec/vulkan_av1.c             |  2 +-
 libavcodec/vulkan_encode_av1.c      |  2 +-
 libavcodec/vulkan_encode_h264.c     |  2 +-
 libavcodec/vulkan_encode_h265.c     |  2 +-
 libavcodec/vulkan_h264.c            |  2 +-
 libavcodec/vulkan_hevc.c            |  2 +-
 libavcodec/vulkan_vp9.c             |  2 +-
 libavcodec/zmbv.c                   |  8 ++++----
 libavfilter/af_ashowinfo.c          |  2 +-
 libavfilter/f_graphmonitor.c        |  2 +-
 libavfilter/vf_avgblur_opencl.c     |  2 +-
 libavfilter/vf_convolution_opencl.c |  4 ++--
 libavfilter/vf_lcevc.c              |  4 ++--
 libavfilter/vf_neighbor_opencl.c    |  3 +--
 libavfilter/vf_program_opencl.c     |  3 +--
 libavfilter/vf_showinfo.c           | 12 +++++-------
 libavfilter/vf_unsharp_opencl.c     |  3 +--
 libavformat/ape.c                   |  2 +-
 libavformat/dump.c                  |  9 +++------
 libavformat/framecrcenc.c           |  2 +-
 libavformat/hashenc.c               |  2 +-
 libavformat/http.c                  |  2 +-
 libavformat/id3v2.c                 |  2 +-
 libavformat/mmsh.c                  |  4 ++--
 libavformat/mmst.c                  |  4 ++--
 libavformat/mpjpegdec.c             |  2 +-
 libavformat/oggparsevorbis.c        |  2 +-
 libavformat/rtpdec_xiph.c           |  4 ++--
 libavformat/sdp.c                   |  2 +-
 libavformat/supenc.c                |  4 ++--
 libavutil/hwcontext_opencl.c        |  2 +-
 libavutil/hwcontext_vulkan.c        |  2 +-
 libavutil/internal.h                |  3 ---
 libavutil/tests/channel_layout.c    |  3 +--
 libavutil/tests/imgutils.c          |  4 ++--
 libavutil/tests/side_data_array.c   |  2 +-
 libavutil/vulkan.c                  |  2 +-
 64 files changed, 97 insertions(+), 110 deletions(-)

diff --git a/libavcodec/alsdec.c b/libavcodec/alsdec.c
index fc55aa2687..09c1b8db9e 100644
--- a/libavcodec/alsdec.c
+++ b/libavcodec/alsdec.c
@@ -1343,13 +1343,13 @@ static int revert_channel_correlation(ALSDecContext *ctx, ALSBlockData *bd,
             if (ch[dep].time_diff_sign) {
                 t      = -t;
                 if (begin < t) {
-                    av_log(ctx->avctx, AV_LOG_ERROR, "begin %"PTRDIFF_SPECIFIER" smaller than time diff index %d.\n", begin, t);
+                    av_log(ctx->avctx, AV_LOG_ERROR, "begin %td smaller than time diff index %d.\n", begin, t);
                     return AVERROR_INVALIDDATA;
                 }
                 begin -= t;
             } else {
                 if (end < t) {
-                    av_log(ctx->avctx, AV_LOG_ERROR, "end %"PTRDIFF_SPECIFIER" smaller than time diff index %d.\n", end, t);
+                    av_log(ctx->avctx, AV_LOG_ERROR, "end %td smaller than time diff index %d.\n", end, t);
                     return AVERROR_INVALIDDATA;
                 }
                 end   -= t;
diff --git a/libavcodec/av1dec.c b/libavcodec/av1dec.c
index d4ceb5ef09..1dffc7c1b9 100644
--- a/libavcodec/av1dec.c
+++ b/libavcodec/av1dec.c
@@ -1450,7 +1450,7 @@ static int av1_receive_frame_internal(AVCodecContext *avctx, AVFrame *frame)
             break;
         default:
             av_log(avctx, AV_LOG_DEBUG,
-                   "Unknown obu type: %d (%"SIZE_SPECIFIER" bits).\n",
+                   "Unknown obu type: %d (%zu bits).\n",
                    unit->type, unit->data_size);
         }
 
diff --git a/libavcodec/cbs.c b/libavcodec/cbs.c
index cea6e6db21..41c8184434 100644
--- a/libavcodec/cbs.c
+++ b/libavcodec/cbs.c
@@ -376,7 +376,7 @@ static int cbs_write_unit_data(CodedBitstreamContext *ctx,
         if (ret < 0) {
             av_log(ctx->log_ctx, AV_LOG_ERROR, "Unable to allocate a "
                    "sufficiently large write buffer (last attempt "
-                   "%"SIZE_SPECIFIER" bytes).\n", ctx->write_buffer_size);
+                   "%zu bytes).\n", ctx->write_buffer_size);
             return ret;
         }
     }
diff --git a/libavcodec/cbs_apv.c b/libavcodec/cbs_apv.c
index 7a06244c69..be52c2e511 100644
--- a/libavcodec/cbs_apv.c
+++ b/libavcodec/cbs_apv.c
@@ -201,7 +201,7 @@ static int cbs_apv_split_fragment(CodedBitstreamContext *ctx,
 
         if (size < 8) {
             av_log(ctx->log_ctx, AV_LOG_ERROR, "Invalid PBU: "
-                   "fragment too short (%"SIZE_SPECIFIER" bytes).\n",
+                   "fragment too short (%zu bytes).\n",
                    size);
             err = AVERROR_INVALIDDATA;
             goto fail;
diff --git a/libavcodec/cbs_av1.c b/libavcodec/cbs_av1.c
index 75601c934b..d05352c738 100644
--- a/libavcodec/cbs_av1.c
+++ b/libavcodec/cbs_av1.c
@@ -714,7 +714,7 @@ static int cbs_av1_split_fragment(CodedBitstreamContext *ctx,
 
     if (INT_MAX / 8 < size) {
         av_log(ctx->log_ctx, AV_LOG_ERROR, "Invalid fragment: "
-               "too large (%"SIZE_SPECIFIER" bytes).\n", size);
+               "too large (%zu bytes).\n", size);
         err = AVERROR_INVALIDDATA;
         goto fail;
     }
@@ -765,7 +765,7 @@ static int cbs_av1_split_fragment(CodedBitstreamContext *ctx,
         if (obu_header.obu_has_size_field) {
             if (get_bits_left(&gbc) < 8) {
                 av_log(ctx->log_ctx, AV_LOG_ERROR, "Invalid OBU: fragment "
-                       "too short (%"SIZE_SPECIFIER" bytes).\n", size);
+                       "too short (%zu bytes).\n", size);
                 err = AVERROR_INVALIDDATA;
                 goto fail;
             }
@@ -782,7 +782,7 @@ static int cbs_av1_split_fragment(CodedBitstreamContext *ctx,
 
         if (size < obu_length) {
             av_log(ctx->log_ctx, AV_LOG_ERROR, "Invalid OBU length: "
-                   "%"PRIu64", but only %"SIZE_SPECIFIER" bytes remaining in fragment.\n",
+                   "%"PRIu64", but only %zu bytes remaining in fragment.\n",
                    obu_length, size);
             err = AVERROR_INVALIDDATA;
             goto fail;
@@ -868,7 +868,7 @@ static int cbs_av1_read_unit(CodedBitstreamContext *ctx,
     } else {
         if (unit->data_size < 1 + obu->header.obu_extension_flag) {
             av_log(ctx->log_ctx, AV_LOG_ERROR, "Invalid OBU length: "
-                   "unit too short (%"SIZE_SPECIFIER").\n", unit->data_size);
+                   "unit too short (%zu).\n", unit->data_size);
             return AVERROR_INVALIDDATA;
         }
         obu->obu_size = unit->data_size - 1 - obu->header.obu_extension_flag;
diff --git a/libavcodec/cbs_vp9.c b/libavcodec/cbs_vp9.c
index 9e9b37e527..37015f5c77 100644
--- a/libavcodec/cbs_vp9.c
+++ b/libavcodec/cbs_vp9.c
@@ -415,7 +415,7 @@ static int cbs_vp9_split_fragment(CodedBitstreamContext *ctx,
         }
         if (pos + index_size != frag->data_size) {
             av_log(ctx->log_ctx, AV_LOG_WARNING, "Extra padding at "
-                   "end of superframe: %"SIZE_SPECIFIER" bytes.\n",
+                   "end of superframe: %zu bytes.\n",
                    frag->data_size - (pos + index_size));
         }
 
diff --git a/libavcodec/d3d12va_encode.c b/libavcodec/d3d12va_encode.c
index d28447c3c9..5a82a42c5f 100644
--- a/libavcodec/d3d12va_encode.c
+++ b/libavcodec/d3d12va_encode.c
@@ -707,7 +707,7 @@ static int d3d12va_encode_get_coded_data(AVCodecContext *avctx,
         goto end;
 
     total_size += pic->header_size;
-    av_log(avctx, AV_LOG_DEBUG, "Output buffer size %"SIZE_SPECIFIER"\n", total_size);
+    av_log(avctx, AV_LOG_DEBUG, "Output buffer size %zu\n", total_size);
 
     hr = ID3D12Resource_Map(pic->output_buffer, 0, NULL, (void **)&mapped_data);
     if (FAILED(hr)) {
diff --git a/libavcodec/d3d12va_encode_av1.c b/libavcodec/d3d12va_encode_av1.c
index e7a115a2ee..34e803db98 100644
--- a/libavcodec/d3d12va_encode_av1.c
+++ b/libavcodec/d3d12va_encode_av1.c
@@ -406,7 +406,7 @@ static int d3d12va_encode_av1_get_coded_data(AVCodecContext *avctx,
     av_log(avctx, AV_LOG_DEBUG, "Tile group extra size: %d bytes.\n", tile_group_extra_size);
 
     total_size += (pic->header_size + tile_group_extra_size + av1_pic_hd_size);
-    av_log(avctx, AV_LOG_DEBUG, "Output buffer size %"SIZE_SPECIFIER"\n", total_size);
+    av_log(avctx, AV_LOG_DEBUG, "Output buffer size %zu\n", total_size);
 
     hr = ID3D12Resource_Map(pic->output_buffer, 0, NULL, (void **)&mapped_data);
     if (FAILED(hr)) {
diff --git a/libavcodec/decode.c b/libavcodec/decode.c
index 7089114e98..f0646901c8 100644
--- a/libavcodec/decode.c
+++ b/libavcodec/decode.c
@@ -747,8 +747,8 @@ static int apply_cropping(AVCodecContext *avctx, AVFrame *frame)
         (frame->crop_top + frame->crop_bottom) >= frame->height) {
         av_log(avctx, AV_LOG_WARNING,
                "Invalid cropping information set by a decoder: "
-               "%"SIZE_SPECIFIER"/%"SIZE_SPECIFIER"/%"SIZE_SPECIFIER"/%"SIZE_SPECIFIER" "
-               "(frame size %dx%d). This is a bug, please report it\n",
+               "%zu/%zu/%zu/%zu (frame size %dx%d). "
+               "This is a bug, please report it\n",
                frame->crop_left, frame->crop_right, frame->crop_top, frame->crop_bottom,
                frame->width, frame->height);
         frame->crop_left   = 0;
@@ -2270,7 +2270,7 @@ int ff_copy_palette(void *dst, const AVPacket *src, void *logctx)
         return 1;
     } else if (pal) {
         av_log(logctx, AV_LOG_ERROR,
-               "Palette size %"SIZE_SPECIFIER" is wrong\n", size);
+               "Palette size %zu is wrong\n", size);
     }
     return 0;
 }
diff --git a/libavcodec/dnxhddec.c b/libavcodec/dnxhddec.c
index fe0809a5f5..7ec61f9ebd 100644
--- a/libavcodec/dnxhddec.c
+++ b/libavcodec/dnxhddec.c
@@ -333,7 +333,7 @@ static int dnxhd_decode_header(DNXHDContext *ctx, AVFrame *frame,
 
     if (ctx->mb_height > FF_ARRAY_ELEMS(ctx->mb_scan_index)) {
         av_log(ctx->avctx, AV_LOG_ERROR,
-               "mb_height too big (%d > %"SIZE_SPECIFIER").\n", ctx->mb_height, FF_ARRAY_ELEMS(ctx->mb_scan_index));
+               "mb_height too big (%d > %zu).\n", ctx->mb_height, FF_ARRAY_ELEMS(ctx->mb_scan_index));
         return AVERROR_INVALIDDATA;
     }
 
diff --git a/libavcodec/dvdsubenc.c b/libavcodec/dvdsubenc.c
index 00bab35988..2a5463b8d8 100644
--- a/libavcodec/dvdsubenc.c
+++ b/libavcodec/dvdsubenc.c
@@ -411,7 +411,7 @@ static int dvdsub_encode(AVCodecContext *avctx,
     qq = outbuf;
     bytestream_put_be16(&qq, q - outbuf);
 
-    av_log(NULL, AV_LOG_DEBUG, "subtitle_packet size=%"PTRDIFF_SPECIFIER"\n", q - outbuf);
+    av_log(NULL, AV_LOG_DEBUG, "subtitle_packet size=%td\n", q - outbuf);
     ret = q - outbuf;
 
 fail:
diff --git a/libavcodec/g2meet.c b/libavcodec/g2meet.c
index f952a06f12..a4900641e7 100644
--- a/libavcodec/g2meet.c
+++ b/libavcodec/g2meet.c
@@ -890,7 +890,7 @@ static int epic_jb_decode_tile(G2MContext *c, int tile_x, int tile_y,
     }
 
     if (src_size < els_dsize) {
-        av_log(avctx, AV_LOG_ERROR, "ePIC: data too short, needed %"SIZE_SPECIFIER", got %"SIZE_SPECIFIER"\n",
+        av_log(avctx, AV_LOG_ERROR, "ePIC: data too short, needed %zu, got %zu\n",
                els_dsize, src_size);
         return AVERROR_INVALIDDATA;
     }
diff --git a/libavcodec/h264_ps.c b/libavcodec/h264_ps.c
index 3a3cad7de7..ac204172cb 100644
--- a/libavcodec/h264_ps.c
+++ b/libavcodec/h264_ps.c
@@ -715,7 +715,7 @@ int ff_h264_decode_picture_parameter_set(GetBitContext *gb, AVCodecContext *avct
     pps->data_size = get_bits_bytesize(gb, 1);
     if (pps->data_size > sizeof(pps->data)) {
         av_log(avctx, AV_LOG_DEBUG, "Truncating likely oversized PPS "
-               "(%"SIZE_SPECIFIER" > %"SIZE_SPECIFIER")\n",
+               "(%zu > %zu)\n",
                pps->data_size, sizeof(pps->data));
         pps->data_size = sizeof(pps->data);
     }
diff --git a/libavcodec/h264_slice.c b/libavcodec/h264_slice.c
index 7e53e38cca..d91f04fa67 100644
--- a/libavcodec/h264_slice.c
+++ b/libavcodec/h264_slice.c
@@ -2639,10 +2639,10 @@ static int decode_slice(struct AVCodecContext *avctx, void *arg)
                 goto finish;
             }
             if (sl->cabac.bytestream > sl->cabac.bytestream_end + 2 )
-                av_log(h->avctx, AV_LOG_DEBUG, "bytestream overread %"PTRDIFF_SPECIFIER"\n", sl->cabac.bytestream_end - sl->cabac.bytestream);
+                av_log(h->avctx, AV_LOG_DEBUG, "bytestream overread %td\n", sl->cabac.bytestream_end - sl->cabac.bytestream);
             if (ret < 0 || sl->cabac.bytestream > sl->cabac.bytestream_end + 4) {
                 av_log(h->avctx, AV_LOG_ERROR,
-                       "error while decoding MB %d %d, bytestream %"PTRDIFF_SPECIFIER"\n",
+                       "error while decoding MB %d %d, bytestream %td\n",
                        sl->mb_x, sl->mb_y,
                        sl->cabac.bytestream_end - sl->cabac.bytestream);
                 er_add_slice(sl, sl->resync_mb_x, sl->resync_mb_y, sl->mb_x,
diff --git a/libavcodec/hapenc.c b/libavcodec/hapenc.c
index a0b199906d..6882b1e563 100644
--- a/libavcodec/hapenc.c
+++ b/libavcodec/hapenc.c
@@ -121,7 +121,7 @@ static int hap_compress_frame(AVCodecContext *avctx, uint8_t *dst)
         /* If there is no gain from snappy, just use the raw texture. */
         if (chunk->compressed_size >= chunk->uncompressed_size) {
             av_log(avctx, AV_LOG_VERBOSE,
-                   "Snappy buffer bigger than uncompressed (%"SIZE_SPECIFIER" >= %"SIZE_SPECIFIER" bytes).\n",
+                   "Snappy buffer bigger than uncompressed (%zu >= %zu bytes).\n",
                    chunk->compressed_size, chunk->uncompressed_size);
             memcpy(chunk_dst, chunk_src, chunk->uncompressed_size);
             chunk->compressor = HAP_COMP_NONE;
diff --git a/libavcodec/hq_hqa.c b/libavcodec/hq_hqa.c
index c813b923b6..61242678b2 100644
--- a/libavcodec/hq_hqa.c
+++ b/libavcodec/hq_hqa.c
@@ -177,7 +177,7 @@ static int hq_decode_frame(HQContext *ctx, AVFrame *pic, GetByteContext *gbc,
             slice_off[slice] >= slice_off[slice + 1] ||
             slice_off[slice + 1] > data_size) {
             av_log(ctx->avctx, AV_LOG_ERROR,
-                   "Invalid slice size %"SIZE_SPECIFIER".\n", data_size);
+                   "Invalid slice size %zu.\n", data_size);
             break;
         }
         init_get_bits(&gb, src + slice_off[slice],
@@ -304,7 +304,7 @@ static int hqa_decode_frame(HQContext *ctx, AVFrame *pic, GetByteContext *gbc, s
             slice_off[slice] >= slice_off[slice + 1] ||
             slice_off[slice + 1] > data_size) {
             av_log(ctx->avctx, AV_LOG_ERROR,
-                   "Invalid slice size %"SIZE_SPECIFIER".\n", data_size);
+                   "Invalid slice size %zu.\n", data_size);
             break;
         }
         init_get_bits(&gb, src + slice_off[slice],
diff --git a/libavcodec/jpeg2000dec.c b/libavcodec/jpeg2000dec.c
index 5d1502fabb..de1a73b92b 100644
--- a/libavcodec/jpeg2000dec.c
+++ b/libavcodec/jpeg2000dec.c
@@ -1476,7 +1476,7 @@ static int jpeg2000_decode_packet(Jpeg2000DecoderContext *s, Jpeg2000Tile *tile,
                 }
                 if (tmp_length > cblk->data_allocated) {
                     avpriv_request_sample(s->avctx,
-                                        "Block with lengthinc greater than %"SIZE_SPECIFIER"",
+                                        "Block with lengthinc greater than %zu",
                                         cblk->data_allocated);
                     return AVERROR_PATCHWELCOME;
                 }
@@ -2071,7 +2071,7 @@ static int decode_cblk(const Jpeg2000DecoderContext *s, Jpeg2000CodingStyle *cod
                 return AVERROR_INVALIDDATA;
             }
             if (FFABS(cblk->data + cblk->data_start[term_cnt + 1] - 2 - t1->mqc.bp) > 0) {
-                av_log(s->avctx, AV_LOG_WARNING, "Mid mismatch %"PTRDIFF_SPECIFIER" in pass %d of %d\n",
+                av_log(s->avctx, AV_LOG_WARNING, "Mid mismatch %td in pass %d of %d\n",
                     cblk->data + cblk->data_start[term_cnt + 1] - 2 - t1->mqc.bp,
                     pass_cnt, cblk->npasses);
             }
@@ -2088,7 +2088,7 @@ static int decode_cblk(const Jpeg2000DecoderContext *s, Jpeg2000CodingStyle *cod
     }
 
     if (cblk->data + cblk->length - 2 > t1->mqc.bp) {
-        av_log(s->avctx, AV_LOG_WARNING, "End mismatch %"PTRDIFF_SPECIFIER"\n",
+        av_log(s->avctx, AV_LOG_WARNING, "End mismatch %td\n",
                cblk->data + cblk->length - 2 - t1->mqc.bp);
     }
 
diff --git a/libavcodec/libaomenc.c b/libavcodec/libaomenc.c
index 1e5db4ad8c..be319d0f3c 100644
--- a/libavcodec/libaomenc.c
+++ b/libavcodec/libaomenc.c
@@ -240,7 +240,7 @@ static av_cold void dump_enc_cfg(AVCodecContext *avctx,
            width, "g_pass:",            cfg->g_pass,
            width, "g_lag_in_frames:",   cfg->g_lag_in_frames);
     av_log(avctx, level, "rate control settings\n"
-                         "  %*s%u\n  %*s%d\n  %*s%p(%"SIZE_SPECIFIER")\n  %*s%u\n",
+                         "  %*s%u\n  %*s%d\n  %*s%p(%zu)\n  %*s%u\n",
            width, "rc_dropframe_thresh:", cfg->rc_dropframe_thresh,
            width, "rc_end_usage:",        cfg->rc_end_usage,
            width, "rc_twopass_stats_in:", cfg->rc_twopass_stats_in.buf, cfg->rc_twopass_stats_in.sz,
@@ -896,7 +896,7 @@ static av_cold int aom_init(AVCodecContext *avctx,
         ret                   = av_reallocp(&ctx->twopass_stats.buf, ctx->twopass_stats.sz);
         if (ret < 0) {
             av_log(avctx, AV_LOG_ERROR,
-                   "Stat buffer alloc (%"SIZE_SPECIFIER" bytes) failed\n",
+                   "Stat buffer alloc (%zu bytes) failed\n",
                    ctx->twopass_stats.sz);
             ctx->twopass_stats.sz = 0;
             return ret;
@@ -1091,7 +1091,7 @@ static int storeframe(AVCodecContext *avctx, struct FrameListData *cx_frame,
     int ret = ff_get_encode_buffer(avctx, pkt, cx_frame->sz, 0);
     if (ret < 0) {
         av_log(avctx, AV_LOG_ERROR,
-               "Error getting output packet of size %"SIZE_SPECIFIER".\n", cx_frame->sz);
+               "Error getting output packet of size %zu.\n", cx_frame->sz);
         return ret;
     }
     memcpy(pkt->data, cx_frame->buf, pkt->size);
@@ -1190,7 +1190,7 @@ static int queue_frames(AVCodecContext *avctx, AVPacket *pkt_out)
 
                 if (!cx_frame->buf) {
                     av_log(avctx, AV_LOG_ERROR,
-                           "Data buffer alloc (%"SIZE_SPECIFIER" bytes) failed\n",
+                           "Data buffer alloc (%zu bytes) failed\n",
                            cx_frame->sz);
                     av_freep(&cx_frame);
                     return AVERROR(ENOMEM);
@@ -1352,7 +1352,7 @@ static int aom_encode(AVCodecContext *avctx, AVPacket *pkt,
 
         avctx->stats_out = av_malloc(b64_size);
         if (!avctx->stats_out) {
-            av_log(avctx, AV_LOG_ERROR, "Stat buffer alloc (%"SIZE_SPECIFIER" bytes) failed\n",
+            av_log(avctx, AV_LOG_ERROR, "Stat buffer alloc (%zu bytes) failed\n",
                    b64_size);
             return AVERROR(ENOMEM);
         }
diff --git a/libavcodec/libilbc.c b/libavcodec/libilbc.c
index f496f16db9..9a43e566f5 100644
--- a/libavcodec/libilbc.c
+++ b/libavcodec/libilbc.c
@@ -99,8 +99,7 @@ static int ilbc_decode_frame(AVCodecContext *avctx, AVFrame *frame,
 #if LIBILBC_VERSION_MAJOR < 3
         av_log(avctx, AV_LOG_ERROR, "iLBC frame too short (%u, should be %u)\n",
 #else
-        av_log(avctx, AV_LOG_ERROR, "iLBC frame too short (%u, should be "
-                                    "%"SIZE_SPECIFIER")\n",
+        av_log(avctx, AV_LOG_ERROR, "iLBC frame too short (%u, should be %zu)\n",
 #endif
                buf_size, s->decoder.no_of_bytes);
         return AVERROR_INVALIDDATA;
diff --git a/libavcodec/libvpxenc.c b/libavcodec/libvpxenc.c
index 05c34a6857..ef529abcab 100644
--- a/libavcodec/libvpxenc.c
+++ b/libavcodec/libvpxenc.c
@@ -242,7 +242,7 @@ static av_cold void dump_enc_cfg(AVCodecContext *avctx,
            width, "g_lag_in_frames:",   cfg->g_lag_in_frames);
     av_log(avctx, level, "rate control settings\n"
            "  %*s%u\n  %*s%u\n  %*s%u\n  %*s%u\n"
-           "  %*s%d\n  %*s%p(%"SIZE_SPECIFIER")\n  %*s%u\n",
+           "  %*s%d\n  %*s%p(%zu)\n  %*s%u\n",
            width, "rc_dropframe_thresh:",   cfg->rc_dropframe_thresh,
            width, "rc_resize_allowed:",     cfg->rc_resize_allowed,
            width, "rc_resize_up_thresh:",   cfg->rc_resize_up_thresh,
@@ -1117,7 +1117,7 @@ static av_cold int vpx_init(AVCodecContext *avctx,
         ret = av_reallocp(&ctx->twopass_stats.buf, ctx->twopass_stats.sz);
         if (ret < 0) {
             av_log(avctx, AV_LOG_ERROR,
-                   "Stat buffer alloc (%"SIZE_SPECIFIER" bytes) failed\n",
+                   "Stat buffer alloc (%zu bytes) failed\n",
                    ctx->twopass_stats.sz);
             ctx->twopass_stats.sz = 0;
             return ret;
@@ -1423,7 +1423,7 @@ static int queue_frames(AVCodecContext *avctx, struct vpx_codec_ctx *encoder,
 
                 if (!cx_frame->buf) {
                     av_log(avctx, AV_LOG_ERROR,
-                           "Data buffer alloc (%"SIZE_SPECIFIER" bytes) failed\n",
+                           "Data buffer alloc (%zu bytes) failed\n",
                            cx_frame->sz);
                     av_freep(&cx_frame);
                     return AVERROR(ENOMEM);
diff --git a/libavcodec/mjpegdec.c b/libavcodec/mjpegdec.c
index fb39c4e9fd..4d7cdfde12 100644
--- a/libavcodec/mjpegdec.c
+++ b/libavcodec/mjpegdec.c
@@ -2281,7 +2281,7 @@ int ff_mjpeg_find_marker(MJpegDecodeContext *s,
         memset(s->buffer + *unescaped_buf_size, 0,
                AV_INPUT_BUFFER_PADDING_SIZE);
 
-        av_log(s->avctx, AV_LOG_DEBUG, "escaping removed %"PTRDIFF_SPECIFIER" bytes\n",
+        av_log(s->avctx, AV_LOG_DEBUG, "escaping removed %td bytes\n",
                (buf_end - *buf_ptr) - (dst - s->buffer));
     } else if (start_code == SOS && s->ls) {
         const uint8_t *src = *buf_ptr;
@@ -2389,7 +2389,7 @@ redo_for_pal8:
                    start_code, unescaped_buf_size, buf_size);
             return AVERROR_INVALIDDATA;
         }
-        av_log(avctx, AV_LOG_DEBUG, "marker=%x avail_size_in_buf=%"PTRDIFF_SPECIFIER"\n",
+        av_log(avctx, AV_LOG_DEBUG, "marker=%x avail_size_in_buf=%td\n",
                start_code, buf_end - buf_ptr);
 
         ret = init_get_bits8(&s->gb, unescaped_buf_ptr, unescaped_buf_size);
@@ -2859,7 +2859,7 @@ the_end:
     }
 
 the_end_no_picture:
-    av_log(avctx, AV_LOG_DEBUG, "decode frame unused %"PTRDIFF_SPECIFIER" bytes\n",
+    av_log(avctx, AV_LOG_DEBUG, "decode frame unused %td bytes\n",
            buf_end - buf_ptr);
     return buf_ptr - buf;
 }
diff --git a/libavcodec/mjpegenc_common.c b/libavcodec/mjpegenc_common.c
index 21b3b19b93..5effbdbc32 100644
--- a/libavcodec/mjpegenc_common.c
+++ b/libavcodec/mjpegenc_common.c
@@ -145,7 +145,7 @@ int ff_mjpeg_add_icc_profile_size(AVCodecContext *avctx, const AVFrame *frame,
         return 0;
 
     if (sd->size > ICC_MAX_CHUNKS * ICC_CHUNK_SIZE) {
-        av_log(avctx, AV_LOG_ERROR, "Cannot store %"SIZE_SPECIFIER" byte ICC "
+        av_log(avctx, AV_LOG_ERROR, "Cannot store %zu byte ICC "
                "profile: too large for JPEG\n",
                sd->size);
         return AVERROR_INVALIDDATA;
diff --git a/libavcodec/mpeg12dec.c b/libavcodec/mpeg12dec.c
index 1caa8279bd..4c83bcfa90 100644
--- a/libavcodec/mpeg12dec.c
+++ b/libavcodec/mpeg12dec.c
@@ -2269,7 +2269,7 @@ static int decode_chunks(AVCodecContext *avctx, AVFrame *picture,
         input_size = buf_end - buf_ptr;
 
         if (avctx->debug & FF_DEBUG_STARTCODE)
-            av_log(avctx, AV_LOG_DEBUG, "%3"PRIX32" at %"PTRDIFF_SPECIFIER" left %d\n",
+            av_log(avctx, AV_LOG_DEBUG, "%3"PRIX32" at %td left %d\n",
                    start_code, buf_ptr - buf, input_size);
 
         /* prepare data for next start code */
diff --git a/libavcodec/mpeg12enc.c b/libavcodec/mpeg12enc.c
index 4b9d186396..3baadbcfeb 100644
--- a/libavcodec/mpeg12enc.c
+++ b/libavcodec/mpeg12enc.c
@@ -481,7 +481,7 @@ static int mpeg1_encode_picture_header(MPVMainEncContext *const m)
                 put_bits(&s->pb, 8, 0xff);                  // marker_bits
             } else {
                 av_log(s->c.avctx, AV_LOG_WARNING,
-                    "Closed Caption size (%"SIZE_SPECIFIER") can not exceed "
+                    "Closed Caption size (%zu) can not exceed "
                     "93 bytes and must be a multiple of 3\n", side_data->size);
             }
         }
diff --git a/libavcodec/mpegpicture.c b/libavcodec/mpegpicture.c
index 6e96389c34..3f32fe8773 100644
--- a/libavcodec/mpegpicture.c
+++ b/libavcodec/mpegpicture.c
@@ -186,7 +186,7 @@ int ff_mpv_pic_check_linesize(void *logctx, const AVFrame *f,
     if ((linesize   &&   linesize != f->linesize[0]) ||
         (uvlinesize && uvlinesize != f->linesize[1])) {
         av_log(logctx, AV_LOG_ERROR, "Stride change unsupported: "
-               "linesize=%"PTRDIFF_SPECIFIER"/%d uvlinesize=%"PTRDIFF_SPECIFIER"/%d)\n",
+               "linesize=%td/%d uvlinesize=%td/%d)\n",
                linesize,   f->linesize[0],
                uvlinesize, f->linesize[1]);
         return AVERROR_PATCHWELCOME;
diff --git a/libavcodec/mpegvideo_enc.c b/libavcodec/mpegvideo_enc.c
index e1f9623d65..a4f78c25db 100644
--- a/libavcodec/mpegvideo_enc.c
+++ b/libavcodec/mpegvideo_enc.c
@@ -1365,7 +1365,7 @@ static int load_input_picture(MPVMainEncContext *const m, const AVFrame *pic_arg
         if (s->c.linesize & (STRIDE_ALIGN-1))
             direct = 0;
 
-        ff_dlog(s->c.avctx, "%d %d %"PTRDIFF_SPECIFIER" %"PTRDIFF_SPECIFIER"\n", pic_arg->linesize[0],
+        ff_dlog(s->c.avctx, "%d %d %td %td\n", pic_arg->linesize[0],
                 pic_arg->linesize[1], s->c.linesize, s->c.uvlinesize);
 
         pic = av_refstruct_pool_get(s->c.picture_pool);
diff --git a/libavcodec/mscc.c b/libavcodec/mscc.c
index 708e84a8e1..ba48fc08d9 100644
--- a/libavcodec/mscc.c
+++ b/libavcodec/mscc.c
@@ -188,7 +188,7 @@ inflate_error:
                 s->pal[j] = 0xFF000000 | AV_RL32(pal + j * 4);
         } else if (pal) {
             av_log(avctx, AV_LOG_ERROR,
-                   "Palette size %"SIZE_SPECIFIER" is wrong\n", size);
+                   "Palette size %zu is wrong\n", size);
         }
         memcpy(frame->data[1], s->pal, AVPALETTE_SIZE);
     }
diff --git a/libavcodec/vulkan_av1.c b/libavcodec/vulkan_av1.c
index 788e3cca78..3a52d1f33a 100644
--- a/libavcodec/vulkan_av1.c
+++ b/libavcodec/vulkan_av1.c
@@ -634,7 +634,7 @@ static int vk_av1_end_frame(AVCodecContext *avctx)
         rav[i] = ap->ref_src[i]->f;
     }
 
-    av_log(avctx, AV_LOG_DEBUG, "Decoding frame, %"SIZE_SPECIFIER" bytes, %i tiles\n",
+    av_log(avctx, AV_LOG_DEBUG, "Decoding frame, %zu bytes, %i tiles\n",
            vp->slices_size, ap->av1_pic_info.tileCount);
 
     return ff_vk_decode_frame(avctx, pic->f, vp, rav, rvp);
diff --git a/libavcodec/vulkan_encode_av1.c b/libavcodec/vulkan_encode_av1.c
index 42ba30b302..c280f37a91 100644
--- a/libavcodec/vulkan_encode_av1.c
+++ b/libavcodec/vulkan_encode_av1.c
@@ -1021,7 +1021,7 @@ static int init_base_units(AVCodecContext *avctx)
         if (!data)
             return AVERROR(ENOMEM);
     } else {
-        av_log(avctx, AV_LOG_ERROR, "Unable to get feedback for AV1 sequence header = %"SIZE_SPECIFIER"\n",
+        av_log(avctx, AV_LOG_ERROR, "Unable to get feedback for AV1 sequence header = %zu\n",
                data_size);
         return err;
     }
diff --git a/libavcodec/vulkan_encode_h264.c b/libavcodec/vulkan_encode_h264.c
index 942e911fb7..0327ccba0b 100644
--- a/libavcodec/vulkan_encode_h264.c
+++ b/libavcodec/vulkan_encode_h264.c
@@ -1147,7 +1147,7 @@ static int init_base_units(AVCodecContext *avctx)
         if (!data)
             return AVERROR(ENOMEM);
     } else {
-        av_log(avctx, AV_LOG_ERROR, "Unable to get feedback for H.264 units = %"SIZE_SPECIFIER"\n", data_size);
+        av_log(avctx, AV_LOG_ERROR, "Unable to get feedback for H.264 units = %zu\n", data_size);
         return err;
     }
 
diff --git a/libavcodec/vulkan_encode_h265.c b/libavcodec/vulkan_encode_h265.c
index c30b7e8f93..9cd33abfe3 100644
--- a/libavcodec/vulkan_encode_h265.c
+++ b/libavcodec/vulkan_encode_h265.c
@@ -1316,7 +1316,7 @@ static int init_base_units(AVCodecContext *avctx)
         if (!data)
             return AVERROR(ENOMEM);
     } else {
-        av_log(avctx, AV_LOG_ERROR, "Unable to get feedback for H.265 units = %"SIZE_SPECIFIER"\n", data_size);
+        av_log(avctx, AV_LOG_ERROR, "Unable to get feedback for H.265 units = %zu\n", data_size);
         return err;
     }
 
diff --git a/libavcodec/vulkan_h264.c b/libavcodec/vulkan_h264.c
index bcc5c99b4b..8293f7ece9 100644
--- a/libavcodec/vulkan_h264.c
+++ b/libavcodec/vulkan_h264.c
@@ -556,7 +556,7 @@ static int vk_h264_end_frame(AVCodecContext *avctx)
         rav[i] = hp->ref_src[i]->f;
     }
 
-    av_log(avctx, AV_LOG_DEBUG, "Decoding frame, %"SIZE_SPECIFIER" bytes, %i slices\n",
+    av_log(avctx, AV_LOG_DEBUG, "Decoding frame, %zu bytes, %i slices\n",
            vp->slices_size, hp->h264_pic_info.sliceCount);
 
     return ff_vk_decode_frame(avctx, pic->f, vp, rav, rvp);
diff --git a/libavcodec/vulkan_hevc.c b/libavcodec/vulkan_hevc.c
index 5e15c6b931..f609c4301f 100644
--- a/libavcodec/vulkan_hevc.c
+++ b/libavcodec/vulkan_hevc.c
@@ -921,7 +921,7 @@ static int vk_hevc_end_frame(AVCodecContext *avctx)
         rvp[i] = &rfhp->vp;
     }
 
-    av_log(avctx, AV_LOG_DEBUG, "Decoding frame, %"SIZE_SPECIFIER" bytes, %i slices\n",
+    av_log(avctx, AV_LOG_DEBUG, "Decoding frame, %zu bytes, %i slices\n",
            vp->slices_size, hp->h265_pic_info.sliceSegmentCount);
 
     return ff_vk_decode_frame(avctx, pic->f, vp, rav, rvp);
diff --git a/libavcodec/vulkan_vp9.c b/libavcodec/vulkan_vp9.c
index 7b852a29a5..5b592c1443 100644
--- a/libavcodec/vulkan_vp9.c
+++ b/libavcodec/vulkan_vp9.c
@@ -336,7 +336,7 @@ static int vk_vp9_end_frame(AVCodecContext *avctx)
         rav[i] = ap->ref_src[i]->tf.f;
     }
 
-    av_log(avctx, AV_LOG_VERBOSE, "Decoding frame, %"SIZE_SPECIFIER" bytes\n",
+    av_log(avctx, AV_LOG_VERBOSE, "Decoding frame, %zu bytes\n",
            vp->slices_size);
 
     return ff_vk_decode_frame(avctx, pic->tf.f, vp, rav, rvp);
diff --git a/libavcodec/zmbv.c b/libavcodec/zmbv.c
index 2c09ccbd73..f0bffd8966 100644
--- a/libavcodec/zmbv.c
+++ b/libavcodec/zmbv.c
@@ -150,7 +150,7 @@ static int zmbv_decode_xor_8(ZmbvContext *c)
         prev += c->width * c->bh;
     }
     if (src - c->decomp_buf != c->decomp_len)
-        av_log(c->avctx, AV_LOG_ERROR, "Used %"PTRDIFF_SPECIFIER" of %i bytes\n",
+        av_log(c->avctx, AV_LOG_ERROR, "Used %td of %i bytes\n",
                src-c->decomp_buf, c->decomp_len);
     return 0;
 }
@@ -226,7 +226,7 @@ static int zmbv_decode_xor_16(ZmbvContext *c)
         prev += c->width * c->bh;
     }
     if (src - c->decomp_buf != c->decomp_len)
-        av_log(c->avctx, AV_LOG_ERROR, "Used %"PTRDIFF_SPECIFIER" of %i bytes\n",
+        av_log(c->avctx, AV_LOG_ERROR, "Used %td of %i bytes\n",
                src-c->decomp_buf, c->decomp_len);
     return 0;
 }
@@ -311,7 +311,7 @@ static int zmbv_decode_xor_24(ZmbvContext *c)
         prev += stride * c->bh;
     }
     if (src - c->decomp_buf != c->decomp_len)
-        av_log(c->avctx, AV_LOG_ERROR, "Used %"PTRDIFF_SPECIFIER" of %i bytes\n",
+        av_log(c->avctx, AV_LOG_ERROR, "Used %td of %i bytes\n",
                src-c->decomp_buf, c->decomp_len);
     return 0;
 }
@@ -388,7 +388,7 @@ static int zmbv_decode_xor_32(ZmbvContext *c)
         prev   += c->width * c->bh;
     }
     if (src - c->decomp_buf != c->decomp_len)
-        av_log(c->avctx, AV_LOG_ERROR, "Used %"PTRDIFF_SPECIFIER" of %i bytes\n",
+        av_log(c->avctx, AV_LOG_ERROR, "Used %td of %i bytes\n",
                src-c->decomp_buf, c->decomp_len);
     return 0;
 }
diff --git a/libavfilter/af_ashowinfo.c b/libavfilter/af_ashowinfo.c
index 57f1ebf535..ebcd5c996f 100644
--- a/libavfilter/af_ashowinfo.c
+++ b/libavfilter/af_ashowinfo.c
@@ -168,7 +168,7 @@ static void dump_audio_service_type(AVFilterContext *ctx, AVFrameSideData *sd)
 static void dump_unknown(AVFilterContext *ctx, AVFrameSideData *sd)
 {
     av_log(ctx, AV_LOG_INFO, "unknown side data type: %d, size "
-           "%"SIZE_SPECIFIER" bytes", sd->type, sd->size);
+           "%zu bytes", sd->type, sd->size);
 }
 
 static int filter_frame(AVFilterLink *inlink, AVFrame *buf)
diff --git a/libavfilter/f_graphmonitor.c b/libavfilter/f_graphmonitor.c
index 20cdcce79d..95c34593d2 100644
--- a/libavfilter/f_graphmonitor.c
+++ b/libavfilter/f_graphmonitor.c
@@ -304,7 +304,7 @@ static int draw_items(AVFilterContext *ctx,
         len = snprintf(buffer, sizeof(buffer)-1, " | queue: ");
         drawtext(out, xpos, ypos, buffer, len, s->white);
         xpos += len * 8;
-        len = snprintf(buffer, sizeof(buffer)-1, "%"SIZE_SPECIFIER, frames);
+        len = snprintf(buffer, sizeof(buffer)-1, "%zu", frames);
         drawtext(out, xpos, ypos, buffer, len, frames > 0 ? frames >= 10 ? frames >= 50 ? s->red : s->yellow : s->green : s->white);
         xpos += len * 8;
     }
diff --git a/libavfilter/vf_avgblur_opencl.c b/libavfilter/vf_avgblur_opencl.c
index 790a51ea80..2bd7a899d5 100644
--- a/libavfilter/vf_avgblur_opencl.c
+++ b/libavfilter/vf_avgblur_opencl.c
@@ -222,7 +222,7 @@ static int avgblur_opencl_filter_frame(AVFilterLink *inlink, AVFrame *input)
                 goto fail;
 
             av_log(avctx, AV_LOG_DEBUG, "Run kernel on plane %d "
-                   "(%"SIZE_SPECIFIER"x%"SIZE_SPECIFIER").\n",
+                   "(%zux%zu).\n",
                    p, global_work[0], global_work[1]);
 
             cle = clEnqueueNDRangeKernel(ctx->command_queue, ctx->kernel_horiz, 2, NULL,
diff --git a/libavfilter/vf_convolution_opencl.c b/libavfilter/vf_convolution_opencl.c
index c866022a12..99ef186038 100644
--- a/libavfilter/vf_convolution_opencl.c
+++ b/libavfilter/vf_convolution_opencl.c
@@ -235,7 +235,7 @@ static int convolution_opencl_filter_frame(AVFilterLink *inlink, AVFrame *input)
                 goto fail;
 
             av_log(avctx, AV_LOG_DEBUG, "Run kernel on plane %d "
-                   "(%"SIZE_SPECIFIER"x%"SIZE_SPECIFIER").\n",
+                   "(%zux%zu).\n",
                    p, global_work[0], global_work[1]);
 
             cle = clEnqueueNDRangeKernel(ctx->command_queue, ctx->kernel, 2, NULL,
@@ -264,7 +264,7 @@ static int convolution_opencl_filter_frame(AVFilterLink *inlink, AVFrame *input)
                     goto fail;
 
                 av_log(avctx, AV_LOG_DEBUG, "Run kernel on plane %d "
-                       "(%"SIZE_SPECIFIER"x%"SIZE_SPECIFIER").\n",
+                       "(%zux%zu).\n",
                        p, global_work[0], global_work[1]);
 
                 cle = clEnqueueNDRangeKernel(ctx->command_queue, ctx->kernel, 2, NULL,
diff --git a/libavfilter/vf_lcevc.c b/libavfilter/vf_lcevc.c
index 60fab41fcb..575bbd75fa 100644
--- a/libavfilter/vf_lcevc.c
+++ b/libavfilter/vf_lcevc.c
@@ -110,7 +110,7 @@ static int alloc_base_frame(AVFilterLink *inlink, const AVFrame *in,
     desc.matrixCoefficients = (LCEVC_MatrixCoefficients)in->colorspace;
     desc.transferCharacteristics = (LCEVC_TransferCharacteristics)in->color_trc;
     av_log(ctx, AV_LOG_DEBUG, "in  PTS %"PRId64", %dx%d, "
-                              "%"SIZE_SPECIFIER"/%"SIZE_SPECIFIER"/%"SIZE_SPECIFIER"/%"SIZE_SPECIFIER", "
+                              "%zu/%zu/%zu/%zu, "
                               "SAR %d:%d\n",
            in->pts, in->width, in->height,
            in->crop_top, in->crop_bottom, in->crop_left, in->crop_right,
@@ -234,7 +234,7 @@ static int generate_output(AVFilterLink *inlink, AVFrame *out)
     out->height = outlink->h = desc.height + out->crop_top + out->crop_bottom;
 
     av_log(ctx, AV_LOG_DEBUG, "out PTS %"PRId64", %dx%d, "
-                              "%"SIZE_SPECIFIER"/%"SIZE_SPECIFIER"/%"SIZE_SPECIFIER"/%"SIZE_SPECIFIER", "
+                              "%zu/%zu/%zu/%zu, "
                               "SAR %d:%d, "
                               "hasEnhancement %d, enhanced %d\n",
            out->pts, out->width, out->height,
diff --git a/libavfilter/vf_neighbor_opencl.c b/libavfilter/vf_neighbor_opencl.c
index 38a1b7821e..0b8d7fc998 100644
--- a/libavfilter/vf_neighbor_opencl.c
+++ b/libavfilter/vf_neighbor_opencl.c
@@ -182,8 +182,7 @@ static int neighbor_opencl_filter_frame(AVFilterLink *inlink, AVFrame *input)
             if (err < 0)
                 goto fail;
 
-            av_log(avctx, AV_LOG_DEBUG, "Run kernel on plane %d "
-                   "(%"SIZE_SPECIFIER"x%"SIZE_SPECIFIER").\n",
+            av_log(avctx, AV_LOG_DEBUG, "Run kernel on plane %d (%zux%zu).\n",
                    p, global_work[0], global_work[1]);
 
             cle = clEnqueueNDRangeKernel(ctx->command_queue, ctx->kernel, 2, NULL,
diff --git a/libavfilter/vf_program_opencl.c b/libavfilter/vf_program_opencl.c
index c0c10045db..0bb8b6e98f 100644
--- a/libavfilter/vf_program_opencl.c
+++ b/libavfilter/vf_program_opencl.c
@@ -145,8 +145,7 @@ static int program_opencl_run(AVFilterContext *avctx)
         if (err < 0)
             goto fail;
 
-        av_log(avctx, AV_LOG_DEBUG, "Run kernel on plane %d "
-               "(%"SIZE_SPECIFIER"x%"SIZE_SPECIFIER").\n",
+        av_log(avctx, AV_LOG_DEBUG, "Run kernel on plane %d (%zux%zu).\n",
                plane, global_work[0], global_work[1]);
 
         cle = clEnqueueNDRangeKernel(ctx->command_queue, ctx->kernel, 2, NULL,
diff --git a/libavfilter/vf_showinfo.c b/libavfilter/vf_showinfo.c
index a85e894ae5..4ac3d45dc0 100644
--- a/libavfilter/vf_showinfo.c
+++ b/libavfilter/vf_showinfo.c
@@ -87,8 +87,7 @@ static void dump_spherical(AVFilterContext *ctx, AVFrame *frame, const AVFrameSi
         size_t l, t, r, b;
         av_spherical_tile_bounds(spherical, frame->width, frame->height,
                                  &l, &t, &r, &b);
-        av_log(ctx, AV_LOG_INFO,
-               "[%"SIZE_SPECIFIER", %"SIZE_SPECIFIER", %"SIZE_SPECIFIER", %"SIZE_SPECIFIER"] ",
+        av_log(ctx, AV_LOG_INFO, "[%zu, %zu, %zu, %zu] ",
                l, t, r, b);
     } else if (spherical->projection == AV_SPHERICAL_CUBEMAP) {
         av_log(ctx, AV_LOG_INFO, "[pad %"PRIu32"] ", spherical->padding);
@@ -426,8 +425,8 @@ static void dump_sei_unregistered_metadata(AVFilterContext *ctx, const AVFrameSi
     ShowInfoContext *s = ctx->priv;
 
     if (sd->size < AV_UUID_LEN) {
-        av_log(ctx, AV_LOG_ERROR, "invalid data(%"SIZE_SPECIFIER" < "
-               "UUID(%d-bytes))\n", sd->size, AV_UUID_LEN);
+        av_log(ctx, AV_LOG_ERROR, "invalid data (%zu < UUID(%d-bytes))\n",
+               sd->size, AV_UUID_LEN);
         return;
     }
 
@@ -874,11 +873,10 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *frame)
             break;
         default:
             if (name)
-                av_log(ctx, AV_LOG_INFO,
-                       "(%"SIZE_SPECIFIER" bytes)", sd->size);
+                av_log(ctx, AV_LOG_INFO, "(%zu bytes)", sd->size);
             else
                 av_log(ctx, AV_LOG_WARNING, "unknown side data type %d "
-                       "(%"SIZE_SPECIFIER" bytes)", sd->type, sd->size);
+                       "(%zu bytes)", sd->type, sd->size);
             break;
         }
 
diff --git a/libavfilter/vf_unsharp_opencl.c b/libavfilter/vf_unsharp_opencl.c
index 15853e8db3..8a0db7d975 100644
--- a/libavfilter/vf_unsharp_opencl.c
+++ b/libavfilter/vf_unsharp_opencl.c
@@ -269,8 +269,7 @@ static int unsharp_opencl_filter_frame(AVFilterLink *inlink, AVFrame *input)
         local_work[0]  = 16;
         local_work[1]  = 16;
 
-        av_log(avctx, AV_LOG_DEBUG, "Run kernel on plane %d "
-               "(%"SIZE_SPECIFIER"x%"SIZE_SPECIFIER").\n",
+        av_log(avctx, AV_LOG_DEBUG, "Run kernel on plane %d (%zux%zu).\n",
                p, global_work[0], global_work[1]);
 
         cle = clEnqueueNDRangeKernel(ctx->command_queue, ctx->kernel, 2, NULL,
diff --git a/libavformat/ape.c b/libavformat/ape.c
index 7e6bf12961..d2fd62902c 100644
--- a/libavformat/ape.c
+++ b/libavformat/ape.c
@@ -247,7 +247,7 @@ static int ape_read_header(AVFormatContext * s)
     }
     if (ape->seektablelength / sizeof(uint32_t) < ape->totalframes) {
         av_log(s, AV_LOG_ERROR,
-               "Number of seek entries is less than number of frames: %"SIZE_SPECIFIER" vs. %"PRIu32"\n",
+               "Number of seek entries is less than number of frames: %zu vs. %"PRIu32"\n",
                ape->seektablelength / sizeof(uint32_t), ape->totalframes);
         return AVERROR_INVALIDDATA;
     }
diff --git a/libavformat/dump.c b/libavformat/dump.c
index 2948189432..734a6e0bbf 100644
--- a/libavformat/dump.c
+++ b/libavformat/dump.c
@@ -402,9 +402,7 @@ static void dump_spherical(void *ctx, int w, int h,
         size_t l, t, r, b;
         av_spherical_tile_bounds(spherical, w, h,
                                  &l, &t, &r, &b);
-        av_log(ctx, log_level,
-               "[%"SIZE_SPECIFIER", %"SIZE_SPECIFIER", %"SIZE_SPECIFIER", %"SIZE_SPECIFIER"] ",
-               l, t, r, b);
+        av_log(ctx, log_level, "[%zu, %zu, %zu, %zu] ", l, t, r, b);
     } else if (spherical->projection == AV_SPHERICAL_CUBEMAP) {
         av_log(ctx, log_level, "[pad %"PRIu32"] ", spherical->padding);
     }
@@ -536,11 +534,10 @@ static void dump_sidedata(void *ctx, const AVPacketSideData *side_data, int nb_s
             break;
         default:
             if (name)
-                av_log(ctx, log_level,
-                       "(%"SIZE_SPECIFIER" bytes)", sd->size);
+                av_log(ctx, log_level, "(%zu bytes)", sd->size);
             else
                 av_log(ctx, log_level, "unknown side data type %d "
-                       "(%"SIZE_SPECIFIER" bytes)", sd->type, sd->size);
+                       "(%zu bytes)", sd->type, sd->size);
             break;
         }
 
diff --git a/libavformat/framecrcenc.c b/libavformat/framecrcenc.c
index 8c35a2122c..25c59132e9 100644
--- a/libavformat/framecrcenc.c
+++ b/libavformat/framecrcenc.c
@@ -170,7 +170,7 @@ static int framecrc_write_packet(struct AVFormatContext *s, AVPacket *pkt)
                 side_data_crc = 0;
             }
 
-            av_strlcatf(buf, sizeof(buf), ", %s, %8"SIZE_SPECIFIER", 0x%08"PRIx32,
+            av_strlcatf(buf, sizeof(buf), ", %s, %8zu, 0x%08"PRIx32,
                         av_packet_side_data_name(sd->type), sd->size, side_data_crc);
         }
     }
diff --git a/libavformat/hashenc.c b/libavformat/hashenc.c
index e4e3244246..b108eca41b 100644
--- a/libavformat/hashenc.c
+++ b/libavformat/hashenc.c
@@ -306,7 +306,7 @@ static int framehash_write_packet(struct AVFormatContext *s, AVPacket *pkt)
             } else
                 av_hash_update(c->hashes[0], pkt->side_data[i].data, pkt->side_data[i].size);
             snprintf(buf, sizeof(buf) - (AV_HASH_MAX_SIZE * 2 + 1),
-                     ", %8"SIZE_SPECIFIER", ", pkt->side_data[i].size);
+                     ", %8zu, ", pkt->side_data[i].size);
             len = strlen(buf);
             av_hash_final_hex(c->hashes[0], buf + len, sizeof(buf) - len);
             avio_write(s->pb, buf, strlen(buf));
diff --git a/libavformat/http.c b/libavformat/http.c
index c4e6292a95..bd25a45636 100644
--- a/libavformat/http.c
+++ b/libavformat/http.c
@@ -625,7 +625,7 @@ static int http_write_reply(URLContext* h, int status_code)
         message_len = snprintf(message, sizeof(message),
                  "HTTP/1.1 %03d %s\r\n"
                  "Content-Type: %s\r\n"
-                 "Content-Length: %"SIZE_SPECIFIER"\r\n"
+                 "Content-Length: %zu\r\n"
                  "%s"
                  "\r\n"
                  "%03d %s\r\n",
diff --git a/libavformat/id3v2.c b/libavformat/id3v2.c
index 3ce2fadce8..9d4a9802a9 100644
--- a/libavformat/id3v2.c
+++ b/libavformat/id3v2.c
@@ -485,7 +485,7 @@ static void read_geobtag(AVFormatContext *s, AVIOContext *pb, int taglen,
 
     new_extra = av_mallocz(sizeof(ID3v2ExtraMeta));
     if (!new_extra) {
-        av_log(s, AV_LOG_ERROR, "Failed to alloc %"SIZE_SPECIFIER" bytes\n",
+        av_log(s, AV_LOG_ERROR, "Failed to alloc %zu bytes\n",
                sizeof(ID3v2ExtraMeta));
         return;
     }
diff --git a/libavformat/mmsh.c b/libavformat/mmsh.c
index aeadb12dab..ddeeafa423 100644
--- a/libavformat/mmsh.c
+++ b/libavformat/mmsh.c
@@ -118,7 +118,7 @@ static int read_data_packet(MMSHContext *mmsh, const int len)
     int res;
     if (len > sizeof(mms->in_buffer)) {
         av_log(NULL, AV_LOG_ERROR,
-               "Data packet length %d exceeds the in_buffer size %"SIZE_SPECIFIER"\n",
+               "Data packet length %d exceeds the in_buffer size %zu\n",
                len, sizeof(mms->in_buffer));
         return AVERROR(EIO);
     }
@@ -193,7 +193,7 @@ static int get_http_header_data(MMSHContext *mmsh)
             if (len) {
                 if (len > sizeof(mms->in_buffer)) {
                     av_log(NULL, AV_LOG_ERROR,
-                           "Other packet len = %d exceed the in_buffer size %"SIZE_SPECIFIER"\n",
+                           "Other packet len = %d exceed the in_buffer size %zu\n",
                            len, sizeof(mms->in_buffer));
                     return AVERROR(EIO);
                 }
diff --git a/libavformat/mmst.c b/libavformat/mmst.c
index e68606086e..59d3f9726e 100644
--- a/libavformat/mmst.c
+++ b/libavformat/mmst.c
@@ -284,7 +284,7 @@ static MMSSCPacketType get_tcp_server_response(MMSTContext *mmst)
             if (length_remaining < 0
                 || length_remaining > sizeof(mms->in_buffer) - 12) {
                 av_log(mms->mms_hd, AV_LOG_ERROR,
-                       "Incoming packet length %d exceeds bufsize %"SIZE_SPECIFIER"\n",
+                       "Incoming packet length %d exceeds bufsize %zu\n",
                        length_remaining, sizeof(mms->in_buffer) - 12);
                 return AVERROR_INVALIDDATA;
             }
@@ -320,7 +320,7 @@ static MMSSCPacketType get_tcp_server_response(MMSTContext *mmst)
             if (length_remaining < 0
                 || length_remaining > sizeof(mms->in_buffer) - 8) {
                 av_log(mms->mms_hd, AV_LOG_ERROR,
-                       "Data length %d is invalid or too large (max=%"SIZE_SPECIFIER")\n",
+                       "Data length %d is invalid or too large (max=%zu)\n",
                        length_remaining, sizeof(mms->in_buffer));
                 return AVERROR_INVALIDDATA;
             }
diff --git a/libavformat/mpjpegdec.c b/libavformat/mpjpegdec.c
index 125b17585e..3b47bccf9d 100644
--- a/libavformat/mpjpegdec.c
+++ b/libavformat/mpjpegdec.c
@@ -193,7 +193,7 @@ static int parse_multipart_header(AVIOContext *pb,
         if (log_ctx)
         av_log(log_ctx,
             AV_LOG_ERROR,
-            "Expected boundary '%s' not found, instead found a line of %"SIZE_SPECIFIER" bytes\n",
+            "Expected boundary '%s' not found, instead found a line of %zu bytes\n",
             expected_boundary,
             strlen(line));
 
diff --git a/libavformat/oggparsevorbis.c b/libavformat/oggparsevorbis.c
index 7c4f7624f8..ed81a431f6 100644
--- a/libavformat/oggparsevorbis.c
+++ b/libavformat/oggparsevorbis.c
@@ -184,7 +184,7 @@ int ff_vorbis_comment(AVFormatContext *as, AVDictionary **m,
 
     if (p != end)
         av_log(as, AV_LOG_INFO,
-               "%"PTRDIFF_SPECIFIER" bytes of comment header remain\n", end - p);
+               "%td bytes of comment header remain\n", end - p);
     if (n > 0)
         av_log(as, AV_LOG_INFO,
                "truncated comment header, %i comments not found\n", n);
diff --git a/libavformat/rtpdec_xiph.c b/libavformat/rtpdec_xiph.c
index 95f4bdf3a1..23924f2363 100644
--- a/libavformat/rtpdec_xiph.c
+++ b/libavformat/rtpdec_xiph.c
@@ -234,7 +234,7 @@ parse_packed_headers(AVFormatContext *s,
 
     if (packed_headers_end - packed_headers < 9) {
         av_log(s, AV_LOG_ERROR,
-               "Invalid %"PTRDIFF_SPECIFIER" byte packed header.",
+               "Invalid %td byte packed header.",
                packed_headers_end - packed_headers);
         return AVERROR_INVALIDDATA;
     }
@@ -255,7 +255,7 @@ parse_packed_headers(AVFormatContext *s,
     if (packed_headers_end - packed_headers != length ||
         length1 > length || length2 > length - length1) {
         av_log(s, AV_LOG_ERROR,
-               "Bad packed header lengths (%d,%d,%"PTRDIFF_SPECIFIER",%u)\n", length1,
+               "Bad packed header lengths (%d,%d,%td,%u)\n", length1,
                length2, packed_headers_end - packed_headers, length);
         return AVERROR_INVALIDDATA;
     }
diff --git a/libavformat/sdp.c b/libavformat/sdp.c
index 21ada5d1ce..f6798f4bd9 100644
--- a/libavformat/sdp.c
+++ b/libavformat/sdp.c
@@ -235,7 +235,7 @@ static int extradata2psets(AVFormatContext *s, const AVCodecParameters *par,
             sps_end = r1;
         }
         if (!av_base64_encode(p, MAX_PSET_SIZE - (p - psets), r, r1 - r)) {
-            av_log(s, AV_LOG_ERROR, "Cannot Base64-encode %"PTRDIFF_SPECIFIER" %"PTRDIFF_SPECIFIER"!\n",
+            av_log(s, AV_LOG_ERROR, "Cannot Base64-encode %td %td!\n",
                    MAX_PSET_SIZE - (p - psets), r1 - r);
 fail_in_loop:
             av_free(psets);
diff --git a/libavformat/supenc.c b/libavformat/supenc.c
index ebdfc7c939..c664723bc4 100644
--- a/libavformat/supenc.c
+++ b/libavformat/supenc.c
@@ -47,7 +47,7 @@ static int sup_write_packet(AVFormatContext *s, AVPacket *pkt)
         size_t len = AV_RB16(data + 1) + 3;
 
         if (len > size) {
-            av_log(s, AV_LOG_ERROR, "Not enough data, skipping %"SIZE_SPECIFIER" bytes\n",
+            av_log(s, AV_LOG_ERROR, "Not enough data, skipping %zu bytes\n",
                    size);
             return AVERROR_INVALIDDATA;
         }
@@ -64,7 +64,7 @@ static int sup_write_packet(AVFormatContext *s, AVPacket *pkt)
     }
 
     if (size > 0) {
-        av_log(s, AV_LOG_ERROR, "Skipping %"SIZE_SPECIFIER" bytes after last segment in frame\n",
+        av_log(s, AV_LOG_ERROR, "Skipping %zu bytes after last segment in frame\n",
                size);
         return AVERROR_INVALIDDATA;
     }
diff --git a/libavutil/hwcontext_opencl.c b/libavutil/hwcontext_opencl.c
index 46424eb52a..03f46238b2 100644
--- a/libavutil/hwcontext_opencl.c
+++ b/libavutil/hwcontext_opencl.c
@@ -2747,7 +2747,7 @@ static int opencl_map_from_drm_arm(AVHWFramesContext *dst_fc, AVFrame *dst,
                               &fd, desc->objects[i].size, &cle);
         if (!mapping->object_buffers[i]) {
             av_log(dst_fc, AV_LOG_ERROR, "Failed to create CL buffer "
-                   "from object %d (fd %d, size %"SIZE_SPECIFIER") of DRM frame: %d.\n",
+                   "from object %d (fd %d, size %zu) of DRM frame: %d.\n",
                    i, fd, desc->objects[i].size, cle);
             err = AVERROR(EIO);
             goto fail;
diff --git a/libavutil/hwcontext_vulkan.c b/libavutil/hwcontext_vulkan.c
index 89bf39bb3f..ebbb175b6e 100644
--- a/libavutil/hwcontext_vulkan.c
+++ b/libavutil/hwcontext_vulkan.c
@@ -1890,7 +1890,7 @@ static int vulkan_device_init(AVHWDeviceContext *ctx)
     av_log(ctx, AV_LOG_VERBOSE, "Alignments:\n");
     av_log(ctx, AV_LOG_VERBOSE, "    optimalBufferCopyRowPitchAlignment: %"PRIu64"\n",
            p->props.properties.limits.optimalBufferCopyRowPitchAlignment);
-    av_log(ctx, AV_LOG_VERBOSE, "    minMemoryMapAlignment:              %"SIZE_SPECIFIER"\n",
+    av_log(ctx, AV_LOG_VERBOSE, "    minMemoryMapAlignment:              %zu\n",
            p->props.properties.limits.minMemoryMapAlignment);
     av_log(ctx, AV_LOG_VERBOSE, "    nonCoherentAtomSize:                %"PRIu64"\n",
            p->props.properties.limits.nonCoherentAtomSize);
diff --git a/libavutil/internal.h b/libavutil/internal.h
index 10ec26e9a0..2a85170795 100644
--- a/libavutil/internal.h
+++ b/libavutil/internal.h
@@ -115,9 +115,6 @@ void avpriv_report_missing_feature(void *avc,
 void avpriv_request_sample(void *avc,
                            const char *msg, ...) av_printf_format(2, 3);
 
-#define PTRDIFF_SPECIFIER "td"
-#define SIZE_SPECIFIER "zu"
-
 #ifdef DEBUG
 #   define ff_dlog(ctx, ...) av_log(ctx, AV_LOG_DEBUG, __VA_ARGS__)
 #else
diff --git a/libavutil/tests/channel_layout.c b/libavutil/tests/channel_layout.c
index e79e4c3611..9185547eaa 100644
--- a/libavutil/tests/channel_layout.c
+++ b/libavutil/tests/channel_layout.c
@@ -25,7 +25,6 @@
 #include "libavutil/bprint.h"
 #include "libavutil/channel_layout.h"
 #include "libavutil/error.h"
-#include "libavutil/internal.h"
 #include "libavutil/macros.h"
 #include "libavutil/mem.h"
 
@@ -41,7 +40,7 @@
     func_name ## _bprint(BPRINT_ARGS ## ARG_ORDER((bp), __VA_ARGS__));     \
     if (strlen((bp)->str) != (bp)->len) {                                  \
         printf("strlen of AVBPrint-string returned by "#func_name"_bprint" \
-               " differs from AVBPrint.len: %"SIZE_SPECIFIER" vs. %u\n",   \
+               " differs from AVBPrint.len: %zu vs. %u\n",                 \
                strlen((bp)->str), (bp)->len);                              \
         break;                                                             \
     }                                                                      \
diff --git a/libavutil/tests/imgutils.c b/libavutil/tests/imgutils.c
index cd363145ad..39684795d4 100644
--- a/libavutil/tests/imgutils.c
+++ b/libavutil/tests/imgutils.c
@@ -44,13 +44,13 @@ static int check_image_fill(enum AVPixelFormat pix_fmt, int w, int h) {
     // Test the output of av_image_fill_plane_sizes()
     printf(", plane_sizes:");
     for (i = 0; i < 4; i++)
-        printf(" %5"SIZE_SPECIFIER, sizes[i]);
+        printf(" %5zu", sizes[i]);
     // Test the output of av_image_fill_pointers()
     for (i = 0; i < 3 && data[i + 1]; i++)
         offsets[i] = data[i + 1] - data[i];
     printf(", plane_offsets:");
     for (i = 0; i < 3; i++)
-        printf(" %5"PTRDIFF_SPECIFIER, offsets[i]);
+        printf(" %5td", offsets[i]);
     printf(", total_size: %d", total_size);
 
     return 0;
diff --git a/libavutil/tests/side_data_array.c b/libavutil/tests/side_data_array.c
index 633e9ee681..fcc6f5e8d0 100644
--- a/libavutil/tests/side_data_array.c
+++ b/libavutil/tests/side_data_array.c
@@ -27,7 +27,7 @@ static void print_entries(const AVFrameSideData **sd, const int nb_sd)
     for (int i = 0; i < nb_sd; i++) {
         const AVFrameSideData *entry = sd[i];
 
-        printf("sd %d (size %"SIZE_SPECIFIER"), %s",
+        printf("sd %d (size %zu), %s",
                i, entry->size, av_frame_side_data_name(entry->type));
 
         if (entry->type != AV_FRAME_DATA_SEI_UNREGISTERED) {
diff --git a/libavutil/vulkan.c b/libavutil/vulkan.c
index c3d8c1d0d9..3015a536c2 100644
--- a/libavutil/vulkan.c
+++ b/libavutil/vulkan.c
@@ -1066,7 +1066,7 @@ int ff_vk_create_buf(FFVulkanContext *s, FFVkBuffer *buf, size_t size,
         .pNext = &ded_req,
     };
 
-    av_log(s, AV_LOG_DEBUG, "Creating a buffer of %"SIZE_SPECIFIER" bytes, "
+    av_log(s, AV_LOG_DEBUG, "Creating a buffer of %zu bytes, "
                             "usage: 0x%x, flags: 0x%x\n",
            size, usage, flags);
 
-- 
2.52.0


From b7d0343d8d8a8d06a88c9f19f8de98912e72fec7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Martin=20Storsj=C3=B6?= <martin@martin.st>
Date: Wed, 3 Dec 2025 13:49:39 +0200
Subject: [PATCH 183/304] avcodec/{arm,neon}/mpegvideo: Readd a missed
 initialization

This was accidentally removed in
357fc5243c32300bba91c096488e86558beed4c8.

This fixes test failures when built with Clang and MSVC;
surprisingly, the checkasm test did seem to pass when built with
GCC. Clang and MSVC also warn about the use of the uninitialized
variable, while GCC didn't.
---
 libavcodec/neon/mpegvideo.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libavcodec/neon/mpegvideo.c b/libavcodec/neon/mpegvideo.c
index 44e9b70303..45ac36df18 100644
--- a/libavcodec/neon/mpegvideo.c
+++ b/libavcodec/neon/mpegvideo.c
@@ -41,6 +41,8 @@ static void inline ff_dct_unquantize_h263_neon(int qscale, int qadd, int nCoeffs
     int16x8_t q14s16, q15s16, qzs16;
     uint16x8_t q1u16, q9u16;
 
+    qzs16 = vdupq_n_s16(0);
+
     q15s16 = vdupq_n_s16(qscale << 1);
     q14s16 = vdupq_n_s16(qadd);
     q13s16 = vnegq_s16(q14s16);
-- 
2.52.0


From 3e89fd9c16fbc9da3973418d8415652ce828b02f Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Wed, 3 Dec 2025 13:05:58 +0100
Subject: [PATCH 184/304] tests/checkasm/mpegvideo_unquantize: Add missing
 const

Fixes this test under UBSan:
runtime error: call to function dct_unquantize_mpeg1_intra_c through pointer to incorrect function type 'void (*)(struct MpegEncContext *, short *, int, int)'
I don't know how I could forget this.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 tests/checkasm/mpegvideo_unquantize.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/checkasm/mpegvideo_unquantize.c b/tests/checkasm/mpegvideo_unquantize.c
index 220a743a96..60c61b217b 100644
--- a/tests/checkasm/mpegvideo_unquantize.c
+++ b/tests/checkasm/mpegvideo_unquantize.c
@@ -215,11 +215,11 @@ void checkasm_check_mpegvideo_unquantize(void)
     int q_scale_type = rnd() & 1;
 
     ff_mpv_unquantize_init(&unquant_dsp_ctx, 1 /* bitexact */, q_scale_type);
-    declare_func(void, MPVContext *s, int16_t *block, int n, int qscale);
+    declare_func(void, const MPVContext *s, int16_t *block, int n, int qscale);
 
     for (size_t i = 0; i < FF_ARRAY_ELEMS(tests); ++i) {
-        void (*func)(MPVContext *s, int16_t *block, int n, int qscale) =
-            *(void (**)(MPVContext *, int16_t *, int, int))((char*)&unquant_dsp_ctx + tests[i].offset);
+        void (*func)(const MPVContext *s, int16_t *block, int n, int qscale) =
+            *(void (**)(const MPVContext *, int16_t *, int, int))((char*)&unquant_dsp_ctx + tests[i].offset);
         if (check_func(func, "%s", tests[i].name)) {
             MPVContext new, ref;
             DECLARE_ALIGNED(16, int16_t, block_new)[64];
-- 
2.52.0


From ff7c3762f3c08abf97a39d2375be63555177d2f7 Mon Sep 17 00:00:00 2001
From: Oliver Chang <ochang@google.com>
Date: Wed, 3 Dec 2025 02:57:43 +0000
Subject: [PATCH 185/304] libavcodec/prores_raw: Fix heap-buffer-overflow in
 decode_frame

Fixes a heap-buffer-overflow in `decode_frame` where `header_len` read
from the bitstream was not validated against the remaining bytes in the
input buffer (`gb`). This allowed `gb_hdr` to be initialized with a size
exceeding the actual packet data, leading to an out-of-bounds read.

The fix adds a check to ensure `bytestream2_get_bytes_left(&gb)` is
greater than or equal to `header_len - 2` before initializing `gb_hdr`.

Fixes: https://issues.oss-fuzz.com/issues/439711053
---
 libavcodec/prores_raw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/prores_raw.c b/libavcodec/prores_raw.c
index 01f1bbd2fb..8be566ed36 100644
--- a/libavcodec/prores_raw.c
+++ b/libavcodec/prores_raw.c
@@ -360,7 +360,7 @@ static int decode_frame(AVCodecContext *avctx,
         return AVERROR_INVALIDDATA;
 
     int header_len = bytestream2_get_be16(&gb);
-    if (header_len < 62)
+    if (header_len < 62 || bytestream2_get_bytes_left(&gb) < header_len - 2)
         return AVERROR_INVALIDDATA;
 
     GetByteContext gb_hdr;
-- 
2.52.0


From 1a995bc1a30edb44ee1f15ffe0824a71ca6b017e Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Wed, 3 Dec 2025 21:21:16 +0100
Subject: [PATCH 186/304] hwcontext_vulkan: fix compilation with older header
 versions

---
 libavutil/hwcontext_vulkan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavutil/hwcontext_vulkan.c b/libavutil/hwcontext_vulkan.c
index ebbb175b6e..2fc66a0cea 100644
--- a/libavutil/hwcontext_vulkan.c
+++ b/libavutil/hwcontext_vulkan.c
@@ -2569,7 +2569,7 @@ static int switch_layout_host(AVHWFramesContext *hwfc, FFVkExecPool *ectx,
     VkResult ret;
     VulkanDevicePriv *p = hwfc->device_ctx->hwctx;
     FFVulkanFunctions *vk = &p->vkctx.vkfn;
-    VkHostImageLayoutTransitionInfo layout_change[AV_NUM_DATA_POINTERS];
+    VkHostImageLayoutTransitionInfoEXT layout_change[AV_NUM_DATA_POINTERS];
     int nb_images = ff_vk_count_images(frame);
 
     VkImageLayout new_layout;
@@ -2585,7 +2585,7 @@ static int switch_layout_host(AVHWFramesContext *hwfc, FFVkExecPool *ectx,
         return AVERROR(ENOTSUP);
 
     for (i = 0; i < nb_images; i++) {
-        layout_change[i] = (VkHostImageLayoutTransitionInfo) {
+        layout_change[i] = (VkHostImageLayoutTransitionInfoEXT) {
             .sType = VK_STRUCTURE_TYPE_HOST_IMAGE_LAYOUT_TRANSITION_INFO,
             .image = frame->img[i],
             .oldLayout = frame->layout[i],
-- 
2.52.0


From df7e9825f8f96c2bf79a49045f666293fc3050c3 Mon Sep 17 00:00:00 2001
From: Jack Lau <jacklau1222gm@gmail.com>
Date: Tue, 18 Nov 2025 06:45:06 +0800
Subject: [PATCH 187/304] avfilter/vf_feedback: fix feedback block

Fix #20940

The feedback and its sub-filter both request frame
from each other, casuing block since 4440e499ba

The feedback should only request inputs[1] once
rather than continuously request frame cause blocking.

This patch add check whether feedback already request
inputs[1] via ff_outlink_frame_wanted(ctx->outputs[1]),
if true, then exit and waiting inputs[0] because it means
we need more frames input to proceed.

Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
---
 libavfilter/vf_feedback.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/libavfilter/vf_feedback.c b/libavfilter/vf_feedback.c
index 6667ddfd16..e3633d7d37 100644
--- a/libavfilter/vf_feedback.c
+++ b/libavfilter/vf_feedback.c
@@ -245,6 +245,10 @@ static int activate(AVFilterContext *ctx)
     }
 
     if (!s->feed || ctx->is_disabled) {
+        if (!ctx->is_disabled && ff_outlink_frame_wanted(ctx->outputs[1])) {
+            ff_inlink_request_frame(ctx->inputs[0]);
+            return 0;
+        }
         if (ff_outlink_frame_wanted(ctx->outputs[0])) {
             ff_inlink_request_frame(ctx->inputs[0]);
             if (!ctx->is_disabled)
-- 
2.52.0


From 5bb7a84831b97154638562e6d840c0d3dcf7d236 Mon Sep 17 00:00:00 2001
From: Jack Lau <jacklau1222gm@gmail.com>
Date: Tue, 18 Nov 2025 21:29:43 +0800
Subject: [PATCH 188/304] tests/fate/filter-video: add two feedback tests

- Add fate-filter-feedback-yadif

- add fate-filter-feedback-hflip

Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
---
 tests/fate/filter-video.mak          |  8 +++++++
 tests/ref/fate/filter-feedback-hflip | 30 ++++++++++++++++++++++++
 tests/ref/fate/filter-feedback-yadif | 35 ++++++++++++++++++++++++++++
 3 files changed, 73 insertions(+)
 create mode 100644 tests/ref/fate/filter-feedback-hflip
 create mode 100644 tests/ref/fate/filter-feedback-yadif

diff --git a/tests/fate/filter-video.mak b/tests/fate/filter-video.mak
index 087ba8d9cd..9f3c92a395 100644
--- a/tests/fate/filter-video.mak
+++ b/tests/fate/filter-video.mak
@@ -20,6 +20,11 @@ fate-filter-bwdif10: CMD = framecrc -ec 0 -flags bitexact -idct simple -i $(TARG
 
 FATE_FILTER_SAMPLES-yes += $(FATE_BWDIF-yes)
 
+FATE_FEEDBACK-$(call FILTERDEMDEC, FEEDBACK YADIF, MPEGTS, MPEG2VIDEO) += fate-filter-feedback-yadif
+fate-filter-feedback-yadif: CMD = framecrc -ec 0 -flags bitexact -idct simple -i $(TARGET_SAMPLES)/mpeg2/mpeg2_field_encoding.ts -frames:v 30 -vf "[in][yadifin]feedback=x=0:y=0:w=100:h=100[out][yadifout];[yadifout]yadif[yadifin]"
+
+FATE_FILTER_SAMPLES-yes += $(FATE_FEEDBACK-yes)
+
 FATE_YADIF-$(call FILTERDEMDEC, YADIF, MPEGTS, MPEG2VIDEO) += fate-filter-yadif-mode0 fate-filter-yadif-mode1
 fate-filter-yadif-mode0: CMD = framecrc -ec 0 -flags bitexact -idct simple -i $(TARGET_SAMPLES)/mpeg2/mpeg2_field_encoding.ts -frames:v 30 -vf yadif=0
 fate-filter-yadif-mode1: CMD = framecrc -ec 0 -flags bitexact -idct simple -i $(TARGET_SAMPLES)/mpeg2/mpeg2_field_encoding.ts -frames:v 59 -vf yadif=1
@@ -170,6 +175,9 @@ FATE_FILTER-$(call FILTERFRAMECRC, TESTSRC FORMAT CONCAT SCALE, LAVFI_INDEV FILE
 fate-filter-lavd-scalenorm: tests/data/filtergraphs/scalenorm
 fate-filter-lavd-scalenorm: CMD = framecrc -f lavfi -graph_file $(TARGET_PATH)/tests/data/filtergraphs/scalenorm -i dummy
 
+FATE_FILTER-$(call FILTERFRAMECRC, TESTSRC2 FEEDBACK HFLIP) += fate-filter-feedback-hflip
+fate-filter-feedback-hflip: CMD = framecrc -f lavfi -i testsrc2=d=1 -vf "[in][hflipin]feedback=x=0:y=0:w=100:h=100[out][hflipout];[hflipout]hflip[hflipin]"
+
 FATE_FILTER-$(call FILTERFRAMECRC, FRAMERATE TESTSRC2) += fate-filter-framerate-up fate-filter-framerate-down
 fate-filter-framerate-up: CMD = framecrc -lavfi testsrc2=r=2:d=10,framerate=fps=10 -t 1
 fate-filter-framerate-down: CMD = framecrc -lavfi testsrc2=r=2:d=10,framerate=fps=1 -t 1
diff --git a/tests/ref/fate/filter-feedback-hflip b/tests/ref/fate/filter-feedback-hflip
new file mode 100644
index 0000000000..0c004ee281
--- /dev/null
+++ b/tests/ref/fate/filter-feedback-hflip
@@ -0,0 +1,30 @@
+#tb 0: 1/25
+#media_type 0: video
+#codec_id 0: rawvideo
+#dimensions 0: 320x240
+#sar 0: 1/1
+0,          0,          0,        1,   115200, 0x441d0ff3
+0,          1,          1,        1,   115200, 0x5de73658
+0,          2,          2,        1,   115200, 0x1f727e03
+0,          3,          3,        1,   115200, 0xbf2fa0c4
+0,          4,          4,        1,   115200, 0xee4ed474
+0,          5,          5,        1,   115200, 0x0013f45b
+0,          6,          6,        1,   115200, 0x7c7cf929
+0,          7,          7,        1,   115200, 0xd4cdf824
+0,          8,          8,        1,   115200, 0x1e48fad9
+0,          9,          9,        1,   115200, 0xcbbff220
+0,         10,         10,        1,   115200, 0x1162fcc9
+0,         11,         11,        1,   115200, 0x742ff029
+0,         12,         12,        1,   115200, 0x9901f2e0
+0,         13,         13,        1,   115200, 0xf0a1f828
+0,         14,         14,        1,   115200, 0x585903b8
+0,         15,         15,        1,   115200, 0x2dbf15f6
+0,         16,         16,        1,   115200, 0x0fd011f5
+0,         17,         17,        1,   115200, 0x73ec12cf
+0,         18,         18,        1,   115200, 0x09c910e9
+0,         19,         19,        1,   115200, 0x90811025
+0,         20,         20,        1,   115200, 0xd2d721b1
+0,         21,         21,        1,   115200, 0xfaa309a0
+0,         22,         22,        1,   115200, 0x4c010776
+0,         23,         23,        1,   115200, 0x3560eaac
+0,         24,         24,        1,   115200, 0xeed2d55d
diff --git a/tests/ref/fate/filter-feedback-yadif b/tests/ref/fate/filter-feedback-yadif
new file mode 100644
index 0000000000..841b17a7cd
--- /dev/null
+++ b/tests/ref/fate/filter-feedback-yadif
@@ -0,0 +1,35 @@
+#tb 0: 1/25
+#media_type 0: video
+#codec_id 0: rawvideo
+#dimensions 0: 720x576
+#sar 0: 16/15
+0,          9,          9,        1,   622080, 0x694c9553
+0,         10,         10,        1,   622080, 0x213ec012
+0,         11,         11,        1,   622080, 0x25c5d006
+0,         12,         12,        1,   622080, 0x65b6207d
+0,         13,         13,        1,   622080, 0xa670b9eb
+0,         14,         14,        1,   622080, 0x1e01e2cf
+0,         15,         15,        1,   622080, 0xae927c11
+0,         16,         16,        1,   622080, 0xe05ae9f8
+0,         17,         17,        1,   622080, 0x93b5db69
+0,         18,         18,        1,   622080, 0x25295750
+0,         19,         19,        1,   622080, 0x82ebf126
+0,         20,         20,        1,   622080, 0x32aea022
+0,         21,         21,        1,   622080, 0x55a5851f
+0,         22,         22,        1,   622080, 0x69951dcc
+0,         23,         23,        1,   622080, 0xf4d9c6d0
+0,         24,         24,        1,   622080, 0x5178d1b1
+0,         25,         25,        1,   622080, 0x3fa221a3
+0,         26,         26,        1,   622080, 0x6f5044f4
+0,         27,         27,        1,   622080, 0x691c3234
+0,         28,         28,        1,   622080, 0xe67ac448
+0,         29,         29,        1,   622080, 0x7195350c
+0,         30,         30,        1,   622080, 0x1bb70e7b
+0,         31,         31,        1,   622080, 0xf089397d
+0,         32,         32,        1,   622080, 0x6e2363b6
+0,         33,         33,        1,   622080, 0x84c8548c
+0,         34,         34,        1,   622080, 0x812e5ea9
+0,         35,         35,        1,   622080, 0x7ba238c6
+0,         36,         36,        1,   622080, 0x740317f5
+0,         37,         37,        1,   622080, 0xf5b22ef6
+0,         38,         38,        1,   622080, 0x54e3b3f2
-- 
2.52.0


From e6edef9db03eeca7c0b66428bbbd6f0f9fa71e9e Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Wed, 3 Dec 2025 22:34:03 +0100
Subject: [PATCH 189/304] hwcontext_vulkan: fix final error to let old header
 files work

........
---
 libavutil/hwcontext_vulkan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavutil/hwcontext_vulkan.c b/libavutil/hwcontext_vulkan.c
index 2fc66a0cea..a120e6185c 100644
--- a/libavutil/hwcontext_vulkan.c
+++ b/libavutil/hwcontext_vulkan.c
@@ -2586,7 +2586,7 @@ static int switch_layout_host(AVHWFramesContext *hwfc, FFVkExecPool *ectx,
 
     for (i = 0; i < nb_images; i++) {
         layout_change[i] = (VkHostImageLayoutTransitionInfoEXT) {
-            .sType = VK_STRUCTURE_TYPE_HOST_IMAGE_LAYOUT_TRANSITION_INFO,
+            .sType = VK_STRUCTURE_TYPE_HOST_IMAGE_LAYOUT_TRANSITION_INFO_EXT,
             .image = frame->img[i],
             .oldLayout = frame->layout[i],
             .newLayout = new_layout,
-- 
2.52.0


From df191a8073040fbdfb36b0e13e4713edbcc23b40 Mon Sep 17 00:00:00 2001
From: wutno <aaron@installgentoo.net>
Date: Thu, 20 Nov 2025 16:19:49 -0500
Subject: [PATCH 190/304] avformat/xmv: Handle zero sized packet at end of file

Some XMVs introduce a blank packet at the end of the stream. Previously, we
didn't account for this and returned AVERROR_INVALIDDATA, indicating an issue
with the file. Instead, let's check for this and close out with AVERROR_EOF.
---
 libavformat/xmv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavformat/xmv.c b/libavformat/xmv.c
index c0b402860e..2c5e2bb475 100644
--- a/libavformat/xmv.c
+++ b/libavformat/xmv.c
@@ -412,7 +412,7 @@ static int xmv_fetch_new_packet(AVFormatContext *s)
     AVIOContext     *pb  = s->pb;
     int result;
 
-    if (xmv->this_packet_offset == xmv->next_packet_offset)
+    if (xmv->next_packet_size == 0)
         return AVERROR_EOF;
 
     /* Seek to it */
-- 
2.52.0


From c2048ba35c124a83e9d65ac7fddb3a87de8aed0f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 1 Dec 2025 21:09:34 +0100
Subject: [PATCH 191/304] avdevice/gdigrab: suppress int to pointer cast
 warning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fixes: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 libavdevice/gdigrab.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavdevice/gdigrab.c b/libavdevice/gdigrab.c
index 08a41c304b..fa9ba27db0 100644
--- a/libavdevice/gdigrab.c
+++ b/libavdevice/gdigrab.c
@@ -279,7 +279,7 @@ gdigrab_read_header(AVFormatContext *s1)
         char *p;
         name = filename + 5;
 
-        hwnd = (HWND) strtoull(name, &p, 0);
+        hwnd = (HWND)(intptr_t) strtoull(name, &p, 0);
 
         if (p == NULL || p == name || p[0] != '\0')
         {
-- 
2.52.0


From c6861585a4e90e935b343fabd9146c8db5c93eb8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 1 Dec 2025 21:21:25 +0100
Subject: [PATCH 192/304] fftools/cmdutils: use strcpy directly, the length is
 computed already
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

There is no need to scan for NULL, if we inject it ourselves.

Fixes: warning: 'strncat' specified bound 10 equals source length [-Wstringop-overflow=]
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 fftools/cmdutils.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fftools/cmdutils.c b/fftools/cmdutils.c
index e906d4506d..9a365c228d 100644
--- a/fftools/cmdutils.c
+++ b/fftools/cmdutils.c
@@ -962,8 +962,7 @@ FILE *get_preset_file(char *filename, size_t filename_size,
                     datadir, desired_size, sizeof *datadir);
                 if (new_datadir) {
                     datadir = new_datadir;
-                    datadir[datadir_len] = 0;
-                    strncat(datadir, "/ffpresets",  desired_size - 1 - datadir_len);
+                    strcpy(datadir + datadir_len, "/ffpresets");
                     base[2] = datadir;
                 }
             }
-- 
2.52.0


From 549d964e2b661263567e45e6c16395e115d646b5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 1 Dec 2025 21:30:26 +0100
Subject: [PATCH 193/304] avutil/vulkan: fix device memory size truncation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

size_t cannot fit VK_WHOLE_SIZE on 32-bit builds.

Fixes: warning: conversion from 'long long unsigned int' to 'size_t' {aka 'unsigned int'} changes value from '18446744073709551615' to '4294967295'

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 libavutil/vulkan.c | 2 +-
 libavutil/vulkan.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavutil/vulkan.c b/libavutil/vulkan.c
index 3015a536c2..7858e002ed 100644
--- a/libavutil/vulkan.c
+++ b/libavutil/vulkan.c
@@ -1168,7 +1168,7 @@ int ff_vk_map_buffers(FFVulkanContext *s, FFVkBuffer **buf, uint8_t *mem[],
 }
 
 int ff_vk_flush_buffer(FFVulkanContext *s, FFVkBuffer *buf,
-                       size_t offset, size_t mem_size,
+                       VkDeviceSize offset, VkDeviceSize mem_size,
                        int flush)
 {
     VkResult ret;
diff --git a/libavutil/vulkan.h b/libavutil/vulkan.h
index 868788c2a4..29116bcb2c 100644
--- a/libavutil/vulkan.h
+++ b/libavutil/vulkan.h
@@ -527,7 +527,7 @@ int ff_vk_create_buf(FFVulkanContext *s, FFVkBuffer *buf, size_t size,
  * Flush or invalidate a single buffer, with a given size and offset.
  */
 int ff_vk_flush_buffer(FFVulkanContext *s, FFVkBuffer *buf,
-                       size_t offset, size_t mem_size,
+                       VkDeviceSize offset, VkDeviceSize mem_size,
                        int flush);
 
 /**
-- 
2.52.0


From 5e6794ca7d9811d0b2007d4d8d4f1ea2e6b62274 Mon Sep 17 00:00:00 2001
From: Michael Niedermayer <michael@niedermayer.cc>
Date: Tue, 2 Dec 2025 04:36:47 +0100
Subject: [PATCH 194/304] Revert "avformat/rawdec: set framerate in codec
 parameters"

Fixes single image videos
this works and creates our single image video
./ffmpeg -i lena.pnm /tmp/file.m2v

this fails after 3d96d83a0a009a3bf566b208b14a7d264f17263b:
./ffmpeg -i /tmp/file.m2v /tmp/file.jpg -y

This reverts commit 3d96d83a0a009a3bf566b208b14a7d264f17263b.
---
 libavformat/rawdec.c                          |   2 +-
 tests/ref/fate/cavs-demux                     |   2 +-
 tests/ref/fate/enhanced-flv-hevc-hdr10        |   4 +-
 tests/ref/fate/h264-bsf-dts2pts               | 102 +++++++++---------
 .../ref/fate/h264-conformance-cvpcmnl2_sva_c  |   6 +-
 tests/ref/fate/hevc-bsf-dts2pts-cra           |  68 ++++++------
 tests/ref/fate/hevc-bsf-dts2pts-idr           |  38 +++----
 tests/ref/fate/hevc-bsf-dts2pts-idr-cra       |  38 +++----
 tests/ref/lavf-fate/hevc.mp4                  |   2 +-
 9 files changed, 131 insertions(+), 131 deletions(-)

diff --git a/libavformat/rawdec.c b/libavformat/rawdec.c
index 5cf2764a0d..d0c829dc42 100644
--- a/libavformat/rawdec.c
+++ b/libavformat/rawdec.c
@@ -83,9 +83,9 @@ int ff_raw_video_read_header(AVFormatContext *s)
 
     st->codecpar->codec_type = AVMEDIA_TYPE_VIDEO;
     st->codecpar->codec_id = ffifmt(s->iformat)->raw_codec_id;
-    st->codecpar->framerate = s1->framerate;
     sti->need_parsing = AVSTREAM_PARSE_FULL_RAW;
 
+    st->avg_frame_rate = s1->framerate;
     avpriv_set_pts_info(st, 64, 1, 1200000);
 
 fail:
diff --git a/tests/ref/fate/cavs-demux b/tests/ref/fate/cavs-demux
index eb16eb1f9d..c4847293ab 100644
--- a/tests/ref/fate/cavs-demux
+++ b/tests/ref/fate/cavs-demux
@@ -58,5 +58,5 @@ packet|codec_type=video|stream_index=0|pts=2240000|pts_time=1.866667|dts=2240000
 packet|codec_type=video|stream_index=0|pts=2280000|pts_time=1.900000|dts=2280000|dts_time=1.900000|duration=40000|duration_time=0.033333|size=67|pos=172185|flags=___|data_hash=CRC32:42484449
 packet|codec_type=video|stream_index=0|pts=2320000|pts_time=1.933333|dts=2320000|dts_time=1.933333|duration=40000|duration_time=0.033333|size=83|pos=172252|flags=___|data_hash=CRC32:a941bdf0
 packet|codec_type=video|stream_index=0|pts=2360000|pts_time=1.966667|dts=2360000|dts_time=1.966667|duration=40000|duration_time=0.033333|size=5417|pos=172335|flags=___|data_hash=CRC32:9d0d503b
-stream|index=0|codec_name=cavs|profile=unknown|codec_type=video|codec_tag_string=[0][0][0][0]|codec_tag=0x0000|width=1280|height=720|coded_width=1280|coded_height=720|has_b_frames=0|sample_aspect_ratio=N/A|display_aspect_ratio=N/A|pix_fmt=yuv420p|level=-99|color_range=unknown|color_space=unknown|color_transfer=unknown|color_primaries=unknown|chroma_location=unspecified|field_order=unknown|refs=1|id=N/A|r_frame_rate=30/1|avg_frame_rate=30/1|time_base=1/1200000|start_pts=N/A|start_time=N/A|duration_ts=N/A|duration=N/A|bit_rate=N/A|max_bit_rate=N/A|bits_per_raw_sample=N/A|nb_frames=N/A|nb_read_frames=N/A|nb_read_packets=60|extradata_size=18|extradata_hash=CRC32:1255d52e|disposition:default=0|disposition:dub=0|disposition:original=0|disposition:comment=0|disposition:lyrics=0|disposition:karaoke=0|disposition:forced=0|disposition:hearing_impaired=0|disposition:visual_impaired=0|disposition:clean_effects=0|disposition:attached_pic=0|disposition:timed_thumbnails=0|disposition:non_diegetic=0|disposition:captions=0|disposition:descriptions=0|disposition:metadata=0|disposition:dependent=0|disposition:still_image=0|disposition:multilayer=0
+stream|index=0|codec_name=cavs|profile=unknown|codec_type=video|codec_tag_string=[0][0][0][0]|codec_tag=0x0000|width=1280|height=720|coded_width=1280|coded_height=720|has_b_frames=0|sample_aspect_ratio=N/A|display_aspect_ratio=N/A|pix_fmt=yuv420p|level=-99|color_range=unknown|color_space=unknown|color_transfer=unknown|color_primaries=unknown|chroma_location=unspecified|field_order=unknown|refs=1|id=N/A|r_frame_rate=30/1|avg_frame_rate=25/1|time_base=1/1200000|start_pts=N/A|start_time=N/A|duration_ts=N/A|duration=N/A|bit_rate=N/A|max_bit_rate=N/A|bits_per_raw_sample=N/A|nb_frames=N/A|nb_read_frames=N/A|nb_read_packets=60|extradata_size=18|extradata_hash=CRC32:1255d52e|disposition:default=0|disposition:dub=0|disposition:original=0|disposition:comment=0|disposition:lyrics=0|disposition:karaoke=0|disposition:forced=0|disposition:hearing_impaired=0|disposition:visual_impaired=0|disposition:clean_effects=0|disposition:attached_pic=0|disposition:timed_thumbnails=0|disposition:non_diegetic=0|disposition:captions=0|disposition:descriptions=0|disposition:metadata=0|disposition:dependent=0|disposition:still_image=0|disposition:multilayer=0
 format|filename=bunny.mp4|nb_streams=1|nb_programs=0|nb_stream_groups=0|format_name=cavsvideo|start_time=N/A|duration=N/A|size=177752|bit_rate=N/A|probe_score=51
diff --git a/tests/ref/fate/enhanced-flv-hevc-hdr10 b/tests/ref/fate/enhanced-flv-hevc-hdr10
index bebcf84fab..525f056d66 100644
--- a/tests/ref/fate/enhanced-flv-hevc-hdr10
+++ b/tests/ref/fate/enhanced-flv-hevc-hdr10
@@ -4,7 +4,7 @@
 #codec_id 0: hevc
 #dimensions 0: 1280x720
 #sar 0: 0/1
-0,          0,          0,        0,    77718, 0xb59c83a5
+0,          0,          0,       40,    77718, 0xb59c83a5
 [FRAME]
 media_type=video
 stream_index=0
@@ -17,7 +17,7 @@ best_effort_timestamp=0
 best_effort_timestamp_time=0.000000
 duration=N/A
 duration_time=N/A
-pkt_pos=439
+pkt_pos=459
 pkt_size=77718
 width=1280
 height=720
diff --git a/tests/ref/fate/h264-bsf-dts2pts b/tests/ref/fate/h264-bsf-dts2pts
index 46c640eab8..f908bb44f5 100644
--- a/tests/ref/fate/h264-bsf-dts2pts
+++ b/tests/ref/fate/h264-bsf-dts2pts
@@ -1,5 +1,5 @@
-451601038b9091014e45660bc98e09dc *tests/data/fate/h264-bsf-dts2pts.mov
-244033 tests/data/fate/h264-bsf-dts2pts.mov
+219edd347ce3151f5b5579d300cd7179 *tests/data/fate/h264-bsf-dts2pts.mov
+243937 tests/data/fate/h264-bsf-dts2pts.mov
 #extradata 0:       26, 0x75e2093d
 #tb 0: 1/1200000
 #media_type 0: video
@@ -7,52 +7,52 @@
 #dimensions 0: 352x288
 #sar 0: 0/1
 0,     -48000,          0,    48000,    13686, 0x5ee9bd4c
-0,          0,     144000,    24000,     9320, 0x17224db1, F=0x0
-0,      24000,     168000,    24000,     8903, 0xe394918b, F=0x0
-0,      48000,      48000,    48000,    10108, 0x98418e7e, F=0x0
-0,      96000,      96000,    24000,     2937, 0x49dccb76, F=0x0
-0,     120000,     120000,    24000,     2604, 0xfc8013cd, F=0x0
-0,     144000,     288000,    24000,     7420, 0xcb4155cd, F=0x0
-0,     168000,     312000,    24000,     5664, 0x060bc948, F=0x0
-0,     192000,     192000,    48000,     4859, 0x0a5a8368, F=0x0
-0,     240000,     240000,    24000,     2883, 0xb9639a19, F=0x0
-0,     264000,     264000,    24000,     2547, 0xba95e99d, F=0x0
-0,     288000,     432000,    24000,     4659, 0x19203a0d, F=0x0
-0,     312000,     456000,    24000,     9719, 0xb500c328, F=0x0
-0,     336000,     336000,    48000,     5078, 0x5359c6b8, F=0x0
-0,     384000,     384000,    48000,     5041, 0x88dfcdf1, F=0x0
-0,     432000,     576000,    48000,     9494, 0x29297319, F=0x0
-0,     480000,     480000,    48000,     4772, 0x80273a60, F=0x0
-0,     528000,     528000,    24000,     3237, 0xd99e742c, F=0x0
-0,     552000,     552000,    24000,     2650, 0xc7cc378a, F=0x0
-0,     576000,     720000,    24000,     6519, 0x142aa357, F=0x0
-0,     600000,     744000,    24000,     5878, 0xe70d7e21, F=0x0
-0,     624000,     624000,    24000,     2648, 0xe58b1c4b, F=0x0
-0,     648000,     648000,    24000,     4522, 0x33ad0882, F=0x0
-0,     672000,     672000,    24000,     3246, 0xdbfa539f, F=0x0
-0,     696000,     696000,    24000,     3027, 0xdb5bf675, F=0x0
-0,     720000,     864000,    48000,     9282, 0x07973603, F=0x0
-0,     768000,     768000,    24000,     2786, 0x14824d92, F=0x0
-0,     792000,     792000,    24000,     2719, 0x00614eef, F=0x0
-0,     816000,     816000,    24000,     2627, 0xe8e91216, F=0x0
-0,     840000,     840000,    24000,     2720, 0xbe974fcc, F=0x0
-0,     864000,    1008000,    48000,     7687, 0x0de01895, F=0x0
-0,     912000,     912000,    48000,     5464, 0x113f954d, F=0x0
-0,     960000,     960000,    24000,     3482, 0x5c90cdae, F=0x0
-0,     984000,     984000,    24000,     2791, 0x4acb702a, F=0x0
-0,    1008000,    1152000,    24000,    11362, 0x13363bdb, F=0x0
-0,    1032000,    1176000,    24000,     2975, 0x99b1e813, F=0x0
-0,    1056000,    1056000,    24000,     2342, 0xe9587867, F=0x0
-0,    1080000,    1080000,    24000,     2634, 0x8d9814fc, F=0x0
-0,    1104000,    1104000,    24000,     2419, 0x033cbb5f, F=0x0
-0,    1128000,    1128000,    24000,     2498, 0x7dd9e476, F=0x0
-0,    1152000,    1296000,    24000,     2668, 0x358e2bd8, F=0x0
-0,    1176000,    1320000,    24000,     9068, 0x3a639927, F=0x0
-0,    1200000,    1200000,    48000,     4939, 0xa5309a8c, F=0x0
-0,    1248000,    1248000,    24000,     2650, 0x2ab82b97, F=0x0
-0,    1272000,    1272000,    24000,     2503, 0xfd97cd4c, F=0x0
-0,    1296000,    1440000,    48000,     5121, 0xaf88e5b8, F=0x0
-0,    1344000,    1344000,    24000,     2643, 0xa1791db0, F=0x0
-0,    1368000,    1368000,    24000,     2637, 0xe1a42510, F=0x0
-0,    1392000,    1392000,    24000,     2633, 0x08430f15, F=0x0
-0,    1416000,    1416000,    24000,     2721, 0xe6756990, F=0x0
+0,          0,     240000,    48000,     9320, 0x17224db1, F=0x0
+0,      48000,     288000,    48000,     8903, 0xe394918b, F=0x0
+0,      96000,      96000,    48000,    10108, 0x98418e7e, F=0x0
+0,     144000,     144000,    48000,     2937, 0x49dccb76, F=0x0
+0,     192000,     192000,    48000,     2604, 0xfc8013cd, F=0x0
+0,     240000,     480000,    48000,     7420, 0xcb4155cd, F=0x0
+0,     288000,     528000,    48000,     5664, 0x060bc948, F=0x0
+0,     336000,     336000,    48000,     4859, 0x0a5a8368, F=0x0
+0,     384000,     384000,    48000,     2883, 0xb9639a19, F=0x0
+0,     432000,     432000,    48000,     2547, 0xba95e99d, F=0x0
+0,     480000,     672000,    48000,     4659, 0x19203a0d, F=0x0
+0,     528000,     696000,    48000,     9719, 0xb500c328, F=0x0
+0,     576000,     576000,    48000,     5078, 0x5359c6b8, F=0x0
+0,     624000,     624000,    48000,     5041, 0x88dfcdf1, F=0x0
+0,     672000,     864000,    48000,     9494, 0x29297319, F=0x0
+0,     720000,     720000,    48000,     4772, 0x80273a60, F=0x0
+0,     768000,     768000,    48000,     3237, 0xd99e742c, F=0x0
+0,     816000,     816000,    48000,     2650, 0xc7cc378a, F=0x0
+0,     864000,    1152000,    48000,     6519, 0x142aa357, F=0x0
+0,     912000,    1176000,    48000,     5878, 0xe70d7e21, F=0x0
+0,     960000,     960000,    48000,     2648, 0xe58b1c4b, F=0x0
+0,    1008000,    1008000,    48000,     4522, 0x33ad0882, F=0x0
+0,    1056000,    1056000,    48000,     3246, 0xdbfa539f, F=0x0
+0,    1104000,    1104000,    48000,     3027, 0xdb5bf675, F=0x0
+0,    1152000,    1392000,    48000,     9282, 0x07973603, F=0x0
+0,    1200000,    1200000,    48000,     2786, 0x14824d92, F=0x0
+0,    1248000,    1248000,    48000,     2719, 0x00614eef, F=0x0
+0,    1296000,    1296000,    48000,     2627, 0xe8e91216, F=0x0
+0,    1344000,    1344000,    48000,     2720, 0xbe974fcc, F=0x0
+0,    1392000,    1584000,    48000,     7687, 0x0de01895, F=0x0
+0,    1440000,    1440000,    48000,     5464, 0x113f954d, F=0x0
+0,    1488000,    1488000,    48000,     3482, 0x5c90cdae, F=0x0
+0,    1536000,    1536000,    48000,     2791, 0x4acb702a, F=0x0
+0,    1584000,    1872000,    48000,    11362, 0x13363bdb, F=0x0
+0,    1632000,    1920000,    48000,     2975, 0x99b1e813, F=0x0
+0,    1680000,    1680000,    48000,     2342, 0xe9587867, F=0x0
+0,    1728000,    1728000,    48000,     2634, 0x8d9814fc, F=0x0
+0,    1776000,    1776000,    48000,     2419, 0x033cbb5f, F=0x0
+0,    1824000,    1824000,    48000,     2498, 0x7dd9e476, F=0x0
+0,    1872000,    2112000,    48000,     2668, 0x358e2bd8, F=0x0
+0,    1920000,    2136000,    48000,     9068, 0x3a639927, F=0x0
+0,    1968000,    1968000,    48000,     4939, 0xa5309a8c, F=0x0
+0,    2016000,    2016000,    48000,     2650, 0x2ab82b97, F=0x0
+0,    2064000,    2064000,    48000,     2503, 0xfd97cd4c, F=0x0
+0,    2112000,    2352000,    48000,     5121, 0xaf88e5b8, F=0x0
+0,    2160000,    2160000,    48000,     2643, 0xa1791db0, F=0x0
+0,    2208000,    2208000,    48000,     2637, 0xe1a42510, F=0x0
+0,    2256000,    2256000,    48000,     2633, 0x08430f15, F=0x0
+0,    2304000,    2304000,    48000,     2721, 0xe6756990, F=0x0
diff --git a/tests/ref/fate/h264-conformance-cvpcmnl2_sva_c b/tests/ref/fate/h264-conformance-cvpcmnl2_sva_c
index 0a0795df41..0303bc24e6 100644
--- a/tests/ref/fate/h264-conformance-cvpcmnl2_sva_c
+++ b/tests/ref/fate/h264-conformance-cvpcmnl2_sva_c
@@ -1,7 +1,7 @@
-#tb 0: 1/50
+#tb 0: 1/25
 #media_type 0: video
 #codec_id 0: rawvideo
 #dimensions 0: 1280x720
 #sar 0: 0/1
-0,          0,          0,        2,  1382400, 0xccbe6bf8
-0,          2,          2,        2,  1382400, 0x49c0cfd7
+0,          0,          0,        1,  1382400, 0xccbe6bf8
+0,          1,          1,        1,  1382400, 0x49c0cfd7
diff --git a/tests/ref/fate/hevc-bsf-dts2pts-cra b/tests/ref/fate/hevc-bsf-dts2pts-cra
index e50aad6f70..4e9e2c5114 100644
--- a/tests/ref/fate/hevc-bsf-dts2pts-cra
+++ b/tests/ref/fate/hevc-bsf-dts2pts-cra
@@ -1,5 +1,5 @@
-f6102684fbf13aa624e6edece38c83f2 *tests/data/fate/hevc-bsf-dts2pts-cra.mov
-102971 tests/data/fate/hevc-bsf-dts2pts-cra.mov
+c3c00fdc637a19fa3d23d37d9974d28d *tests/data/fate/hevc-bsf-dts2pts-cra.mov
+103067 tests/data/fate/hevc-bsf-dts2pts-cra.mov
 #extradata 0:      118, 0x25f51994
 #tb 0: 1/1200000
 #media_type 0: video
@@ -54,35 +54,35 @@ f6102684fbf13aa624e6edece38c83f2 *tests/data/fate/hevc-bsf-dts2pts-cra.mov
 0,    2016000,    2160000,    48000,      892, 0x0d2ab3bc, F=0x0
 0,    2064000,    2112000,    48000,      238, 0x45827561, F=0x0
 0,    2112000,    2208000,    48000,      281, 0x2a3a8e61, F=0x0
-0,    2160000,    2640000,    48000,     4629, 0xf2e0fb0f, F=0x0
-0,    2208000,    2448000,    48000,     1453, 0x6ae5dc98, F=0x0
-0,    2256000,    2352000,    48000,      869, 0x3982ae69, F=0x0
-0,    2304000,    2304000,    48000,      282, 0xd9e28960, F=0x0
-0,    2352000,    2400000,    48000,      259, 0x253a809d, F=0x0
-0,    2400000,    2544000,    48000,      835, 0x83499f30, F=0x0
-0,    2448000,    2496000,    48000,      255, 0xa77b7690, F=0x0
-0,    2496000,    2592000,    48000,      242, 0x83977ccf, F=0x0
-0,    2544000,    3024000,    48000,     5082, 0xba55ee51, F=0x0
-0,    2592000,    2832000,    48000,     1393, 0xc998b442, F=0x0
-0,    2640000,    2736000,    48000,      742, 0x91ab75d2, F=0x0
-0,    2688000,    2688000,    48000,      229, 0xfa326d98, F=0x0
-0,    2736000,    2784000,    48000,      275, 0x49c38226, F=0x0
-0,    2784000,    2928000,    48000,      869, 0xdd05acc4, F=0x0
-0,    2832000,    2880000,    48000,      293, 0xcc9e904f, F=0x0
-0,    2880000,    2976000,    48000,      334, 0x212aa4b1, F=0x0
-0,    2928000,    3408000,    48000,     8539, 0xcccc9eb1
-0,    2976000,    3216000,    48000,     1593, 0x5a351a68, F=0x0
-0,    3024000,    3120000,    48000,     1042, 0xb77d00cc, F=0x0
-0,    3072000,    3072000,    48000,      302, 0xbcdb9750, F=0x0
-0,    3120000,    3168000,    48000,      336, 0xc7b0a55d, F=0x0
-0,    3168000,    3312000,    48000,      875, 0x7e31b046, F=0x0
-0,    3216000,    3264000,    48000,      401, 0xb473bca8, F=0x0
-0,    3264000,    3360000,    48000,      246, 0x43357263, F=0x0
-0,    3312000,    3792000,    48000,     3254, 0x8be44a2d, F=0x0
-0,    3360000,    3600000,    48000,     1151, 0x29d52d14, F=0x0
-0,    3408000,    3504000,    48000,      733, 0x33606982, F=0x0
-0,    3456000,    3456000,    48000,      234, 0xb70a79ff, F=0x0
-0,    3504000,    3552000,    48000,      228, 0x86916848, F=0x0
-0,    3552000,    3696000,    48000,      689, 0xcca34b40, F=0x0
-0,    3600000,    3648000,    48000,      223, 0xa96f6e31, F=0x0
-0,    3648000,    3744000,    48000,      241, 0x7ac17531, F=0x0
+0,    2160000,    2256010,    48000,     4629, 0xf2e0fb0f, F=0x0
+0,    2208000,    2256005,    48000,     1453, 0x6ae5dc98, F=0x0
+0,    2256000,    2256002,        1,      869, 0x3982ae69, F=0x0
+0,    2256001,    2256001,        1,      282, 0xd9e28960, F=0x0
+0,    2256002,    2256004,        2,      259, 0x253a809d, F=0x0
+0,    2256004,    2256007,        1,      835, 0x83499f30, F=0x0
+0,    2256005,    2256006,        1,      255, 0xa77b7690, F=0x0
+0,    2256006,    2256008,        1,      242, 0x83977ccf, F=0x0
+0,    2256007,    2256019,        1,     5082, 0xba55ee51, F=0x0
+0,    2256008,    2256014,        2,     1393, 0xc998b442, F=0x0
+0,    2256010,    2256012,        1,      742, 0x91ab75d2, F=0x0
+0,    2256011,    2256011,        1,      229, 0xfa326d98, F=0x0
+0,    2256012,    2256013,        1,      275, 0x49c38226, F=0x0
+0,    2256013,    2256017,        1,      869, 0xdd05acc4, F=0x0
+0,    2256014,    2256016,        2,      293, 0xcc9e904f, F=0x0
+0,    2256016,    2256018,        1,      334, 0x212aa4b1, F=0x0
+0,    2256017,    2256029,        1,     8539, 0xcccc9eb1
+0,    2256018,    2256024,        1,     1593, 0x5a351a68, F=0x0
+0,    2256019,    2256022,        1,     1042, 0xb77d00cc, F=0x0
+0,    2256020,    2256020,        2,      302, 0xbcdb9750, F=0x0
+0,    2256022,    2256023,        1,      336, 0xc7b0a55d, F=0x0
+0,    2256023,    2256026,        1,      875, 0x7e31b046, F=0x0
+0,    2256024,    2256025,        1,      401, 0xb473bca8, F=0x0
+0,    2256025,    2256028,        1,      246, 0x43357263, F=0x0
+0,    2256026,    2256038,        2,     3254, 0x8be44a2d, F=0x0
+0,    2256028,    2256034,        1,     1151, 0x29d52d14, F=0x0
+0,    2256029,    2256031,        1,      733, 0x33606982, F=0x0
+0,    2256030,    2256030,        1,      234, 0xb70a79ff, F=0x0
+0,    2256031,    2256032,        1,      228, 0x86916848, F=0x0
+0,    2256032,    2256036,        2,      689, 0xcca34b40, F=0x0
+0,    2256034,    2256035,        1,      223, 0xa96f6e31, F=0x0
+0,    2256035,    2256037,        1,      241, 0x7ac17531, F=0x0
diff --git a/tests/ref/fate/hevc-bsf-dts2pts-idr b/tests/ref/fate/hevc-bsf-dts2pts-idr
index adfbe75987..9568a5932c 100644
--- a/tests/ref/fate/hevc-bsf-dts2pts-idr
+++ b/tests/ref/fate/hevc-bsf-dts2pts-idr
@@ -1,5 +1,5 @@
-8f02d31fece6a067a8488d8ca6ee3329 *tests/data/fate/hevc-bsf-dts2pts-idr.mov
-346579 tests/data/fate/hevc-bsf-dts2pts-idr.mov
+368d177821450241820bf3507d74b35a *tests/data/fate/hevc-bsf-dts2pts-idr.mov
+346603 tests/data/fate/hevc-bsf-dts2pts-idr.mov
 #extradata 0:      699, 0x9c810c10
 #tb 0: 1/1200000
 #media_type 0: video
@@ -47,7 +47,7 @@
 0,    1680000,    1824000,    48000,     2208, 0x8e2146e8, F=0x0
 0,    1728000,    1776000,    48000,      900, 0x669fb97b, F=0x0
 0,    1776000,    1872000,    48000,      817, 0x3eb689e1, F=0x0
-0,    1824000,    2304000,    48000,    17919, 0x4e83b301, F=0x0
+0,    1824000,    2256001,    48000,    17919, 0x4e83b301, F=0x0
 0,    1872000,    2112000,    48000,     4911, 0x2fc88e1f, F=0x0
 0,    1920000,    2016000,    48000,     2717, 0xcc1f551e, F=0x0
 0,    1968000,    1968000,    48000,      908, 0x43e1b421, F=0x0
@@ -55,19 +55,19 @@
 0,    2064000,    2208000,    48000,     2857, 0xdeee9614, F=0x0
 0,    2112000,    2160000,    48000,     1039, 0x9cf8f494, F=0x0
 0,    2160000,    2256000,    48000,      975, 0x0f4ec85b, F=0x0
-0,    2208000,    2688000,    48000,    18345, 0xf9dabd3b, F=0x0
-0,    2256000,    2496000,    48000,     5280, 0xc5c33809, F=0x0
-0,    2304000,    2400000,    48000,     2755, 0xda955c75, F=0x0
-0,    2352000,    2352000,    48000,     1027, 0xf652eff3, F=0x0
-0,    2400000,    2448000,    48000,      901, 0xe23cb716, F=0x0
-0,    2448000,    2592000,    48000,     3010, 0x783de5cb, F=0x0
-0,    2496000,    2544000,    48000,     1005, 0x91fcf92f, F=0x0
-0,    2544000,    2640000,    48000,     1131, 0xd88b2920, F=0x0
-0,    2592000,    2592000,    48000,    29018, 0xc9b25608
-0,    2640000,    2880000,    48000,     7436, 0x18344e77, F=0x0
-0,    2688000,    2784000,    48000,     4050, 0x54fcbee0, F=0x0
-0,    2736000,    2736000,    48000,     1412, 0xe249bd86, F=0x0
-0,    2784000,    2832000,        1,      977, 0xa749dc21, F=0x0
-0,    2784001,    2784001,    95999,     2953, 0xba90c008, F=0x0
-0,    2880000,    2928000,        1,     1004, 0xddebea9a, F=0x0
-0,    2880001,    2880001,    48000,     1109, 0x7e081570, F=0x0
+0,    2208000,    2256011,    48000,    18345, 0xf9dabd3b, F=0x0
+0,    2256000,    2256006,        1,     5280, 0xc5c33809, F=0x0
+0,    2256001,    2256004,        1,     2755, 0xda955c75, F=0x0
+0,    2256002,    2256002,        2,     1027, 0xf652eff3, F=0x0
+0,    2256004,    2256005,        1,      901, 0xe23cb716, F=0x0
+0,    2256005,    2256008,        1,     3010, 0x783de5cb, F=0x0
+0,    2256006,    2256007,        1,     1005, 0x91fcf92f, F=0x0
+0,    2256007,    2256010,        1,     1131, 0xd88b2920, F=0x0
+0,    2256008,    2256008,        2,    29018, 0xc9b25608
+0,    2256010,    2256016,        1,     7436, 0x18344e77, F=0x0
+0,    2256011,    2256013,        1,     4050, 0x54fcbee0, F=0x0
+0,    2256012,    2256012,        1,     1412, 0xe249bd86, F=0x0
+0,    2256013,    2256014,        1,      977, 0xa749dc21, F=0x0
+0,    2256014,    2256014,        2,     2953, 0xba90c008, F=0x0
+0,    2256016,    2256017,        1,     1004, 0xddebea9a, F=0x0
+0,    2256017,    2256017,        1,     1109, 0x7e081570, F=0x0
diff --git a/tests/ref/fate/hevc-bsf-dts2pts-idr-cra b/tests/ref/fate/hevc-bsf-dts2pts-idr-cra
index b12f02581f..02e9765a26 100644
--- a/tests/ref/fate/hevc-bsf-dts2pts-idr-cra
+++ b/tests/ref/fate/hevc-bsf-dts2pts-idr-cra
@@ -1,5 +1,5 @@
-ed8d735b9a2780ce8ada61fd31c68886 *tests/data/fate/hevc-bsf-dts2pts-idr-cra.mov
-375210 tests/data/fate/hevc-bsf-dts2pts-idr-cra.mov
+07a216d6537502705348fea392d5d73d *tests/data/fate/hevc-bsf-dts2pts-idr-cra.mov
+375266 tests/data/fate/hevc-bsf-dts2pts-idr-cra.mov
 #extradata 0:      648, 0x30a7fa5c
 #tb 0: 1/1200000
 #media_type 0: video
@@ -47,7 +47,7 @@ ed8d735b9a2780ce8ada61fd31c68886 *tests/data/fate/hevc-bsf-dts2pts-idr-cra.mov
 0,    1680000,    1824000,    48000,     2674, 0x296fd411, F=0x0
 0,    1728000,    1776000,    48000,     1420, 0x304d6168, F=0x0
 0,    1776000,    1872000,    48000,     1364, 0x48734084, F=0x0
-0,    1824000,    2304000,    48000,    19349, 0xda0944f0, F=0x0
+0,    1824000,    2256001,    48000,    19349, 0xda0944f0, F=0x0
 0,    1872000,    2112000,    48000,     5550, 0x9c346b7b, F=0x0
 0,    1920000,    2016000,    48000,     3180, 0x5368f72b, F=0x0
 0,    1968000,    1968000,    48000,     1475, 0x369b7aae, F=0x0
@@ -55,19 +55,19 @@ ed8d735b9a2780ce8ada61fd31c68886 *tests/data/fate/hevc-bsf-dts2pts-idr-cra.mov
 0,    2064000,    2208000,    48000,     3365, 0x81d511a6, F=0x0
 0,    2112000,    2160000,    48000,     1575, 0xba81b0a5, F=0x0
 0,    2160000,    2256000,    48000,     1538, 0xf1199439, F=0x0
-0,    2208000,    2688000,    48000,    20138, 0x101fb220, F=0x0
-0,    2256000,    2496000,    48000,     5806, 0xcc500158, F=0x0
-0,    2304000,    2400000,    48000,     3267, 0xf8f0d6cd, F=0x0
-0,    2352000,    2352000,    48000,     1557, 0x894faf3e, F=0x0
-0,    2400000,    2448000,    48000,     1482, 0x6b1884b7, F=0x0
-0,    2448000,    2592000,    48000,     3513, 0x81227a59, F=0x0
-0,    2496000,    2544000,    48000,     1576, 0x855eabd8, F=0x0
-0,    2544000,    2640000,    48000,     1668, 0x030ade2e, F=0x0
-0,    2592000,    2592000,    48000,    32088, 0xfadbf5f6
-0,    2640000,    2880000,    48000,     5921, 0x21fb4976, F=0x0
-0,    2688000,    2784000,    48000,     3436, 0x92085cd6, F=0x0
-0,    2736000,    2736000,    48000,     1613, 0x1e0cb7c6, F=0x0
-0,    2784000,    2832000,    48000,     1483, 0x88d18a4e, F=0x0
-0,    2832000,    2976000,    48000,     3540, 0xc57c8e7b, F=0x0
-0,    2880000,    2928000,    48000,     1561, 0x9740a363, F=0x0
-0,    2928000,    3024000,    48000,     1651, 0x18e3d52d, F=0x0
+0,    2208000,    2256011,    48000,    20138, 0x101fb220, F=0x0
+0,    2256000,    2256006,        1,     5806, 0xcc500158, F=0x0
+0,    2256001,    2256004,        1,     3267, 0xf8f0d6cd, F=0x0
+0,    2256002,    2256002,        2,     1557, 0x894faf3e, F=0x0
+0,    2256004,    2256005,        1,     1482, 0x6b1884b7, F=0x0
+0,    2256005,    2256008,        1,     3513, 0x81227a59, F=0x0
+0,    2256006,    2256007,        1,     1576, 0x855eabd8, F=0x0
+0,    2256007,    2256010,        1,     1668, 0x030ade2e, F=0x0
+0,    2256008,    2256008,        2,    32088, 0xfadbf5f6
+0,    2256010,    2256016,        1,     5921, 0x21fb4976, F=0x0
+0,    2256011,    2256013,        1,     3436, 0x92085cd6, F=0x0
+0,    2256012,    2256012,        1,     1613, 0x1e0cb7c6, F=0x0
+0,    2256013,    2256014,        1,     1483, 0x88d18a4e, F=0x0
+0,    2256014,    2256018,        2,     3540, 0xc57c8e7b, F=0x0
+0,    2256016,    2256017,        1,     1561, 0x9740a363, F=0x0
+0,    2256017,    2256019,        1,     1651, 0x18e3d52d, F=0x0
diff --git a/tests/ref/lavf-fate/hevc.mp4 b/tests/ref/lavf-fate/hevc.mp4
index 39dddd49cb..aea5ae8979 100644
--- a/tests/ref/lavf-fate/hevc.mp4
+++ b/tests/ref/lavf-fate/hevc.mp4
@@ -1,3 +1,3 @@
-539b0f6daece6656609de0898a1f3d42 *tests/data/lavf-fate/lavf.hevc.mp4
+37b3a3e84df2350380b05b2af4dc97f5 *tests/data/lavf-fate/lavf.hevc.mp4
 151340 tests/data/lavf-fate/lavf.hevc.mp4
 tests/data/lavf-fate/lavf.hevc.mp4 CRC=0xc0a771de
-- 
2.52.0


From fb56204443ded48374afa0bbf6155d5246d82ee2 Mon Sep 17 00:00:00 2001
From: stevxiao <steven.xiao@amd.com>
Date: Fri, 21 Nov 2025 20:39:22 -0500
Subject: [PATCH 195/304] avcodec/d3d12va_encode: add intra refresh support for
 d3d12va encode

Intra refresh is a technique that gradually refreshes the video by encoding rows or regions as intra macroblocks/CTUs spread over multiple frames, rather than using periodic I-frames.
This provides better error resilience for video streaming while maintaining more consistent bitrate.

Disable Intra Refresh (This is the default)
ffmpeg -init_hw_device d3d12va -hwaccel d3d12va -hwaccel_output_format d3d12 \
-i input.mp4 \
-c:v h264_d3d12va \
-intra_refresh_mode none \
-intra_refresh_duration 30 \
-g 60 \
output.h264

Enable Intra Refresh
ffmpeg -init_hw_device d3d12va -hwaccel d3d12va -hwaccel_output_format d3d12 \
-i input.mp4 \
-c:v h264_d3d12va \
-intra_refresh_mode row_based \
-intra_refresh_duration 30 \
-g 60 \
output.h264

Parameters
- `-intra_refresh_mode`: Set to `row_based` to enable row-based intra refresh, or `NONE` to disable
- `-intra_refresh_duration`: Number of frames over which to spread the intra refresh (default: 0 = use GOP size)
- `-g`: GOP size (should typically be larger than intra refresh duration)
---
 configure                        |  2 +
 libavcodec/d3d12va_encode.c      | 98 +++++++++++++++++++++++++++++++-
 libavcodec/d3d12va_encode.h      | 29 +++++++++-
 libavcodec/d3d12va_encode_av1.c  |  3 +-
 libavcodec/d3d12va_encode_h264.c |  2 +-
 libavcodec/d3d12va_encode_hevc.c |  2 +-
 6 files changed, 129 insertions(+), 7 deletions(-)

diff --git a/configure b/configure
index 883539e361..04e086c32a 100755
--- a/configure
+++ b/configure
@@ -2615,6 +2615,7 @@ CONFIG_EXTRA="
     cbs_vp8
     cbs_vp9
     celp_math
+    d3d12_intra_refresh
     d3d12va_encode
     deflate_wrapper
     dirac_parse
@@ -6964,6 +6965,7 @@ check_type "windows.h d3d12video.h" "ID3D12VideoEncoder"
 test_code cc "windows.h d3d12video.h" "D3D12_FEATURE_VIDEO feature = D3D12_FEATURE_VIDEO_ENCODER_CODEC" && \
 test_code cc "windows.h d3d12video.h" "D3D12_FEATURE_DATA_VIDEO_ENCODER_RESOURCE_REQUIREMENTS req" && enable d3d12_encoder_feature
 test_code cc "windows.h d3d12video.h" "D3D12_VIDEO_ENCODER_CODEC c = D3D12_VIDEO_ENCODER_CODEC_AV1; (void)c;" && enable d3d12va_av1_headers
+test_code cc "windows.h d3d12video.h" "D3D12_FEATURE_DATA_VIDEO_ENCODER_INTRA_REFRESH_MODE check = { 0 };" && enable d3d12_intra_refresh
 check_type "windows.h" "DPI_AWARENESS_CONTEXT" -D_WIN32_WINNT=0x0A00
 check_type "windows.h security.h schnlsp.h" SecPkgContext_KeyingMaterialInfo -DSECURITY_WIN32
 check_type "d3d9.h dxva2api.h" DXVA2_ConfigPictureDecode -D_WIN32_WINNT=0x0602
diff --git a/libavcodec/d3d12va_encode.c b/libavcodec/d3d12va_encode.c
index 5a82a42c5f..f15df37b96 100644
--- a/libavcodec/d3d12va_encode.c
+++ b/libavcodec/d3d12va_encode.c
@@ -211,10 +211,17 @@ static int d3d12va_encode_issue(AVCodecContext *avctx,
     int barriers_ref_index = 0;
     D3D12_RESOURCE_BARRIER *barriers_ref = NULL;
 
+    D3D12_VIDEO_ENCODER_SEQUENCE_CONTROL_FLAGS seq_flags = D3D12_VIDEO_ENCODER_SEQUENCE_CONTROL_FLAG_NONE;
+
+    // Request intra refresh if enabled
+    if (ctx->intra_refresh.Mode != D3D12_VIDEO_ENCODER_INTRA_REFRESH_MODE_NONE) {
+        seq_flags |= D3D12_VIDEO_ENCODER_SEQUENCE_CONTROL_FLAG_REQUEST_INTRA_REFRESH;
+    }
+
     D3D12_VIDEO_ENCODER_ENCODEFRAME_INPUT_ARGUMENTS input_args = {
         .SequenceControlDesc = {
-            .Flags = D3D12_VIDEO_ENCODER_SEQUENCE_CONTROL_FLAG_NONE,
-            .IntraRefreshConfig = { 0 },
+            .Flags = seq_flags,
+            .IntraRefreshConfig = ctx->intra_refresh,
             .RateControl = ctx->rc,
             .PictureTargetResolution = ctx->resolution,
             .SelectedLayoutMode = D3D12_VIDEO_ENCODER_FRAME_SUBREGION_LAYOUT_MODE_FULL_FRAME,
@@ -359,7 +366,7 @@ static int d3d12va_encode_issue(AVCodecContext *avctx,
         }
     }
 
-    input_args.PictureControlDesc.IntraRefreshFrameIndex  = 0;
+    input_args.PictureControlDesc.IntraRefreshFrameIndex  = ctx->intra_refresh_frame_index;
     if (base_pic->is_reference)
         input_args.PictureControlDesc.Flags |= D3D12_VIDEO_ENCODER_PICTURE_CONTROL_FLAG_USED_AS_REFERENCE_PICTURE;
 
@@ -547,6 +554,12 @@ static int d3d12va_encode_issue(AVCodecContext *avctx,
 
     pic->fence_value = ctx->sync_ctx.fence_value;
 
+    // Update intra refresh frame index for next frame
+    if (ctx->intra_refresh.Mode != D3D12_VIDEO_ENCODER_INTRA_REFRESH_MODE_NONE) {
+        ctx->intra_refresh_frame_index =
+            (ctx->intra_refresh_frame_index + 1) % ctx->intra_refresh.IntraRefreshDuration;
+    }
+
     if (d3d12_refs.ppTexture2Ds)
         av_freep(&d3d12_refs.ppTexture2Ds);
 
@@ -1220,6 +1233,81 @@ static int d3d12va_encode_init_gop_structure(AVCodecContext *avctx)
     return 0;
 }
 
+static int d3d12va_encode_init_intra_refresh(AVCodecContext *avctx)
+{
+    FFHWBaseEncodeContext *base_ctx = avctx->priv_data;
+    D3D12VAEncodeContext       *ctx = avctx->priv_data;
+
+    if (ctx->intra_refresh.Mode == D3D12_VIDEO_ENCODER_INTRA_REFRESH_MODE_NONE)
+        return 0;
+
+    // Check for SDK API availability
+#if CONFIG_D3D12_INTRA_REFRESH
+    HRESULT hr;
+    D3D12_VIDEO_ENCODER_LEVEL_SETTING                    level = { 0 };
+    D3D12_VIDEO_ENCODER_LEVELS_H264                 h264_level = { 0 };
+    D3D12_VIDEO_ENCODER_LEVEL_TIER_CONSTRAINTS_HEVC hevc_level = { 0 };
+#if CONFIG_AV1_D3D12VA_ENCODER
+    D3D12_VIDEO_ENCODER_AV1_LEVEL_TIER_CONSTRAINTS   av1_level = { 0 };
+#endif
+
+    switch (ctx->codec->d3d12_codec) {
+        case D3D12_VIDEO_ENCODER_CODEC_H264:
+            level.DataSize = sizeof(D3D12_VIDEO_ENCODER_LEVELS_H264);
+            level.pH264LevelSetting = &h264_level;
+            break;
+        case D3D12_VIDEO_ENCODER_CODEC_HEVC:
+            level.DataSize = sizeof(D3D12_VIDEO_ENCODER_LEVEL_TIER_CONSTRAINTS_HEVC);
+            level.pHEVCLevelSetting = &hevc_level;
+            break;
+#if CONFIG_AV1_D3D12VA_ENCODER
+        case D3D12_VIDEO_ENCODER_CODEC_AV1:
+            level.DataSize = sizeof(D3D12_VIDEO_ENCODER_AV1_LEVEL_TIER_CONSTRAINTS);
+            level.pAV1LevelSetting = &av1_level;
+            break;
+#endif
+        default:
+            av_assert0(0);
+    }
+
+    D3D12_FEATURE_DATA_VIDEO_ENCODER_INTRA_REFRESH_MODE intra_refresh_support = {
+        .NodeIndex = 0,
+        .Codec = ctx->codec->d3d12_codec,
+        .Profile = ctx->profile->d3d12_profile,
+        .Level = level,
+        .IntraRefreshMode = ctx->intra_refresh.Mode,
+    };
+
+    hr = ID3D12VideoDevice3_CheckFeatureSupport(ctx->video_device3,
+        D3D12_FEATURE_VIDEO_ENCODER_INTRA_REFRESH_MODE,
+        &intra_refresh_support, sizeof(intra_refresh_support));
+
+    if (FAILED(hr) || !intra_refresh_support.IsSupported) {
+        av_log(avctx, AV_LOG_ERROR, "Requested intra refresh mode not supported by driver.\n");
+        return AVERROR(ENOTSUP);
+    }
+#else
+    // Older SDK - validation will occur in init_sequence_params via D3D12_FEATURE_VIDEO_ENCODER_SUPPORT
+    av_log(avctx, AV_LOG_VERBOSE, "Intra refresh explicit check not available in this SDK.\n"
+           "Support will be validated during encoder initialization.\n");
+#endif
+
+    // Set duration: use GOP size if not specified
+    if (ctx->intra_refresh.IntraRefreshDuration == 0) {
+        ctx->intra_refresh.IntraRefreshDuration = base_ctx->gop_size;
+        av_log(avctx, AV_LOG_VERBOSE, "Intra refresh duration set to GOP size: %d\n",
+               ctx->intra_refresh.IntraRefreshDuration);
+    }
+
+    // Initialize frame index
+    ctx->intra_refresh_frame_index = 0;
+
+    av_log(avctx, AV_LOG_VERBOSE, "Intra refresh: mode=%d, duration=%d frames\n",
+           ctx->intra_refresh.Mode, ctx->intra_refresh.IntraRefreshDuration);
+
+    return 0;
+}
+
 static int d3d12va_create_encoder(AVCodecContext *avctx)
 {
     FFHWBaseEncodeContext  *base_ctx     = avctx->priv_data;
@@ -1570,6 +1658,10 @@ int ff_d3d12va_encode_init(AVCodecContext *avctx)
     if (err < 0)
         goto fail;
 
+    err = d3d12va_encode_init_intra_refresh(avctx);
+    if (err < 0)
+        goto fail;
+
     if (!(ctx->codec->flags & FF_HW_FLAG_SLICE_CONTROL) && avctx->slices > 0) {
         av_log(avctx, AV_LOG_WARNING, "Multiple slices were requested "
                "but this codec does not support controlling slices.\n");
diff --git a/libavcodec/d3d12va_encode.h b/libavcodec/d3d12va_encode.h
index b7cfea0582..699d8d2077 100644
--- a/libavcodec/d3d12va_encode.h
+++ b/libavcodec/d3d12va_encode.h
@@ -266,6 +266,17 @@ typedef struct D3D12VAEncodeContext {
     D3D12_VIDEO_ENCODER_LEVEL_SETTING level;
 
     D3D12_VIDEO_ENCODER_PICTURE_CONTROL_SUBREGIONS_LAYOUT_DATA subregions_layout;
+
+    /**
+     * Intra refresh configuration
+     */
+    D3D12_VIDEO_ENCODER_INTRA_REFRESH intra_refresh;
+
+    /**
+     * Current frame index within intra refresh cycle
+     */
+    UINT intra_refresh_frame_index;
+
 } D3D12VAEncodeContext;
 
 typedef struct D3D12VAEncodeType {
@@ -347,11 +358,27 @@ int ff_d3d12va_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt);
 int ff_d3d12va_encode_init(AVCodecContext *avctx);
 int ff_d3d12va_encode_close(AVCodecContext *avctx);
 
+#define D3D12VA_ENCODE_INTRA_REFRESH_MODE(name, mode, desc) \
+    { #name, desc, 0, AV_OPT_TYPE_CONST, { .i64 = D3D12_VIDEO_ENCODER_INTRA_REFRESH_MODE_ ## mode }, \
+      0, 0, FLAGS, .unit = "intra_refresh_mode" }
+
 #define D3D12VA_ENCODE_COMMON_OPTIONS \
     { "max_frame_size", \
       "Maximum frame size (in bytes)",\
       OFFSET(common.max_frame_size), AV_OPT_TYPE_INT, \
-      { .i64 = 0 }, 0, INT_MAX / 8, FLAGS }
+      { .i64 = 0 }, 0, INT_MAX / 8, FLAGS }, \
+    { "intra_refresh_mode", \
+      "Set intra refresh mode", \
+      OFFSET(common.intra_refresh.Mode), AV_OPT_TYPE_INT, \
+      { .i64 = D3D12_VIDEO_ENCODER_INTRA_REFRESH_MODE_NONE }, \
+      D3D12_VIDEO_ENCODER_INTRA_REFRESH_MODE_NONE, \
+      D3D12_VIDEO_ENCODER_INTRA_REFRESH_MODE_ROW_BASED, FLAGS, .unit = "intra_refresh_mode" }, \
+    D3D12VA_ENCODE_INTRA_REFRESH_MODE(none, NONE, "Disable intra refresh"), \
+    D3D12VA_ENCODE_INTRA_REFRESH_MODE(row_based, ROW_BASED, "Row-based intra refresh"), \
+    { "intra_refresh_duration", \
+      "Number of frames over which to spread intra refresh (0 = GOP size)", \
+      OFFSET(common.intra_refresh.IntraRefreshDuration), AV_OPT_TYPE_INT, \
+      { .i64 = 0 }, 0, INT_MAX, FLAGS }
 
 #define D3D12VA_ENCODE_RC_MODE(name, desc) \
     { #name, desc, 0, AV_OPT_TYPE_CONST, { .i64 = RC_MODE_ ## name }, \
diff --git a/libavcodec/d3d12va_encode_av1.c b/libavcodec/d3d12va_encode_av1.c
index 34e803db98..31d3df33bd 100644
--- a/libavcodec/d3d12va_encode_av1.c
+++ b/libavcodec/d3d12va_encode_av1.c
@@ -559,7 +559,7 @@ static int d3d12va_encode_av1_init_sequence_params(AVCodecContext *avctx)
         .Codec                            = D3D12_VIDEO_ENCODER_CODEC_AV1,
         .InputFormat                      = hwctx->format,
         .RateControl                      = ctx->rc,
-        .IntraRefresh                     = D3D12_VIDEO_ENCODER_INTRA_REFRESH_MODE_NONE,
+        .IntraRefresh                     = ctx->intra_refresh.Mode,
         .SubregionFrameEncoding           = D3D12_VIDEO_ENCODER_FRAME_SUBREGION_LAYOUT_MODE_FULL_FRAME,
         .ResolutionsListCount             = 1,
         .pResolutionList                  = &ctx->resolution,
@@ -1082,6 +1082,7 @@ static int d3d12va_encode_av1_close(AVCodecContext *avctx)
 #define FLAGS (AV_OPT_FLAG_VIDEO_PARAM | AV_OPT_FLAG_ENCODING_PARAM)
 static const AVOption d3d12va_encode_av1_options[] = {
     HW_BASE_ENCODE_COMMON_OPTIONS,
+    D3D12VA_ENCODE_COMMON_OPTIONS,
     D3D12VA_ENCODE_RC_OPTIONS,
 
     { "qp", "Constant QP (for P-frames; scaled by qfactor/qoffset for I/B)",
diff --git a/libavcodec/d3d12va_encode_h264.c b/libavcodec/d3d12va_encode_h264.c
index 967544ea24..bcf5a326e5 100644
--- a/libavcodec/d3d12va_encode_h264.c
+++ b/libavcodec/d3d12va_encode_h264.c
@@ -178,7 +178,7 @@ static int d3d12va_encode_h264_init_sequence_params(AVCodecContext *avctx)
         .Codec                            = D3D12_VIDEO_ENCODER_CODEC_H264,
         .InputFormat                      = hwctx->format,
         .RateControl                      = ctx->rc,
-        .IntraRefresh                     = D3D12_VIDEO_ENCODER_INTRA_REFRESH_MODE_NONE,
+        .IntraRefresh                     = ctx->intra_refresh.Mode,
         .SubregionFrameEncoding           = D3D12_VIDEO_ENCODER_FRAME_SUBREGION_LAYOUT_MODE_FULL_FRAME,
         .ResolutionsListCount             = 1,
         .pResolutionList                  = &ctx->resolution,
diff --git a/libavcodec/d3d12va_encode_hevc.c b/libavcodec/d3d12va_encode_hevc.c
index 01e5b4cb4c..e00ecbb4de 100644
--- a/libavcodec/d3d12va_encode_hevc.c
+++ b/libavcodec/d3d12va_encode_hevc.c
@@ -250,7 +250,7 @@ static int d3d12va_encode_hevc_init_sequence_params(AVCodecContext *avctx)
         .Codec                            = D3D12_VIDEO_ENCODER_CODEC_HEVC,
         .InputFormat                      = hwctx->format,
         .RateControl                      = ctx->rc,
-        .IntraRefresh                     = D3D12_VIDEO_ENCODER_INTRA_REFRESH_MODE_NONE,
+        .IntraRefresh                     = ctx->intra_refresh.Mode,
         .SubregionFrameEncoding           = D3D12_VIDEO_ENCODER_FRAME_SUBREGION_LAYOUT_MODE_FULL_FRAME,
         .ResolutionsListCount             = 1,
         .pResolutionList                  = &ctx->resolution,
-- 
2.52.0


From 9e176d55f4ae26d65c529e870bdf52ce909d382a Mon Sep 17 00:00:00 2001
From: Xia Tao <xiatao@gmail.com>
Date: Thu, 4 Dec 2025 11:49:50 +0800
Subject: [PATCH 196/304] avcodec/wasm/hevc: fix typo in butterfly macro

Signed-off-by: Xia Tao <xiatao@gmail.com>
---
 libavcodec/wasm/hevc/idct.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/libavcodec/wasm/hevc/idct.c b/libavcodec/wasm/hevc/idct.c
index 828fc26a49..f2b04f307c 100644
--- a/libavcodec/wasm/hevc/idct.c
+++ b/libavcodec/wasm/hevc/idct.c
@@ -279,7 +279,7 @@ void ff_hevc_idct_8x8_10_simd128(int16_t *coeffs, int col_limit)
     x1 += x2;                                   \
     x3 += x2;                                   \
 
-#define bufferfly(e, o, p, m)   \
+#define butterfly(e, o, p, m)   \
     p = wasm_i32x4_add(e, o);   \
     m = wasm_i32x4_sub(e, o);   \
 
@@ -309,10 +309,10 @@ static void tr16_8x4(v128_t in0, v128_t in1, v128_t in2, v128_t in3,
     v30 = wasm_i32x4_add(v30, wasm_i32x4_extmul_high_i16x8(in3, trans[5]));
     v31 = wasm_i32x4_sub(v31, wasm_i32x4_extmul_high_i16x8(in3, trans[4]));
 
-    bufferfly(v24, v28, v16, v23);
-    bufferfly(v25, v29, v17, v22);
-    bufferfly(v26, v30, v18, v21);
-    bufferfly(v27, v31, v19, v20);
+    butterfly(v24, v28, v16, v23);
+    butterfly(v25, v29, v17, v22);
+    butterfly(v26, v30, v18, v21);
+    butterfly(v27, v31, v19, v20);
 
     sp += offset;
     wasm_v128_store(sp, v16); sp += 16;
-- 
2.52.0


From cab3625f985be53ff9e4dfa97e2251051793d865 Mon Sep 17 00:00:00 2001
From: Oliver Chang <ochang@google.com>
Date: Wed, 3 Dec 2025 04:53:09 +0000
Subject: [PATCH 197/304] avcodec/aacdec: Fix heap-use-after-free in USAC
 decoding

A heap-use-after-free vulnerability was identified in
`libavcodec/aac/aacdec.c`.  When `che_configure` frees a
`ChannelElement` (`ac->che[type][id]`), it failed to clear all
references to it in `ac->tag_che_map`.  `ac->tag_che_map` caches
pointers to `ChannelElement`s and can contain cross-type mappings (e.g.,
a `TYPE_SCE` tag mapping to a `TYPE_LFE` element).

In a USAC stream reconfiguration scenario, an LFE element was freed, but
a stale pointer remained in `ac->tag_che_map`. Subsequent calls to
`ff_aac_get_che` returned this dangling pointer, leading to a crash in
`decode_usac_core_coder`.

This commit fixes the issue by iterating over the entire
`ac->tag_che_map` in `che_configure` and clearing any entries that point
to the `ChannelElement` about to be freed, ensuring no dangling pointers
remain.

Fixes: https://issues.oss-fuzz.com/issues/440220467
---
 libavcodec/aac/aacdec.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/libavcodec/aac/aacdec.c b/libavcodec/aac/aacdec.c
index 9b42014ee8..b8d53036d4 100644
--- a/libavcodec/aac/aacdec.c
+++ b/libavcodec/aac/aacdec.c
@@ -164,6 +164,12 @@ static av_cold int che_configure(AACDecContext *ac,
         }
     } else {
         if (ac->che[type][id]) {
+            for (int i = 0; i < FF_ARRAY_ELEMS(ac->tag_che_map); i++) {
+                for (int j = 0; j < MAX_ELEM_ID; j++) {
+                    if (ac->tag_che_map[i][j] == ac->che[type][id])
+                        ac->tag_che_map[i][j] = NULL;
+                }
+            }
             ac->proc.sbr_ctx_close(ac->che[type][id]);
         }
         av_freep(&ac->che[type][id]);
-- 
2.52.0


From 35d435969904af2b8ae9d5aa278ae9c028d70955 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Thu, 4 Dec 2025 15:02:40 +0100
Subject: [PATCH 198/304] vulkan_dpx: fix alignment issue

12-bit images apparently require mod-32 alignment for each line.
Go figure.
---
 libavcodec/vulkan/dpx_unpack.comp | 7 +++++--
 libavcodec/vulkan_dpx.c           | 2 --
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/libavcodec/vulkan/dpx_unpack.comp b/libavcodec/vulkan/dpx_unpack.comp
index 5a44de87bf..b04ce5ddc6 100644
--- a/libavcodec/vulkan/dpx_unpack.comp
+++ b/libavcodec/vulkan/dpx_unpack.comp
@@ -44,8 +44,11 @@ i16vec4 parse_packed_in_32(ivec2 pos, int stride)
 #else
 i16vec4 parse_packed_in_32(ivec2 pos, int stride)
 {
-    uint line_off = pos.y*(stride*BITS_PER_COMP*COMPONENTS +
-                           (need_align << 3));
+    uint line_size = stride*BITS_PER_COMP*COMPONENTS;
+    line_size += line_size & 31;
+    line_size += need_align << 3;
+
+    uint line_off = pos.y*line_size;
     uint pix_off = pos.x*BITS_PER_COMP*COMPONENTS;
 
     uint off = (line_off + pix_off >> 5);
diff --git a/libavcodec/vulkan_dpx.c b/libavcodec/vulkan_dpx.c
index 54774f2424..16ac10c5b8 100644
--- a/libavcodec/vulkan_dpx.c
+++ b/libavcodec/vulkan_dpx.c
@@ -402,9 +402,7 @@ static int vk_decode_dpx_init(AVCodecContext *avctx)
 
     switch (dpx->pix_fmt) {
     case AV_PIX_FMT_GRAY10:
-    case AV_PIX_FMT_GRAY12:
     case AV_PIX_FMT_GBRAP10:
-    case AV_PIX_FMT_GBRAP12:
     case AV_PIX_FMT_UYVY422:
     case AV_PIX_FMT_YUV444P:
     case AV_PIX_FMT_YUVA444P:
-- 
2.52.0


From a200f67dde1bcb2a1af08698d9fb1b3944742bfb Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sun, 23 Nov 2025 11:08:14 +0100
Subject: [PATCH 199/304] avcodec/x86/vp8dsp: Remove MMXEXT functions
 overridden by SSSE3

SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMX(EXT) functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp8dsp.asm    | 159 +----------------------------------
 libavcodec/x86/vp8dsp_init.c |  37 +-------
 2 files changed, 6 insertions(+), 190 deletions(-)

diff --git a/libavcodec/x86/vp8dsp.asm b/libavcodec/x86/vp8dsp.asm
index 231c21ea0d..7b836351e4 100644
--- a/libavcodec/x86/vp8dsp.asm
+++ b/libavcodec/x86/vp8dsp.asm
@@ -1,5 +1,5 @@
 ;******************************************************************************
-;* VP8 MMXEXT optimizations
+;* VP8 ASM optimizations
 ;* Copyright (c) 2010 Ronald S. Bultje <rsbultje@gmail.com>
 ;* Copyright (c) 2010 Fiona Glaser <fiona@x264.com>
 ;*
@@ -24,25 +24,6 @@
 
 SECTION_RODATA
 
-fourtap_filter_hw_m: times 4 dw  -6, 123
-                     times 4 dw  12,  -1
-                     times 4 dw  -9,  93
-                     times 4 dw  50,  -6
-                     times 4 dw  -6,  50
-                     times 4 dw  93,  -9
-                     times 4 dw  -1,  12
-                     times 4 dw 123,  -6
-
-sixtap_filter_hw_m:  times 4 dw   2, -11
-                     times 4 dw 108,  36
-                     times 4 dw  -8,   1
-                     times 4 dw   3, -16
-                     times 4 dw  77,  77
-                     times 4 dw -16,   3
-                     times 4 dw   1,  -8
-                     times 4 dw  36, 108
-                     times 4 dw -11,   2
-
 fourtap_filter_hb_m: times 8 db  -6, 123
                      times 8 db  12,  -1
                      times 8 db  -9,  93
@@ -115,8 +96,6 @@ bilinear_filter_vb_m: times 8 db 7, 1
                       times 8 db 1, 7
 
 %if PIC
-%define fourtap_filter_hw  picregq
-%define sixtap_filter_hw   picregq
 %define fourtap_filter_hb  picregq
 %define sixtap_filter_hb   picregq
 %define fourtap_filter_v   picregq
@@ -125,8 +104,6 @@ bilinear_filter_vb_m: times 8 db 7, 1
 %define bilinear_filter_vb picregq
 %define npicregs 1
 %else
-%define fourtap_filter_hw  fourtap_filter_hw_m
-%define sixtap_filter_hw   sixtap_filter_hw_m
 %define fourtap_filter_hb  fourtap_filter_hb_m
 %define sixtap_filter_hb   sixtap_filter_hb_m
 %define fourtap_filter_v   fourtap_filter_v_m
@@ -322,112 +299,6 @@ FILTER_SSSE3 4
 INIT_XMM ssse3
 FILTER_SSSE3 8
 
-; 4x4 block, H-only 4-tap filter
-INIT_MMX mmxext
-cglobal put_vp8_epel4_h4, 6, 6 + npicregs, 0, dst, dststride, src, srcstride, height, mx, picreg
-    shl       mxd, 4
-%if PIC
-    lea   picregq, [fourtap_filter_hw_m]
-%endif
-    movq      mm4, [fourtap_filter_hw+mxq-16] ; set up 4tap filter in words
-    movq      mm5, [fourtap_filter_hw+mxq]
-    movq      mm7, [pw_64]
-    pxor      mm6, mm6
-
-.nextrow:
-    movq      mm1, [srcq-1]                ; (ABCDEFGH) load 8 horizontal pixels
-
-    ; first set of 2 pixels
-    movq      mm2, mm1                     ; byte ABCD..
-    punpcklbw mm1, mm6                     ; byte->word ABCD
-    pshufw    mm0, mm2, 9                  ; byte CDEF..
-    punpcklbw mm0, mm6                     ; byte->word CDEF
-    pshufw    mm3, mm1, 0x94               ; word ABBC
-    pshufw    mm1, mm0, 0x94               ; word CDDE
-    pmaddwd   mm3, mm4                     ; multiply 2px with F0/F1
-    movq      mm0, mm1                     ; backup for second set of pixels
-    pmaddwd   mm1, mm5                     ; multiply 2px with F2/F3
-    paddd     mm3, mm1                     ; finish 1st 2px
-
-    ; second set of 2 pixels, use backup of above
-    punpckhbw mm2, mm6                     ; byte->word EFGH
-    pmaddwd   mm0, mm4                     ; multiply backed up 2px with F0/F1
-    pshufw    mm1, mm2, 0x94               ; word EFFG
-    pmaddwd   mm1, mm5                     ; multiply 2px with F2/F3
-    paddd     mm0, mm1                     ; finish 2nd 2px
-
-    ; merge two sets of 2 pixels into one set of 4, round/clip/store
-    packssdw  mm3, mm0                     ; merge dword->word (4px)
-    paddsw    mm3, mm7                     ; rounding
-    psraw     mm3, 7
-    packuswb  mm3, mm6                     ; clip and word->bytes
-    movd   [dstq], mm3                     ; store
-
-    ; go to next line
-    add      dstq, dststrideq
-    add      srcq, srcstrideq
-    dec   heightd                          ; next row
-    jg .nextrow
-    RET
-
-; 4x4 block, H-only 6-tap filter
-INIT_MMX mmxext
-cglobal put_vp8_epel4_h6, 6, 6 + npicregs, 0, dst, dststride, src, srcstride, height, mx, picreg
-    lea       mxd, [mxq*3]
-%if PIC
-    lea   picregq, [sixtap_filter_hw_m]
-%endif
-    movq      mm4, [sixtap_filter_hw+mxq*8-48] ; set up 4tap filter in words
-    movq      mm5, [sixtap_filter_hw+mxq*8-32]
-    movq      mm6, [sixtap_filter_hw+mxq*8-16]
-    movq      mm7, [pw_64]
-    pxor      mm3, mm3
-
-.nextrow:
-    movq      mm1, [srcq-2]                ; (ABCDEFGH) load 8 horizontal pixels
-
-    ; first set of 2 pixels
-    movq      mm2, mm1                     ; byte ABCD..
-    punpcklbw mm1, mm3                     ; byte->word ABCD
-    pshufw    mm0, mm2, 0x9                ; byte CDEF..
-    punpckhbw mm2, mm3                     ; byte->word EFGH
-    punpcklbw mm0, mm3                     ; byte->word CDEF
-    pshufw    mm1, mm1, 0x94               ; word ABBC
-    pshufw    mm2, mm2, 0x94               ; word EFFG
-    pmaddwd   mm1, mm4                     ; multiply 2px with F0/F1
-    pshufw    mm3, mm0, 0x94               ; word CDDE
-    movq      mm0, mm3                     ; backup for second set of pixels
-    pmaddwd   mm3, mm5                     ; multiply 2px with F2/F3
-    paddd     mm1, mm3                     ; add to 1st 2px cache
-    movq      mm3, mm2                     ; backup for second set of pixels
-    pmaddwd   mm2, mm6                     ; multiply 2px with F4/F5
-    paddd     mm1, mm2                     ; finish 1st 2px
-
-    ; second set of 2 pixels, use backup of above
-    movd      mm2, [srcq+3]                ; byte FGHI (prevent overreads)
-    pmaddwd   mm0, mm4                     ; multiply 1st backed up 2px with F0/F1
-    pmaddwd   mm3, mm5                     ; multiply 2nd backed up 2px with F2/F3
-    paddd     mm0, mm3                     ; add to 2nd 2px cache
-    pxor      mm3, mm3
-    punpcklbw mm2, mm3                     ; byte->word FGHI
-    pshufw    mm2, mm2, 0xE9               ; word GHHI
-    pmaddwd   mm2, mm6                     ; multiply 2px with F4/F5
-    paddd     mm0, mm2                     ; finish 2nd 2px
-
-    ; merge two sets of 2 pixels into one set of 4, round/clip/store
-    packssdw  mm1, mm0                     ; merge dword->word (4px)
-    paddsw    mm1, mm7                     ; rounding
-    psraw     mm1, 7
-    packuswb  mm1, mm3                     ; clip and word->bytes
-    movd   [dstq], mm1                     ; store
-
-    ; go to next line
-    add      dstq, dststrideq
-    add      srcq, srcstrideq
-    dec   heightd                          ; next row
-    jg .nextrow
-    RET
-
 INIT_XMM sse2
 cglobal put_vp8_epel8_h4, 6, 6 + npicregs, 10, dst, dststride, src, srcstride, height, mx, picreg
     shl      mxd, 5
@@ -539,9 +410,9 @@ cglobal put_vp8_epel8_h6, 6, 6 + npicregs, 14, dst, dststride, src, srcstride, h
     jg .nextrow
     RET
 
-%macro FILTER_V 1
+INIT_XMM sse2
 ; 4x4 block, V-only 4-tap filter
-cglobal put_vp8_epel%1_v4, 7, 7, 8, dst, dststride, src, srcstride, height, picreg, my
+cglobal put_vp8_epel8_v4, 7, 7, 8, dst, dststride, src, srcstride, height, picreg, my
     shl      myd, 5
 %if PIC
     lea  picregq, [fourtap_filter_v_m]
@@ -594,7 +465,7 @@ cglobal put_vp8_epel%1_v4, 7, 7, 8, dst, dststride, src, srcstride, height, picr
 
 
 ; 4x4 block, V-only 6-tap filter
-cglobal put_vp8_epel%1_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picreg, my
+cglobal put_vp8_epel8_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picreg, my
     shl      myd, 4
     lea      myq, [myq*3]
 %if PIC
@@ -656,12 +527,6 @@ cglobal put_vp8_epel%1_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picr
     dec  heightd                           ; next row
     jg .nextrow
     RET
-%endmacro
-
-INIT_MMX mmxext
-FILTER_V 4
-INIT_XMM sse2
-FILTER_V 8
 
 %macro FILTER_BILINEAR 1
 %if cpuflag(ssse3)
@@ -722,16 +587,9 @@ cglobal put_vp8_bilinear%1_v, 7, 7, 7, dst, dststride, src, srcstride, height, p
     psraw     m2, 2
     pavgw     m0, m6
     pavgw     m2, m6
-%if mmsize == 8
-    packuswb  m0, m0
-    packuswb  m2, m2
-    movh   [dstq+dststrideq*0], m0
-    movh   [dstq+dststrideq*1], m2
-%else
     packuswb  m0, m2
     movh   [dstq+dststrideq*0], m0
     movhps [dstq+dststrideq*1], m0
-%endif
 %endif ; cpuflag(ssse3)
 
     lea     dstq, [dstq+dststrideq*2]
@@ -799,16 +657,9 @@ cglobal put_vp8_bilinear%1_h, 6, 6 + npicregs, 7, dst, dststride, src, srcstride
     psraw     m2, 2
     pavgw     m0, m6
     pavgw     m2, m6
-%if mmsize == 8
-    packuswb  m0, m0
-    packuswb  m2, m2
-    movh   [dstq+dststrideq*0], m0
-    movh   [dstq+dststrideq*1], m2
-%else
     packuswb  m0, m2
     movh   [dstq+dststrideq*0], m0
     movhps [dstq+dststrideq*1], m0
-%endif
 %endif ; cpuflag(ssse3)
 
     lea     dstq, [dstq+dststrideq*2]
@@ -818,8 +669,6 @@ cglobal put_vp8_bilinear%1_h, 6, 6 + npicregs, 7, dst, dststride, src, srcstride
     RET
 %endmacro
 
-INIT_MMX mmxext
-FILTER_BILINEAR 4
 INIT_XMM sse2
 FILTER_BILINEAR 8
 INIT_MMX ssse3
diff --git a/libavcodec/x86/vp8dsp_init.c b/libavcodec/x86/vp8dsp_init.c
index e37afab775..00733a2564 100644
--- a/libavcodec/x86/vp8dsp_init.c
+++ b/libavcodec/x86/vp8dsp_init.c
@@ -29,19 +29,6 @@
 /*
  * MC functions
  */
-void ff_put_vp8_epel4_h4_mmxext(uint8_t *dst, ptrdiff_t dststride,
-                                const uint8_t *src, ptrdiff_t srcstride,
-                                int height, int mx, int my);
-void ff_put_vp8_epel4_h6_mmxext(uint8_t *dst, ptrdiff_t dststride,
-                                const uint8_t *src, ptrdiff_t srcstride,
-                                int height, int mx, int my);
-void ff_put_vp8_epel4_v4_mmxext(uint8_t *dst, ptrdiff_t dststride,
-                                const uint8_t *src, ptrdiff_t srcstride,
-                                int height, int mx, int my);
-void ff_put_vp8_epel4_v6_mmxext(uint8_t *dst, ptrdiff_t dststride,
-                                const uint8_t *src, ptrdiff_t srcstride,
-                                int height, int mx, int my);
-
 void ff_put_vp8_epel8_h4_sse2  (uint8_t *dst, ptrdiff_t dststride,
                                 const uint8_t *src, ptrdiff_t srcstride,
                                 int height, int mx, int my);
@@ -80,9 +67,6 @@ void ff_put_vp8_epel8_v6_ssse3 (uint8_t *dst, ptrdiff_t dststride,
                                 const uint8_t *src, ptrdiff_t srcstride,
                                 int height, int mx, int my);
 
-void ff_put_vp8_bilinear4_h_mmxext(uint8_t *dst, ptrdiff_t dststride,
-                                   const uint8_t *src, ptrdiff_t srcstride,
-                                   int height, int mx, int my);
 void ff_put_vp8_bilinear8_h_sse2  (uint8_t *dst, ptrdiff_t dststride,
                                    const uint8_t *src, ptrdiff_t srcstride,
                                    int height, int mx, int my);
@@ -93,9 +77,6 @@ void ff_put_vp8_bilinear8_h_ssse3 (uint8_t *dst, ptrdiff_t dststride,
                                    const uint8_t *src, ptrdiff_t srcstride,
                                    int height, int mx, int my);
 
-void ff_put_vp8_bilinear4_v_mmxext(uint8_t *dst, ptrdiff_t dststride,
-                                   const uint8_t *src, ptrdiff_t srcstride,
-                                   int height, int mx, int my);
 void ff_put_vp8_bilinear8_v_sse2  (uint8_t *dst, ptrdiff_t dststride,
                                    const uint8_t *src, ptrdiff_t srcstride,
                                    int height, int mx, int my);
@@ -159,14 +140,6 @@ static void ff_put_vp8_epel ## SIZE ## _h ## TAPNUMX ## v ## TAPNUMY ## _ ## OPT
         dst, dststride, tmpptr, SIZE,      height,               mx, my); \
 }
 
-#define HVTAPMMX(x, y) \
-HVTAP(mmxext, 8, x, y,  4,  8)
-
-HVTAPMMX(4, 4)
-HVTAPMMX(4, 6)
-HVTAPMMX(6, 4)
-HVTAPMMX(6, 6)
-
 #define HVTAPSSE2(x, y, w) \
 HVTAP(sse2,  16, x, y, w, 16) \
 HVTAP(ssse3, 16, x, y, w, 16)
@@ -194,7 +167,6 @@ static void ff_put_vp8_bilinear ## SIZE ## _hv_ ## OPT( \
         dst, dststride, tmp, SIZE,      height,     mx, my); \
 }
 
-HVBILIN(mmxext,  8,  4,  8)
 HVBILIN(sse2,  8,  8, 16)
 HVBILIN(sse2,  8, 16, 16)
 HVBILIN(ssse3, 8,  4,  8)
@@ -285,13 +257,6 @@ av_cold void ff_vp78dsp_init_x86(VP8DSPContext *c)
         c->put_vp8_bilinear_pixels_tab[1][0][0] = ff_put_vp8_pixels8_mmx;
     }
 
-    /* note that 4-tap width=16 functions are missing because w=16
-     * is only used for luma, and luma is always a copy or sixtap. */
-    if (EXTERNAL_MMXEXT(cpu_flags)) {
-        VP8_MC_FUNC(2, 4, mmxext);
-        VP8_BILINEAR_MC_FUNC(2, 4, mmxext);
-    }
-
     if (EXTERNAL_SSE(cpu_flags)) {
         c->put_vp8_epel_pixels_tab[0][0][0]     =
         c->put_vp8_bilinear_pixels_tab[0][0][0] = ff_put_vp8_pixels16_sse;
@@ -304,6 +269,8 @@ av_cold void ff_vp78dsp_init_x86(VP8DSPContext *c)
         VP8_BILINEAR_MC_FUNC(1, 8, sse2);
     }
 
+    /* note that 4-tap width=16 functions are missing because w=16
+     * is only used for luma, and luma is always a copy or sixtap. */
     if (EXTERNAL_SSSE3(cpu_flags)) {
         VP8_LUMA_MC_FUNC(0, 16, ssse3);
         VP8_MC_FUNC(1, 8, ssse3);
-- 
2.52.0


From 96aedd1e119e6063f17684237c1a6ba6d3c73b84 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sun, 23 Nov 2025 11:25:26 +0100
Subject: [PATCH 200/304] avcodec/x86/vp8dsp: Don't use MMX registers in
 put_vp8_pixels8

Use GPRs on x64 and xmm registers else (using GPRs reduces codesize).
This avoids clobbering the floating point state and therefore no longer
breaks the ABI.
No change in benchmarks here.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp8dsp.asm    | 20 ++++++++++++++------
 libavcodec/x86/vp8dsp_init.c |  9 +++------
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/libavcodec/x86/vp8dsp.asm b/libavcodec/x86/vp8dsp.asm
index 7b836351e4..7dee979e20 100644
--- a/libavcodec/x86/vp8dsp.asm
+++ b/libavcodec/x86/vp8dsp.asm
@@ -676,14 +676,22 @@ FILTER_BILINEAR 4
 INIT_XMM ssse3
 FILTER_BILINEAR 8
 
-INIT_MMX mmx
-cglobal put_vp8_pixels8, 5, 5, 0, dst, dststride, src, srcstride, height
+INIT_XMM sse2
+cglobal put_vp8_pixels8, 5, 5+2*ARCH_X86_64, 2, dst, dststride, src, srcstride, height
 .nextrow:
-    movq    mm0, [srcq+srcstrideq*0]
-    movq    mm1, [srcq+srcstrideq*1]
+%if ARCH_X86_64
+    mov     r5q, [srcq+srcstrideq*0]
+    mov     r6q, [srcq+srcstrideq*1]
     lea    srcq, [srcq+srcstrideq*2]
-    movq [dstq+dststrideq*0], mm0
-    movq [dstq+dststrideq*1], mm1
+    mov [dstq+dststrideq*0], r5q
+    mov [dstq+dststrideq*1], r6q
+%else
+    movq     m0, [srcq+srcstrideq*0]
+    movq     m1, [srcq+srcstrideq*1]
+    lea    srcq, [srcq+srcstrideq*2]
+    movq [dstq+dststrideq*0], m0
+    movq [dstq+dststrideq*1], m1
+%endif
     lea    dstq, [dstq+dststrideq*2]
     sub heightd, 2
     jg .nextrow
diff --git a/libavcodec/x86/vp8dsp_init.c b/libavcodec/x86/vp8dsp_init.c
index 00733a2564..40aa52c7f0 100644
--- a/libavcodec/x86/vp8dsp_init.c
+++ b/libavcodec/x86/vp8dsp_init.c
@@ -88,7 +88,7 @@ void ff_put_vp8_bilinear8_v_ssse3 (uint8_t *dst, ptrdiff_t dststride,
                                    int height, int mx, int my);
 
 
-void ff_put_vp8_pixels8_mmx (uint8_t *dst, ptrdiff_t dststride,
+void ff_put_vp8_pixels8_sse2(uint8_t *dst, ptrdiff_t dststride,
                              const uint8_t *src, ptrdiff_t srcstride,
                              int height, int mx, int my);
 void ff_put_vp8_pixels16_sse(uint8_t *dst, ptrdiff_t dststride,
@@ -252,17 +252,14 @@ av_cold void ff_vp78dsp_init_x86(VP8DSPContext *c)
 {
     int cpu_flags = av_get_cpu_flags();
 
-    if (EXTERNAL_MMX(cpu_flags)) {
-        c->put_vp8_epel_pixels_tab[1][0][0]     =
-        c->put_vp8_bilinear_pixels_tab[1][0][0] = ff_put_vp8_pixels8_mmx;
-    }
-
     if (EXTERNAL_SSE(cpu_flags)) {
         c->put_vp8_epel_pixels_tab[0][0][0]     =
         c->put_vp8_bilinear_pixels_tab[0][0][0] = ff_put_vp8_pixels16_sse;
     }
 
     if (EXTERNAL_SSE2_SLOW(cpu_flags)) {
+        c->put_vp8_epel_pixels_tab[1][0][0]     =
+        c->put_vp8_bilinear_pixels_tab[1][0][0] = ff_put_vp8_pixels8_sse2;
         VP8_LUMA_MC_FUNC(0, 16, sse2);
         VP8_MC_FUNC(1, 8, sse2);
         VP8_BILINEAR_MC_FUNC(0, 16, sse2);
-- 
2.52.0


From b1e3ef1a5b7d7cbf30d3415951959b189956f628 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sun, 23 Nov 2025 12:53:12 +0100
Subject: [PATCH 201/304] avcodec/x86/vp8dsp: Directly use negated stride

There is a register available. No change in benchmarks here.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp8dsp.asm | 44 +++++++++++++++++++--------------------
 1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/libavcodec/x86/vp8dsp.asm b/libavcodec/x86/vp8dsp.asm
index 7dee979e20..6b5ca9f309 100644
--- a/libavcodec/x86/vp8dsp.asm
+++ b/libavcodec/x86/vp8dsp.asm
@@ -219,11 +219,11 @@ cglobal put_vp8_epel%1_v4, 7, 7, 8, dst, dststride, src, srcstride, height, picr
     mova      m7, [pw_256]
 
     ; read 3 lines
-    sub     srcq, srcstrideq
-    movh      m0, [srcq]
-    movh      m1, [srcq+  srcstrideq]
-    movh      m2, [srcq+2*srcstrideq]
-    add     srcq, srcstrideq
+    mov  picregq, srcstrideq
+    neg  picregq
+    movh      m0, [srcq+picregq]
+    movh      m1, [srcq]
+    movh      m2, [srcq+srcstrideq]
 
 .nextrow:
     movh      m3, [srcq+2*srcstrideq]      ; read new row
@@ -255,18 +255,17 @@ cglobal put_vp8_epel%1_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picr
     lea      myq, [sixtap_filter_hb+myq*8]
 
     ; read 5 lines
-    sub     srcq, srcstrideq
-    sub     srcq, srcstrideq
-    movh      m0, [srcq]
-    movh      m1, [srcq+srcstrideq]
-    movh      m2, [srcq+srcstrideq*2]
+    mov  picregq, srcstrideq
+    neg  picregq
+    movh      m0, [srcq+2*picregq]
+    movh      m1, [srcq+picregq]
+    movh      m2, [srcq]
+    movh      m3, [srcq+srcstrideq]
+    movh      m4, [srcq+2*srcstrideq]
     lea     srcq, [srcq+srcstrideq*2]
-    add     srcq, srcstrideq
-    movh      m3, [srcq]
-    movh      m4, [srcq+srcstrideq]
 
 .nextrow:
-    movh      m5, [srcq+2*srcstrideq]      ; read new row
+    movh      m5, [srcq+srcstrideq]      ; read new row
     mova      m6, m0
     punpcklbw m6, m5
     mova      m0, m1
@@ -475,15 +474,14 @@ cglobal put_vp8_epel8_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picre
     pxor      m7, m7
 
     ; read 5 lines
-    sub     srcq, srcstrideq
-    sub     srcq, srcstrideq
-    movh      m0, [srcq]
-    movh      m1, [srcq+srcstrideq]
-    movh      m2, [srcq+srcstrideq*2]
+    mov  picregq, srcstrideq
+    neg  picregq
+    movh      m0, [srcq+2*picregq]
+    movh      m1, [srcq+picregq]
+    movh      m2, [srcq]
+    movh      m3, [srcq+srcstrideq]
+    movh      m4, [srcq+2*srcstrideq]
     lea     srcq, [srcq+srcstrideq*2]
-    add     srcq, srcstrideq
-    movh      m3, [srcq]
-    movh      m4, [srcq+srcstrideq]
     punpcklbw m0, m7
     punpcklbw m1, m7
     punpcklbw m2, m7
@@ -499,7 +497,7 @@ cglobal put_vp8_epel8_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picre
     paddsw    m6, m5
 
     ; then calculate positive taps
-    movh      m5, [srcq+2*srcstrideq]      ; read new row
+    movh      m5, [srcq+srcstrideq]      ; read new row
     punpcklbw m5, m7
     pmullw    m0, [myq+0]
     paddsw    m6, m0
-- 
2.52.0


From e812daf3354d6298f63e4da72fc831f162d022b7 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sun, 23 Nov 2025 13:15:07 +0100
Subject: [PATCH 202/304] avcodec/x86/vp8dsp: Increment src pointer earlier

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp8dsp.asm | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/libavcodec/x86/vp8dsp.asm b/libavcodec/x86/vp8dsp.asm
index 6b5ca9f309..0d37012e9d 100644
--- a/libavcodec/x86/vp8dsp.asm
+++ b/libavcodec/x86/vp8dsp.asm
@@ -166,6 +166,7 @@ cglobal put_vp8_epel%1_h6, 6, 6 + npicregs, 8, dst, dststride, src, srcstride, h
     pmaddubsw m0, m5
     pmaddubsw m1, m6
     pmaddubsw m2, m7
+    add     srcq, srcstrideq
     paddsw    m0, m1
     paddsw    m0, m2
     pmulhrsw  m0, [pw_256]
@@ -174,7 +175,6 @@ cglobal put_vp8_epel%1_h6, 6, 6 + npicregs, 8, dst, dststride, src, srcstride, h
 
     ; go to next line
     add     dstq, dststrideq
-    add     srcq, srcstrideq
     dec  heightd            ; next row
     jg .nextrow
     RET
@@ -197,6 +197,7 @@ cglobal put_vp8_epel%1_h4, 6, 6 + npicregs, 7, dst, dststride, src, srcstride, h
     pshufb    m1, m4
     pmaddubsw m0, m5
     pmaddubsw m1, m6
+    add     srcq, srcstrideq
     paddsw    m0, m1
     pmulhrsw  m0, m2
     packuswb  m0, m0
@@ -204,7 +205,6 @@ cglobal put_vp8_epel%1_h4, 6, 6 + npicregs, 7, dst, dststride, src, srcstride, h
 
     ; go to next line
     add     dstq, dststrideq
-    add     srcq, srcstrideq
     dec  heightd            ; next row
     jg .nextrow
     RET
@@ -234,6 +234,7 @@ cglobal put_vp8_epel%1_v4, 7, 7, 8, dst, dststride, src, srcstride, height, picr
     punpcklbw m2, m3
     pmaddubsw m4, m5
     pmaddubsw m2, m6
+    add      srcq, srcstrideq
     paddsw    m4, m2
     mova      m2, m3
     pmulhrsw  m4, m7
@@ -242,7 +243,6 @@ cglobal put_vp8_epel%1_v4, 7, 7, 8, dst, dststride, src, srcstride, height, picr
 
     ; go to next line
     add      dstq, dststrideq
-    add      srcq, srcstrideq
     dec   heightd                          ; next row
     jg .nextrow
     RET
@@ -275,6 +275,7 @@ cglobal put_vp8_epel%1_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picr
     pmaddubsw m6, [myq-48]
     pmaddubsw m1, [myq-32]
     pmaddubsw m7, [myq-16]
+    add      srcq, srcstrideq
     paddsw    m6, m1
     paddsw    m6, m7
     mova      m1, m2
@@ -287,7 +288,6 @@ cglobal put_vp8_epel%1_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picr
 
     ; go to next line
     add      dstq, dststrideq
-    add      srcq, srcstrideq
     dec   heightd                          ; next row
     jg .nextrow
     RET
@@ -331,6 +331,7 @@ cglobal put_vp8_epel8_h4, 6, 6 + npicregs, 10, dst, dststride, src, srcstride, h
     pmullw    m2, [mxq+32]
     pmullw    m3, [mxq+48]
 %endif
+    add     srcq, srcstrideq
     paddsw    m0, m1
     paddsw    m2, m3
     paddsw    m0, m2
@@ -341,7 +342,6 @@ cglobal put_vp8_epel8_h4, 6, 6 + npicregs, 10, dst, dststride, src, srcstride, h
 
     ; go to next line
     add     dstq, dststrideq
-    add     srcq, srcstrideq
     dec  heightd            ; next row
     jg .nextrow
     RET
@@ -392,6 +392,7 @@ cglobal put_vp8_epel8_h6, 6, 6 + npicregs, 14, dst, dststride, src, srcstride, h
     pmullw    m4, [mxq+64]
     pmullw    m5, [mxq+80]
 %endif
+    add     srcq, srcstrideq
     paddsw    m1, m4
     paddsw    m0, m5
     paddsw    m1, m2
@@ -404,7 +405,6 @@ cglobal put_vp8_epel8_h6, 6, 6 + npicregs, 14, dst, dststride, src, srcstride, h
 
     ; go to next line
     add     dstq, dststrideq
-    add     srcq, srcstrideq
     dec  heightd            ; next row
     jg .nextrow
     RET
@@ -446,6 +446,7 @@ cglobal put_vp8_epel8_v4, 7, 7, 8, dst, dststride, src, srcstride, height, picre
     paddsw    m4, m1
     mova      m1, m2
     pmullw    m2, [myq+32]
+    add     srcq, srcstrideq
     paddsw    m4, m2
     mova      m2, m3
 
@@ -457,7 +458,6 @@ cglobal put_vp8_epel8_v4, 7, 7, 8, dst, dststride, src, srcstride, height, picre
 
     ; go to next line
     add     dstq, dststrideq
-    add     srcq, srcstrideq
     dec  heightd                           ; next row
     jg .nextrow
     RET
@@ -507,6 +507,7 @@ cglobal put_vp8_epel8_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picre
     paddsw    m6, m2
     mova      m2, m3
     pmullw    m3, [myq+48]
+    add     srcq, srcstrideq
     paddsw    m6, m3
     mova      m3, m4
     mova      m4, m5
@@ -521,7 +522,6 @@ cglobal put_vp8_epel8_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picre
 
     ; go to next line
     add     dstq, dststrideq
-    add     srcq, srcstrideq
     dec  heightd                           ; next row
     jg .nextrow
     RET
@@ -543,6 +543,7 @@ cglobal put_vp8_bilinear%1_v, 7, 7, 5, dst, dststride, src, srcstride, height, p
     punpcklbw m1, m2
     pmaddubsw m0, m3
     pmaddubsw m1, m3
+    lea     srcq, [srcq+srcstrideq*2]
     psraw     m0, 2
     psraw     m1, 2
     pavgw     m0, m4
@@ -579,6 +580,7 @@ cglobal put_vp8_bilinear%1_v, 7, 7, 7, dst, dststride, src, srcstride, height, p
     pmullw    m1, m5
     pmullw    m2, m4
     pmullw    m3, m5
+    lea     srcq, [srcq+srcstrideq*2]
     paddsw    m0, m1
     paddsw    m2, m3
     psraw     m0, 2
@@ -591,7 +593,6 @@ cglobal put_vp8_bilinear%1_v, 7, 7, 7, dst, dststride, src, srcstride, height, p
 %endif ; cpuflag(ssse3)
 
     lea     dstq, [dstq+dststrideq*2]
-    lea     srcq, [srcq+srcstrideq*2]
     sub  heightd, 2
     jg .nextrow
     RET
@@ -612,6 +613,7 @@ cglobal put_vp8_bilinear%1_h, 6, 6 + npicregs, 5, dst, dststride, src, srcstride
     pshufb    m1, m2
     pmaddubsw m0, m3
     pmaddubsw m1, m3
+    lea     srcq, [srcq+srcstrideq*2]
     psraw     m0, 2
     psraw     m1, 2
     pavgw     m0, m4
@@ -649,6 +651,7 @@ cglobal put_vp8_bilinear%1_h, 6, 6 + npicregs, 7, dst, dststride, src, srcstride
     pmullw    m1, m5
     pmullw    m2, m4
     pmullw    m3, m5
+    lea     srcq, [srcq+srcstrideq*2]
     paddsw    m0, m1
     paddsw    m2, m3
     psraw     m0, 2
@@ -661,7 +664,6 @@ cglobal put_vp8_bilinear%1_h, 6, 6 + npicregs, 7, dst, dststride, src, srcstride
 %endif ; cpuflag(ssse3)
 
     lea     dstq, [dstq+dststrideq*2]
-    lea     srcq, [srcq+srcstrideq*2]
     sub  heightd, 2
     jg .nextrow
     RET
-- 
2.52.0


From 2c7bb090bd97cca78294818f1f8709338cc63a29 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sun, 23 Nov 2025 13:27:35 +0100
Subject: [PATCH 203/304] avcodec/x86/vp8dsp: Avoid reload

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp8dsp.asm | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libavcodec/x86/vp8dsp.asm b/libavcodec/x86/vp8dsp.asm
index 0d37012e9d..e971da68ac 100644
--- a/libavcodec/x86/vp8dsp.asm
+++ b/libavcodec/x86/vp8dsp.asm
@@ -535,8 +535,8 @@ cglobal put_vp8_bilinear%1_v, 7, 7, 5, dst, dststride, src, srcstride, height, p
 %endif
     pxor      m4, m4
     mova      m3, [bilinear_filter_vb+myq-16]
-.nextrow:
     movh      m0, [srcq+srcstrideq*0]
+.nextrow:
     movh      m1, [srcq+srcstrideq*1]
     movh      m2, [srcq+srcstrideq*2]
     punpcklbw m0, m1
@@ -558,6 +558,7 @@ cglobal put_vp8_bilinear%1_v, 7, 7, 5, dst, dststride, src, srcstride, height, p
     movh   [dstq+dststrideq*0], m0
     movhps [dstq+dststrideq*1], m0
 %endif
+    mova      m0, m2
 %else ; cpuflag(ssse3)
 cglobal put_vp8_bilinear%1_v, 7, 7, 7, dst, dststride, src, srcstride, height, picreg, my
     shl      myd, 4
-- 
2.52.0


From 1b078c82c90c56ccb95bf6881496cfd287f41a4e Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sun, 23 Nov 2025 15:39:48 +0100
Subject: [PATCH 204/304] avcodec/x86/vp8dsp_init: Remove unused macro

Forgotten in 6a551f14050674fb685920eb1b0640810cacccf9.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp8dsp_init.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/libavcodec/x86/vp8dsp_init.c b/libavcodec/x86/vp8dsp_init.c
index 40aa52c7f0..828b038cdf 100644
--- a/libavcodec/x86/vp8dsp_init.c
+++ b/libavcodec/x86/vp8dsp_init.c
@@ -105,16 +105,6 @@ static void ff_put_vp8_ ## FILTERTYPE ## 16_ ## TAPTYPE ## _ ## OPT( \
     ff_put_vp8_ ## FILTERTYPE ## 8_ ## TAPTYPE ## _ ## OPT( \
         dst + 8, dststride, src + 8, srcstride, height, mx, my); \
 }
-#define TAP_W8(OPT, FILTERTYPE, TAPTYPE) \
-static void ff_put_vp8_ ## FILTERTYPE ## 8_ ## TAPTYPE ## _ ## OPT( \
-    uint8_t *dst,  ptrdiff_t dststride, uint8_t *src, \
-    ptrdiff_t srcstride, int height, int mx, int my) \
-{ \
-    ff_put_vp8_ ## FILTERTYPE ## 4_ ## TAPTYPE ## _ ## OPT( \
-        dst,     dststride, src,     srcstride, height, mx, my); \
-    ff_put_vp8_ ## FILTERTYPE ## 4_ ## TAPTYPE ## _ ## OPT( \
-        dst + 4, dststride, src + 4, srcstride, height, mx, my); \
-}
 
 TAP_W16(sse2,  epel, h6)
 TAP_W16(sse2,  epel, v6)
-- 
2.52.0


From bfe88118541a687a661b9b768cec81d70a3d42b3 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sun, 23 Nov 2025 20:25:26 +0100
Subject: [PATCH 205/304] avcodec/x86/vp8dsp: Avoid unpacking multiple times

Always pair row i with row i+2 for the vertical four-tap filter
and row i+3 for the vertical six-tap filter (instead of pairing
the first with the sixth, the second with the third and the fourth
and the fifth). This allows to unpack each row only once instead
of (at most) three times.

Old benchmarks:
vp8_put_epel4_v4_c:                                     98.4 ( 1.00x)
vp8_put_epel4_v4_ssse3:                                 28.6 ( 3.44x)
vp8_put_epel4_v6_c:                                    131.6 ( 1.00x)
vp8_put_epel4_v6_ssse3:                                 38.5 ( 3.42x)
vp8_put_epel8_v4_c:                                    362.5 ( 1.00x)
vp8_put_epel8_v4_sse2:                                  63.8 ( 5.68x)
vp8_put_epel8_v4_ssse3:                                 44.4 ( 8.16x)
vp8_put_epel8_v6_c:                                    538.3 ( 1.00x)
vp8_put_epel8_v6_sse2:                                  86.5 ( 6.22x)
vp8_put_epel8_v6_ssse3:                                 57.0 ( 9.44x)
vp8_put_epel16_v6_c:                                  1044.6 ( 1.00x)
vp8_put_epel16_v6_sse2:                                158.0 ( 6.61x)
vp8_put_epel16_v6_ssse3:                               106.7 ( 9.79x)

New benchmarks:
vp8_put_epel4_v4_c:                                    100.0 ( 1.00x)
vp8_put_epel4_v4_ssse3:                                 28.4 ( 3.52x)
vp8_put_epel4_v6_c:                                    131.7 ( 1.00x)
vp8_put_epel4_v6_ssse3:                                 34.3 ( 3.84x)
vp8_put_epel8_v4_c:                                    364.4 ( 1.00x)
vp8_put_epel8_v4_sse2:                                  63.7 ( 5.72x)
vp8_put_epel8_v4_ssse3:                                 43.3 ( 8.42x)
vp8_put_epel8_v6_c:                                    550.2 ( 1.00x)
vp8_put_epel8_v6_sse2:                                  86.4 ( 6.37x)
vp8_put_epel8_v6_ssse3:                                 52.9 (10.40x)
vp8_put_epel16_v6_c:                                  1052.5 ( 1.00x)
vp8_put_epel16_v6_sse2:                                158.3 ( 6.65x)
vp8_put_epel16_v6_ssse3:                                98.9 (10.64x)

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp8dsp.asm | 68 +++++++++++++++++++++++++--------------
 1 file changed, 44 insertions(+), 24 deletions(-)

diff --git a/libavcodec/x86/vp8dsp.asm b/libavcodec/x86/vp8dsp.asm
index e971da68ac..7cb729a443 100644
--- a/libavcodec/x86/vp8dsp.asm
+++ b/libavcodec/x86/vp8dsp.asm
@@ -33,6 +33,15 @@ fourtap_filter_hb_m: times 8 db  -6, 123
                      times 8 db  -1,  12
                      times 8 db 123,  -6
 
+fourtap_filter_b_m:  times 8 db  -6,  12
+                     times 8 db 123,  -1
+                     times 8 db  -9,  50
+                     times 8 db  93,  -6
+                     times 8 db  -6,  93
+                     times 8 db  50,  -9
+                     times 8 db  -1, 123
+                     times 8 db  12,  -6
+
 sixtap_filter_hb_m:  times 8 db   2,   1
                      times 8 db -11, 108
                      times 8 db  36,  -8
@@ -43,6 +52,16 @@ sixtap_filter_hb_m:  times 8 db   2,   1
                      times 8 db  -8,  36
                      times 8 db 108, -11
 
+sixtap_filter_b_m:   times 8 db   2,  36
+                     times 8 db -11,  -8
+                     times 8 db 108,   1
+                     times 8 db   3,  77
+                     times 8 db -16, -16
+                     times 8 db  77,   3
+                     times 8 db   1, 108
+                     times 8 db  -8, -11
+                     times 8 db  36,   2
+
 fourtap_filter_v_m:  times 8 dw  -6
                      times 8 dw 123
                      times 8 dw  12
@@ -97,7 +116,9 @@ bilinear_filter_vb_m: times 8 db 7, 1
 
 %if PIC
 %define fourtap_filter_hb  picregq
+%define fourtap_filter_b   picregq
 %define sixtap_filter_hb   picregq
+%define sixtap_filter_b    picregq
 %define fourtap_filter_v   picregq
 %define sixtap_filter_v    picregq
 %define bilinear_filter_vw picregq
@@ -105,7 +126,9 @@ bilinear_filter_vb_m: times 8 db 7, 1
 %define npicregs 1
 %else
 %define fourtap_filter_hb  fourtap_filter_hb_m
+%define fourtap_filter_b   fourtap_filter_b_m
 %define sixtap_filter_hb   sixtap_filter_hb_m
+%define sixtap_filter_b    sixtap_filter_b_m
 %define fourtap_filter_v   fourtap_filter_v_m
 %define sixtap_filter_v    sixtap_filter_v_m
 %define bilinear_filter_vw bilinear_filter_vw_m
@@ -212,10 +235,10 @@ cglobal put_vp8_epel%1_h4, 6, 6 + npicregs, 7, dst, dststride, src, srcstride, h
 cglobal put_vp8_epel%1_v4, 7, 7, 8, dst, dststride, src, srcstride, height, picreg, my
     shl      myd, 4
 %if PIC
-    lea  picregq, [fourtap_filter_hb_m]
+    lea  picregq, [fourtap_filter_b_m]
 %endif
-    mova      m5, [fourtap_filter_hb+myq-16]
-    mova      m6, [fourtap_filter_hb+myq]
+    mova      m5, [fourtap_filter_b+myq-16]
+    mova      m6, [fourtap_filter_b+myq]
     mova      m7, [pw_256]
 
     ; read 3 lines
@@ -224,21 +247,20 @@ cglobal put_vp8_epel%1_v4, 7, 7, 8, dst, dststride, src, srcstride, height, picr
     movh      m0, [srcq+picregq]
     movh      m1, [srcq]
     movh      m2, [srcq+srcstrideq]
+    punpcklbw m0, m2
 
 .nextrow:
     movh      m3, [srcq+2*srcstrideq]      ; read new row
-    mova      m4, m0
+    pmaddubsw m0, m5
+    punpcklbw m1, m3
+    pmaddubsw m4, m1, m6
+    add     srcq, srcstrideq
+    paddsw    m4, m0
     mova      m0, m1
-    punpcklbw m4, m1
-    mova      m1, m2
-    punpcklbw m2, m3
-    pmaddubsw m4, m5
-    pmaddubsw m2, m6
-    add      srcq, srcstrideq
-    paddsw    m4, m2
-    mova      m2, m3
     pmulhrsw  m4, m7
+    mova      m1, m2
     packuswb  m4, m4
+    mova      m2, m3
     movh  [dstq], m4
 
     ; go to next line
@@ -250,9 +272,9 @@ cglobal put_vp8_epel%1_v4, 7, 7, 8, dst, dststride, src, srcstride, height, picr
 cglobal put_vp8_epel%1_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picreg, my
     lea      myd, [myq*3]
 %if PIC
-    lea  picregq, [sixtap_filter_hb_m]
+    lea  picregq, [sixtap_filter_b_m]
 %endif
-    lea      myq, [sixtap_filter_hb+myq*8]
+    lea      myq, [sixtap_filter_b+myq*8]
 
     ; read 5 lines
     mov  picregq, srcstrideq
@@ -263,20 +285,18 @@ cglobal put_vp8_epel%1_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picr
     movh      m3, [srcq+srcstrideq]
     movh      m4, [srcq+2*srcstrideq]
     lea     srcq, [srcq+srcstrideq*2]
+    punpcklbw m0, m3
+    punpcklbw m1, m4
 
 .nextrow:
     movh      m5, [srcq+srcstrideq]      ; read new row
-    mova      m6, m0
-    punpcklbw m6, m5
+    pmaddubsw m0, [myq-48]
+    punpcklbw m2, m5
+    pmaddubsw m6, m1, [myq-32]
+    pmaddubsw m7, m2, [myq-16]
+    add     srcq, srcstrideq
+    paddw     m6, m0
     mova      m0, m1
-    punpcklbw m1, m2
-    mova      m7, m3
-    punpcklbw m7, m4
-    pmaddubsw m6, [myq-48]
-    pmaddubsw m1, [myq-32]
-    pmaddubsw m7, [myq-16]
-    add      srcq, srcstrideq
-    paddsw    m6, m1
     paddsw    m6, m7
     mova      m1, m2
     mova      m2, m3
-- 
2.52.0


From 05dac08c76b136fed97759c320d3391e88ffde14 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sun, 23 Nov 2025 23:29:24 +0100
Subject: [PATCH 206/304] avcodec/x86/vp8dsp: Don't use MMX registers in
 ff_put_vp8_epel4_v6_ssse3

Switching to xmm registers allows to process two rows in parallel,
leading to speedups. It is also ABI compliant (no more missing emms).

Old benchmarks:
vp8_put_epel4_v6_c:                                    132.8 ( 1.00x)
vp8_put_epel4_v6_ssse3:                                 34.3 ( 3.87x)

New benchmarks:
vp8_put_epel4_v6_c:                                    131.5 ( 1.00x)
vp8_put_epel4_v6_ssse3:                                 27.1 ( 4.86x)

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp8dsp.asm | 48 +++++++++++++++++++++++++++++++++++----
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/libavcodec/x86/vp8dsp.asm b/libavcodec/x86/vp8dsp.asm
index 7cb729a443..4778944ac7 100644
--- a/libavcodec/x86/vp8dsp.asm
+++ b/libavcodec/x86/vp8dsp.asm
@@ -162,6 +162,12 @@ SECTION .text
 ;-------------------------------------------------------------------------------
 
 %macro FILTER_SSSE3 1
+%if %1 == 4
+%define MOV movd
+%else
+%define MOV movq
+%endif
+
 cglobal put_vp8_epel%1_h6, 6, 6 + npicregs, 8, dst, dststride, src, srcstride, height, mx, picreg
     lea      mxd, [mxq*3]
     mova      m3, [filter_h6_shuf2]
@@ -269,6 +275,7 @@ cglobal put_vp8_epel%1_v4, 7, 7, 8, dst, dststride, src, srcstride, height, picr
     jg .nextrow
     RET
 
+INIT_XMM ssse3
 cglobal put_vp8_epel%1_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picreg, my
     lea      myd, [myq*3]
 %if PIC
@@ -279,14 +286,44 @@ cglobal put_vp8_epel%1_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picr
     ; read 5 lines
     mov  picregq, srcstrideq
     neg  picregq
-    movh      m0, [srcq+2*picregq]
-    movh      m1, [srcq+picregq]
-    movh      m2, [srcq]
-    movh      m3, [srcq+srcstrideq]
-    movh      m4, [srcq+2*srcstrideq]
+    MOV       m0, [srcq+2*picregq]
+    MOV       m1, [srcq+picregq]
+    MOV       m2, [srcq]
+    MOV       m3, [srcq+srcstrideq]
+    MOV       m4, [srcq+2*srcstrideq]
     lea     srcq, [srcq+srcstrideq*2]
     punpcklbw m0, m3
     punpcklbw m1, m4
+%if %1 == 4
+    punpcklqdq m0, m1
+
+.next2rows:
+    movd       m5, [srcq+srcstrideq]
+    movd       m6, [srcq+2*srcstrideq]
+    pmaddubsw  m0, [myq-48]
+    punpcklbw  m2, m5
+    punpcklqdq m1, m2
+    pmaddubsw  m1, [myq-32]
+    punpcklbw  m3, m6
+    punpcklqdq m2, m3
+    paddw      m0, m1
+    pmaddubsw  m1, m2, [myq-16]
+    lea      srcq, [srcq+2*srcstrideq]
+    paddsw     m1, m0
+    mova       m0, m2
+    pmulhrsw   m1, [pw_256]
+    mova       m2, m4
+    packuswb   m1, m1
+    movd   [dstq], m1
+    mova       m4, m6
+    psrldq     m1, 4
+    movd [dstq+dststrideq], m1
+    lea      dstq, [dstq+2*dststrideq]
+    mova       m1, m3
+    mova       m3, m5
+    sub   heightd, 2
+    jg .next2rows
+%else
 
 .nextrow:
     movh      m5, [srcq+srcstrideq]      ; read new row
@@ -310,6 +347,7 @@ cglobal put_vp8_epel%1_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picr
     add      dstq, dststrideq
     dec   heightd                          ; next row
     jg .nextrow
+%endif
     RET
 %endmacro
 
-- 
2.52.0


From 23a13b2badc3eccdd174bcf23bd7a52edf75b5eb Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Mon, 24 Nov 2025 09:16:26 +0100
Subject: [PATCH 207/304] avcodec/x86/vp8dsp: Don't use MMX registers in
 ff_put_vp8_epel4_v4_ssse3

Switching to xmm registers allows to process two rows in parallel,
leading to speedups. It is also ABI compliant (no more missing emms).

Old benchmarks:
vp8_put_epel4_v4_c:                                     96.8 ( 1.00x)
vp8_put_epel4_v4_ssse3:                                 28.2 ( 3.43x)

New benchmarks:
vp8_put_epel4_v4_c:                                     95.1 ( 1.00x)
vp8_put_epel4_v4_ssse3:                                 22.8 ( 4.17x)

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp8dsp.asm | 36 +++++++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/libavcodec/x86/vp8dsp.asm b/libavcodec/x86/vp8dsp.asm
index 4778944ac7..fd60feaf1f 100644
--- a/libavcodec/x86/vp8dsp.asm
+++ b/libavcodec/x86/vp8dsp.asm
@@ -238,6 +238,7 @@ cglobal put_vp8_epel%1_h4, 6, 6 + npicregs, 7, dst, dststride, src, srcstride, h
     jg .nextrow
     RET
 
+INIT_XMM ssse3
 cglobal put_vp8_epel%1_v4, 7, 7, 8, dst, dststride, src, srcstride, height, picreg, my
     shl      myd, 4
 %if PIC
@@ -250,13 +251,38 @@ cglobal put_vp8_epel%1_v4, 7, 7, 8, dst, dststride, src, srcstride, height, picr
     ; read 3 lines
     mov  picregq, srcstrideq
     neg  picregq
-    movh      m0, [srcq+picregq]
-    movh      m1, [srcq]
-    movh      m2, [srcq+srcstrideq]
+    MOV       m0, [srcq+picregq]
+    MOV       m1, [srcq]
+    MOV       m2, [srcq+srcstrideq]
+    lea     srcq, [srcq+2*srcstrideq]
     punpcklbw m0, m2
 
+%if %1 == 4
+.next2rows:
+    movd       m3, [srcq]
+    movd       m4, [srcq+srcstrideq]
+    punpcklbw  m1, m3
+    punpcklqdq m0, m1
+    punpcklbw  m2, m4
+    pmaddubsw  m0, m5
+    punpcklqdq m1, m2
+    pmaddubsw  m1, m6
+    lea     srcq, [srcq+2*srcstrideq]
+    paddsw     m1, m0
+    pmulhrsw   m1, m7
+    mova       m0, m2
+    packuswb   m1, m1
+    movd   [dstq], m1
+    mova       m2, m4
+    psrldq     m1, 4
+    movd [dstq+dststrideq], m1
+    mova       m1, m3
+    lea      dstq, [dstq+2*dststrideq]
+    sub   heightd, 2
+    jg .next2rows
+%else
 .nextrow:
-    movh      m3, [srcq+2*srcstrideq]      ; read new row
+    movh      m3, [srcq]      ; read new row
     pmaddubsw m0, m5
     punpcklbw m1, m3
     pmaddubsw m4, m1, m6
@@ -273,9 +299,9 @@ cglobal put_vp8_epel%1_v4, 7, 7, 8, dst, dststride, src, srcstride, height, picr
     add      dstq, dststrideq
     dec   heightd                          ; next row
     jg .nextrow
+%endif
     RET
 
-INIT_XMM ssse3
 cglobal put_vp8_epel%1_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picreg, my
     lea      myd, [myq*3]
 %if PIC
-- 
2.52.0


From cac02b4f5673b198a8e41b31ead86124d641c355 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Mon, 24 Nov 2025 13:29:42 +0100
Subject: [PATCH 208/304] avcodec/x86/vp8dsp: Don't use MMX registers in
 ff_put_vp8_epel4_h4_ssse3

Doubling the register width allows to use only one pshufb and pmaddubsw.

Old benchmarks:
vp8_put_epel4_h4_c:                                     82.8 ( 1.00x)
vp8_put_epel4_h4_ssse3:                                 13.9 ( 5.96x)

New benchmarks:
vp8_put_epel4_h4_c:                                     82.7 ( 1.00x)
vp8_put_epel4_h4_ssse3:                                 11.7 ( 7.08x)

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp8dsp.asm | 36 ++++++++++++++++++++++++++++++++----
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/libavcodec/x86/vp8dsp.asm b/libavcodec/x86/vp8dsp.asm
index fd60feaf1f..6c365898ce 100644
--- a/libavcodec/x86/vp8dsp.asm
+++ b/libavcodec/x86/vp8dsp.asm
@@ -24,6 +24,15 @@
 
 SECTION_RODATA
 
+fourtap_filter4_b_m: times 4 db  -6, 123
+                     times 4 db  12,  -1
+                     times 4 db  -9,  93
+                     times 4 db  50,  -6
+                     times 4 db  -6,  50
+                     times 4 db  93,  -9
+                     times 4 db  -1,  12
+                     times 4 db 123,  -6
+
 fourtap_filter_hb_m: times 8 db  -6, 123
                      times 8 db  12,  -1
                      times 8 db  -9,  93
@@ -117,6 +126,7 @@ bilinear_filter_vb_m: times 8 db 7, 1
 %if PIC
 %define fourtap_filter_hb  picregq
 %define fourtap_filter_b   picregq
+%define fourtap_filter4_b  picregq
 %define sixtap_filter_hb   picregq
 %define sixtap_filter_b    picregq
 %define fourtap_filter_v   picregq
@@ -127,6 +137,7 @@ bilinear_filter_vb_m: times 8 db 7, 1
 %else
 %define fourtap_filter_hb  fourtap_filter_hb_m
 %define fourtap_filter_b   fourtap_filter_b_m
+%define fourtap_filter4_b  fourtap_filter4_b_m
 %define sixtap_filter_hb   sixtap_filter_hb_m
 %define sixtap_filter_b    sixtap_filter_b_m
 %define fourtap_filter_v   fourtap_filter_v_m
@@ -136,6 +147,7 @@ bilinear_filter_vb_m: times 8 db 7, 1
 %define npicregs 0
 %endif
 
+filter4_h4_shuf: db 0, 1, 1, 2, 2, 3, 3, 4, 2, 3, 3,  4, 4,  5, 5,  6
 filter_h2_shuf:  db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5,  6, 6,  7,  7,  8
 filter_h4_shuf:  db 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7,  8, 8,  9,  9, 10
 
@@ -208,9 +220,11 @@ cglobal put_vp8_epel%1_h6, 6, 6 + npicregs, 8, dst, dststride, src, srcstride, h
     jg .nextrow
     RET
 
-cglobal put_vp8_epel%1_h4, 6, 6 + npicregs, 7, dst, dststride, src, srcstride, height, mx, picreg
-    shl      mxd, 4
+INIT_XMM ssse3
+cglobal put_vp8_epel%1_h4, 6, 6 + npicregs, 6+!!(%1 == 8), dst, dststride, src, srcstride, height, mx, picreg
     mova      m2, [pw_256]
+%if %1 == 8
+    shl      mxd, 4
     mova      m3, [filter_h2_shuf]
     mova      m4, [filter_h4_shuf]
 %if PIC
@@ -218,19 +232,34 @@ cglobal put_vp8_epel%1_h4, 6, 6 + npicregs, 7, dst, dststride, src, srcstride, h
 %endif
     mova      m5, [fourtap_filter_hb+mxq-16] ; set up 4tap filter in bytes
     mova      m6, [fourtap_filter_hb+mxq]
+%else
+    shl      mxd, 3
+    mova      m3, [filter4_h4_shuf]
+%if PIC
+    lea  picregq, [fourtap_filter4_b_m]
+%endif
+    mova      m5, [fourtap_filter4_b+mxq-8]
+%endif
 
 .nextrow:
+%if %1 == 4
+    movq      m0, [srcq-1]
+    pshufb    m0, m3
+    pmaddubsw m0, m5
+    movhlps   m1, m0
+%else
     movu      m0, [srcq-1]
     mova      m1, m0
     pshufb    m0, m3
     pshufb    m1, m4
     pmaddubsw m0, m5
     pmaddubsw m1, m6
+%endif
     add     srcq, srcstrideq
     paddsw    m0, m1
     pmulhrsw  m0, m2
     packuswb  m0, m0
-    movh  [dstq], m0        ; store
+    MOV   [dstq], m0        ; store
 
     ; go to next line
     add     dstq, dststrideq
@@ -238,7 +267,6 @@ cglobal put_vp8_epel%1_h4, 6, 6 + npicregs, 7, dst, dststride, src, srcstride, h
     jg .nextrow
     RET
 
-INIT_XMM ssse3
 cglobal put_vp8_epel%1_v4, 7, 7, 8, dst, dststride, src, srcstride, height, picreg, my
     shl      myd, 4
 %if PIC
-- 
2.52.0


From 04b77783c042955d6cac27512449e5208a20d31e Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Mon, 24 Nov 2025 16:11:10 +0100
Subject: [PATCH 209/304] avcodec/x86/vp8dsp: Don't use MMX registers in
 ff_put_vp8_epel4_h6_ssse3

Doubling the register width allowed to avoid a pshufb and a pmaddubsw.

Old benchmarks:
vp8_put_epel4_h6_c:                                    115.9 ( 1.00x)
vp8_put_epel4_h6_ssse3:                                 20.2 ( 5.74x)
vp8_put_epel4_h6v4_c:                                  276.3 ( 1.00x)
vp8_put_epel4_h6v4_ssse3:                               58.6 ( 4.71x)
vp8_put_epel4_h6v6_c:                                  363.6 ( 1.00x)
vp8_put_epel4_h6v6_ssse3:                               62.5 ( 5.82x)

New benchmarks:
vp8_put_epel4_h6_c:                                    116.4 ( 1.00x)
vp8_put_epel4_h6_ssse3:                                 16.0 ( 7.29x)
vp8_put_epel4_h6v4_c:                                  280.9 ( 1.00x)
vp8_put_epel4_h6v4_ssse3:                               44.3 ( 6.33x)
vp8_put_epel4_h6v6_c:                                  365.6 ( 1.00x)
vp8_put_epel4_h6v6_ssse3:                               53.1 ( 6.89x)

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp8dsp.asm | 50 +++++++++++++++++++++++++++++----------
 1 file changed, 38 insertions(+), 12 deletions(-)

diff --git a/libavcodec/x86/vp8dsp.asm b/libavcodec/x86/vp8dsp.asm
index 6c365898ce..2a66e51da6 100644
--- a/libavcodec/x86/vp8dsp.asm
+++ b/libavcodec/x86/vp8dsp.asm
@@ -33,6 +33,16 @@ fourtap_filter4_b_m: times 4 db  -6, 123
                      times 4 db  -1,  12
                      times 4 db 123,  -6
 
+sixtap_filter4_hb_m: times 8 db   2, -11
+                     times 4 db 108,  -8
+                     times 4 db  36,   1
+                     times 8 db   3, -16
+                     times 4 db  77, -16
+                     times 4 db  77,   3
+                     times 8 db   1,  -8
+                     times 4 db  36, -11
+                     times 4 db 108,   2
+
 fourtap_filter_hb_m: times 8 db  -6, 123
                      times 8 db  12,  -1
                      times 8 db  -9,  93
@@ -129,6 +139,7 @@ bilinear_filter_vb_m: times 8 db 7, 1
 %define fourtap_filter4_b  picregq
 %define sixtap_filter_hb   picregq
 %define sixtap_filter_b    picregq
+%define sixtap_filter4_hb  picregq
 %define fourtap_filter_v   picregq
 %define sixtap_filter_v    picregq
 %define bilinear_filter_vw picregq
@@ -140,6 +151,7 @@ bilinear_filter_vb_m: times 8 db 7, 1
 %define fourtap_filter4_b  fourtap_filter4_b_m
 %define sixtap_filter_hb   sixtap_filter_hb_m
 %define sixtap_filter_b    sixtap_filter_b_m
+%define sixtap_filter4_hb  sixtap_filter4_hb_m
 %define fourtap_filter_v   fourtap_filter_v_m
 %define sixtap_filter_v    sixtap_filter_v_m
 %define bilinear_filter_vw bilinear_filter_vw_m
@@ -148,6 +160,7 @@ bilinear_filter_vb_m: times 8 db 7, 1
 %endif
 
 filter4_h4_shuf: db 0, 1, 1, 2, 2, 3, 3, 4, 2, 3, 3,  4, 4,  5, 5,  6
+filter4_h6_shuf: db 1, 3, 2, 4, 3, 5, 4, 6, 2, 4, 3,  5, 4,  6, 5,  7
 filter_h2_shuf:  db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5,  6, 6,  7,  7,  8
 filter_h4_shuf:  db 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7,  8, 8,  9,  9, 10
 
@@ -180,7 +193,16 @@ SECTION .text
 %define MOV movq
 %endif
 
-cglobal put_vp8_epel%1_h6, 6, 6 + npicregs, 8, dst, dststride, src, srcstride, height, mx, picreg
+cglobal put_vp8_epel%1_h6, 6, 6 + npicregs, 6+2*(%1==8), dst, dststride, src, srcstride, height, mx, picreg
+%if %1 == 4
+    mova      m3, [filter4_h6_shuf]
+%if PIC
+    lea  picregq, [sixtap_filter4_hb_m]
+%endif
+    shl      mxd, 4
+    mova      m4, [sixtap_filter4_hb+mxq-32]
+    mova      m5, [sixtap_filter4_hb+mxq-16]
+%else
     lea      mxd, [mxq*3]
     mova      m3, [filter_h6_shuf2]
     mova      m4, [filter_h6_shuf3]
@@ -190,29 +212,35 @@ cglobal put_vp8_epel%1_h6, 6, 6 + npicregs, 8, dst, dststride, src, srcstride, h
     mova      m5, [sixtap_filter_hb+mxq*8-48] ; set up 6tap filter in bytes
     mova      m6, [sixtap_filter_hb+mxq*8-32]
     mova      m7, [sixtap_filter_hb+mxq*8-16]
+%endif
 
 .nextrow:
+%if %1 == 4
+    ; we need nine bytes, so two loads
+    movq      m1, [srcq-1]
+    movq      m0, [srcq-2]
+    punpcklbw m0, m1
+    pshufb    m1, m3
+    pmaddubsw m1, m5
+    pmaddubsw m0, m4
+    movhlps   m2, m1
+%else
     movu      m0, [srcq-2]
     mova      m1, m0
     mova      m2, m0
-%if mmsize == 8
-; For epel4, we need 9 bytes, but only 8 get loaded; to compensate, do the
-; shuffle with a memory operand
-    punpcklbw m0, [srcq+3]
-%else
     pshufb    m0, [filter_h6_shuf1]
-%endif
     pshufb    m1, m3
     pshufb    m2, m4
     pmaddubsw m0, m5
     pmaddubsw m1, m6
     pmaddubsw m2, m7
+%endif
     add     srcq, srcstrideq
-    paddsw    m0, m1
+    paddw     m0, m1
     paddsw    m0, m2
     pmulhrsw  m0, [pw_256]
     packuswb  m0, m0
-    movh  [dstq], m0        ; store
+    MOV   [dstq], m0        ; store
 
     ; go to next line
     add     dstq, dststrideq
@@ -220,7 +248,6 @@ cglobal put_vp8_epel%1_h6, 6, 6 + npicregs, 8, dst, dststride, src, srcstride, h
     jg .nextrow
     RET
 
-INIT_XMM ssse3
 cglobal put_vp8_epel%1_h4, 6, 6 + npicregs, 6+!!(%1 == 8), dst, dststride, src, srcstride, height, mx, picreg
     mova      m2, [pw_256]
 %if %1 == 8
@@ -405,9 +432,8 @@ cglobal put_vp8_epel%1_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picr
     RET
 %endmacro
 
-INIT_MMX ssse3
-FILTER_SSSE3 4
 INIT_XMM ssse3
+FILTER_SSSE3 4
 FILTER_SSSE3 8
 
 INIT_XMM sse2
-- 
2.52.0


From 6d4eb58af140f34ae8be01bb425ed32eb9d409ef Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Mon, 24 Nov 2025 20:32:58 +0100
Subject: [PATCH 210/304] avcodec/x86/vp8dsp: Reduce number of coefficient
 tables

By changing the permutations used in the epel8_h{4,6} case
we can simply reuse the coefficient tables from the vertical epel
filters.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp8dsp.asm | 54 ++++++++++++---------------------------
 1 file changed, 17 insertions(+), 37 deletions(-)

diff --git a/libavcodec/x86/vp8dsp.asm b/libavcodec/x86/vp8dsp.asm
index 2a66e51da6..340f6cc818 100644
--- a/libavcodec/x86/vp8dsp.asm
+++ b/libavcodec/x86/vp8dsp.asm
@@ -43,15 +43,6 @@ sixtap_filter4_hb_m: times 8 db   2, -11
                      times 4 db  36, -11
                      times 4 db 108,   2
 
-fourtap_filter_hb_m: times 8 db  -6, 123
-                     times 8 db  12,  -1
-                     times 8 db  -9,  93
-                     times 8 db  50,  -6
-                     times 8 db  -6,  50
-                     times 8 db  93,  -9
-                     times 8 db  -1,  12
-                     times 8 db 123,  -6
-
 fourtap_filter_b_m:  times 8 db  -6,  12
                      times 8 db 123,  -1
                      times 8 db  -9,  50
@@ -61,16 +52,6 @@ fourtap_filter_b_m:  times 8 db  -6,  12
                      times 8 db  -1, 123
                      times 8 db  12,  -6
 
-sixtap_filter_hb_m:  times 8 db   2,   1
-                     times 8 db -11, 108
-                     times 8 db  36,  -8
-                     times 8 db   3,   3
-                     times 8 db -16,  77
-                     times 8 db  77, -16
-                     times 8 db   1,   2
-                     times 8 db  -8,  36
-                     times 8 db 108, -11
-
 sixtap_filter_b_m:   times 8 db   2,  36
                      times 8 db -11,  -8
                      times 8 db 108,   1
@@ -134,10 +115,8 @@ bilinear_filter_vb_m: times 8 db 7, 1
                       times 8 db 1, 7
 
 %if PIC
-%define fourtap_filter_hb  picregq
 %define fourtap_filter_b   picregq
 %define fourtap_filter4_b  picregq
-%define sixtap_filter_hb   picregq
 %define sixtap_filter_b    picregq
 %define sixtap_filter4_hb  picregq
 %define fourtap_filter_v   picregq
@@ -146,10 +125,8 @@ bilinear_filter_vb_m: times 8 db 7, 1
 %define bilinear_filter_vb picregq
 %define npicregs 1
 %else
-%define fourtap_filter_hb  fourtap_filter_hb_m
 %define fourtap_filter_b   fourtap_filter_b_m
 %define fourtap_filter4_b  fourtap_filter4_b_m
-%define sixtap_filter_hb   sixtap_filter_hb_m
 %define sixtap_filter_b    sixtap_filter_b_m
 %define sixtap_filter4_hb  sixtap_filter4_hb_m
 %define fourtap_filter_v   fourtap_filter_v_m
@@ -161,12 +138,15 @@ bilinear_filter_vb_m: times 8 db 7, 1
 
 filter4_h4_shuf: db 0, 1, 1, 2, 2, 3, 3, 4, 2, 3, 3,  4, 4,  5, 5,  6
 filter4_h6_shuf: db 1, 3, 2, 4, 3, 5, 4, 6, 2, 4, 3,  5, 4,  6, 5,  7
-filter_h2_shuf:  db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5,  6, 6,  7,  7,  8
-filter_h4_shuf:  db 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7,  8, 8,  9,  9, 10
 
-filter_h6_shuf1: db 0, 5, 1, 6, 2, 7, 3, 8, 4, 9, 5, 10, 6, 11,  7, 12
-filter_h6_shuf2: db 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6,  7, 7,  8,  8,  9
-filter_h6_shuf3: db 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8,  9, 9, 10, 10, 11
+filter_h4_shuf1: db 0, 2, 1, 3, 2, 4, 3, 5, 4, 6, 5,  7, 6,  8, 7,  9
+filter_h4_shuf2: db 1, 3, 2, 4, 3, 5, 4, 6, 5, 7, 6,  8, 7,  9, 8, 10
+
+filter_h6_shuf1: db 0, 3, 1, 4, 2, 5, 3, 6, 4, 7, 5,  8, 6,  9, 7, 10
+filter_h6_shuf2: db 1, 4, 2, 5, 3, 6, 4, 7, 5, 8, 6,  9, 7, 10, 8, 11
+filter_h6_shuf3: db 2, 5, 3, 6, 4, 7, 5, 8, 6, 9, 7, 10, 8, 11, 9, 12
+
+filter_h2_shuf:  db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5,  6, 6,  7, 7,  8
 
 pw_20091: times 4 dw 20091
 pw_17734: times 4 dw 17734
@@ -207,11 +187,11 @@ cglobal put_vp8_epel%1_h6, 6, 6 + npicregs, 6+2*(%1==8), dst, dststride, src, sr
     mova      m3, [filter_h6_shuf2]
     mova      m4, [filter_h6_shuf3]
 %if PIC
-    lea  picregq, [sixtap_filter_hb_m]
+    lea  picregq, [sixtap_filter_b_m]
 %endif
-    mova      m5, [sixtap_filter_hb+mxq*8-48] ; set up 6tap filter in bytes
-    mova      m6, [sixtap_filter_hb+mxq*8-32]
-    mova      m7, [sixtap_filter_hb+mxq*8-16]
+    mova      m5, [sixtap_filter_b+mxq*8-48] ; set up 6tap filter in bytes
+    mova      m6, [sixtap_filter_b+mxq*8-32]
+    mova      m7, [sixtap_filter_b+mxq*8-16]
 %endif
 
 .nextrow:
@@ -252,13 +232,13 @@ cglobal put_vp8_epel%1_h4, 6, 6 + npicregs, 6+!!(%1 == 8), dst, dststride, src,
     mova      m2, [pw_256]
 %if %1 == 8
     shl      mxd, 4
-    mova      m3, [filter_h2_shuf]
-    mova      m4, [filter_h4_shuf]
+    mova      m3, [filter_h4_shuf1]
+    mova      m4, [filter_h4_shuf2]
 %if PIC
-    lea  picregq, [fourtap_filter_hb_m]
+    lea  picregq, [fourtap_filter_b_m]
 %endif
-    mova      m5, [fourtap_filter_hb+mxq-16] ; set up 4tap filter in bytes
-    mova      m6, [fourtap_filter_hb+mxq]
+    mova      m5, [fourtap_filter_b+mxq-16] ; set up 4tap filter in bytes
+    mova      m6, [fourtap_filter_b+mxq]
 %else
     shl      mxd, 3
     mova      m3, [filter4_h4_shuf]
-- 
2.52.0


From 61ef56bad1cef1eff1a75cfa0de99e25379c4609 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Mon, 24 Nov 2025 22:36:45 +0100
Subject: [PATCH 211/304] avcodec/x86/vp8dsp: Don't use saturated addition when
 unnecessary

For the epel functions, there can be no overflow as long as the sum
contains only one of the two large central coefficients; for bilinear
functions, there can be no overflow whatsoever.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp8dsp.asm | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/libavcodec/x86/vp8dsp.asm b/libavcodec/x86/vp8dsp.asm
index 340f6cc818..22356f687b 100644
--- a/libavcodec/x86/vp8dsp.asm
+++ b/libavcodec/x86/vp8dsp.asm
@@ -450,10 +450,10 @@ cglobal put_vp8_epel8_h4, 6, 6 + npicregs, 10, dst, dststride, src, srcstride, h
     pmullw    m3, [mxq+48]
 %endif
     add     srcq, srcstrideq
-    paddsw    m0, m1
-    paddsw    m2, m3
+    paddw     m0, m1
+    paddw     m2, m3
+    paddw     m0, m4
     paddsw    m0, m2
-    paddsw    m0, m4
     psraw     m0, 7
     packuswb  m0, m7
     movh  [dstq], m0        ; store
@@ -511,12 +511,12 @@ cglobal put_vp8_epel8_h6, 6, 6 + npicregs, 14, dst, dststride, src, srcstride, h
     pmullw    m5, [mxq+80]
 %endif
     add     srcq, srcstrideq
-    paddsw    m1, m4
-    paddsw    m0, m5
-    paddsw    m1, m2
-    paddsw    m0, m3
+    paddw     m1, m4
+    paddw     m0, m5
+    paddw     m1, m2
+    paddw     m0, m3
+    paddw     m1, m6
     paddsw    m0, m1
-    paddsw    m0, m6
     psraw     m0, 7
     packuswb  m0, m7
     movh  [dstq], m0        ; store
@@ -556,20 +556,20 @@ cglobal put_vp8_epel8_v4, 7, 7, 8, dst, dststride, src, srcstride, height, picre
     mova      m3, m4
     pmullw    m0, [myq+0]
     pmullw    m4, m5
-    paddsw    m4, m0
+    paddw     m4, m0
 
     ; then calculate positive taps
     mova      m0, m1
     pmullw    m1, [myq+16]
-    paddsw    m4, m1
+    paddw     m4, m1
     mova      m1, m2
     pmullw    m2, [myq+32]
+    paddw     m4, m6
     add     srcq, srcstrideq
     paddsw    m4, m2
     mova      m2, m3
 
     ; round/clip/store
-    paddsw    m4, m6
     psraw     m4, 7
     packuswb  m4, m7
     movh  [dstq], m4
@@ -612,17 +612,18 @@ cglobal put_vp8_epel8_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picre
     pmullw    m5, [myq+16]
     mova      m6, m4
     pmullw    m6, [myq+64]
-    paddsw    m6, m5
+    paddw     m6, m5
 
     ; then calculate positive taps
     movh      m5, [srcq+srcstrideq]      ; read new row
     punpcklbw m5, m7
     pmullw    m0, [myq+0]
-    paddsw    m6, m0
+    paddw     m6, [pw_64]
+    paddw     m6, m0
     mova      m0, m1
     mova      m1, m2
     pmullw    m2, [myq+32]
-    paddsw    m6, m2
+    paddw     m6, m2
     mova      m2, m3
     pmullw    m3, [myq+48]
     add     srcq, srcstrideq
@@ -633,7 +634,6 @@ cglobal put_vp8_epel8_v6, 7, 7, 8, dst, dststride, src, srcstride, height, picre
     paddsw    m6, m5
 
     ; round/clip/store
-    paddsw    m6, [pw_64]
     psraw     m6, 7
     packuswb  m6, m7
     movh  [dstq], m6
@@ -700,8 +700,8 @@ cglobal put_vp8_bilinear%1_v, 7, 7, 7, dst, dststride, src, srcstride, height, p
     pmullw    m2, m4
     pmullw    m3, m5
     lea     srcq, [srcq+srcstrideq*2]
-    paddsw    m0, m1
-    paddsw    m2, m3
+    paddw     m0, m1
+    paddw     m2, m3
     psraw     m0, 2
     psraw     m2, 2
     pavgw     m0, m6
@@ -771,8 +771,8 @@ cglobal put_vp8_bilinear%1_h, 6, 6 + npicregs, 7, dst, dststride, src, srcstride
     pmullw    m2, m4
     pmullw    m3, m5
     lea     srcq, [srcq+srcstrideq*2]
-    paddsw    m0, m1
-    paddsw    m2, m3
+    paddw     m0, m1
+    paddw     m2, m3
     psraw     m0, 2
     psraw     m2, 2
     pavgw     m0, m6
-- 
2.52.0


From 3607826f583f4d5b1c5f96e4c3be0aec4ae96bdf Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 2 Dec 2025 19:49:17 +0100
Subject: [PATCH 212/304] avcodec/riscv/vp8dsp_rvv: Remove unused functions

Only the sixtap functions are used for size 16.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/riscv/vp8dsp_init.c | 5 -----
 libavcodec/riscv/vp8dsp_rvv.S  | 9 ++++++++-
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/libavcodec/riscv/vp8dsp_init.c b/libavcodec/riscv/vp8dsp_init.c
index 3e35c72198..fecf6ef9b0 100644
--- a/libavcodec/riscv/vp8dsp_init.c
+++ b/libavcodec/riscv/vp8dsp_init.c
@@ -90,27 +90,22 @@ av_cold void ff_vp78dsp_init_riscv(VP8DSPContext *c)
             c->put_vp8_epel_pixels_tab[0][0][2] = ff_put_vp8_epel16_h6_rvv;
             c->put_vp8_epel_pixels_tab[1][0][2] = ff_put_vp8_epel8_h6_rvv;
             c->put_vp8_epel_pixels_tab[2][0][2] = ff_put_vp8_epel4_h6_rvv;
-            c->put_vp8_epel_pixels_tab[0][0][1] = ff_put_vp8_epel16_h4_rvv;
             c->put_vp8_epel_pixels_tab[1][0][1] = ff_put_vp8_epel8_h4_rvv;
             c->put_vp8_epel_pixels_tab[2][0][1] = ff_put_vp8_epel4_h4_rvv;
 
             c->put_vp8_epel_pixels_tab[0][2][0] = ff_put_vp8_epel16_v6_rvv;
             c->put_vp8_epel_pixels_tab[1][2][0] = ff_put_vp8_epel8_v6_rvv;
             c->put_vp8_epel_pixels_tab[2][2][0] = ff_put_vp8_epel4_v6_rvv;
-            c->put_vp8_epel_pixels_tab[0][1][0] = ff_put_vp8_epel16_v4_rvv;
             c->put_vp8_epel_pixels_tab[1][1][0] = ff_put_vp8_epel8_v4_rvv;
             c->put_vp8_epel_pixels_tab[2][1][0] = ff_put_vp8_epel4_v4_rvv;
 #if __riscv_xlen <= 64
             c->put_vp8_epel_pixels_tab[0][2][2] = ff_put_vp8_epel16_h6v6_rvv;
             c->put_vp8_epel_pixels_tab[1][2][2] = ff_put_vp8_epel8_h6v6_rvv;
             c->put_vp8_epel_pixels_tab[2][2][2] = ff_put_vp8_epel4_h6v6_rvv;
-            c->put_vp8_epel_pixels_tab[0][2][1] = ff_put_vp8_epel16_h4v6_rvv;
             c->put_vp8_epel_pixels_tab[1][2][1] = ff_put_vp8_epel8_h4v6_rvv;
             c->put_vp8_epel_pixels_tab[2][2][1] = ff_put_vp8_epel4_h4v6_rvv;
-            c->put_vp8_epel_pixels_tab[0][1][1] = ff_put_vp8_epel16_h4v4_rvv;
             c->put_vp8_epel_pixels_tab[1][1][1] = ff_put_vp8_epel8_h4v4_rvv;
             c->put_vp8_epel_pixels_tab[2][1][1] = ff_put_vp8_epel4_h4v4_rvv;
-            c->put_vp8_epel_pixels_tab[0][1][2] = ff_put_vp8_epel16_h6v4_rvv;
             c->put_vp8_epel_pixels_tab[1][1][2] = ff_put_vp8_epel8_h6v4_rvv;
             c->put_vp8_epel_pixels_tab[2][1][2] = ff_put_vp8_epel4_h6v4_rvv;
 #endif
diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S
index 2ee7029c60..ed08f72cdc 100644
--- a/libavcodec/riscv/vp8dsp_rvv.S
+++ b/libavcodec/riscv/vp8dsp_rvv.S
@@ -537,7 +537,14 @@ func ff_put_vp8_epel\len\()_h\hsize\()v\vsize\()_rvv, zve32x, zba
 endfunc
 .endm
 
-.irp len,16,8,4
+# Only the sixtaps versions are used for epel16.
+epel 16 6 h
+epel 16 6 v
+#if __riscv_xlen <= 64
+epel_hv 16 6 6
+#endif
+
+.irp len,8,4
 epel \len 6 h
 epel \len 4 h
 epel \len 6 v
-- 
2.52.0


From b8ee9222a181f2cadd996c44ad584854ad19663d Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Mon, 24 Nov 2025 23:13:16 +0100
Subject: [PATCH 213/304] avcodec/vp8dsp: Don't compile unused functions

The width 16 epel functions never use four taps in any direction*,
so don't build said functions. Saves 4352B of .text and 89B of
.text.unlikely here.

*: mx and my in vp8_mc_luma() are always even.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/vp8dsp.c     | 11 +++++------
 tests/checkasm/vp8dsp.c |  3 ++-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/libavcodec/vp8dsp.c b/libavcodec/vp8dsp.c
index 5543303adb..eabe3edb27 100644
--- a/libavcodec/vp8dsp.c
+++ b/libavcodec/vp8dsp.c
@@ -558,26 +558,21 @@ put_vp8_epel ## SIZE ## _h ## HTAPS ## v ## VTAPS ## _c(uint8_t *dst,         \
     }                                                                         \
 }
 
-VP8_EPEL_H(16, 4)
 VP8_EPEL_H(8,  4)
 VP8_EPEL_H(4,  4)
 VP8_EPEL_H(16, 6)
 VP8_EPEL_H(8,  6)
 VP8_EPEL_H(4,  6)
-VP8_EPEL_V(16, 4)
 VP8_EPEL_V(8,  4)
 VP8_EPEL_V(4,  4)
 VP8_EPEL_V(16, 6)
 VP8_EPEL_V(8,  6)
 VP8_EPEL_V(4,  6)
 
-VP8_EPEL_HV(16, 4, 4)
 VP8_EPEL_HV(8,  4, 4)
 VP8_EPEL_HV(4,  4, 4)
-VP8_EPEL_HV(16, 4, 6)
 VP8_EPEL_HV(8,  4, 6)
 VP8_EPEL_HV(4,  4, 6)
-VP8_EPEL_HV(16, 6, 4)
 VP8_EPEL_HV(8,  6, 4)
 VP8_EPEL_HV(4,  6, 4)
 VP8_EPEL_HV(16, 6, 6)
@@ -667,7 +662,11 @@ VP8_BILINEAR(4)
 
 av_cold void ff_vp78dsp_init(VP8DSPContext *dsp)
 {
-    VP78_MC_FUNC(0, 16);
+    dsp->put_vp8_epel_pixels_tab[0][0][0] = put_vp8_pixels16_c;
+    dsp->put_vp8_epel_pixels_tab[0][0][2] = put_vp8_epel16_h6_c;
+    dsp->put_vp8_epel_pixels_tab[0][2][0] = put_vp8_epel16_v6_c;
+    dsp->put_vp8_epel_pixels_tab[0][2][2] = put_vp8_epel16_h6v6_c;
+
     VP78_MC_FUNC(1, 8);
     VP78_MC_FUNC(2, 4);
 
diff --git a/tests/checkasm/vp8dsp.c b/tests/checkasm/vp8dsp.c
index a12c295a2a..4d6704d5a9 100644
--- a/tests/checkasm/vp8dsp.c
+++ b/tests/checkasm/vp8dsp.c
@@ -510,7 +510,8 @@ static void checkasm_check_vp78dsp(VP8DSPContext *d, bool is_vp7)
 
 void checkasm_check_vp8dsp(void)
 {
-    VP8DSPContext d;
+    // Needs to be zeroed because not all size 16 epel functions exist.
+    VP8DSPContext d = { 0 };
 
     ff_vp78dsp_init(&d);
     check_mc(&d);
-- 
2.52.0


From 8123bb42d188865d32e34f9223771c2798f1233d Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Thu, 4 Dec 2025 16:47:00 +0100
Subject: [PATCH 214/304] ffv1enc_vulkan: fix encoding with large contexts

When RGB_LINECACHE == 2, then top2 is not the current line.
---
 libavcodec/vulkan/ffv1_common.comp | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/libavcodec/vulkan/ffv1_common.comp b/libavcodec/vulkan/ffv1_common.comp
index 3d40592739..5f654e2b29 100644
--- a/libavcodec/vulkan/ffv1_common.comp
+++ b/libavcodec/vulkan/ffv1_common.comp
@@ -124,8 +124,12 @@ ivec2 get_pred(readonly uimage2D pred, ivec2 sp, ivec2 off,
         }
         base += quant_table[quant_table_idx][3][(cur2 - cur) & MAX_QUANT_TABLE_MASK];
 
+#if RGB_LINECACHE == 2
         /* top-2 became current upon swap */
         TYPE top2 = TYPE(imageLoad(pred, sp + LADDR(off))[comp]);
+#else
+        TYPE top2 = TYPE(imageLoad(pred, sp + LADDR(off + ivec2(0, -2)))[comp]);
+#endif
         base += quant_table[quant_table_idx][4][(top2 - top[1]) & MAX_QUANT_TABLE_MASK];
     }
 
-- 
2.52.0


From 9dbad1be9b3e18c28de5fcd9314b61556fb21fd7 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Thu, 4 Dec 2025 15:53:27 +0100
Subject: [PATCH 215/304] avformat/movenc: Fix leak of IAMFContext on error

Forgotten in 5b87869c09cece1583e74b6f796aa825a4765631.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavformat/movenc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/libavformat/movenc.c b/libavformat/movenc.c
index afbb1151af..8d8acd2aff 100644
--- a/libavformat/movenc.c
+++ b/libavformat/movenc.c
@@ -7867,8 +7867,11 @@ static int mov_init_iamf_track(AVFormatContext *s)
         default:
             av_assert0(0);
         }
-        if (ret < 0)
+        if (ret < 0) {
+            ff_iamf_uninit_context(iamf);
+            av_free(iamf);
             return ret;
+        }
     }
 
     track = &mov->tracks[first_iamf_idx];
-- 
2.52.0


From ec3aa7a1765426178ded0201e48d87b3fffb6875 Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Mon, 1 Dec 2025 11:07:43 -0300
Subject: [PATCH 216/304] avfilter/f_sidedata: also handle global side data in
 filter links

Should fix issue #21071

Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavfilter/f_sidedata.c | 47 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 43 insertions(+), 4 deletions(-)

diff --git a/libavfilter/f_sidedata.c b/libavfilter/f_sidedata.c
index b88a8cbd02..161fd2b9a2 100644
--- a/libavfilter/f_sidedata.c
+++ b/libavfilter/f_sidedata.c
@@ -27,10 +27,8 @@
 #include "libavutil/internal.h"
 #include "libavutil/frame.h"
 #include "libavutil/opt.h"
-#include "audio.h"
 #include "avfilter.h"
 #include "filters.h"
-#include "video.h"
 
 enum SideDataMode {
     SIDEDATA_SELECT,
@@ -96,6 +94,31 @@ static av_cold int init(AVFilterContext *ctx)
     return 0;
 }
 
+static int config_props(AVFilterLink *outlink)
+{
+    AVFilterContext *ctx = outlink->src;
+    SideDataContext *s = ctx->priv;
+    const AVFrameSideData *sd = NULL;
+
+    if (s->type != -1)
+       sd = av_frame_side_data_get(outlink->side_data, outlink->nb_side_data, s->type);
+
+    switch (s->mode) {
+    case SIDEDATA_SELECT:
+        break;
+    case SIDEDATA_DELETE:
+        if (s->type == -1)
+            av_frame_side_data_free(&outlink->side_data, &outlink->nb_side_data);
+        else if (sd)
+            av_frame_side_data_remove(&outlink->side_data, &outlink->nb_side_data, s->type);
+        break;
+    default:
+        av_assert0(0);
+    };
+
+    return 0;
+}
+
 static int filter_frame(AVFilterLink *inlink, AVFrame *frame)
 {
     AVFilterContext *ctx = inlink->dst;
@@ -143,6 +166,14 @@ static const AVFilterPad ainputs[] = {
     },
 };
 
+static const AVFilterPad aoutputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_AUDIO,
+        .config_props = config_props,
+    },
+};
+
 const FFFilter ff_af_asidedata = {
     .p.name        = "asidedata",
     .p.description = NULL_IF_CONFIG_SMALL("Manipulate audio frame side data."),
@@ -152,7 +183,7 @@ const FFFilter ff_af_asidedata = {
     .priv_size     = sizeof(SideDataContext),
     .init          = init,
     FILTER_INPUTS(ainputs),
-    FILTER_OUTPUTS(ff_audio_default_filterpad),
+    FILTER_OUTPUTS(aoutputs),
 };
 #endif /* CONFIG_ASIDEDATA_FILTER */
 
@@ -169,6 +200,14 @@ static const AVFilterPad inputs[] = {
     },
 };
 
+static const AVFilterPad outputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_VIDEO,
+        .config_props = config_props,
+    },
+};
+
 const FFFilter ff_vf_sidedata = {
     .p.name        = "sidedata",
     .p.description = NULL_IF_CONFIG_SMALL("Manipulate video frame side data."),
@@ -178,6 +217,6 @@ const FFFilter ff_vf_sidedata = {
     .priv_size   = sizeof(SideDataContext),
     .init        = init,
     FILTER_INPUTS(inputs),
-    FILTER_OUTPUTS(ff_video_default_filterpad),
+    FILTER_OUTPUTS(outputs),
 };
 #endif /* CONFIG_SIDEDATA_FILTER */
-- 
2.52.0


From 0df5d5721d80aced6876ce3436e34b80705a63ac Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Fri, 5 Dec 2025 00:40:47 +0100
Subject: [PATCH 217/304] swscale/tests/swscale: Fix typo

Reviewed-by: Timo Rothenpieler <timo@rothenpieler.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libswscale/tests/swscale.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libswscale/tests/swscale.c b/libswscale/tests/swscale.c
index 8a85bef94e..373f031363 100644
--- a/libswscale/tests/swscale.c
+++ b/libswscale/tests/swscale.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2024      Nikles Haas
+ * Copyright (C) 2024      Niklas Haas
  * Copyright (C) 2003-2011 Michael Niedermayer <michaelni@gmx.at>
  *
  * This file is part of FFmpeg.
-- 
2.52.0


From 430cfe1364a85204379b6b7a0cb0f5113c31cae7 Mon Sep 17 00:00:00 2001
From: Arpad Panyik <Arpad.Panyik@arm.com>
Date: Wed, 26 Nov 2025 16:35:11 +0000
Subject: [PATCH 218/304] swscale: Refactor XYZ+RGB state and add function
 hooks

Prepare for xyz12Torgb48 architecture-specific optimizations in
subsequent patches by:
 - Grouping XYZ+RGB gamma LUTs and 3x3 matrices into SwsColorXform
   (ctx->xyz2rgb and ctx->rgb2xyz), replacing scattered fields.
 - Dropping the unused last matrix column giving the same or smaller
   SwsInternal size.
 - Renaming ff_xyz12Torgb48 and ff_rgb48Toxyz12 and routing calls via
   the new per-context function pointer (ctx->xyz12Torgb48 and
   ctx->rgb48Toxyz12) in graph.c and swscale.c.
 - Adding ff_sws_init_xyzdsp and invoking it in swscale init paths
   (normal and unscaled).
 - Making fill_xyztables public to ease its setup later in checkasm.

These modifications do not introduce any functional changes.

Signed-off-by: Arpad Panyik <Arpad.Panyik@arm.com>
---
 libswscale/graph.c            |  6 ++-
 libswscale/swscale.c          | 92 +++++++++++++++++++----------------
 libswscale/swscale_internal.h | 32 +++++++-----
 libswscale/swscale_unscaled.c |  2 +
 libswscale/utils.c            | 37 +++++++-------
 5 files changed, 95 insertions(+), 74 deletions(-)

diff --git a/libswscale/graph.c b/libswscale/graph.c
index 0a79b17f89..cf44710c2d 100644
--- a/libswscale/graph.c
+++ b/libswscale/graph.c
@@ -142,7 +142,8 @@ static void run_rgb0(const SwsImg *out, const SwsImg *in, int y, int h,
 static void run_xyz2rgb(const SwsImg *out, const SwsImg *in, int y, int h,
                         const SwsPass *pass)
 {
-    ff_xyz12Torgb48(pass->priv, out->data[0] + y * out->linesize[0], out->linesize[0],
+    const SwsInternal *c = pass->priv;
+    c->xyz12Torgb48(c, out->data[0] + y * out->linesize[0], out->linesize[0],
                     in->data[0] + y * in->linesize[0], in->linesize[0],
                     pass->width, h);
 }
@@ -150,7 +151,8 @@ static void run_xyz2rgb(const SwsImg *out, const SwsImg *in, int y, int h,
 static void run_rgb2xyz(const SwsImg *out, const SwsImg *in, int y, int h,
                         const SwsPass *pass)
 {
-    ff_rgb48Toxyz12(pass->priv, out->data[0] + y * out->linesize[0], out->linesize[0],
+    const SwsInternal *c = pass->priv;
+    c->rgb48Toxyz12(c, out->data[0] + y * out->linesize[0], out->linesize[0],
                     in->data[0] + y * in->linesize[0], in->linesize[0],
                     pass->width, h);
 }
diff --git a/libswscale/swscale.c b/libswscale/swscale.c
index f4c7eccac4..95a61a4183 100644
--- a/libswscale/swscale.c
+++ b/libswscale/swscale.c
@@ -660,6 +660,8 @@ static av_cold void sws_init_swscale(SwsInternal *c)
 {
     enum AVPixelFormat srcFormat = c->opts.src_format;
 
+    ff_sws_init_xyzdsp(c);
+
     ff_sws_init_output_funcs(c, &c->yuv2plane1, &c->yuv2planeX,
                              &c->yuv2nv12cX, &c->yuv2packed1,
                              &c->yuv2packed2, &c->yuv2packedX, &c->yuv2anyX);
@@ -737,8 +739,8 @@ static int check_image_pointers(const uint8_t * const data[4], enum AVPixelForma
     return 1;
 }
 
-void ff_xyz12Torgb48(const SwsInternal *c, uint8_t *dst, int dst_stride,
-                     const uint8_t *src, int src_stride, int w, int h)
+static void xyz12Torgb48_c(const SwsInternal *c, uint8_t *dst, int dst_stride,
+                           const uint8_t *src, int src_stride, int w, int h)
 {
     const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(c->opts.src_format);
 
@@ -759,20 +761,20 @@ void ff_xyz12Torgb48(const SwsInternal *c, uint8_t *dst, int dst_stride,
                 z = AV_RL16(src16 + xp + 2);
             }
 
-            x = c->xyzgamma[x >> 4];
-            y = c->xyzgamma[y >> 4];
-            z = c->xyzgamma[z >> 4];
+            x = c->xyz2rgb.gamma.in[x >> 4];
+            y = c->xyz2rgb.gamma.in[y >> 4];
+            z = c->xyz2rgb.gamma.in[z >> 4];
 
             // convert from XYZlinear to sRGBlinear
-            r = c->xyz2rgb_matrix[0][0] * x +
-                c->xyz2rgb_matrix[0][1] * y +
-                c->xyz2rgb_matrix[0][2] * z >> 12;
-            g = c->xyz2rgb_matrix[1][0] * x +
-                c->xyz2rgb_matrix[1][1] * y +
-                c->xyz2rgb_matrix[1][2] * z >> 12;
-            b = c->xyz2rgb_matrix[2][0] * x +
-                c->xyz2rgb_matrix[2][1] * y +
-                c->xyz2rgb_matrix[2][2] * z >> 12;
+            r = c->xyz2rgb.mat[0][0] * x +
+                c->xyz2rgb.mat[0][1] * y +
+                c->xyz2rgb.mat[0][2] * z >> 12;
+            g = c->xyz2rgb.mat[1][0] * x +
+                c->xyz2rgb.mat[1][1] * y +
+                c->xyz2rgb.mat[1][2] * z >> 12;
+            b = c->xyz2rgb.mat[2][0] * x +
+                c->xyz2rgb.mat[2][1] * y +
+                c->xyz2rgb.mat[2][2] * z >> 12;
 
             // limit values to 16-bit depth
             r = av_clip_uint16(r);
@@ -781,13 +783,13 @@ void ff_xyz12Torgb48(const SwsInternal *c, uint8_t *dst, int dst_stride,
 
             // convert from sRGBlinear to RGB and scale from 12bit to 16bit
             if (desc->flags & AV_PIX_FMT_FLAG_BE) {
-                AV_WB16(dst16 + xp + 0, c->rgbgamma[r] << 4);
-                AV_WB16(dst16 + xp + 1, c->rgbgamma[g] << 4);
-                AV_WB16(dst16 + xp + 2, c->rgbgamma[b] << 4);
+                AV_WB16(dst16 + xp + 0, c->xyz2rgb.gamma.out[r] << 4);
+                AV_WB16(dst16 + xp + 1, c->xyz2rgb.gamma.out[g] << 4);
+                AV_WB16(dst16 + xp + 2, c->xyz2rgb.gamma.out[b] << 4);
             } else {
-                AV_WL16(dst16 + xp + 0, c->rgbgamma[r] << 4);
-                AV_WL16(dst16 + xp + 1, c->rgbgamma[g] << 4);
-                AV_WL16(dst16 + xp + 2, c->rgbgamma[b] << 4);
+                AV_WL16(dst16 + xp + 0, c->xyz2rgb.gamma.out[r] << 4);
+                AV_WL16(dst16 + xp + 1, c->xyz2rgb.gamma.out[g] << 4);
+                AV_WL16(dst16 + xp + 2, c->xyz2rgb.gamma.out[b] << 4);
             }
         }
 
@@ -796,8 +798,8 @@ void ff_xyz12Torgb48(const SwsInternal *c, uint8_t *dst, int dst_stride,
     }
 }
 
-void ff_rgb48Toxyz12(const SwsInternal *c, uint8_t *dst, int dst_stride,
-                     const uint8_t *src, int src_stride, int w, int h)
+static void rgb48Toxyz12_c(const SwsInternal *c, uint8_t *dst, int dst_stride,
+                           const uint8_t *src, int src_stride, int w, int h)
 {
     const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(c->opts.dst_format);
 
@@ -818,20 +820,20 @@ void ff_rgb48Toxyz12(const SwsInternal *c, uint8_t *dst, int dst_stride,
                 b = AV_RL16(src16 + xp + 2);
             }
 
-            r = c->rgbgammainv[r>>4];
-            g = c->rgbgammainv[g>>4];
-            b = c->rgbgammainv[b>>4];
+            r = c->rgb2xyz.gamma.in[r >> 4];
+            g = c->rgb2xyz.gamma.in[g >> 4];
+            b = c->rgb2xyz.gamma.in[b >> 4];
 
             // convert from sRGBlinear to XYZlinear
-            x = c->rgb2xyz_matrix[0][0] * r +
-                c->rgb2xyz_matrix[0][1] * g +
-                c->rgb2xyz_matrix[0][2] * b >> 12;
-            y = c->rgb2xyz_matrix[1][0] * r +
-                c->rgb2xyz_matrix[1][1] * g +
-                c->rgb2xyz_matrix[1][2] * b >> 12;
-            z = c->rgb2xyz_matrix[2][0] * r +
-                c->rgb2xyz_matrix[2][1] * g +
-                c->rgb2xyz_matrix[2][2] * b >> 12;
+            x = c->rgb2xyz.mat[0][0] * r +
+                c->rgb2xyz.mat[0][1] * g +
+                c->rgb2xyz.mat[0][2] * b >> 12;
+            y = c->rgb2xyz.mat[1][0] * r +
+                c->rgb2xyz.mat[1][1] * g +
+                c->rgb2xyz.mat[1][2] * b >> 12;
+            z = c->rgb2xyz.mat[2][0] * r +
+                c->rgb2xyz.mat[2][1] * g +
+                c->rgb2xyz.mat[2][2] * b >> 12;
 
             // limit values to 16-bit depth
             x = av_clip_uint16(x);
@@ -840,13 +842,13 @@ void ff_rgb48Toxyz12(const SwsInternal *c, uint8_t *dst, int dst_stride,
 
             // convert from XYZlinear to X'Y'Z' and scale from 12bit to 16bit
             if (desc->flags & AV_PIX_FMT_FLAG_BE) {
-                AV_WB16(dst16 + xp + 0, c->xyzgammainv[x] << 4);
-                AV_WB16(dst16 + xp + 1, c->xyzgammainv[y] << 4);
-                AV_WB16(dst16 + xp + 2, c->xyzgammainv[z] << 4);
+                AV_WB16(dst16 + xp + 0, c->rgb2xyz.gamma.out[x] << 4);
+                AV_WB16(dst16 + xp + 1, c->rgb2xyz.gamma.out[y] << 4);
+                AV_WB16(dst16 + xp + 2, c->rgb2xyz.gamma.out[z] << 4);
             } else {
-                AV_WL16(dst16 + xp + 0, c->xyzgammainv[x] << 4);
-                AV_WL16(dst16 + xp + 1, c->xyzgammainv[y] << 4);
-                AV_WL16(dst16 + xp + 2, c->xyzgammainv[z] << 4);
+                AV_WL16(dst16 + xp + 0, c->rgb2xyz.gamma.out[x] << 4);
+                AV_WL16(dst16 + xp + 1, c->rgb2xyz.gamma.out[y] << 4);
+                AV_WL16(dst16 + xp + 2, c->rgb2xyz.gamma.out[z] << 4);
             }
         }
 
@@ -855,6 +857,12 @@ void ff_rgb48Toxyz12(const SwsInternal *c, uint8_t *dst, int dst_stride,
     }
 }
 
+av_cold void ff_sws_init_xyzdsp(SwsInternal *c)
+{
+    c->xyz12Torgb48 = xyz12Torgb48_c;
+    c->rgb48Toxyz12 = rgb48Toxyz12_c;
+}
+
 void ff_update_palette(SwsInternal *c, const uint32_t *pal)
 {
     for (int i = 0; i < 256; i++) {
@@ -1110,7 +1118,7 @@ static int scale_internal(SwsContext *sws,
         base = srcStride[0] < 0 ? c->xyz_scratch - srcStride[0] * (srcSliceH-1) :
                                   c->xyz_scratch;
 
-        ff_xyz12Torgb48(c, base, srcStride[0], src2[0], srcStride[0], sws->src_w, srcSliceH);
+        c->xyz12Torgb48(c, base, srcStride[0], src2[0], srcStride[0], sws->src_w, srcSliceH);
         src2[0] = base;
     }
 
@@ -1182,7 +1190,7 @@ static int scale_internal(SwsContext *sws,
         }
 
         /* replace on the same data */
-        ff_rgb48Toxyz12(c, dst, dstStride2[0], dst, dstStride2[0], sws->dst_w, ret);
+        c->rgb48Toxyz12(c, dst, dstStride2[0], dst, dstStride2[0], sws->dst_w, ret);
     }
 
     /* reset slice direction at end of frame */
diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h
index 5dd65a8d71..02e625a10e 100644
--- a/libswscale/swscale_internal.h
+++ b/libswscale/swscale_internal.h
@@ -93,6 +93,19 @@ typedef int (*SwsFunc)(SwsInternal *c, const uint8_t *const src[],
                        const int srcStride[], int srcSliceY, int srcSliceH,
                        uint8_t *const dst[], const int dstStride[]);
 
+typedef void (*SwsColorFunc)(const SwsInternal *c, uint8_t *dst, int dst_stride,
+                             const uint8_t *src, int src_stride, int w, int h);
+
+typedef struct SwsLuts {
+    uint16_t *in;
+    uint16_t *out;
+} SwsLuts;
+
+typedef struct SwsColorXform {
+    SwsLuts gamma;
+    int16_t mat[3][3];
+} SwsColorXform;
+
 /**
  * Write one line of horizontally scaled data to planar output
  * without any additional vertical scaling (or point-scaling).
@@ -547,12 +560,10 @@ struct SwsInternal {
 /* pre defined color-spaces gamma */
 #define XYZ_GAMMA (2.6)
 #define RGB_GAMMA (2.2)
-    uint16_t *xyzgamma;
-    uint16_t *rgbgamma;
-    uint16_t *xyzgammainv;
-    uint16_t *rgbgammainv;
-    int16_t xyz2rgb_matrix[3][4];
-    int16_t rgb2xyz_matrix[3][4];
+    SwsColorFunc  xyz12Torgb48;
+    SwsColorFunc  rgb48Toxyz12;
+    SwsColorXform xyz2rgb;
+    SwsColorXform rgb2xyz;
 
     /* function pointers for swscale() */
     yuv2planar1_fn yuv2plane1;
@@ -720,6 +731,9 @@ av_cold void ff_sws_init_range_convert_loongarch(SwsInternal *c);
 av_cold void ff_sws_init_range_convert_riscv(SwsInternal *c);
 av_cold void ff_sws_init_range_convert_x86(SwsInternal *c);
 
+av_cold void ff_sws_init_xyzdsp(SwsInternal *c);
+av_cold int ff_sws_fill_xyztables(SwsInternal *c);
+
 SwsFunc ff_yuv2rgb_init_x86(SwsInternal *c);
 SwsFunc ff_yuv2rgb_init_ppc(SwsInternal *c);
 SwsFunc ff_yuv2rgb_init_loongarch(SwsInternal *c);
@@ -1043,12 +1057,6 @@ void ff_copyPlane(const uint8_t *src, int srcStride,
                   int srcSliceY, int srcSliceH, int width,
                   uint8_t *dst, int dstStride);
 
-void ff_xyz12Torgb48(const SwsInternal *c, uint8_t *dst, int dst_stride,
-                     const uint8_t *src, int src_stride, int w, int h);
-
-void ff_rgb48Toxyz12(const SwsInternal *c, uint8_t *dst, int dst_stride,
-                     const uint8_t *src, int src_stride, int w, int h);
-
 static inline void fillPlane16(uint8_t *plane, int stride, int width, int height, int y,
                                int alpha, int bits, const int big_endian)
 {
diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c
index 2c791e89fe..c9feb84c8e 100644
--- a/libswscale/swscale_unscaled.c
+++ b/libswscale/swscale_unscaled.c
@@ -2685,6 +2685,8 @@ void ff_get_unscaled_swscale(SwsInternal *c)
         }
     }
 
+    ff_sws_init_xyzdsp(c);
+
 #if ARCH_PPC
     ff_get_unscaled_swscale_ppc(c);
 #elif ARCH_ARM
diff --git a/libswscale/utils.c b/libswscale/utils.c
index a13d8df7e8..8f8789b24d 100644
--- a/libswscale/utils.c
+++ b/libswscale/utils.c
@@ -719,36 +719,37 @@ static av_cold void init_xyz_tables(void)
     }
 }
 
-static int fill_xyztables(SwsInternal *c)
+av_cold int ff_sws_fill_xyztables(SwsInternal *c)
 {
-    static const int16_t xyz2rgb_matrix[3][4] = {
+    static const int16_t xyz2rgb_matrix[3][3] = {
         {13270, -6295, -2041},
         {-3969,  7682,   170},
         {  228,  -835,  4329} };
-    static const int16_t rgb2xyz_matrix[3][4] = {
+    static const int16_t rgb2xyz_matrix[3][3] = {
         {1689, 1464,  739},
         { 871, 2929,  296},
         {  79,  488, 3891} };
 
-    if (c->xyzgamma)
+    if (c->xyz2rgb.gamma.in)
         return 0;
 
-    memcpy(c->xyz2rgb_matrix, xyz2rgb_matrix, sizeof(c->xyz2rgb_matrix));
-    memcpy(c->rgb2xyz_matrix, rgb2xyz_matrix, sizeof(c->rgb2xyz_matrix));
+    memcpy(c->xyz2rgb.mat, xyz2rgb_matrix, sizeof(c->xyz2rgb.mat));
+    memcpy(c->rgb2xyz.mat, rgb2xyz_matrix, sizeof(c->rgb2xyz.mat));
 
 #if CONFIG_SMALL
-    c->xyzgamma = av_malloc(sizeof(uint16_t) * 2 * (4096 + 65536));
-    if (!c->xyzgamma)
+    c->xyz2rgb.gamma.in = av_malloc(sizeof(uint16_t) * 2 * (4096 + 65536));
+    if (!c->xyz2rgb.gamma.in)
         return AVERROR(ENOMEM);
-    c->rgbgammainv = c->xyzgamma + 4096;
-    c->rgbgamma = c->rgbgammainv + 4096;
-    c->xyzgammainv = c->rgbgamma + 65536;
-    init_xyz_tables(c->xyzgamma, c->xyzgammainv, c->rgbgamma, c->rgbgammainv);
+    c->rgb2xyz.gamma.in  = c->xyz2rgb.gamma.in  + 4096;
+    c->xyz2rgb.gamma.out = c->rgb2xyz.gamma.in  + 4096;
+    c->rgb2xyz.gamma.out = c->xyz2rgb.gamma.out + 65536;
+    init_xyz_tables(c->xyz2rgb.gamma.in,  c->rgb2xyz.gamma.out,
+                    c->xyz2rgb.gamma.out, c->rgb2xyz.gamma.in);
 #else
-    c->xyzgamma = xyzgamma_tab;
-    c->rgbgamma = rgbgamma_tab;
-    c->xyzgammainv = xyzgammainv_tab;
-    c->rgbgammainv = rgbgammainv_tab;
+    c->xyz2rgb.gamma.in  = xyzgamma_tab;
+    c->xyz2rgb.gamma.out = rgbgamma_tab;
+    c->rgb2xyz.gamma.in  = rgbgammainv_tab;
+    c->rgb2xyz.gamma.out = xyzgammainv_tab;
 
     static AVOnce xyz_init_static_once = AV_ONCE_INIT;
     ff_thread_once(&xyz_init_static_once, init_xyz_tables);
@@ -822,7 +823,7 @@ static int handle_formats(SwsContext *sws)
     c->srcXYZ    |= handle_xyz(&sws->src_format);
     c->dstXYZ    |= handle_xyz(&sws->dst_format);
     if (c->srcXYZ || c->dstXYZ)
-        return fill_xyztables(c);
+        return ff_sws_fill_xyztables(c);
     else
         return 0;
 }
@@ -2312,7 +2313,7 @@ void sws_freeContext(SwsContext *sws)
     av_freep(&c->gamma);
     av_freep(&c->inv_gamma);
 #if CONFIG_SMALL
-    av_freep(&c->xyzgamma);
+    av_freep(&c->xyz2rgb.gamma.in);
 #endif
 
     av_freep(&c->rgb0_scratch);
-- 
2.52.0


From 34416d92049300e9f7ff2dfc29ecb6b9d386c91f Mon Sep 17 00:00:00 2001
From: Arpad Panyik <Arpad.Panyik@arm.com>
Date: Wed, 26 Nov 2025 16:36:13 +0000
Subject: [PATCH 219/304] checkasm: Add xyz12Torgb48le test

Add checkasm coverage for the XYZ12LE to RGB48LE path via the
ctx->xyz12Torgb48 hook. Integrate the test into the build and runner,
exercise a variety of widths/heights, compare against the C reference,
and benchmark when width is multiple of 4.

This improves test coverage for the new function pointer in preparation
for architecture-specific implementations in subsequent commits.

Signed-off-by: Arpad Panyik <Arpad.Panyik@arm.com>
---
 tests/checkasm/Makefile     |   1 +
 tests/checkasm/checkasm.c   |   1 +
 tests/checkasm/checkasm.h   |   1 +
 tests/checkasm/sw_xyz2rgb.c | 108 ++++++++++++++++++++++++++++++++++++
 tests/fate/checkasm.mak     |   1 +
 5 files changed, 112 insertions(+)
 create mode 100644 tests/checkasm/sw_xyz2rgb.c

diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index b9c8adb21f..1c34619249 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -82,6 +82,7 @@ SWSCALEOBJS                             += sw_gbrp.o            \
                                            sw_range_convert.o   \
                                            sw_rgb.o             \
                                            sw_scale.o           \
+                                           sw_xyz2rgb.o         \
                                            sw_yuv2rgb.o         \
                                            sw_yuv2yuv.o
 
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index a899967937..7edc8e4e6e 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -330,6 +330,7 @@ static const struct {
     { "sw_range_convert", checkasm_check_sw_range_convert },
     { "sw_rgb", checkasm_check_sw_rgb },
     { "sw_scale", checkasm_check_sw_scale },
+    { "sw_xyz2rgb", checkasm_check_sw_xyz2rgb },
     { "sw_yuv2rgb", checkasm_check_sw_yuv2rgb },
     { "sw_yuv2yuv", checkasm_check_sw_yuv2yuv },
     { "sw_ops", checkasm_check_sw_ops },
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index ec075c4763..9f4fb8b283 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -139,6 +139,7 @@ void checkasm_check_sw_gbrp(void);
 void checkasm_check_sw_range_convert(void);
 void checkasm_check_sw_rgb(void);
 void checkasm_check_sw_scale(void);
+void checkasm_check_sw_xyz2rgb(void);
 void checkasm_check_sw_yuv2rgb(void);
 void checkasm_check_sw_yuv2yuv(void);
 void checkasm_check_sw_ops(void);
diff --git a/tests/checkasm/sw_xyz2rgb.c b/tests/checkasm/sw_xyz2rgb.c
new file mode 100644
index 0000000000..afffb6c8da
--- /dev/null
+++ b/tests/checkasm/sw_xyz2rgb.c
@@ -0,0 +1,108 @@
+/*
+ * Copyright (c) 2025 Arpad Panyik <Arpad.Panyik@arm.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <string.h>
+
+#include "libavutil/intreadwrite.h"
+#include "libavutil/mem_internal.h"
+#include "libavutil/pixdesc.h"
+#include "libavutil/pixfmt.h"
+
+#include "libswscale/swscale.h"
+#include "libswscale/swscale_internal.h"
+
+#include "checkasm.h"
+
+#define NUM_LINES 4
+#define MAX_LINE_SIZE 1920
+
+#define randomize_buffers(buf, size)      \
+    do {                                  \
+        for (int j = 0; j < size; j += 2) \
+            AV_WN32(buf + j, rnd());      \
+    } while (0)
+
+static void check_xyz12Torgb48le(void)
+{
+    static const int input_sizes[] = {1, 2, 3, 4, 5, 6, 7, 8, 16, 17, 21, 31,
+                                      32, 64, 128, 256, 512, 1024,
+                                      MAX_LINE_SIZE};
+
+    const int src_stride = 3 * sizeof(uint16_t) * MAX_LINE_SIZE;
+    const int dst_stride = src_stride;
+
+    const int src_pix_fmt = AV_PIX_FMT_XYZ12LE;
+    const int dst_pix_fmt = AV_PIX_FMT_RGB48LE;
+
+    const AVPixFmtDescriptor *src_desc = av_pix_fmt_desc_get(src_pix_fmt);
+    const AVPixFmtDescriptor *dst_desc = av_pix_fmt_desc_get(dst_pix_fmt);
+
+    LOCAL_ALIGNED_8(uint16_t, src,     [3 * MAX_LINE_SIZE * NUM_LINES]);
+    LOCAL_ALIGNED_8(uint16_t, dst_ref, [3 * MAX_LINE_SIZE * NUM_LINES]);
+    LOCAL_ALIGNED_8(uint16_t, dst_new, [3 * MAX_LINE_SIZE * NUM_LINES]);
+
+    declare_func(void, const SwsContext *, uint8_t *, int, const uint8_t *,
+                 int, int, int);
+
+    SwsInternal c;
+    memset(&c, 0, sizeof(c));
+    c.opts.src_format = src_pix_fmt;
+    ff_sws_init_xyzdsp(&c);
+    ff_sws_fill_xyztables(&c);
+
+    randomize_buffers(src, 3 * MAX_LINE_SIZE * NUM_LINES);
+
+    for (int height = 1; height <= NUM_LINES; height++) {
+        for (int isi = 0; isi < FF_ARRAY_ELEMS(input_sizes); isi++) {
+            int width = input_sizes[isi];
+
+            if (check_func(c.xyz12Torgb48, "%s_%s_%dx%d", src_desc->name,
+                           dst_desc->name, width, height)) {
+                memset(dst_ref, 0xFE,
+                       3 * sizeof(uint16_t) * MAX_LINE_SIZE * NUM_LINES);
+                memset(dst_new, 0xFE,
+                       3 * sizeof(uint16_t) * MAX_LINE_SIZE * NUM_LINES);
+
+                call_ref((const SwsContext*)&c, (uint8_t *)dst_ref, dst_stride,
+                         (const uint8_t *)src, src_stride, width, height);
+                call_new((const SwsContext*)&c, (uint8_t *)dst_new, dst_stride,
+                         (const uint8_t *)src, src_stride, width, height);
+
+                checkasm_check(uint16_t, dst_ref, dst_stride, dst_new,
+                               dst_stride, width, height, "dst_rgb");
+
+                if (!(width & 3) && height == NUM_LINES) {
+                    bench_new((const SwsContext*)&c, (uint8_t *)dst_new,
+                              dst_stride, (const uint8_t *)src, src_stride,
+                              width, height);
+                }
+            }
+        }
+    }
+}
+
+#undef NUM_LINES
+#undef MAX_LINE_SIZE
+
+void checkasm_check_sw_xyz2rgb(void)
+{
+    check_xyz12Torgb48le();
+    report("xyz12Torgb48le");
+}
diff --git a/tests/fate/checkasm.mak b/tests/fate/checkasm.mak
index 48edd17bf2..f26e534591 100644
--- a/tests/fate/checkasm.mak
+++ b/tests/fate/checkasm.mak
@@ -55,6 +55,7 @@ FATE_CHECKASM = fate-checkasm-aacencdsp                                 \
                 fate-checkasm-sw_range_convert                          \
                 fate-checkasm-sw_rgb                                    \
                 fate-checkasm-sw_scale                                  \
+                fate-checkasm-sw_xyz2rgb                                \
                 fate-checkasm-sw_yuv2rgb                                \
                 fate-checkasm-sw_yuv2yuv                                \
                 fate-checkasm-takdsp                                    \
-- 
2.52.0


From e9a0be009a64a779efe9db375d6e3e81e87c25a0 Mon Sep 17 00:00:00 2001
From: Arpad Panyik <Arpad.Panyik@arm.com>
Date: Wed, 26 Nov 2025 16:38:16 +0000
Subject: [PATCH 220/304] swscale: Add AArch64 Neon path for xyz12Torgb48 LE

Add optimized Neon code path for the little endian case of the
xyz12Torgb48 function. The innermost loop processes the data in 4x2
pixel blocks using software gathers with the matrix multiplication
and clipping done by Neon.

Relative runtime of micro benchmarks after this patch on some
Cortex and Neoverse CPU cores:

 xyz12le_rgb48le    X1      X3      X4    X925      V2
 16x4_neon:       2.55x   4.34x   3.84x   3.31x   3.22x
 32x4_neon:       2.39x   3.63x   3.22x   3.35x   3.29x
 64x4_neon:       2.37x   3.31x   2.91x   3.33x   3.27x
 128x4_neon:      2.34x   3.28x   2.91x   3.35x   3.24x
 256x4_neon:      2.30x   3.17x   2.91x   3.32x   3.10x
 512x4_neon:      2.26x   3.10x   2.91x   3.30x   3.07x
 1024x4_neon:     2.26x   3.07x   2.96x   3.30x   3.05x
 1920x4_neon:     2.26x   3.06x   2.93x   3.28x   3.04x

 xyz12le_rgb48le   A76     A78    A715    A720    A725
 16x4_neon:       2.33x   2.28x   2.53x   3.33x   3.19x
 32x4_neon:       2.35x   2.18x   2.45x   3.23x   3.24x
 64x4_neon:       2.35x   2.16x   2.42x   3.15x   3.21x
 128x4_neon:      2.35x   2.13x   2.39x   3.00x   3.09x
 256x4_neon:      2.36x   2.12x   2.35x   2.85x   2.99x
 512x4_neon:      2.35x   2.14x   2.35x   2.78x   2.95x
 1024x4_neon:     2.31x   2.09x   2.33x   2.80x   2.91x
 1920x4_neon:     2.30x   2.07x   2.32x   2.81x   2.94x

 xyz12le_rgb48le   A55    A510    A520
 16x4_neon:       2.09x   1.92x   2.36x
 32x4_neon:       2.05x   1.89x   2.38x
 64x4_neon:       2.02x   1.77x   2.35x
 128x4_neon:      1.96x   1.74x   2.25x
 256x4_neon:      1.90x   1.72x   2.19x
 512x4_neon:      1.83x   1.75x   2.16x
 1024x4_neon:     1.83x   1.62x   2.15x
 1920x4_neon:     1.82x   1.60x   2.15x

Signed-off-by: Arpad Panyik <Arpad.Panyik@arm.com>
---
 libswscale/aarch64/Makefile       |   1 +
 libswscale/aarch64/asm-offsets.h  |  36 ++
 libswscale/aarch64/swscale.c      |  40 ++
 libswscale/aarch64/xyz2rgb_neon.S | 702 ++++++++++++++++++++++++++++++
 libswscale/swscale.c              |   4 +
 libswscale/swscale_internal.h     |   2 +
 6 files changed, 785 insertions(+)
 create mode 100644 libswscale/aarch64/asm-offsets.h
 create mode 100644 libswscale/aarch64/xyz2rgb_neon.S

diff --git a/libswscale/aarch64/Makefile b/libswscale/aarch64/Makefile
index 1de8c9c0d6..1c82e34e28 100644
--- a/libswscale/aarch64/Makefile
+++ b/libswscale/aarch64/Makefile
@@ -8,4 +8,5 @@ NEON-OBJS   += aarch64/hscale.o                 \
                aarch64/range_convert_neon.o     \
                aarch64/rgb2rgb_neon.o           \
                aarch64/swscale_unscaled_neon.o  \
+               aarch64/xyz2rgb_neon.o           \
                aarch64/yuv2rgb_neon.o           \
diff --git a/libswscale/aarch64/asm-offsets.h b/libswscale/aarch64/asm-offsets.h
new file mode 100644
index 0000000000..110389c965
--- /dev/null
+++ b/libswscale/aarch64/asm-offsets.h
@@ -0,0 +1,36 @@
+/*
+ * Copyright (c) 2025 Arpad Panyik <Arpad.Panyik@arm.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef SWSCALE_AARCH64_ASM_OFFSETS_H
+#define SWSCALE_AARCH64_ASM_OFFSETS_H
+
+/* SwsLuts */
+#define SL_IN  0x00
+#define SL_OUT 0x08
+
+/* SwsColorXform */
+#define SCX_GAMMA     0x00
+#define SCX_MAT       0x10
+#define SCX_GAMMA_IN  (SCX_GAMMA + SL_IN)
+#define SCX_GAMMA_OUT (SCX_GAMMA + SL_OUT)
+#define SCX_MAT_00    SCX_MAT
+#define SCX_MAT_22    (SCX_MAT + 8 * 2)
+
+#endif /* SWSCALE_AARCH64_ASM_OFFSETS_H */
diff --git a/libswscale/aarch64/swscale.c b/libswscale/aarch64/swscale.c
index 55fff03a5a..4f86364f4b 100644
--- a/libswscale/aarch64/swscale.c
+++ b/libswscale/aarch64/swscale.c
@@ -21,6 +21,35 @@
 #include "libswscale/swscale.h"
 #include "libswscale/swscale_internal.h"
 #include "libavutil/aarch64/cpu.h"
+#include "asm-offsets.h"
+
+#define SIZEOF_MEMBER(type, member) \
+    sizeof(((type*)0)->member)
+
+static_assert(offsetof(SwsLuts, in)  == SL_IN,  "struct layout mismatch");
+static_assert(offsetof(SwsLuts, out) == SL_OUT, "struct layout mismatch");
+
+static_assert(offsetof(SwsColorXform, gamma) == SCX_GAMMA,
+              "struct layout mismatch");
+static_assert(offsetof(SwsColorXform, mat) == SCX_MAT,
+              "struct layout mismatch");
+
+static_assert(offsetof(SwsColorXform, mat) +
+              2 * SIZEOF_MEMBER(SwsColorXform, mat[0]) +
+              2 * SIZEOF_MEMBER(SwsColorXform, mat[0][0]) == SCX_MAT_22,
+              "struct layout mismatch");
+
+void ff_xyz12Torgb48le_neon_asm(const SwsColorXform *c, uint8_t *dst,
+                                int dst_stride, const uint8_t *src,
+                                int src_stride, int w, int h);
+
+static void xyz12Torgb48le_neon(const SwsInternal *c, uint8_t *dst,
+                                int dst_stride, const uint8_t *src,
+                                int src_stride, int w, int h)
+{
+    ff_xyz12Torgb48le_neon_asm(&c->xyz2rgb, dst, dst_stride, src, src_stride,
+                               w, h);
+}
 
 void ff_hscale16to15_4_neon_asm(int shift, int16_t *_dst, int dstW,
                       const uint8_t *_src, const int16_t *filter,
@@ -307,6 +336,17 @@ av_cold void ff_sws_init_range_convert_aarch64(SwsInternal *c)
     }
 }
 
+av_cold void ff_sws_init_xyzdsp_aarch64(SwsInternal *c)
+{
+    int cpu_flags = av_get_cpu_flags();
+
+    if (have_neon(cpu_flags)) {
+        if (!isBE(c->opts.src_format)) {
+            c->xyz12Torgb48 = xyz12Torgb48le_neon;
+        }
+    }
+}
+
 av_cold void ff_sws_init_swscale_aarch64(SwsInternal *c)
 {
     int cpu_flags = av_get_cpu_flags();
diff --git a/libswscale/aarch64/xyz2rgb_neon.S b/libswscale/aarch64/xyz2rgb_neon.S
new file mode 100644
index 0000000000..4b2135085a
--- /dev/null
+++ b/libswscale/aarch64/xyz2rgb_neon.S
@@ -0,0 +1,702 @@
+/*
+ * Copyright (c) 2025 Arpad Panyik <Arpad.Panyik@arm.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/aarch64/asm.S"
+#include "asm-offsets.h"
+
+#define JUMP_ALIGN 2
+#define LOOP_ALIGN 2
+
+function ff_xyz12Torgb48le_neon_asm, export=1
+// x0  const SwsColorXform *c
+// x1  uint8_t *dst
+// w2  int dst_stride
+// x3  const uint8_t *src
+// w4  int src_stride
+// w5  int w
+// w6  int h
+
+        ldp             x7,  x8, [x0, #(SCX_GAMMA_IN)]  // gamma.in, gamma.out
+        ldr             q6,  [x0, #(SCX_MAT_00)]        // mat[0][0]..[2][1]
+        ldr             h7,  [x0, #(SCX_MAT_22)]        // mat[2][2]; > 0
+
+        add             w9,  w5,  w5, lsl #1        // w * 3
+        add             x17, x3,  w4, sxtw          // sr2 = src + src_stride
+        add             x16, x1,  w2, sxtw          // ds2 = dst + dst_stride
+        sub             w4,  w4,  w9                // src_stride - w * 3
+        sub             w2,  w2,  w9                // dst_stride - w * 3
+        abs             v6.8h,  v6.8h               // abs(mat[0][0]..[2][1])
+        sbfiz           x4,  x4,  #1, #32           // src_stride * 2 - w * 6
+        sbfiz           x2,  x2,  #1, #32           // dst_stride * 2 - w * 6
+
+        subs            w6,  w6,  #2
+        b.lt            6f                          // h < 2
+
+        stp             x19, x20, [sp, #-64]!
+        stp             x21, x22, [sp, #16]
+        stp             x23, x24, [sp, #32]
+        str             x25, [sp, #48]
+
+        .align LOOP_ALIGN
+1:      // yp loop for 2x4 pixels
+        subs            w0,  w5,  #4
+        b.lt            3f                          // w < 4
+
+        .align LOOP_ALIGN
+2:      // xp loop for 2x4 pixels: XYZ0[0..3], XYZ1[0..3]
+        ldp             x9,  x10, [x3]              // x9  = X0[0] Y0[0] Z0[0] X0[1], x10 = Y0[1] Z0[1] X0[2] Y0[2]
+        ldr             x11, [x3, #16]              // x11 = Z0[2] X0[3] Y0[3] Z0[3]
+        add             x3,  x3,  #24
+        ubfx            x12, x9,  #4,  #12          // X0[0] >> 4
+        lsr             x13, x9,  #52               // X0[1] >> 4
+        ubfx            x14, x10, #36, #12          // X0[2] >> 4
+        ubfx            x15, x11, #20, #12          // X0[3] >> 4
+
+        ldp             x19, x20, [x17]             // x19 = X1[0] Y1[0] Z1[0] X1[1], x20 = Y1[1] Z1[1] X1[2] Y1[2]
+        ldr             x21, [x17, #16]             // x21 = Z1[2] X1[3] Y1[3] Z1[3]
+        add             x17, x17, #24
+        ubfx            x22, x19, #4, #12           // X1[0] >> 4
+        lsr             x23, x19, #52               // X1[1] >> 4
+        ubfx            x24, x20, #36, #12          // X1[2] >> 4
+        ubfx            x25, x21, #20, #12          // X1[3] >> 4
+
+        ldr             h0,  [x7, x12, lsl #1]      // gamma.in[X0[0] >> 4]
+        ubfx            x12, x9,  #20, #12          // Y0[0] >> 4
+        ldr             h16, [x7, x13, lsl #1]      // gamma.in[X0[1] >> 4]
+        ubfx            x13, x10, #4, #12           // Y0[1] >> 4
+        ldr             h17, [x7, x14, lsl #1]      // gamma.in[X0[2] >> 4]
+        lsr             x14, x10, #52               // Y0[2] >> 4
+        ldr             h18, [x7, x15, lsl #1]      // gamma.in[X0[3] >> 4]
+        ubfx            x15, x11, #36, #12          // Y0[3] >> 4
+
+        ldr             h20, [x7, x22, lsl #1]      // gamma.in[X1[0] >> 4]
+        ubfx            x22, x19, #20, #12          // Y1[0] >> 4
+        ldr             h26, [x7, x23, lsl #1]      // gamma.in[X1[1] >> 4]
+        ubfx            x23, x20, #4,  #12          // Y1[1] >> 4
+        ldr             h27, [x7, x24, lsl #1]      // gamma.in[X1[2] >> 4]
+        lsr             x24, x20, #52               // Y1[2] >> 4
+        ldr             h28, [x7, x25, lsl #1]      // gamma.in[X1[3] >> 4]
+        ubfx            x25, x21, #36, #12          // Y1[3] >> 4
+
+        mov             v0.h[1],  v16.h[0]          // v0.4h  = gamma.in[X0[0..1] >> 4]
+        mov             v17.h[1], v18.h[0]          // v17.4h = gamma.in[X0[2..3] >> 4]
+        mov             v0.s[1],  v17.s[0]          // v0.4h  = gamma.in[X0[0..3] >> 4]
+        ldr             h1,  [x7, x12, lsl #1]      // gamma.in[Y0[0] >> 4]
+        umull           v3.4s, v0.4h, v6.h[0]       // R0[0..3] = gamma.in[X0[0..3] >> 4] * mat[0][0]
+        umull           v5.4s, v0.4h, v6.h[6]       // B0[0..3] = gamma.in[X0[0..3] >> 4] * mat[2][0]
+        ubfx            x12, x9,  #36, #12          // Z0[0] >> 4
+        ldr             h16, [x7, x13, lsl #1]      // gamma.in[Y0[1] >> 4]
+
+        mov             v20.h[1], v26.h[0]          // v20.4h = gamma.in[X1[0..1] >> 4]
+        mov             v27.h[1], v28.h[0]          // v27.4h = gamma.in[X1[2..3] >> 4]
+        mov             v20.s[1], v27.s[0]          // v20.4h = gamma.in[X1[0..3] >> 4]
+        ldr             h21, [x7, x22, lsl #1]      // gamma.in[Y1[0] >> 4]
+        umull           v23.4s, v20.4h, v6.h[0]     // R1[0..3] = gamma.in[X1[0..3] >> 4] * mat[0][0]
+        umull           v25.4s, v20.4h, v6.h[6]     // B1[0..3] = gamma.in[X1[0..3] >> 4] * mat[2][0]
+        ubfx            x22, x19, #36, #12          // Z1[0] >> 4
+        ldr             h26, [x7, x23, lsl #1]      // gamma.in[Y1[1] >> 4]
+
+        ubfx            x13, x10, #20, #12          // Z0[1] >> 4
+        ldr             h17, [x7, x14, lsl #1]      // gamma.in[Y0[2] >> 4]
+        ubfx            x14, x11, #4,  #12          // Z0[2] >> 4
+        ldr             h18, [x7, x15, lsl #1]      // gamma.in[Y0[3] >> 4]
+        lsr             x15, x11, #52               // Z0[3] >> 4
+        mov             v1.h[1],  v16.h[0]          // v1.4h  = gamma.in[Y0[0..1] >> 4]
+        mov             v17.h[1], v18.h[0]          // v17.4h = gamma.in[Y0[2..3] >> 4]
+        mov             v1.s[1],  v17.s[0]          // v1.4h  = gamma.in[Y0[0..3] >> 4]
+
+        ubfx            x23, x20, #20, #12          // Z1[1] >> 4
+        ldr             h27, [x7, x24, lsl #1]      // gamma.in[Y1[2] >> 4]
+        ubfx            x24, x21, #4,  #12          // Z1[2] >> 4
+        ldr             h28, [x7, x25, lsl #1]      // gamma.in[Y1[3] >> 4]
+        umull           v4.4s,  v1.4h,  v6.h[4]     // G0[0..3]  = gamma.in[Y0[0..3] >> 4] * mat[1][1]
+        umlsl           v3.4s,  v1.4h,  v6.h[1]     // R0[0..3] -= gamma.in[Y0[0..3] >> 4] * mat[0][1]
+
+        lsr             x25, x21, #52               // Z1[3] >> 4
+        mov             v21.h[1], v26.h[0]          // v21.4h = gamma.in[Y1[0..1] >> 4]
+        mov             v27.h[1], v28.h[0]          // v27.4h = gamma.in[Y1[2..3] >> 4]
+        mov             v21.s[1], v27.s[0]          // v21.4h = gamma.in[Y1[0..3] >> 4]
+        umlsl           v4.4s,  v0.4h,  v6.h[3]     // G0[0..3] -= gamma.in[X0[0..3] >> 4] * mat[1][0]
+        umlsl           v5.4s,  v1.4h,  v6.h[7]     // B0[0..3] -= gamma.in[Y0[0..3] >> 4] * mat[2][1]
+
+        ldr             h2,  [x7, x12, lsl #1]      // gamma.in[Z0[0] >> 4]
+        ldr             h16, [x7, x13, lsl #1]      // gamma.in[Z0[1] >> 4]
+        ldr             h17, [x7, x14, lsl #1]      // gamma.in[Z0[2] >> 4]
+        ldr             h18, [x7, x15, lsl #1]      // gamma.in[Z0[3] >> 4]
+        umull           v24.4s, v21.4h, v6.h[4]     // G1[0..3]  = gamma.in[Y1[0..3] >> 4] * mat[1][1]
+        umlsl           v23.4s, v21.4h, v6.h[1]     // R1[0..3] -= gamma.in[Y1[0..3] >> 4] * mat[0][1]
+
+        mov             v2.h[1],  v16.h[0]          // v2.4h  = gamma.in[Z0[0..1] >> 4]
+        mov             v17.h[1], v18.h[0]          // v17.4h = gamma.in[Z0[2..3] >> 4]
+        mov             v2.s[1],  v17.s[0]          // v2.4h  = gamma.in[Z0[0..3] >> 4]
+        umlsl           v24.4s, v20.4h, v6.h[3]     // G1[0..3] -= gamma.in[X1[0..3] >> 4] * mat[1][0]
+        umlsl           v25.4s, v21.4h, v6.h[7]     // B1[0..3] -= gamma.in[Y1[0..3] >> 4] * mat[2][1]
+
+        ldr             h22, [x7, x22, lsl #1]      // gamma.in[Z1[0] >> 4]
+        ldr             h26, [x7, x23, lsl #1]      // gamma.in[Z1[1] >> 4]
+        ldr             h27, [x7, x24, lsl #1]      // gamma.in[Z1[2] >> 4]
+        ldr             h28, [x7, x25, lsl #1]      // gamma.in[Z1[3] >> 4]
+        mov             v22.h[1], v26.h[0]          // v22.4h = gamma.in[Z1[0..1] >> 4]
+        mov             v27.h[1], v28.h[0]          // v27.4h = gamma.in[Z1[2..3] >> 4]
+        mov             v22.s[1], v27.s[0]          // v22.4h = gamma.in[Z1[0..3] >> 4]
+
+        umlsl           v3.4s,  v2.4h,  v6.h[2]     // R0[0..3] -= gamma.in[Z0[0..3] >> 4] * mat[0][2]
+        sqshrun         v3.4h,  v3.4s,  #12         // clip(R0[0..3] >> 12)
+        umlal           v4.4s,  v2.4h,  v6.h[5]     // G0[0..3] += gamma.in[Z0[0..3] >> 4] * mat[1][2]
+        sqshrun         v4.4h,  v4.4s,  #12         // clip(G0[0..3] >> 12)
+        umov            w9,  v3.h[0]                // clip(R0[0] >> 12)
+        umov            w10, v4.h[1]                // clip(G0[1] >> 12)
+        umlal           v5.4s,  v2.4h,  v7.h[0]     // B0[0..3] += gamma.in[Z0[0..3] >> 4] * mat[2][2]
+        sqshrun         v5.4h,  v5.4s,  #12         // clip(B0[0..3] >> 12)
+
+        umlsl           v23.4s, v22.4h, v6.h[2]     // R1[0..3] -= gamma.in[Z1[0..3] >> 4] * mat[0][2]
+        sqshrun         v23.4h, v23.4s, #12         // clip(R1[0..3] >> 12)
+        umlal           v24.4s, v22.4h, v6.h[5]     // G1[0..3] += gamma.in[Z1[0..3] >> 4] * mat[1][2]
+        sqshrun         v24.4h, v24.4s, #12         // clip(G1[0..3] >> 12)
+        umov            w19, v23.h[0]               // clip(R1[0] >> 12)
+        umov            w20, v24.h[1]               // clip(G1[1] >> 12)
+        umlal           v25.4s, v22.4h, v7.h[0]     // B1[0..3] += gamma.in[Z1[0..3] >> 4] * mat[2][2]
+        sqshrun         v25.4h, v25.4s, #12         // clip(B1[0..3] >> 12)
+
+        umov            w11, v5.h[2]                // clip(B0[2] >> 12)
+        umov            w12, v4.h[0]                // clip(G0[0] >> 12)
+        ldrh            w9,  [x8, x9,  lsl #1]      // R0[0] = gamma.out[clip(R0[0] >> 12)]
+        lsl             x9,  x9,  #4                // R0[0] << 4
+        umov            w13, v5.h[1]                // clip(B0[1] >> 12)
+        ldrh            w10, [x8, x10, lsl #1]      // G0[1] = gamma.out[clip(G0[1] >> 12)]
+        lsl             x10, x10, #4                // G0[1] << 4
+
+        umov            w21, v25.h[2]               // clip(B1[2] >> 12)
+        umov            w22, v24.h[0]               // clip(G1[0] >> 12)
+        ldrh            w19, [x8, x19, lsl #1]      // R1[0] = gamma.out[clip(R1[0] >> 12)]
+        lsl             x19, x19, #4                // R1[0] << 4
+        umov            w23, v25.h[1]               // clip(B1[1] >> 12)
+        ldrh            w20, [x8, x20, lsl #1]      // G1[1] = gamma.out[clip(G1[1] >> 12)]
+        lsl             x20, x20, #4                // G1[1] << 4
+
+        umov            w14, v3.h[3]                // clip(R0[3] >> 12)
+        ldrh            w11, [x8, x11, lsl #1]      // B0[2] = gamma.out[clip(B0[2] >> 12)]
+        lsl             x11, x11, #4                // B0[2] << 4
+        umov            w15, v5.h[0]                // clip(B0[0] >> 12)
+        ldrh            w12, [x8, x12, lsl #1]      // G0[0] = gamma.out[clip(G0[0] >> 12)]
+        orr             x9,  x9,  x12, lsl #20      // R0[0] << 4, G0[0] << 4
+        umov            w12, v3.h[2]                // clip(R0[2] >> 12)
+        ldrh            w13, [x8, x13, lsl #1]      // B0[1] = gamma.out[clip(B0[1] >> 12)]
+
+        umov            w24, v23.h[3]               // clip(R1[3] >> 12)
+        ldrh            w21, [x8, x21, lsl #1]      // B1[2] = gamma.out[clip(B1[2] >> 12)]
+        lsl             x21, x21, #4                // B1[2] << 4
+        umov            w25, v25.h[0]               // clip(B1[0] >> 12)
+        ldrh            w22, [x8, x22, lsl #1]      // G1[0] = gamma.out[clip(G1[0] >> 12)]
+        orr             x19, x19, x22, lsl #20      // R1[0] << 4, G1[0] << 4
+        umov            w22, v23.h[2]               // clip(R1[2] >> 12)
+        ldrh            w23, [x8, x23, lsl #1]      // B1[1] = gamma.out[clip(B1[1] >> 12)]
+
+        orr             x10, x10, x13, lsl #20      // G0[1] << 4, B0[1] << 4
+        umov            w13, v4.h[3]                // clip(G0[3] >> 12)
+        ldrh            w14, [x8, x14, lsl #1]      // R0[3] = gamma.out[clip(R0[3] >> 12)]
+        orr             x11, x11, x14, lsl #20      // B0[2] << 4, R0[3] << 4
+        umov            w14, v3.h[1]                // clip(R0[1] >> 12)
+        ldrh            w15, [x8, x15, lsl #1]      // B0[0] = gamma.out[clip(B0[0] >> 12)]
+        orr             x9,  x9,  x15, lsl #36      // R0[0] << 4, G0[0] << 4, B0[0] << 4
+        umov            w15, v4.h[2]                // clip(G0[2] >> 12)
+
+        orr             x20, x20, x23, lsl #20      // G1[1] << 4, B1[1] << 4
+        umov            w23, v24.h[3]               // clip(G1[3] >> 12)
+        ldrh            w24, [x8, x24, lsl #1]      // R1[3] = gamma.out[clip(R1[3] >> 12)]
+        orr             x21, x21, x24, lsl #20      // B1[2] << 4, R1[3] << 4
+        umov            w24, v23.h[1]               // clip(R1[1] >> 12)
+        ldrh            w25, [x8, x25, lsl #1]      // B1[0] = gamma.out[clip(B1[0] >> 12)]
+        orr             x19, x19, x25, lsl #36      // R1[0] << 4, G1[0] << 4, B1[0] << 4
+        umov            w25, v24.h[2]               // clip(G1[2] >> 12)
+
+        ldrh            w12, [x8, x12, lsl #1]      // R0[2] = gamma.out[clip(R0[2] >> 12)]
+        orr             x10, x10, x12, lsl #36      // G0[1] << 4, B0[1] << 4, R0[2] << 4
+        umov            w12, v5.h[3]                // clip(B0[3] >> 12)
+        ldrh            w13, [x8, x13, lsl #1]      // G0[3] = gamma.out[clip(G0[3] >> 12)]
+        orr             x11, x11, x13, lsl #36      // B0[2] << 4, R0[3] << 4, G0[3] << 4
+        ldrh            w14, [x8, x14, lsl #1]      // R0[1] = gamma.out[clip(R0[1] >> 12)]
+        orr             x9,  x9,  x14, lsl #52      // x9  = R0[0] << 4, G0[0] << 4, B0[0] << 4, R0[1] << 4
+        ldrh            w15, [x8, x15, lsl #1]      // G0[2] = gamma.out[clip(G0[2] >> 12)]
+        orr             x10, x10, x15, lsl #52      // x10 = G0[1] << 4, B0[1] << 4, R0[2] << 4, G0[2] << 4
+        ldrh            w12, [x8, x12, lsl #1]      // B0[3] = gamma.out[clip(B0[3] >> 12)]
+        orr             x11, x11, x12, lsl #52      // x11 = B0[2] << 4, R0[3] << 4, G0[3] << 4, B0[3] << 4
+        stp             x9,  x10, [x1]
+        str             x11, [x1, #16]
+
+        ldrh            w22, [x8, x22, lsl #1]      // R1[2] = gamma.out[clip(R1[2] >> 12)]
+        orr             x20, x20, x22, lsl #36      // G1[1] << 4, B1[1] << 4, R1[2] << 4
+        umov            w22, v25.h[3]               // clip(B1[3] >> 12)
+        ldrh            w23, [x8, x23, lsl #1]      // G1[3] = gamma.out[clip(G1[3] >> 12)]
+        orr             x21, x21, x23, lsl #36      // B1[2] << 4, R1[3] << 4, G1[3] << 4
+        ldrh            w24, [x8, x24, lsl #1]      // R1[1] = gamma.out[clip(R1[1] >> 12)]
+        orr             x19, x19, x24, lsl #52      // x19 = R1[0] << 4, G1[0] << 4, B1[0] << 4, R1[1] << 4
+        ldrh            w25, [x8, x25, lsl #1]      // G1[2] = gamma.out[clip(G1[2] >> 12)]
+        orr             x20, x20, x25, lsl #52      // x20 = G1[1] << 4, B1[1] << 4, R1[2] << 4, G1[2] << 4
+        ldrh            w22, [x8, x22, lsl #1]      // B1[3] = gamma.out[clip(B1[3] >> 12)]
+        orr             x21, x21, x22, lsl #52      // x21 = B1[2] << 4, R1[3] << 4, G1[3] << 4, B1[3] << 4
+        stp             x19, x20, [x16]
+        str             x21, [x16, #16]
+
+        add             x1,  x1,  #24
+        add             x16, x16, #24
+
+        subs            w0,  w0,  #4
+        b.ge            2b
+
+        .align JUMP_ALIGN
+3:
+        tst             w5,  #3
+        b.eq            5f                          // no residual pixels; (w & 3) == 0
+
+        ldr             w10, [x3]                   // w10 = X0[0] Y0[0]
+        ldrh            w11, [x3, #4]               // w11 = Z0[0]
+        add             x3,  x3,  #6
+        ldr             w20, [x17]                  // w20 = X1[0] Y1[0]
+        ldrh            w21, [x17, #4]              // w21 = Z1[0]
+        add             x17, x17, #6
+        ubfx            w9,  w10, #4,  #12          // X0[0] >> 4
+        ubfx            w10, w10, #20, #12          // Y0[0] >> 4
+        lsr             w11, w11, #4                // Z0[0] >> 4
+        ldr             h0,  [x7, x9,  lsl #1]      // v0.4h = gamma.in[X0[0] >> 4]
+        ldr             h1,  [x7, x10, lsl #1]      // v1.4h = gamma.in[Y0[0] >> 4]
+        ldr             h2,  [x7, x11, lsl #1]      // v2.4h = gamma.in[Z0[0] >> 4]
+        ubfx            w19, w20, #4,  #12          // X1[0] >> 4
+        ubfx            w20, w20, #20, #12          // Y1[0] >> 4
+        lsr             w21, w21, #4                // Z1[0] >> 4
+        ldr             h20, [x7, x19, lsl #1]      // v20.4h = gamma.in[X1[0] >> 4]
+        ldr             h21, [x7, x20, lsl #1]      // v21.4h = gamma.in[Y1[0] >> 4]
+        ldr             h22, [x7, x21, lsl #1]      // v22.4h = gamma.in[Z1[0] >> 4]
+
+        cmp             w0,  #-2
+        b.lt            4f                          // (w & 3) == 1
+
+        ldr             w10, [x3]                   // w10 = X0[1] Y0[1]
+        ldrh            w11, [x3, #4]               // w11 = Z0[1]
+        add             x3,  x3,  #6
+        ldr             w20, [x17]                  // w20 = X1[1] Y1[1]
+        ldrh            w21, [x17, #4]              // w21 = Z1[1]
+        add             x17, x17,  #6
+        ubfx            w9,  w10, #4,  #12          // X0[1] >> 4
+        ubfx            w10, w10, #20, #12          // Y0[1] >> 4
+        lsr             w11, w11, #4                // Z0[1] >> 4
+        ldr             h16, [x7, x9,  lsl #1]      // gamma.in[X0[1] >> 4]
+        ldr             h17, [x7, x10, lsl #1]      // gamma.in[Y0[1] >> 4]
+        ldr             h18, [x7, x11, lsl #1]      // gamma.in[Z0[1] >> 4]
+        ubfx            w19, w20, #4,  #12          // X1[1] >> 4
+        ubfx            w20, w20, #20, #12          // Y1[1] >> 4
+        lsr             w21, w21, #4                // Z1[1] >> 4
+        ldr             h23, [x7, x19, lsl #1]      // gamma.in[X1[1] >> 4]
+        ldr             h24, [x7, x20, lsl #1]      // gamma.in[Y1[1] >> 4]
+        ldr             h25, [x7, x21, lsl #1]      // gamma.in[Z1[1] >> 4]
+        mov             v0.h[1],  v16.h[0]          // v0.4h = gamma.in[X0[0..1] >> 4]
+        mov             v1.h[1],  v17.h[0]          // v1.4h = gamma.in[Y0[0..1] >> 4]
+        mov             v2.h[1],  v18.h[0]          // v2.4h = gamma.in[Z0[0..1] >> 4]
+        mov             v20.h[1], v23.h[0]          // v20.4h = gamma.in[X1[0..1] >> 4]
+        mov             v21.h[1], v24.h[0]          // v21.4h = gamma.in[Y1[0..1] >> 4]
+        mov             v22.h[1], v25.h[0]          // v22.4h = gamma.in[Z1[0..1] >> 4]
+
+        b.le            4f                          // (w & 3) == 2
+
+        ldr             w10, [x3]                   // w10 = X0[2] Y0[2]
+        ldrh            w11, [x3, #4]               // w11 = Z0[2]
+        add             x3,  x3,  #6
+        ldr             w20, [x17]                  // w20 = X1[2] Y1[2]
+        ldrh            w21, [x17, #4]              // w21 = Z1[2]
+        add             x17, x17, #6
+        ubfx            w9,  w10, #4,  #12          // X0[2] >> 4
+        ubfx            w10, w10, #20, #12          // Y0[2] >> 4
+        lsr             w11, w11, #4                // Z0[2] >> 4
+        ldr             h16, [x7, x9,  lsl #1]      // gamma.in[X0[2] >> 4]
+        ldr             h17, [x7, x10, lsl #1]      // gamma.in[Y0[2] >> 4]
+        ldr             h18, [x7, x11, lsl #1]      // gamma.in[Z0[2] >> 4]
+        ubfx            w19, w20, #4,  #12          // X1[2] >> 4
+        ubfx            w20, w20, #20, #12          // Y1[2] >> 4
+        lsr             w21, w21, #4                // Z1[2] >> 4
+        ldr             h23, [x7, x19, lsl #1]      // gamma.in[X1[2] >> 4]
+        ldr             h24, [x7, x20, lsl #1]      // gamma.in[Y1[2] >> 4]
+        ldr             h25, [x7, x21, lsl #1]      // gamma.in[Z1[2] >> 4]
+        mov             v0.h[2],  v16.h[0]          // v0.4h = gamma.in[X0[0..2] >> 4]
+        mov             v1.h[2],  v17.h[0]          // v1.4h = gamma.in[Y0[0..2] >> 4]
+        mov             v2.h[2],  v18.h[0]          // v2.4h = gamma.in[Z0[0..2] >> 4]
+        mov             v20.h[2], v23.h[0]          // v20.4h = gamma.in[X1[0..2] >> 4]
+        mov             v21.h[2], v24.h[0]          // v21.4h = gamma.in[Y1[0..2] >> 4]
+        mov             v22.h[2], v25.h[0]          // v22.4h = gamma.in[Z1[0..2] >> 4]
+
+        .align JUMP_ALIGN
+4:
+        umull           v3.4s,  v0.4h,  v6.h[0]     // R0[0..2] = gamma.in[X0[0..2] >> 4] * mat[0][0]
+        umull           v5.4s,  v0.4h,  v6.h[6]     // B0[0..2] = gamma.in[X0[0..2] >> 4] * mat[2][0]
+
+        umull           v23.4s, v20.4h, v6.h[0]     // R1[0..2] = gamma.in[X1[0..2] >> 4] * mat[0][0]
+        umull           v25.4s, v20.4h, v6.h[6]     // B1[0..2] = gamma.in[X1[0..2] >> 4] * mat[2][0]
+
+        umull           v4.4s,  v1.4h,  v6.h[4]     // G0[0..2]  = gamma.in[Y0[0..2] >> 4] * mat[1][1]
+        umlsl           v3.4s,  v1.4h,  v6.h[1]     // R0[0..2] -= gamma.in[Y0[0..2] >> 4] * mat[0][1]
+        umlsl           v4.4s,  v0.4h,  v6.h[3]     // G0[0..2] -= gamma.in[X0[0..2] >> 4] * mat[1][0]
+        umlsl           v5.4s,  v1.4h,  v6.h[7]     // B0[0..2] -= gamma.in[Y0[0..2] >> 4] * mat[2][1]
+
+        umull           v24.4s, v21.4h, v6.h[4]     // G1[0..2]  = gamma.in[Y1[0..2] >> 4] * mat[1][1]
+        umlsl           v23.4s, v21.4h, v6.h[1]     // R1[0..2] -= gamma.in[Y1[0..2] >> 4] * mat[0][1]
+        umlsl           v24.4s, v20.4h, v6.h[3]     // G1[0..2] -= gamma.in[X1[0..2] >> 4] * mat[1][0]
+        umlsl           v25.4s, v21.4h, v6.h[7]     // B1[0..2] -= gamma.in[Y1[0..2] >> 4] * mat[2][1]
+
+        umlsl           v3.4s,  v2.4h,  v6.h[2]     // R0[0..2] -= gamma.in[Z0[0..2] >> 4] * mat[0][2]
+        sqshrun         v3.4h,  v3.4s,  #12         // clip(R0[0..2] >> 12)
+        umlal           v4.4s,  v2.4h,  v6.h[5]     // G0[0..2] += gamma.in[Z0[0..2] >> 4] * mat[1][2]
+        sqshrun         v4.4h,  v4.4s,  #12         // clip(G0[0..2] >> 12)
+        umlal           v5.4s,  v2.4h,  v7.h[0]     // B0[0..2] += gamma.in[Z0[0..2] >> 4] * mat[2][2]
+        sqshrun         v5.4h,  v5.4s,  #12         // clip(B0[0..2] >> 12)
+
+        umlsl           v23.4s, v22.4h, v6.h[2]     // R1[0..2] -= gamma.in[Z1[0..2] >> 4] * mat[0][2]
+        sqshrun         v23.4h, v23.4s, #12         // clip(R1[0..2] >> 12)
+        umlal           v24.4s, v22.4h, v6.h[5]     // G1[0..2] += gamma.in[Z1[0..2] >> 4] * mat[1][2]
+        sqshrun         v24.4h, v24.4s, #12         // clip(G1[0..2] >> 12)
+        umlal           v25.4s, v22.4h, v7.h[0]     // B1[0..2] += gamma.in[Z1[0..2] >> 4] * mat[2][2]
+        sqshrun         v25.4h, v25.4s, #12         // clip(B1[0..2] >> 12)
+
+        umov            w9,  v3.h[0]                // clip(R0[0] >> 12)
+        umov            w10, v4.h[0]                // clip(G0[0] >> 12)
+        umov            w11, v5.h[0]                // clip(B0[0] >> 12)
+        ldrh            w9,  [x8, x9,  lsl #1]      // R0[0] = gamma.out[clip(R0[0] >> 12)]
+        ldrh            w10, [x8, x10, lsl #1]      // G0[0] = gamma.out[clip(G0[0] >> 12)]
+        ldrh            w11, [x8, x11, lsl #1]      // B0[0] = gamma.out[clip(B0[0] >> 12)]
+        umov            w19, v23.h[0]               // clip(R1[0] >> 12)
+        umov            w20, v24.h[0]               // clip(G1[0] >> 12)
+        umov            w21, v25.h[0]               // clip(B1[0] >> 12)
+        ldrh            w19, [x8, x19, lsl #1]      // R1[0] = gamma.out[clip(R1[0] >> 12)]
+        ldrh            w20, [x8, x20, lsl #1]      // G1[0] = gamma.out[clip(G1[0] >> 12)]
+        ldrh            w21, [x8, x21, lsl #1]      // B1[0] = gamma.out[clip(B1[0] >> 12)]
+        lsl             w9,  w9,  #4                // w9  = R0[0] << 4
+        lsl             w10, w10, #4                // w10 = G0[0] << 4
+        lsl             w11, w11, #4                // w11 = B0[0] << 4
+        strh            w9,  [x1]
+        strh            w10, [x1, #2]
+        strh            w11, [x1, #4]
+        lsl             w19, w19, #4                // w19 = R1[0] << 4
+        lsl             w20, w20, #4                // w20 = G1[0] << 4
+        lsl             w21, w21, #4                // w21 = B1[0] << 4
+        strh            w19, [x16]
+        strh            w20, [x16, #2]
+        strh            w21, [x16, #4]
+        add             x1,  x1,  #6
+        add             x16, x16, #6
+
+        cmp             w0,  #-2
+        b.lt            5f                          // (w & 3) == 1
+
+        umov            w9,  v3.h[1]                // clip(R0[1] >> 12)
+        umov            w10, v4.h[1]                // clip(G0[1] >> 12)
+        umov            w11, v5.h[1]                // clip(B0[1] >> 12)
+        ldrh            w9,  [x8, x9,  lsl #1]      // R0[1] = gamma.out[clip(R0[1] >> 12)]
+        ldrh            w10, [x8, x10, lsl #1]      // G0[1] = gamma.out[clip(G0[1] >> 12)]
+        ldrh            w11, [x8, x11, lsl #1]      // B0[1] = gamma.out[clip(B0[1] >> 12)]
+        umov            w19, v23.h[1]               // clip(R1[1] >> 12)
+        umov            w20, v24.h[1]               // clip(G1[1] >> 12)
+        umov            w21, v25.h[1]               // clip(B1[1] >> 12)
+        ldrh            w19, [x8, x19, lsl #1]      // R1[1] = gamma.out[clip(R1[1] >> 12)]
+        ldrh            w20, [x8, x20, lsl #1]      // G1[1] = gamma.out[clip(G1[1] >> 12)]
+        ldrh            w21, [x8, x21, lsl #1]      // B1[1] = gamma.out[clip(B1[1] >> 12)]
+        lsl             w9,  w9,  #4                // w9  = R0[1] << 4
+        lsl             w10, w10, #4                // w10 = G0[1] << 4
+        lsl             w11, w11, #4                // w11 = B0[1] << 4
+        strh            w9,  [x1]
+        strh            w10, [x1, #2]
+        strh            w11, [x1, #4]
+        lsl             w19, w19, #4                // w19 = R1[1] << 4
+        lsl             w20, w20, #4                // w20 = G1[1] << 4
+        lsl             w21, w21, #4                // w21 = B1[1] << 4
+        strh            w19, [x16]
+        strh            w20, [x16, #2]
+        strh            w21, [x16, #4]
+        add             x1,  x1,  #6
+        add             x16, x16, #6
+
+        b.le            5f                          // (w & 3) == 2
+
+        umov            w9,  v3.h[2]                // clip(R0[2] >> 12)
+        umov            w10, v4.h[2]                // clip(G0[2] >> 12)
+        umov            w11, v5.h[2]                // clip(B0[2] >> 12)
+        ldrh            w9,  [x8, x9,  lsl #1]      // R0[2] = gamma.out[clip(R0[2] >> 12)]
+        ldrh            w10, [x8, x10, lsl #1]      // G0[2] = gamma.out[clip(G0[2] >> 12)]
+        ldrh            w11, [x8, x11, lsl #1]      // B0[2] = gamma.out[clip(B0[2] >> 12)]
+        umov            w19, v23.h[2]               // clip(R1[2] >> 12)
+        umov            w20, v24.h[2]               // clip(G1[2] >> 12)
+        umov            w21, v25.h[2]               // clip(B1[2] >> 12)
+        ldrh            w19, [x8, x19, lsl #1]      // R1[2] = gamma.out[clip(R1[2] >> 12)]
+        ldrh            w20, [x8, x20, lsl #1]      // G1[2] = gamma.out[clip(G1[2] >> 12)]
+        ldrh            w21, [x8, x21, lsl #1]      // B1[2] = gamma.out[clip(B1[2] >> 12)]
+        lsl             w9,  w9,  #4                // w9  = R0[2] << 4
+        lsl             w10, w10, #4                // w10 = G0[2] << 4
+        lsl             w11, w11, #4                // w11 = B0[2] << 4
+        strh            w9,  [x1]
+        strh            w10, [x1, #2]
+        strh            w11, [x1, #4]
+        lsl             w19, w19, #4                // w19 = R1[2] << 4
+        lsl             w20, w20, #4                // w20 = G1[2] << 4
+        lsl             w21, w21, #4                // w21 = B1[2] << 4
+        strh            w19, [x16]
+        strh            w20, [x16, #2]
+        strh            w21, [x16, #4]
+        add             x1,  x1,  #6
+        add             x16, x16, #6
+
+        .align JUMP_ALIGN
+5:
+        add             x3,  x3,  x4
+        add             x17, x17, x4
+        add             x1,  x1,  x2
+        add             x16, x16, x2
+
+        subs            w6,  w6,  #2
+        b.ge            1b
+
+        ldp             x21, x22, [sp, #16]
+        ldp             x23, x24, [sp, #32]
+        ldr             x25, [sp, #48]
+        ldp             x19, x20, [sp], #64
+
+        .align JUMP_ALIGN
+6:
+        tbz             w6,  #0,  10f               // even number of lines; (h & 1) == 0
+
+        subs            w0,  w5,  #4
+        b.lt            8f                          // w < 4
+
+        .align LOOP_ALIGN
+7:      // loop for last odd line by 4 pixels: XYZ[0..3]
+        ldp             x9,  x10, [x3]              // x9  = X[0] Y[0] Z[0] X[1], x10 = Y[1] Z[1] X[2] Y[2]
+        ldr             x11, [x3, #16]              // x11 = Z[2] X[3] Y[3] Z[3]
+        add             x3,  x3,  #24
+
+        ubfx            x12, x9,  #4,  #12          // X[0] >> 4
+        lsr             x13, x9,  #52               // X[1] >> 4
+        ubfx            x14, x10, #36, #12          // X[2] >> 4
+        ubfx            x15, x11, #20, #12          // X[3] >> 4
+
+        ldr             h0,  [x7, x12, lsl #1]      // gamma.in[X[0] >> 4]
+        ubfx            x12, x9,  #20, #12          // Y[0] >> 4
+        ldr             h16, [x7, x13, lsl #1]      // gamma.in[X[1] >> 4]
+        ubfx            x13, x10, #4,  #12          // Y[1] >> 4
+        ldr             h17, [x7, x14, lsl #1]      // gamma.in[X[2] >> 4]
+        lsr             x14, x10, #52               // Y[2] >> 4
+        ldr             h18, [x7, x15, lsl #1]      // gamma.in[X[3] >> 4]
+        ubfx            x15, x11, #36, #12          // Y[3] >> 4
+        mov             v0.h[1],  v16.h[0]          // v0.4h  = gamma.in[X[0..1] >> 4]
+        mov             v17.h[1], v18.h[0]          // v17.4h = gamma.in[X[2..3] >> 4]
+        mov             v0.s[1],  v17.s[0]          // v0.4h  = gamma.in[X[0..3] >> 4]
+
+        umull           v3.4s,  v0.4h,  v6.h[0]     // R[0..3] = gamma.in[X[0..3] >> 4] * mat[0][0]
+        umull           v5.4s,  v0.4h,  v6.h[6]     // B[0..3] = gamma.in[X[0..3] >> 4] * mat[2][0]
+
+        ldr             h1,  [x7, x12, lsl #1]      // gamma.in[Y[0] >> 4]
+        ubfx            x12, x9,  #36, #12          // Z[0] >> 4
+        ldr             h16, [x7, x13, lsl #1]      // gamma.in[Y[1] >> 4]
+        ubfx            x13, x10, #20, #12          // Z[1] >> 4
+        ldr             h17, [x7, x14, lsl #1]      // gamma.in[Y[2] >> 4]
+        ubfx            x14, x11, #4,  #12          // Z[2] >> 4
+        ldr             h18, [x7, x15, lsl #1]      // gamma.in[Y[3] >> 4]
+        lsr             x15, x11, #52               // Z[3] >> 4
+        mov             v1.h[1],  v16.h[0]          // v1.4h  = gamma.in[Y[0..1] >> 4]
+        mov             v17.h[1], v18.h[0]          // v17.4h = gamma.in[Y[2..3] >> 4]
+        mov             v1.s[1],  v17.s[0]          // v1.4h  = gamma.in[Y[0..3] >> 4]
+
+        umull           v4.4s,  v1.4h,  v6.h[4]     // G[0..3]  = gamma.in[Y[0..3] >> 4] * mat[1][1]
+        umlsl           v3.4s,  v1.4h,  v6.h[1]     // R[0..3] -= gamma.in[Y[0..3] >> 4] * mat[0][1]
+        umlsl           v4.4s,  v0.4h,  v6.h[3]     // G[0..3] -= gamma.in[X[0..3] >> 4] * mat[1][0]
+        umlsl           v5.4s,  v1.4h,  v6.h[7]     // B[0..3] -= gamma.in[Y[0..3] >> 4] * mat[2][1]
+
+        ldr             h2,  [x7, x12, lsl #1]      // gamma.in[Z[0] >> 4]
+        ldr             h16, [x7, x13, lsl #1]      // gamma.in[Z[1] >> 4]
+        ldr             h17, [x7, x14, lsl #1]      // gamma.in[Z[2] >> 4]
+        ldr             h18, [x7, x15, lsl #1]      // gamma.in[Z[3] >> 4]
+        mov             v2.h[1],  v16.h[0]          // v2.4h  = gamma.in[Z[0..1] >> 4]
+        mov             v17.h[1], v18.h[0]          // v17.4h = gamma.in[Z[2..3] >> 4]
+        mov             v2.s[1],  v17.s[0]          // v2.4h  = gamma.in[Z[0..3] >> 4]
+
+        umlsl           v3.4s,  v2.4h,  v6.h[2]     // R[0..3] -= gamma.in[Z[0..3] >> 4] * mat[0][2]
+        sqshrun         v3.4h,  v3.4s,  #12         // clip(R[0..3] >> 12)
+        umlal           v4.4s,  v2.4h,  v6.h[5]     // G[0..3] += gamma.in[Z[0..3] >> 4] * mat[1][2]
+        sqshrun         v4.4h,  v4.4s,  #12         // clip(G[0..3] >> 12)
+        umlal           v5.4s,  v2.4h,  v7.h[0]     // B[0..3] += gamma.in[Z[0..3] >> 4] * mat[2][2]
+        sqshrun         v5.4h,  v5.4s,  #12         // clip(B[0..3] >> 12)
+
+        umov            w9,  v3.h[0]                // clip(R[0] >> 12)
+        umov            w10, v4.h[1]                // clip(G[1] >> 12)
+        umov            w11, v5.h[2]                // clip(B[2] >> 12)
+
+        umov            w12, v4.h[0]                // clip(G[0] >> 12)
+        ldrh            w9,  [x8, x9,  lsl #1]      // R[0] = gamma.out[clip(R[0] >> 12)]
+        lsl             x9,  x9,  #4                // R[0] << 4
+        umov            w13, v5.h[1]                // clip(B[1] >> 12)
+        ldrh            w10, [x8, x10, lsl #1]      // G[1] = gamma.out[clip(G[1] >> 12)]
+        lsl             x10, x10, #4                // G[1] << 4
+        umov            w14, v3.h[3]                // clip(R[3] >> 12)
+        ldrh            w11, [x8, x11, lsl #1]      // B[2] = gamma.out[clip(B[2] >> 12)]
+        lsl             x11, x11, #4                // B[2] << 4
+
+        umov            w15, v5.h[0]                // clip(B[0] >> 12)
+        ldrh            w12, [x8, x12, lsl #1]      // G[0] = gamma.out[clip(G[0] >> 12)]
+        orr             x9,  x9,  x12, lsl #20      // R[0] << 4, G[0] << 4
+        umov            w12, v3.h[2]                // clip(R[2] >> 12)
+        ldrh            w13, [x8, x13, lsl #1]      // B[1] = gamma.out[clip(B[1] >> 12)]
+        orr             x10, x10, x13, lsl #20      // G[1] << 4, B[1] << 4
+        umov            w13, v4.h[3]                // clip(G[3] >> 12)
+        ldrh            w14, [x8, x14, lsl #1]      // R[3] = gamma.out[clip(R[3] >> 12)]
+        orr             x11, x11, x14, lsl #20      // B[2] << 4, R[3] << 4
+
+        umov            w14, v3.h[1]                // clip(R[1] >> 12)
+        ldrh            w15, [x8, x15, lsl #1]      // B[0] = gamma.out[clip(B[0] >> 12)]
+        orr             x9,  x9,  x15, lsl #36      // R[0] << 4, G[0] << 4, B[0] << 4
+        umov            w15, v4.h[2]                // clip(G[2] >> 12)
+        ldrh            w12, [x8, x12, lsl #1]      // R[2] = gamma.out[clip(R[2] >> 12)]
+        orr             x10, x10, x12, lsl #36      // G[1] << 4, B[1] << 4, R[2] << 4
+        umov            w12, v5.h[3]                // clip(B[3] >> 12)
+        ldrh            w13, [x8, x13, lsl #1]      // G[3] = gamma.out[clip(G[3] >> 12)]
+        orr             x11, x11, x13, lsl #36      // B[2] << 4, R[3] << 4, G[3] << 4
+
+        ldrh            w14, [x8, x14, lsl #1]      // R[1] = gamma.out[clip(R[1] >> 12)]
+        orr             x9,  x9,  x14, lsl #52      // x9  = R[0] << 4, G[0] << 4, B[0] << 4, R[1] << 4
+        ldrh            w15, [x8, x15, lsl #1]      // G[2] = gamma.out[clip(G[2] >> 12)]
+        orr             x10, x10, x15, lsl #52      // x10 = G[1] << 4, B[1] << 4, R[2] << 4, G[2] << 4
+        ldrh            w12, [x8, x12, lsl #1]      // B[3] = gamma.out[clip(B[3] >> 12)]
+        orr             x11, x11, x12, lsl #52      // x11 = B[2] << 4, R[3] << 4, G[3] << 4, B[3] << 4
+
+        stp             x9,  x10, [x1]
+        str             x11, [x1, #16]
+        add             x1,  x1,  #24
+
+        subs            w0,  w0,  #4
+        b.ge            7b
+
+        .align JUMP_ALIGN
+8:
+        tst             w5,  #3
+        b.eq            10f                         // no residual pixels; (w & 3) == 0
+
+        ldr             w10, [x3]                   // w10 = X[0] Y[0]
+        ldrh            w11, [x3, #4]               // w11 = Z[0]
+        add             x3,  x3,  #6
+        ubfx            w9,  w10, #4,  #12          // X[0] >> 4
+        ubfx            w10, w10, #20, #12          // Y[0] >> 4
+        lsr             w11, w11, #4                // Z[0] >> 4
+        ldr             h0,  [x7, x9,  lsl #1]      // v0.4h = gamma.in[X[0] >> 4]
+        ldr             h1,  [x7, x10, lsl #1]      // v1.4h = gamma.in[Y[0] >> 4]
+        ldr             h2,  [x7, x11, lsl #1]      // v2.4h = gamma.in[Z[0] >> 4]
+
+        cmp             w0,  #-2
+        b.lt            9f                          // (w & 3) == 1
+
+        ldr             w10, [x3]                   // w10 = X[1] Y[1]
+        ldrh            w11, [x3, #4]               // w11 = Z[1]
+        add             x3,  x3,  #6
+        ubfx            w9,  w10, #4, #12           // X[1] >> 4
+        ubfx            w10, w10, #20, #12          // Y[1] >> 4
+        lsr             w11, w11, #4                // Z[1] >> 4
+        ldr             h16, [x7, x9,  lsl #1]      // gamma.in[X[1] >> 4]
+        ldr             h17, [x7, x10, lsl #1]      // gamma.in[Y[1] >> 4]
+        ldr             h18, [x7, x11, lsl #1]      // gamma.in[Z[1] >> 4]
+        mov             v0.h[1], v16.h[0]           // v0.4h = gamma.in[X[0..1] >> 4]
+        mov             v1.h[1], v17.h[0]           // v1.4h = gamma.in[Y[0..1] >> 4]
+        mov             v2.h[1], v18.h[0]           // v2.4h = gamma.in[Z[0..1] >> 4]
+
+        b.le            9f                          // (w & 3) == 2
+
+        ldr             w10, [x3]                   // w10 = X[2] Y[2]
+        ldrh            w11, [x3, #4]               // w11 = Z[2]
+        add             x3,  x3,  #6
+        ubfx            w9,  w10, #4, #12           // X[2] >> 4
+        ubfx            w10, w10, #20, #12          // Y[2] >> 4
+        lsr             w11, w11, #4                // Z[2] >> 4
+        ldr             h16, [x7, x9,  lsl #1]      // gamma.in[X[2] >> 4]
+        ldr             h17, [x7, x10, lsl #1]      // gamma.in[Y[2] >> 4]
+        ldr             h18, [x7, x11, lsl #1]      // gamma.in[Z[2] >> 4]
+        mov             v0.h[2], v16.h[0]           // v0.4h = gamma.in[X[0..2] >> 4]
+        mov             v1.h[2], v17.h[0]           // v1.4h = gamma.in[Y[0..2] >> 4]
+        mov             v2.h[2], v18.h[0]           // v2.4h = gamma.in[Z[0..2] >> 4]
+
+        .align JUMP_ALIGN
+9:
+        umull           v3.4s,  v0.4h,  v6.h[0]     // R[0..2] = gamma.in[X[0..2] >> 4] * mat[0][0]
+        umull           v5.4s,  v0.4h,  v6.h[6]     // B[0..2] = gamma.in[X[0..2] >> 4] * mat[2][0]
+
+        umull           v4.4s,  v1.4h,  v6.h[4]     // G[0..2]  = gamma.in[Y[0..2] >> 4] * mat[1][1]
+        umlsl           v3.4s,  v1.4h,  v6.h[1]     // R[0..2] -= gamma.in[Y[0..2] >> 4] * mat[0][1]
+        umlsl           v4.4s,  v0.4h,  v6.h[3]     // G[0..2] -= gamma.in[X[0..2] >> 4] * mat[1][0]
+        umlsl           v5.4s,  v1.4h,  v6.h[7]     // B[0..2] -= gamma.in[Y[0..2] >> 4] * mat[2][1]
+
+        umlsl           v3.4s,  v2.4h,  v6.h[2]     // R[0..2] -= gamma.in[Z[0..2] >> 4] * mat[0][2]
+        sqshrun         v3.4h,  v3.4s,  #12         // clip(R[0..2] >> 12)
+        umlal           v4.4s,  v2.4h,  v6.h[5]     // G[0..2] += gamma.in[Z[0..2] >> 4] * mat[1][2]
+        sqshrun         v4.4h,  v4.4s,  #12         // clip(G[0..2] >> 12)
+        umlal           v5.4s,  v2.4h,  v7.h[0]     // B[0..2] += gamma.in[Z[0..2] >> 4] * mat[2][2]
+        sqshrun         v5.4h,  v5.4s,  #12         // clip(B[0..2] >> 12)
+
+        umov            w9,  v3.h[0]                // clip(R[0] >> 12)
+        umov            w10, v4.h[0]                // clip(G[0] >> 12)
+        umov            w11, v5.h[0]                // clip(B[0] >> 12)
+        ldrh            w9,  [x8, x9,  lsl #1]      // R[0] = gamma.out[clip(R[0] >> 12)]
+        ldrh            w10, [x8, x10, lsl #1]      // G[0] = gamma.out[clip(G[0] >> 12)]
+        ldrh            w11, [x8, x11, lsl #1]      // B[0] = gamma.out[clip(B[0] >> 12)]
+        lsl             w9,  w9,  #4                // w9  = R[0] << 4
+        lsl             w10, w10, #4                // w10 = G[0] << 4
+        lsl             w11, w11, #4                // w11 = B[0] << 4
+        strh            w9,  [x1]
+        strh            w10, [x1, #2]
+        strh            w11, [x1, #4]
+        add             x1,  x1,  #6
+
+        cmp             w0,  #-2
+        b.lt            10f                         // (w & 3) == 1
+
+        umov            w9,  v3.h[1]                // clip(R[1] >> 12)
+        umov            w10, v4.h[1]                // clip(G[1] >> 12)
+        umov            w11, v5.h[1]                // clip(B[1] >> 12)
+        ldrh            w9,  [x8, x9,  lsl #1]      // R[1] = gamma.out[clip(R[1] >> 12)]
+        ldrh            w10, [x8, x10, lsl #1]      // G[1] = gamma.out[clip(G[1] >> 12)]
+        ldrh            w11, [x8, x11, lsl #1]      // B[1] = gamma.out[clip(B[1] >> 12)]
+        lsl             w9,  w9,  #4                // w9  = R[1] << 4
+        lsl             w10, w10, #4                // w10 = G[1] << 4
+        lsl             w11, w11, #4                // w11 = B[1] << 4
+        strh            w9,  [x1]
+        strh            w10, [x1, #2]
+        strh            w11, [x1, #4]
+        add             x1,  x1,  #6
+
+        b.le            10f                         // (w & 3) == 2
+
+        umov            w9,  v3.h[2]                // clip(R[2] >> 12)
+        umov            w10, v4.h[2]                // clip(G[2] >> 12)
+        umov            w11, v5.h[2]                // clip(B[2] >> 12)
+        ldrh            w9,  [x8, x9,  lsl #1]      // R[2] = gamma.out[clip(R[2] >> 12)]
+        ldrh            w10, [x8, x10, lsl #1]      // G[2] = gamma.out[clip(G[2] >> 12)]
+        ldrh            w11, [x8, x11, lsl #1]      // B[2] = gamma.out[clip(B[2] >> 12)]
+        lsl             w9,  w9,  #4                // w9  = R[2] << 4
+        lsl             w10, w10, #4                // w10 = G[2] << 4
+        lsl             w11, w11, #4                // w11 = B[2] << 4
+        strh            w9,  [x1]
+        strh            w10, [x1, #2]
+        strh            w11, [x1, #4]
+        add             x1,  x1,  #6
+
+        .align JUMP_ALIGN
+10:
+        ret
+endfunc
diff --git a/libswscale/swscale.c b/libswscale/swscale.c
index 95a61a4183..96df4ed3f4 100644
--- a/libswscale/swscale.c
+++ b/libswscale/swscale.c
@@ -861,6 +861,10 @@ av_cold void ff_sws_init_xyzdsp(SwsInternal *c)
 {
     c->xyz12Torgb48 = xyz12Torgb48_c;
     c->rgb48Toxyz12 = rgb48Toxyz12_c;
+
+#if ARCH_AARCH64
+    ff_sws_init_xyzdsp_aarch64(c);
+#endif
 }
 
 void ff_update_palette(SwsInternal *c, const uint32_t *pal)
diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h
index 02e625a10e..5c58272664 100644
--- a/libswscale/swscale_internal.h
+++ b/libswscale/swscale_internal.h
@@ -732,6 +732,8 @@ av_cold void ff_sws_init_range_convert_riscv(SwsInternal *c);
 av_cold void ff_sws_init_range_convert_x86(SwsInternal *c);
 
 av_cold void ff_sws_init_xyzdsp(SwsInternal *c);
+av_cold void ff_sws_init_xyzdsp_aarch64(SwsInternal *c);
+
 av_cold int ff_sws_fill_xyztables(SwsInternal *c);
 
 SwsFunc ff_yuv2rgb_init_x86(SwsInternal *c);
-- 
2.52.0


From 295a9343f92bc423aa76f2379382eb8e41b4b3ba Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Thu, 4 Dec 2025 18:42:02 +0100
Subject: [PATCH 221/304] avcodec/tableprint_vlc: Unbreak hardcoded tables

Forgotten in d8ffec5bf9a2803f55cc0822a97b7815f24bee83.
Fixes issue #21102.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/tableprint_vlc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/tableprint_vlc.h b/libavcodec/tableprint_vlc.h
index ab33e90090..e7c6764573 100644
--- a/libavcodec/tableprint_vlc.h
+++ b/libavcodec/tableprint_vlc.h
@@ -28,7 +28,7 @@
 #define ff_dlog(a, ...) while(0)
 #define ff_tlog(a, ...) while(0)
 #define AVUTIL_MEM_H
-#define av_malloc(s) NULL
+#define av_mallocz(s) NULL
 #define av_malloc_array(a, b) NULL
 #define av_realloc_f(p, o, n) NULL
 #define av_free(p) while(0)
-- 
2.52.0


From bc30807a6892fb59ca042906efd1159fd1a51ae0 Mon Sep 17 00:00:00 2001
From: Zhao Zhili <zhilizhao@tencent.com>
Date: Fri, 14 Nov 2025 23:46:04 +0800
Subject: [PATCH 222/304] avcodec/h264_slice: don't force ff_get_format
 unconditionally after flush

h->context_initialized is zero after flush, which triggers call to
ff_get_format unconditionally. ff_get_format can be heavy with
ff_hwaccel_uninit and hwaccel_init. For example, it takes 20 ms on
macOS with videotoolbox. ff_get_format should not be called if
nothing changed. ff_get_format is guarantee to be called at the
first time and when video information changed with
(must_reinit || needs_reinit).

Fix #20760.
---
 libavcodec/h264_slice.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/h264_slice.c b/libavcodec/h264_slice.c
index d91f04fa67..69f70c90bb 100644
--- a/libavcodec/h264_slice.c
+++ b/libavcodec/h264_slice.c
@@ -1150,7 +1150,7 @@ static int h264_init_ps(H264Context *h, const H264SliceContext *sl, int first_sl
         if (flush_changes)
             ff_h264_flush_change(h);
 
-        if ((ret = get_pixel_format(h, 1)) < 0)
+        if ((ret = get_pixel_format(h, must_reinit || needs_reinit)) < 0)
             return ret;
         h->avctx->pix_fmt = ret;
 
-- 
2.52.0


From 3ab51e592ecc4daf7329e9e6a44749591ef41c07 Mon Sep 17 00:00:00 2001
From: Araz Iusubov <Primeadvice@gmail.com>
Date: Fri, 5 Dec 2025 16:32:46 +0100
Subject: [PATCH 223/304] avcodec/amf: fix hw_device_ctx handling

---
 libavcodec/amfdec.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/libavcodec/amfdec.c b/libavcodec/amfdec.c
index 1a2eb9392c..e1a5b71ee6 100644
--- a/libavcodec/amfdec.c
+++ b/libavcodec/amfdec.c
@@ -274,15 +274,16 @@ static int amf_decode_init(AVCodecContext *avctx)
     if (!ctx->in_pkt)
         return AVERROR(ENOMEM);
 
-    if  (avctx->hw_device_ctx && !avctx->hw_frames_ctx) {
+    if  (avctx->hw_device_ctx) {
         AVHWDeviceContext   *hwdev_ctx;
         hwdev_ctx = (AVHWDeviceContext*)avctx->hw_device_ctx->data;
         if (hwdev_ctx->type == AV_HWDEVICE_TYPE_AMF)
         {
             ctx->device_ctx_ref = av_buffer_ref(avctx->hw_device_ctx);
-            avctx->hw_frames_ctx = av_hwframe_ctx_alloc(avctx->hw_device_ctx);
-
-            AMF_GOTO_FAIL_IF_FALSE(avctx, !!avctx->hw_frames_ctx, AVERROR(ENOMEM), "av_hwframe_ctx_alloc failed\n");
+            if (!avctx->hw_frames_ctx) {
+                avctx->hw_frames_ctx = av_hwframe_ctx_alloc(avctx->hw_device_ctx);
+                AMF_GOTO_FAIL_IF_FALSE(avctx, !!avctx->hw_frames_ctx, AVERROR(ENOMEM), "av_hwframe_ctx_alloc failed\n");
+            }
         } else {
             ret = av_hwdevice_ctx_create_derived(&ctx->device_ctx_ref, AV_HWDEVICE_TYPE_AMF, avctx->hw_device_ctx, 0);
             AMF_GOTO_FAIL_IF_FALSE(avctx, ret == 0, ret, "Failed to create derived AMF device context: %s\n", av_err2str(ret));
-- 
2.52.0


From 1f991d2a19a0bb260d747a46724b924e7a73d310 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Fri, 5 Dec 2025 19:41:36 +0100
Subject: [PATCH 224/304] avcodec: allow bypassing frame threading with an
 optional flag

Normally, this function tries to make sure all threads are saturated with
work to do before returning any frames; and will continue requesting packets
until that is the case.

However, this significantly slows down initial decoding latency when only
requesting a single frame (to e.g. configure the filter graph), and also
wastes a lot of unnecessary memory in the event that the user does not intend
to decode more frames until later.

By introducing a new `flags` paramater and a new flag
`AV_CODEC_RECEIVE_FRAME_FLAG_SYNCHRONOUS` to go along with it, we can allow
users to temporarily bypass this logic.
---
 doc/APIchanges                |  4 ++++
 libavcodec/avcodec.c          | 10 ++++++++--
 libavcodec/avcodec.h          | 14 ++++++++++++++
 libavcodec/avcodec_internal.h |  8 +++++---
 libavcodec/decode.c           | 11 ++++++-----
 libavcodec/pthread_frame.c    |  7 ++++++-
 libavcodec/version.h          |  2 +-
 7 files changed, 44 insertions(+), 12 deletions(-)

diff --git a/doc/APIchanges b/doc/APIchanges
index 239137f1f2..f308fd40a2 100644
--- a/doc/APIchanges
+++ b/doc/APIchanges
@@ -2,6 +2,10 @@ The last version increases of all libraries were on 2025-03-28
 
 API changes, most recent first:
 
+2025-12-xx - xxxxxxxxxx - lavc 62.22.100 - avcodec.h
+  Add avcodec_receive_frame2().
+  Add AV_CODEC_RECEIVE_FRAME_FLAG_SYNCHRONOUS.
+
 2025-09-xx - xxxxxxxxxx - lavfi 11.10.100 - buffersrc.h
   Add av_buffersrc_get_status().
 
diff --git a/libavcodec/avcodec.c b/libavcodec/avcodec.c
index dacdfba8da..8b4b305fcd 100644
--- a/libavcodec/avcodec.c
+++ b/libavcodec/avcodec.c
@@ -705,7 +705,8 @@ int avcodec_is_open(AVCodecContext *s)
     return !!s->internal;
 }
 
-int attribute_align_arg avcodec_receive_frame(AVCodecContext *avctx, AVFrame *frame)
+int attribute_align_arg avcodec_receive_frame2(AVCodecContext *avctx,
+                                               AVFrame *frame, unsigned flags)
 {
     av_frame_unref(frame);
 
@@ -713,10 +714,15 @@ int attribute_align_arg avcodec_receive_frame(AVCodecContext *avctx, AVFrame *fr
         return AVERROR(EINVAL);
 
     if (ff_codec_is_decoder(avctx->codec))
-        return ff_decode_receive_frame(avctx, frame);
+        return ff_decode_receive_frame(avctx, frame, flags);
     return ff_encode_receive_frame(avctx, frame);
 }
 
+int avcodec_receive_frame(AVCodecContext *avctx, AVFrame *frame)
+{
+    return avcodec_receive_frame2(avctx, frame, 0);
+}
+
 #define WRAP_CONFIG(allowed_type, field, var, field_type, sentinel_check)   \
     do {                                                                    \
         if (codec->type != (allowed_type))                                  \
diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index 91bd7cb7ef..92c0207cf6 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -415,6 +415,14 @@ typedef struct RcOverride{
  */
 #define AV_GET_ENCODE_BUFFER_FLAG_REF (1 << 0)
 
+/**
+ * The decoder will bypass frame threading and return the next frame as soon as
+ * possible. Note that this may deliver frames earlier than the advertised
+ * `AVCodecContext.delay`. No effect when frame threading is disabled, or on
+ * encoding.
+ */
+#define AV_CODEC_RECEIVE_FRAME_FLAG_SYNCHRONOUS (1 << 0)
+
 /**
  * main external API structure.
  * New fields can be added to the end with minor version bumps.
@@ -2360,6 +2368,7 @@ int avcodec_send_packet(AVCodecContext *avctx, const AVPacket *avpkt);
  *              frame (depending on the decoder type) allocated by the
  *              codec. Note that the function will always call
  *              av_frame_unref(frame) before doing anything else.
+ * @param flags Combination of AV_CODEC_RECEIVE_FRAME_FLAG_* flags.
  *
  * @retval 0                success, a frame was returned
  * @retval AVERROR(EAGAIN)  output is not available in this state - user must
@@ -2370,6 +2379,11 @@ int avcodec_send_packet(AVCodecContext *avctx, const AVPacket *avpkt);
  *                          @ref AV_CODEC_FLAG_RECON_FRAME flag enabled
  * @retval "other negative error code" legitimate decoding errors
  */
+int avcodec_receive_frame2(AVCodecContext *avctx, AVFrame *frame, unsigned flags);
+
+/**
+ * Alias for `avcodec_receive_frame2(avctx, frame, 0)`.
+ */
 int avcodec_receive_frame(AVCodecContext *avctx, AVFrame *frame);
 
 /**
diff --git a/libavcodec/avcodec_internal.h b/libavcodec/avcodec_internal.h
index 184d7b526c..06645e91a0 100644
--- a/libavcodec/avcodec_internal.h
+++ b/libavcodec/avcodec_internal.h
@@ -45,7 +45,8 @@ extern const SideDataMap ff_sd_global_map[];
 /**
  * avcodec_receive_frame() implementation for decoders.
  */
-int ff_decode_receive_frame(struct AVCodecContext *avctx, struct AVFrame *frame);
+int ff_decode_receive_frame(struct AVCodecContext *avctx, struct AVFrame *frame,
+                            unsigned flags);
 
 /**
  * avcodec_receive_frame() implementation for encoders.
@@ -91,9 +92,10 @@ void ff_thread_flush(struct AVCodecContext *avctx);
  * Submit available packets for decoding to worker threads, return a
  * decoded frame if available. Returns AVERROR(EAGAIN) if none is available.
  *
- * Parameters are the same as FFCodec.receive_frame.
+ * Parameters are the same as FFCodec.receive_frame, plus flags.
  */
-int ff_thread_receive_frame(struct AVCodecContext *avctx, AVFrame *frame);
+int ff_thread_receive_frame(struct AVCodecContext *avctx, AVFrame *frame,
+                            unsigned flags);
 
 /**
  * Do the actual decoding and obtain a decoded frame from the decoder, if
diff --git a/libavcodec/decode.c b/libavcodec/decode.c
index f0646901c8..89a55c1cd8 100644
--- a/libavcodec/decode.c
+++ b/libavcodec/decode.c
@@ -645,14 +645,15 @@ int ff_decode_receive_frame_internal(AVCodecContext *avctx, AVFrame *frame)
     return ret;
 }
 
-static int decode_receive_frame_internal(AVCodecContext *avctx, AVFrame *frame)
+static int decode_receive_frame_internal(AVCodecContext *avctx, AVFrame *frame,
+                                         unsigned flags)
 {
     AVCodecInternal *avci = avctx->internal;
     DecodeContext     *dc = decode_ctx(avci);
     int ret, ok;
 
     if (avctx->active_thread_type & FF_THREAD_FRAME)
-        ret = ff_thread_receive_frame(avctx, frame);
+        ret = ff_thread_receive_frame(avctx, frame, flags);
     else
         ret = ff_decode_receive_frame_internal(avctx, frame);
 
@@ -730,7 +731,7 @@ int attribute_align_arg avcodec_send_packet(AVCodecContext *avctx, const AVPacke
         dc->draining_started = 1;
 
     if (!avci->buffer_frame->buf[0] && !dc->draining_started) {
-        ret = decode_receive_frame_internal(avctx, avci->buffer_frame);
+        ret = decode_receive_frame_internal(avctx, avci->buffer_frame, 0);
         if (ret < 0 && ret != AVERROR(EAGAIN) && ret != AVERROR_EOF)
             return ret;
     }
@@ -792,7 +793,7 @@ fail:
     return AVERROR_BUG;
 }
 
-int ff_decode_receive_frame(AVCodecContext *avctx, AVFrame *frame)
+int ff_decode_receive_frame(AVCodecContext *avctx, AVFrame *frame, unsigned flags)
 {
     AVCodecInternal *avci = avctx->internal;
     int ret;
@@ -800,7 +801,7 @@ int ff_decode_receive_frame(AVCodecContext *avctx, AVFrame *frame)
     if (avci->buffer_frame->buf[0]) {
         av_frame_move_ref(frame, avci->buffer_frame);
     } else {
-        ret = decode_receive_frame_internal(avctx, frame);
+        ret = decode_receive_frame_internal(avctx, frame, flags);
         if (ret < 0)
             return ret;
     }
diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c
index ff015c00b3..2115d4204d 100644
--- a/libavcodec/pthread_frame.c
+++ b/libavcodec/pthread_frame.c
@@ -559,7 +559,7 @@ static int submit_packet(PerThreadContext *p, AVCodecContext *user_avctx,
     return 0;
 }
 
-int ff_thread_receive_frame(AVCodecContext *avctx, AVFrame *frame)
+int ff_thread_receive_frame(AVCodecContext *avctx, AVFrame *frame, unsigned flags)
 {
     FrameThreadContext *fctx = avctx->internal->thread_ctx;
     int ret = 0;
@@ -572,6 +572,10 @@ int ff_thread_receive_frame(AVCodecContext *avctx, AVFrame *frame)
     while (!fctx->df.nb_f && !fctx->result) {
         PerThreadContext *p;
 
+        if (fctx->next_decoding != fctx->next_finished &&
+            (flags & AV_CODEC_RECEIVE_FRAME_FLAG_SYNCHRONOUS))
+            goto wait_for_result;
+
         /* get a packet to be submitted to the next thread */
         av_packet_unref(fctx->next_pkt);
         ret = ff_decode_get_packet(avctx, fctx->next_pkt);
@@ -588,6 +592,7 @@ int ff_thread_receive_frame(AVCodecContext *avctx, AVFrame *frame)
             !avctx->internal->draining)
             continue;
 
+    wait_for_result:
         p                   = &fctx->threads[fctx->next_finished];
         fctx->next_finished = (fctx->next_finished + 1) % avctx->thread_count;
 
diff --git a/libavcodec/version.h b/libavcodec/version.h
index da6f3a84ac..9411511e04 100644
--- a/libavcodec/version.h
+++ b/libavcodec/version.h
@@ -29,7 +29,7 @@
 
 #include "version_major.h"
 
-#define LIBAVCODEC_VERSION_MINOR  21
+#define LIBAVCODEC_VERSION_MINOR  22
 #define LIBAVCODEC_VERSION_MICRO 100
 
 #define LIBAVCODEC_VERSION_INT  AV_VERSION_INT(LIBAVCODEC_VERSION_MAJOR, \
-- 
2.52.0


From 8e514c9ffb747a7a506d5dffa5df41ea44e680b8 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Mon, 22 Sep 2025 15:38:06 +0200
Subject: [PATCH 225/304] fftools/ffmpeg_dec: decode the first frame
 synchronously

In an ideal world, a filtergraph like `-i A -i B ... -filter_complex concat`
would only keep resources in memory related to the file currently being output
by the concat filter. So ideally, we'd open and fully decode A, then open and
fully decode B, and so on.

Practically, however, fftools wants to get one frame from each input file in
order to initialize the filter graph (buffersrc parameters). So what happens
currently is that fftools will request a single frame from each input A, B, etc
that is plugged into the filtergraph.

When using frame threading, the design of the decoder (ff_thread_receive_frame)
is that it will not output any frames until we have received enough packets to
saturate all threads. This, however, forces the decoder to buffer at least as
many frames for each input file as we have threads, before outputting anything.

By decoding the first frame synchronously, we avoid this issue and allow
configuring the filter graph more quickly and without wasting excess resources
on frames that will not (yet) be used.
---
 fftools/ffmpeg_dec.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fftools/ffmpeg_dec.c b/fftools/ffmpeg_dec.c
index 2f25265997..a76d4b1ae8 100644
--- a/fftools/ffmpeg_dec.c
+++ b/fftools/ffmpeg_dec.c
@@ -744,11 +744,14 @@ static int packet_decode(DecoderPriv *dp, AVPacket *pkt, AVFrame *frame)
     while (1) {
         FrameData *fd;
         unsigned outputs_mask = 1;
+        unsigned flags = 0;
+        if (!dp->dec.frames_decoded)
+            flags |= AV_CODEC_RECEIVE_FRAME_FLAG_SYNCHRONOUS;
 
         av_frame_unref(frame);
 
         update_benchmark(NULL);
-        ret = avcodec_receive_frame(dec, frame);
+        ret = avcodec_receive_frame2(dec, frame, flags);
         update_benchmark("decode_%s %s", type_desc, dp->parent_name);
 
         if (ret == AVERROR(EAGAIN)) {
-- 
2.52.0


From 322b1fdf6f59e111b0cb0663474a1ef048b2b233 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Fri, 5 Dec 2025 17:57:21 +0100
Subject: [PATCH 226/304] checkasm/sw_xyz2rgb: fix function type
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 tests/checkasm/sw_xyz2rgb.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/checkasm/sw_xyz2rgb.c b/tests/checkasm/sw_xyz2rgb.c
index afffb6c8da..17c5fcce7b 100644
--- a/tests/checkasm/sw_xyz2rgb.c
+++ b/tests/checkasm/sw_xyz2rgb.c
@@ -58,7 +58,7 @@ static void check_xyz12Torgb48le(void)
     LOCAL_ALIGNED_8(uint16_t, dst_ref, [3 * MAX_LINE_SIZE * NUM_LINES]);
     LOCAL_ALIGNED_8(uint16_t, dst_new, [3 * MAX_LINE_SIZE * NUM_LINES]);
 
-    declare_func(void, const SwsContext *, uint8_t *, int, const uint8_t *,
+    declare_func(void, const SwsInternal *, uint8_t *, int, const uint8_t *,
                  int, int, int);
 
     SwsInternal c;
@@ -80,16 +80,16 @@ static void check_xyz12Torgb48le(void)
                 memset(dst_new, 0xFE,
                        3 * sizeof(uint16_t) * MAX_LINE_SIZE * NUM_LINES);
 
-                call_ref((const SwsContext*)&c, (uint8_t *)dst_ref, dst_stride,
+                call_ref(&c, (uint8_t *)dst_ref, dst_stride,
                          (const uint8_t *)src, src_stride, width, height);
-                call_new((const SwsContext*)&c, (uint8_t *)dst_new, dst_stride,
+                call_new(&c, (uint8_t *)dst_new, dst_stride,
                          (const uint8_t *)src, src_stride, width, height);
 
                 checkasm_check(uint16_t, dst_ref, dst_stride, dst_new,
                                dst_stride, width, height, "dst_rgb");
 
                 if (!(width & 3) && height == NUM_LINES) {
-                    bench_new((const SwsContext*)&c, (uint8_t *)dst_new,
+                    bench_new(&c, (uint8_t *)dst_new,
                               dst_stride, (const uint8_t *)src, src_stride,
                               width, height);
                 }
-- 
2.52.0


From 1bb4ec13b08b1a225068b2cdad6519eed77e2853 Mon Sep 17 00:00:00 2001
From: Michael Niedermayer <michael@niedermayer.cc>
Date: Sun, 7 Dec 2025 11:57:13 +0100
Subject: [PATCH 227/304] avcodec/decode: Fix build due to
 ff_thread_receive_frame()

Regression since: 5e56937b742b9110d38ce6fa4c7bdf2d5c355df4

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 libavcodec/decode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/decode.c b/libavcodec/decode.c
index 89a55c1cd8..ca1adfa386 100644
--- a/libavcodec/decode.c
+++ b/libavcodec/decode.c
@@ -218,7 +218,7 @@ fail:
 
 #if !HAVE_THREADS
 #define ff_thread_get_packet(avctx, pkt) (AVERROR_BUG)
-#define ff_thread_receive_frame(avctx, frame) (AVERROR_BUG)
+#define ff_thread_receive_frame(avctx, frame, flags) (AVERROR_BUG)
 #endif
 
 static int decode_get_packet(AVCodecContext *avctx, AVPacket *pkt)
-- 
2.52.0


From 14051ec3f48b897d9ce8bd9762151431b92783b5 Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Sun, 7 Dec 2025 12:41:27 -0300
Subject: [PATCH 228/304] avcodec: rename avcodec_receive_frame2 to
 avcodec_receive_frame_flags

It's a name that communicates its functionality in a better way.
Since the function was introduced very recently, we can safely rename it.

Signed-off-by: James Almer <jamrial@gmail.com>
---
 doc/APIchanges       | 4 ++--
 fftools/ffmpeg_dec.c | 2 +-
 libavcodec/avcodec.c | 4 ++--
 libavcodec/avcodec.h | 4 ++--
 libavcodec/version.h | 2 +-
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/doc/APIchanges b/doc/APIchanges
index f308fd40a2..744c52ff29 100644
--- a/doc/APIchanges
+++ b/doc/APIchanges
@@ -2,8 +2,8 @@ The last version increases of all libraries were on 2025-03-28
 
 API changes, most recent first:
 
-2025-12-xx - xxxxxxxxxx - lavc 62.22.100 - avcodec.h
-  Add avcodec_receive_frame2().
+2025-12-xx - xxxxxxxxxx - lavc 62.22.101 - avcodec.h
+  Add avcodec_receive_frame_flags().
   Add AV_CODEC_RECEIVE_FRAME_FLAG_SYNCHRONOUS.
 
 2025-09-xx - xxxxxxxxxx - lavfi 11.10.100 - buffersrc.h
diff --git a/fftools/ffmpeg_dec.c b/fftools/ffmpeg_dec.c
index a76d4b1ae8..5020684a28 100644
--- a/fftools/ffmpeg_dec.c
+++ b/fftools/ffmpeg_dec.c
@@ -751,7 +751,7 @@ static int packet_decode(DecoderPriv *dp, AVPacket *pkt, AVFrame *frame)
         av_frame_unref(frame);
 
         update_benchmark(NULL);
-        ret = avcodec_receive_frame2(dec, frame, flags);
+        ret = avcodec_receive_frame_flags(dec, frame, flags);
         update_benchmark("decode_%s %s", type_desc, dp->parent_name);
 
         if (ret == AVERROR(EAGAIN)) {
diff --git a/libavcodec/avcodec.c b/libavcodec/avcodec.c
index 8b4b305fcd..de4e083db1 100644
--- a/libavcodec/avcodec.c
+++ b/libavcodec/avcodec.c
@@ -705,7 +705,7 @@ int avcodec_is_open(AVCodecContext *s)
     return !!s->internal;
 }
 
-int attribute_align_arg avcodec_receive_frame2(AVCodecContext *avctx,
+int attribute_align_arg avcodec_receive_frame_flags(AVCodecContext *avctx,
                                                AVFrame *frame, unsigned flags)
 {
     av_frame_unref(frame);
@@ -720,7 +720,7 @@ int attribute_align_arg avcodec_receive_frame2(AVCodecContext *avctx,
 
 int avcodec_receive_frame(AVCodecContext *avctx, AVFrame *frame)
 {
-    return avcodec_receive_frame2(avctx, frame, 0);
+    return avcodec_receive_frame_flags(avctx, frame, 0);
 }
 
 #define WRAP_CONFIG(allowed_type, field, var, field_type, sentinel_check)   \
diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index 92c0207cf6..1a8f77af82 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -2379,10 +2379,10 @@ int avcodec_send_packet(AVCodecContext *avctx, const AVPacket *avpkt);
  *                          @ref AV_CODEC_FLAG_RECON_FRAME flag enabled
  * @retval "other negative error code" legitimate decoding errors
  */
-int avcodec_receive_frame2(AVCodecContext *avctx, AVFrame *frame, unsigned flags);
+int avcodec_receive_frame_flags(AVCodecContext *avctx, AVFrame *frame, unsigned flags);
 
 /**
- * Alias for `avcodec_receive_frame2(avctx, frame, 0)`.
+ * Alias for `avcodec_receive_frame_flags(avctx, frame, 0)`.
  */
 int avcodec_receive_frame(AVCodecContext *avctx, AVFrame *frame);
 
diff --git a/libavcodec/version.h b/libavcodec/version.h
index 9411511e04..9f55381cf1 100644
--- a/libavcodec/version.h
+++ b/libavcodec/version.h
@@ -30,7 +30,7 @@
 #include "version_major.h"
 
 #define LIBAVCODEC_VERSION_MINOR  22
-#define LIBAVCODEC_VERSION_MICRO 100
+#define LIBAVCODEC_VERSION_MICRO 101
 
 #define LIBAVCODEC_VERSION_INT  AV_VERSION_INT(LIBAVCODEC_VERSION_MAJOR, \
                                                LIBAVCODEC_VERSION_MINOR, \
-- 
2.52.0


From 86485e7c0f5269401dbe022e9769765877eb090c Mon Sep 17 00:00:00 2001
From: Marvin Scholz <epirat07@gmail.com>
Date: Wed, 19 Nov 2025 01:41:22 +0100
Subject: [PATCH 229/304] .forgejo/labeler: consistently quote strings

---
 .forgejo/labeler/labeler.yml | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/.forgejo/labeler/labeler.yml b/.forgejo/labeler/labeler.yml
index 9c4721cf29..212dd12a9a 100644
--- a/.forgejo/labeler/labeler.yml
+++ b/.forgejo/labeler/labeler.yml
@@ -1,31 +1,31 @@
 avcodec:
   - changed-files:
-    - any-glob-to-any-file: libavcodec/**
+    - any-glob-to-any-file: 'libavcodec/**'
 
 avdevice:
   - changed-files:
-    - any-glob-to-any-file: libavdevice/**
+    - any-glob-to-any-file: 'libavdevice/**'
 
 avfilter:
   - changed-files:
-    - any-glob-to-any-file: libavfilter/**
+    - any-glob-to-any-file: 'libavfilter/**'
 
 avformat:
   - changed-files:
-    - any-glob-to-any-file: libavformat/**
+    - any-glob-to-any-file: 'libavformat/**'
 
 avutil:
   - changed-files:
-    - any-glob-to-any-file: libavutil/**
+    - any-glob-to-any-file: 'libavutil/**'
 
 swresample:
   - changed-files:
-    - any-glob-to-any-file: libswresample/**
+    - any-glob-to-any-file: 'libswresample/**'
 
 swscale:
   - changed-files:
-    - any-glob-to-any-file: libswscale/**
+    - any-glob-to-any-file: 'libswscale/**'
 
 CLI:
   - changed-files:
-    - any-glob-to-any-file: fftools/**
+    - any-glob-to-any-file: 'fftools/**'
-- 
2.52.0


From c526251a8e416248b44696d38c2b91edebd8e5b4 Mon Sep 17 00:00:00 2001
From: averne <averne381@gmail.com>
Date: Sat, 6 Dec 2025 19:45:18 +0100
Subject: [PATCH 230/304] vulkan/prores: use vkCmdClearColorImage

The VK spec forbids using clear commands on YUV images,
so we need to allocate separate per-plane images.
This removes the need for a separate reset shader.
---
 libavcodec/vulkan/Makefile          |   1 -
 libavcodec/vulkan/prores_reset.comp |  38 ----------
 libavcodec/vulkan_decode.c          |  28 +++++--
 libavcodec/vulkan_prores.c          | 109 ++++++++++++----------------
 4 files changed, 69 insertions(+), 107 deletions(-)
 delete mode 100644 libavcodec/vulkan/prores_reset.comp

diff --git a/libavcodec/vulkan/Makefile b/libavcodec/vulkan/Makefile
index 26e8e147c2..9d1349e0e3 100644
--- a/libavcodec/vulkan/Makefile
+++ b/libavcodec/vulkan/Makefile
@@ -19,7 +19,6 @@ OBJS-$(CONFIG_PRORES_RAW_VULKAN_HWACCEL) += vulkan/common.o \
                                             vulkan/prores_raw_idct.o
 
 OBJS-$(CONFIG_PRORES_VULKAN_HWACCEL) += vulkan/common.o \
-                                        vulkan/prores_reset.o \
                                         vulkan/prores_vld.o \
                                         vulkan/prores_idct.o
 
diff --git a/libavcodec/vulkan/prores_reset.comp b/libavcodec/vulkan/prores_reset.comp
deleted file mode 100644
index 51cbc6b3d9..0000000000
--- a/libavcodec/vulkan/prores_reset.comp
+++ /dev/null
@@ -1,38 +0,0 @@
-/*
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-void main(void)
-{
-    uvec3 gid = gl_GlobalInvocationID;
-#ifndef INTERLACED
-    ivec2 pos = ivec2(gid);
-#else
-    ivec2 pos = ivec2(gid.x, (gid.y << 1) + bottom_field);
-#endif
-
-    /* Clear luma plane */
-    imageStore(dst[0], pos, uvec4(0));
-
-    /* Clear chroma plane */
-    if (gid.x < mb_width << (4 - log2_chroma_w)) {
-        imageStore(dst[1], pos, uvec4(0));
-        imageStore(dst[2], pos, uvec4(0));
-    }
-
-    /* Alpha plane doesn't need a clear because it is not sparsely encoded */
-}
diff --git a/libavcodec/vulkan_decode.c b/libavcodec/vulkan_decode.c
index d6f6ec8c3b..654149851d 100644
--- a/libavcodec/vulkan_decode.c
+++ b/libavcodec/vulkan_decode.c
@@ -1088,7 +1088,6 @@ static void free_profile_data(AVHWFramesContext *hwfc)
 
 int ff_vk_frame_params(AVCodecContext *avctx, AVBufferRef *hw_frames_ctx)
 {
-    VkFormat vkfmt = VK_FORMAT_UNDEFINED;
     int err, dedicated_dpb;
     AVHWFramesContext *frames_ctx = (AVHWFramesContext*)hw_frames_ctx->data;
     AVVulkanFramesContext *hwfc = frames_ctx->hwctx;
@@ -1107,7 +1106,8 @@ int ff_vk_frame_params(AVCodecContext *avctx, AVBufferRef *hw_frames_ctx)
             return AVERROR(ENOMEM);
 
         err = vulkan_decode_get_profile(avctx, hw_frames_ctx,
-                                        &frames_ctx->sw_format, &vkfmt,
+                                        &frames_ctx->sw_format,
+                                        &hwfc->format[0],
                                         prof, &dedicated_dpb);
         if (err < 0) {
             av_free(prof);
@@ -1119,6 +1119,7 @@ int ff_vk_frame_params(AVCodecContext *avctx, AVBufferRef *hw_frames_ctx)
 
         hwfc->create_pnext = &prof->profile_list;
     } else {
+        hwfc->format[0] = VK_FORMAT_UNDEFINED;
         switch (frames_ctx->sw_format) {
         case AV_PIX_FMT_GBRAP16:
             /* This should be more efficient for downloading and using */
@@ -1141,6 +1142,20 @@ int ff_vk_frame_params(AVCodecContext *avctx, AVBufferRef *hw_frames_ctx)
             /* mpv has issues with bgr0 mapping, so just remap it */
             frames_ctx->sw_format = AV_PIX_FMT_RGB0;
             break;
+        case AV_PIX_FMT_YUVA422P10: /* ProRes needs to clear the input image, which is not possible on YUV formats */
+        case AV_PIX_FMT_YUVA444P10:
+        case AV_PIX_FMT_YUVA422P12:
+        case AV_PIX_FMT_YUVA444P12:
+            hwfc->format[3] = VK_FORMAT_R16_UNORM;
+            /* fallthrough */
+        case AV_PIX_FMT_YUV422P10:
+        case AV_PIX_FMT_YUV444P10:
+        case AV_PIX_FMT_YUV422P12:
+        case AV_PIX_FMT_YUV444P12:
+            hwfc->format[0] = VK_FORMAT_R16_UNORM;
+            hwfc->format[1] = VK_FORMAT_R16_UNORM;
+            hwfc->format[2] = VK_FORMAT_R16_UNORM;
+            break;
         default:
             break;
         }
@@ -1151,11 +1166,10 @@ int ff_vk_frame_params(AVCodecContext *avctx, AVBufferRef *hw_frames_ctx)
     frames_ctx->height = FFALIGN(avctx->coded_height, 1 << pdesc->log2_chroma_h);
     frames_ctx->format = AV_PIX_FMT_VULKAN;
 
-    hwfc->format[0]    = vkfmt;
-    hwfc->tiling       = VK_IMAGE_TILING_OPTIMAL;
-    hwfc->usage        = VK_IMAGE_USAGE_TRANSFER_SRC_BIT |
-                         VK_IMAGE_USAGE_STORAGE_BIT      |
-                         VK_IMAGE_USAGE_SAMPLED_BIT;
+    hwfc->tiling = VK_IMAGE_TILING_OPTIMAL;
+    hwfc->usage  = VK_IMAGE_USAGE_TRANSFER_SRC_BIT |
+                   VK_IMAGE_USAGE_STORAGE_BIT      |
+                   VK_IMAGE_USAGE_SAMPLED_BIT;
 
     if (prof) {
         FFVulkanDecodeShared *ctx;
diff --git a/libavcodec/vulkan_prores.c b/libavcodec/vulkan_prores.c
index 90b8610817..12e66876f2 100644
--- a/libavcodec/vulkan_prores.c
+++ b/libavcodec/vulkan_prores.c
@@ -24,7 +24,6 @@
 #include "libavutil/vulkan_spirv.h"
 
 extern const char *ff_source_common_comp;
-extern const char *ff_source_prores_reset_comp;
 extern const char *ff_source_prores_vld_comp;
 extern const char *ff_source_prores_idct_comp;
 
@@ -46,7 +45,6 @@ typedef struct ProresVulkanDecodePicture {
 } ProresVulkanDecodePicture;
 
 typedef struct ProresVulkanDecodeContext {
-    FFVulkanShader reset;
     FFVulkanShader vld;
     FFVulkanShader idct;
 
@@ -157,12 +155,14 @@ static int vk_prores_end_frame(AVCodecContext *avctx)
     ProresVulkanDecodeContext *pv = ctx->sd_ctx;
     ProresVulkanDecodePicture *pp = pr->hwaccel_picture_private;
     FFVulkanDecodePicture     *vp = &pp->vp;
+    AVFrame                    *f = pr->frame;
+    AVVkFrame                *vkf = (AVVkFrame *)f->data[0];
 
     ProresVkParameters pd;
     FFVkBuffer *slice_data, *metadata;
     VkImageMemoryBarrier2 img_bar[AV_NUM_DATA_POINTERS];
     VkBufferMemoryBarrier2 buf_bar[2];
-    int nb_img_bar = 0, nb_buf_bar = 0, err;
+    int nb_img_bar = 0, nb_buf_bar = 0, nb_imgs, i, err;
     const AVPixFmtDescriptor *pix_desc;
 
     if (!pp->slice_num)
@@ -199,12 +199,11 @@ static int vk_prores_end_frame(AVCodecContext *avctx)
     RET(ff_vk_exec_start(&ctx->s, exec));
 
     /* Prepare deps */
-    RET(ff_vk_exec_add_dep_frame(&ctx->s, exec, pr->frame,
+    RET(ff_vk_exec_add_dep_frame(&ctx->s, exec, f,
                                  VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
                                  VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT));
 
-    RET(ff_vk_exec_mirror_sem_value(&ctx->s, exec, &vp->sem, &vp->sem_value,
-                                    pr->frame));
+    RET(ff_vk_exec_mirror_sem_value(&ctx->s, exec, &vp->sem, &vp->sem_value, f));
 
     /* Transfer ownership to the exec context */
     RET(ff_vk_exec_add_dep_buf(&ctx->s, exec, &vp->slices_buf, 1, 0));
@@ -212,11 +211,47 @@ static int vk_prores_end_frame(AVCodecContext *avctx)
     RET(ff_vk_exec_add_dep_buf(&ctx->s, exec, &pp->metadata_buf, 1, 0));
     pp->metadata_buf = NULL;
 
-    /* Input barrier */
-    ff_vk_frame_barrier(&ctx->s, exec, pr->frame, img_bar, &nb_img_bar,
-                        VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
+    vkf->layout[0] = VK_IMAGE_LAYOUT_UNDEFINED;
+    vkf->access[0] = VK_ACCESS_2_NONE;
+
+    nb_imgs = ff_vk_count_images(vkf);
+
+    if (pr->first_field) {
+        /* Input barrier */
+        ff_vk_frame_barrier(&ctx->s, exec, f, img_bar, &nb_img_bar,
+                            VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
+                            VK_PIPELINE_STAGE_2_CLEAR_BIT,
+                            VK_ACCESS_2_TRANSFER_WRITE_BIT,
+                            VK_IMAGE_LAYOUT_GENERAL,
+                            VK_QUEUE_FAMILY_IGNORED);
+
+        vk->CmdPipelineBarrier2(exec->buf, &(VkDependencyInfo) {
+            .sType                    = VK_STRUCTURE_TYPE_DEPENDENCY_INFO,
+            .pBufferMemoryBarriers    = buf_bar,
+            .bufferMemoryBarrierCount = nb_buf_bar,
+            .pImageMemoryBarriers     = img_bar,
+            .imageMemoryBarrierCount  = nb_img_bar,
+        });
+        nb_img_bar = nb_buf_bar = 0;
+
+        /* Clear the input image since the vld shader does sparse writes, except for alpha */
+        for (i = 0; i < FFMIN(nb_imgs, 3); ++i) {
+            vk->CmdClearColorImage(exec->buf, vkf->img[i],
+                                   VK_IMAGE_LAYOUT_GENERAL,
+                                   &((VkClearColorValue) { 0 }),
+                                   1, &((VkImageSubresourceRange) {
+                                       .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
+                                       .levelCount = 1,
+                                       .layerCount = 1,
+                                   }));
+        }
+    }
+
+    /* Input barrier, or synchronization between clear and vld shader */
+    ff_vk_frame_barrier(&ctx->s, exec, f, img_bar, &nb_img_bar,
+                        pr->first_field ? VK_PIPELINE_STAGE_2_CLEAR_BIT : VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
                         VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT,
-                        VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_SHADER_WRITE_BIT,
+                        VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT,
                         VK_IMAGE_LAYOUT_GENERAL,
                         VK_QUEUE_FAMILY_IGNORED);
 
@@ -244,37 +279,6 @@ static int vk_prores_end_frame(AVCodecContext *avctx)
     });
     nb_img_bar = nb_buf_bar = 0;
 
-    /* Reset */
-    ff_vk_shader_update_img_array(&ctx->s, exec, &pv->reset,
-                                  pr->frame, vp->view.out,
-                                  0, 0,
-                                  VK_IMAGE_LAYOUT_GENERAL,
-                                  VK_NULL_HANDLE);
-
-    ff_vk_exec_bind_shader(&ctx->s, exec, &pv->reset);
-    ff_vk_shader_update_push_const(&ctx->s, exec, &pv->reset,
-                                   VK_SHADER_STAGE_COMPUTE_BIT,
-                                   0, sizeof(pd), &pd);
-
-    vk->CmdDispatch(exec->buf, pr->mb_width << 1, pr->mb_height << 1, 1);
-
-    /* Input frame barrier after reset */
-    ff_vk_frame_barrier(&ctx->s, exec, pr->frame, img_bar, &nb_img_bar,
-                        VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT,
-                        VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT,
-                        VK_ACCESS_SHADER_WRITE_BIT,
-                        VK_IMAGE_LAYOUT_GENERAL,
-                        VK_QUEUE_FAMILY_IGNORED);
-
-    vk->CmdPipelineBarrier2(exec->buf, &(VkDependencyInfo) {
-        .sType                    = VK_STRUCTURE_TYPE_DEPENDENCY_INFO,
-        .pBufferMemoryBarriers    = buf_bar,
-        .bufferMemoryBarrierCount = nb_buf_bar,
-        .pImageMemoryBarriers     = img_bar,
-        .imageMemoryBarrierCount  = nb_img_bar,
-    });
-    nb_img_bar = nb_buf_bar = 0;
-
     /* Entropy decode */
     ff_vk_shader_update_desc_buffer(&ctx->s, exec, &pv->vld,
                                     0, 0, 0,
@@ -287,7 +291,7 @@ static int vk_prores_end_frame(AVCodecContext *avctx)
                                     pp->mb_params_sz,
                                     VK_FORMAT_UNDEFINED);
     ff_vk_shader_update_img_array(&ctx->s, exec, &pv->vld,
-                                  pr->frame, vp->view.out,
+                                  f, vp->view.out,
                                   0, 2,
                                   VK_IMAGE_LAYOUT_GENERAL,
                                   VK_NULL_HANDLE);
@@ -301,7 +305,7 @@ static int vk_prores_end_frame(AVCodecContext *avctx)
                     3 + !!pr->alpha_info);
 
     /* Synchronize vld and idct shaders */
-    ff_vk_frame_barrier(&ctx->s, exec, pr->frame, img_bar, &nb_img_bar,
+    ff_vk_frame_barrier(&ctx->s, exec, f, img_bar, &nb_img_bar,
                         VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT,
                         VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT,
                         VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_SHADER_WRITE_BIT,
@@ -339,7 +343,7 @@ static int vk_prores_end_frame(AVCodecContext *avctx)
                                     pp->mb_params_sz,
                                     VK_FORMAT_UNDEFINED);
     ff_vk_shader_update_img_array(&ctx->s, exec, &pv->idct,
-                                  pr->frame, vp->view.out,
+                                  f, vp->view.out,
                                   0, 1,
                                   VK_IMAGE_LAYOUT_GENERAL,
                                   VK_NULL_HANDLE);
@@ -434,7 +438,6 @@ static void vk_decode_prores_uninit(FFVulkanDecodeShared *ctx)
 {
     ProresVulkanDecodeContext *pv = ctx->sd_ctx;
 
-    ff_vk_shader_free(&ctx->s, &pv->reset);
     ff_vk_shader_free(&ctx->s, &pv->vld);
     ff_vk_shader_free(&ctx->s, &pv->idct);
 
@@ -478,22 +481,6 @@ static int vk_decode_prores_init(AVCodecContext *avctx)
 
     ctx->sd_ctx_free = vk_decode_prores_uninit;
 
-    desc_set = (FFVulkanDescriptorSetBinding []) {
-        {
-            .name       = "dst",
-            .type       = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
-            .dimensions = 2,
-            .mem_layout = ff_vk_shader_rep_fmt(out_frames_ctx->sw_format,
-            FF_VK_REP_NATIVE),
-            .mem_quali  = "writeonly",
-            .elems      = av_pix_fmt_count_planes(out_frames_ctx->sw_format),
-            .stages     = VK_SHADER_STAGE_COMPUTE_BIT,
-        },
-    };
-    RET(init_shader(avctx, &ctx->s, &ctx->exec_pool, spv, &pv->reset,
-                    "prores_dec_reset", "main", desc_set, 1,
-                    ff_source_prores_reset_comp, 0x080801, pr->frame_type != 0));
-
     desc_set = (FFVulkanDescriptorSetBinding []) {
         {
             .name        = "slice_offsets_buf",
-- 
2.52.0


From d7973567adf6d23ead1f8c7158720282ec416f0d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Denis-Courmont?= <remi@remlab.net>
Date: Sat, 8 Jun 2024 23:08:21 +0300
Subject: [PATCH 231/304] lavc/mpv_unquantize: R-V V H.263 DCT unquantize

SpacemiT X60:
dct_unquantize_h263_inter_c:                           417.8 ( 1.00x)
dct_unquantize_h263_inter_rvv_i32:                      66.0 ( 6.33x)
dct_unquantize_h263_intra_c:                           140.2 ( 1.00x)
dct_unquantize_h263_intra_rvv_i32:                      67.7 ( 2.07x)

Note that the C benchmarks are not stable, depending heavily on the
number of coefficients picked by the RNG. The R-V V benchmarks are
however very stable and generally better than C's.
---
 libavcodec/mpegvideo_unquantize.c |  2 +
 libavcodec/mpegvideo_unquantize.h |  1 +
 libavcodec/riscv/Makefile         |  2 +
 libavcodec/riscv/mpegvideo_init.c | 62 +++++++++++++++++++++++++++++++
 libavcodec/riscv/mpegvideo_rvv.S  | 53 ++++++++++++++++++++++++++
 5 files changed, 120 insertions(+)
 create mode 100644 libavcodec/riscv/mpegvideo_init.c
 create mode 100644 libavcodec/riscv/mpegvideo_rvv.S

diff --git a/libavcodec/mpegvideo_unquantize.c b/libavcodec/mpegvideo_unquantize.c
index 9297c80b47..054f5c0187 100644
--- a/libavcodec/mpegvideo_unquantize.c
+++ b/libavcodec/mpegvideo_unquantize.c
@@ -280,6 +280,8 @@ av_cold void ff_mpv_unquantize_init(MPVUnquantDSPContext *s,
     ff_mpv_unquantize_init_arm(s, bitexact);
 #elif ARCH_PPC
     ff_mpv_unquantize_init_ppc(s, bitexact);
+#elif ARCH_RISCV
+    ff_mpv_unquantize_init_riscv(s, bitexact);
 #elif ARCH_X86
     ff_mpv_unquantize_init_x86(s, bitexact);
 #elif ARCH_MIPS
diff --git a/libavcodec/mpegvideo_unquantize.h b/libavcodec/mpegvideo_unquantize.h
index 1a43f467c6..50319d7ad3 100644
--- a/libavcodec/mpegvideo_unquantize.h
+++ b/libavcodec/mpegvideo_unquantize.h
@@ -55,6 +55,7 @@ void ff_mpv_unquantize_init(MPVUnquantDSPContext *s,
 void ff_mpv_unquantize_init_arm (MPVUnquantDSPContext *s, int bitexact);
 void ff_mpv_unquantize_init_neon(MPVUnquantDSPContext *s, int bitexact);
 void ff_mpv_unquantize_init_ppc (MPVUnquantDSPContext *s, int bitexact);
+void ff_mpv_unquantize_init_riscv(MPVUnquantDSPContext *s, int bitexact);
 void ff_mpv_unquantize_init_x86 (MPVUnquantDSPContext *s, int bitexact);
 void ff_mpv_unquantize_init_mips(MPVUnquantDSPContext *s, int bitexact,
                                  int q_scale_type);
diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile
index 736f873fe8..8c1f2f5f6e 100644
--- a/libavcodec/riscv/Makefile
+++ b/libavcodec/riscv/Makefile
@@ -51,6 +51,8 @@ OBJS-$(CONFIG_LPC) += riscv/lpc_init.o
 RVV-OBJS-$(CONFIG_LPC) += riscv/lpc_rvv.o
 OBJS-$(CONFIG_ME_CMP) += riscv/me_cmp_init.o
 RVV-OBJS-$(CONFIG_ME_CMP) += riscv/me_cmp_rvv.o
+OBJS-$(CONFIG_MPEGVIDEO) += riscv/mpegvideo_init.o
+RVV-OBJS-$(CONFIG_MPEGVIDEO) += riscv/mpegvideo_rvv.o
 OBJS-$(CONFIG_MPEGVIDEOENCDSP) += riscv/mpegvideoencdsp_init.o
 RVV-OBJS-$(CONFIG_MPEGVIDEOENCDSP) += riscv/mpegvideoencdsp_rvv.o
 OBJS-$(CONFIG_OPUS_DECODER) += riscv/opusdsp_init.o
diff --git a/libavcodec/riscv/mpegvideo_init.c b/libavcodec/riscv/mpegvideo_init.c
new file mode 100644
index 0000000000..418b91f437
--- /dev/null
+++ b/libavcodec/riscv/mpegvideo_init.c
@@ -0,0 +1,62 @@
+/*
+ * Copyright © 2022 Rémi Denis-Courmont.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config.h"
+
+#include "libavutil/attributes.h"
+#include "libavutil/cpu.h"
+#include "libavcodec/mpegvideo.h"
+#include "libavcodec/mpegvideo_unquantize.h"
+
+void ff_h263_dct_unquantize_intra_rvv(const MPVContext *s, int16_t *block,
+                                      ptrdiff_t len, int qscale, int aic);
+void ff_h263_dct_unquantize_inter_rvv(const MPVContext *s, int16_t *block,
+                                      ptrdiff_t len, int qscale);
+
+static void dct_unquantize_h263_intra_rvv(const MPVContext *s,
+                                          int16_t *block, int n, int qscale)
+{
+    if (!s->h263_aic)
+        block[0] *= (n < 4) ? s->y_dc_scale : s->c_dc_scale;
+
+    n = s->ac_pred ? 63
+                   : s->intra_scantable.raster_end[s->block_last_index[n]];
+    ff_h263_dct_unquantize_intra_rvv(s, block, n, qscale, s->h263_aic);
+}
+
+static void dct_unquantize_h263_inter_rvv(const MPVContext *s,
+                                          int16_t *block, int n, int qscale)
+{
+    n = s->inter_scantable.raster_end[s->block_last_index[n]];
+    ff_h263_dct_unquantize_inter_rvv(s, block, n, qscale);
+}
+
+av_cold
+void ff_mpv_unquantize_init_riscv(MPVUnquantDSPContext *c, int bitexact)
+{
+#if HAVE_RVV
+    int flags = av_get_cpu_flags();
+
+    if ((flags & AV_CPU_FLAG_RVV_I32) && (flags & AV_CPU_FLAG_RVB)) {
+        c->dct_unquantize_h263_intra = dct_unquantize_h263_intra_rvv;
+        c->dct_unquantize_h263_inter = dct_unquantize_h263_inter_rvv;
+    }
+#endif
+}
diff --git a/libavcodec/riscv/mpegvideo_rvv.S b/libavcodec/riscv/mpegvideo_rvv.S
new file mode 100644
index 0000000000..5312413db4
--- /dev/null
+++ b/libavcodec/riscv/mpegvideo_rvv.S
@@ -0,0 +1,53 @@
+/*
+ * Copyright © 2024 Rémi Denis-Courmont.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/riscv/asm.S"
+
+func ff_h263_dct_unquantize_intra_rvv
+        lpad    0
+        addi    a1, a1, 2
+        beqz    a4, 1f
+        slli    a3, a3, 1
+        mv      a4, zero
+        j       2f
+endfunc
+
+func ff_h263_dct_unquantize_inter_rvv, zve32x, zba
+        lpad    0
+        addi    a2, a2, 1
+1:
+        addi    a4, a3, -1
+        slli    a3, a3, 1
+        ori     a4, a4, 1
+2:
+        vsetvli t0, a2, e16, m8, ta, mu
+        vle16.v v8, (a1)
+        sub     a2, a2, t0
+        vmv.v.x v16, a4
+        vmslt.vi    v0, v8, 0
+        vneg.v  v16, v16, v0.t
+        vmsne.vi    v0, v8, 0
+        vmadd.vx    v8, a3, v16, v0.t
+        vse16.v v8, (a1)
+        sh1add  a1, t0, a1
+        bnez    a2, 2b
+
+        ret
+endfunc
-- 
2.52.0


From 88b0dd455533c76c6e2b637e18e3a663ab4919f5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Denis-Courmont?= <remi@remlab.net>
Date: Sat, 29 Nov 2025 17:46:55 +0200
Subject: [PATCH 232/304] lavc/h264idct: R-V V 8-bit h264_luma_dc_dequant_idct

This does not improve performance with current hardware due to the poor
performance of segmented accesses. Performance should be slightly better
with expensive or near-future hardware that I don't have, however it is
still limited by two other factors:
- There are only 4 elements.
- The final stores are necessarily indexed and hit multiple cache lines,
  thus as slow as scalar.
---
 libavcodec/riscv/Makefile               |  2 +-
 libavcodec/riscv/h264dsp_init.c         |  7 +-
 libavcodec/riscv/h264idct_dequant_rvv.S | 86 +++++++++++++++++++++++++
 3 files changed, 93 insertions(+), 2 deletions(-)
 create mode 100644 libavcodec/riscv/h264idct_dequant_rvv.S

diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile
index 8c1f2f5f6e..38021f873f 100644
--- a/libavcodec/riscv/Makefile
+++ b/libavcodec/riscv/Makefile
@@ -32,7 +32,7 @@ OBJS-$(CONFIG_H264CHROMA) += riscv/h264_chroma_init_riscv.o
 RVV-OBJS-$(CONFIG_H264CHROMA) += riscv/h264_mc_chroma.o
 OBJS-$(CONFIG_H264DSP) += riscv/h264dsp_init.o
 RVV-OBJS-$(CONFIG_H264DSP) += riscv/h264addpx_rvv.o riscv/h264dsp_rvv.o \
-                              riscv/h264idct_rvv.o
+                              riscv/h264idct_rvv.o riscv/h264idct_dequant_rvv.o
 OBJS-$(CONFIG_H264QPEL) += riscv/h264qpel_init.o
 RVV-OBJS-$(CONFIG_H264QPEL) += riscv/h264qpel_rvv.o
 OBJS-$(CONFIG_HEVC_DECODER) += riscv/hevcdsp_init.o
diff --git a/libavcodec/riscv/h264dsp_init.c b/libavcodec/riscv/h264dsp_init.c
index f214486bbe..7ab8d38698 100644
--- a/libavcodec/riscv/h264dsp_init.c
+++ b/libavcodec/riscv/h264dsp_init.c
@@ -80,7 +80,8 @@ void ff_h264_idct4_add8_##depth##_rvv(uint8_t **d, const int *soffset, \
                                       const uint8_t nnzc[5 * 8]); \
 void ff_h264_idct4_add8_422_##depth##_rvv(uint8_t **d, const int *soffset, \
                                           int16_t *s, int stride, \
-                                          const uint8_t nnzc[5 * 8]);
+                                          const uint8_t nnzc[5 * 8]); \
+void ff_h264_luma_dc_dequant_idct_##depth##_rvv(int16_t *d, int16_t *s, int q);
 
 IDCT_DEPTH(8)
 IDCT_DEPTH(9)
@@ -174,6 +175,10 @@ av_cold void ff_h264dsp_init_riscv(H264DSPContext *dsp, const int bit_depth,
                     dsp->h264_idct_add8   = ff_h264_idct4_add8_422_8_rvv;
 #  endif
             }
+
+            dsp->h264_luma_dc_dequant_idct =
+                ff_h264_luma_dc_dequant_idct_8_rvv;
+
             if (flags & AV_CPU_FLAG_RVV_I64) {
                 dsp->h264_add_pixels8_clear = ff_h264_add_pixels8_8_rvv;
                 if (flags & AV_CPU_FLAG_RVB)
diff --git a/libavcodec/riscv/h264idct_dequant_rvv.S b/libavcodec/riscv/h264idct_dequant_rvv.S
new file mode 100644
index 0000000000..73a68a28ab
--- /dev/null
+++ b/libavcodec/riscv/h264idct_dequant_rvv.S
@@ -0,0 +1,86 @@
+/*
+ * SPDX-License-Identifier: BSD-2-Clause
+ *
+ * Copyright © 2025 Rémi Denis-Courmont.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *    this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright notice,
+ *    this list of conditions and the following disclaimer in the documentation
+ *    and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "libavutil/riscv/asm.S"
+
+const offsets_8, 1
+        .short 0, 64, 256, 320
+endconst
+
+func ff_h264_luma_dc_dequant_idct_8_rvv, zve32x
+        lpad   0
+        csrwi  vxrm, 0
+        vsetivli    zero, 4, e16, mf2, ta, ma
+        vlseg4e16.v v8, (a1)
+        vwadd.vv    v16, v8, v9     # z0
+        addi    t1, sp, 4 * 4 * -3
+        vwadd.vv    v19, v10, v11   # z3
+        addi    t2, sp, 4 * 4 * -2
+        vwsub.vv    v17, v8, v9     # z1
+        addi    t3, sp, 4 * 4 * -1
+        vwsub.vv    v18, v10, v11   # z2
+        vsetvli zero, zero, e32, m1, ta, ma
+        vadd.vv v8, v16, v19
+        addi    sp, sp, 4 * 4 * -4
+        vsub.vv v9, v16, v19
+        vsub.vv v10, v17, v18
+        vadd.vv v11, v17, v18
+        vsseg4e32.v v8, (sp)
+        vle32.v v8, (sp)
+        vle32.v v9, (t1)
+        vle32.v v10, (t2)
+        vle32.v v11, (t3)
+        vadd.vv v16, v8, v10    # z0
+        addi    sp, sp, 4 * 4 * 4
+        vadd.vv v19, v9, v11    # z3
+        lla     t0, offsets_8
+        vsub.vv v17, v8, v10    # z1
+        vsub.vv v18, v9, v11    # z2
+        vadd.vv v8, v16, v19
+        vadd.vv v9, v17, v18
+        vsub.vv v10, v17, v18
+        vsub.vv v11, v16, v19
+        vle16.v v24, (t0)
+        vmul.vx v8, v8, a2
+        vmul.vx v9, v9, a2
+        vmul.vx v10, v10, a2
+        vmul.vx v11, v11, a2
+        vsetvli zero, zero, e16, mf2, ta, ma
+        vnclip.wi   v16, v8, 8
+        addi    t1, a0, 2 * 16 * 1
+        vnclip.wi   v17, v9, 8
+        addi    t2, a0, 2 * 16 * 4
+        vnclip.wi   v18, v10, 8
+        addi    t3, a0, 2 * 16 * 5
+        vnclip.wi   v19, v11, 8
+        vsuxei16.v  v16, (a0), v24
+        vsuxei16.v  v17, (t1), v24
+        vsuxei16.v  v18, (t2), v24
+        vsuxei16.v  v19, (t3), v24
+        ret
+endfunc
-- 
2.52.0


From ee4c811d19e0481cb8128ea4f251d8085f74fd9b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Denis-Courmont?= <remi@remlab.net>
Date: Sat, 29 Nov 2025 22:51:01 +0200
Subject: [PATCH 233/304] lavc/h264idct: R-V V 9-bit h264_luma_dc_dequant_idct

Note that, like the C reference, the same function can be used for
larger bit depths.
---
 libavcodec/riscv/h264dsp_init.c         |  5 ++-
 libavcodec/riscv/h264idct_dequant_rvv.S | 55 +++++++++++++++++++++++++
 2 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/libavcodec/riscv/h264dsp_init.c b/libavcodec/riscv/h264dsp_init.c
index 7ab8d38698..06cb3c59de 100644
--- a/libavcodec/riscv/h264dsp_init.c
+++ b/libavcodec/riscv/h264dsp_init.c
@@ -189,8 +189,11 @@ av_cold void ff_h264dsp_init_riscv(H264DSPContext *dsp, const int bit_depth,
 
 #define IDCT_DEPTH(depth) \
         if (bit_depth == depth) { \
-            if (zvl128b) \
+            if (zvl128b) { \
                 dsp->h264_idct_add = ff_h264_idct_add_##depth##_rvv; \
+                dsp->h264_luma_dc_dequant_idct = \
+                    ff_h264_luma_dc_dequant_idct_9_rvv; \
+            } \
             if (flags & AV_CPU_FLAG_RVB) \
                 dsp->h264_idct8_add = ff_h264_idct8_add_##depth##_rvv; \
             if (zvl128b && (flags & AV_CPU_FLAG_RVB)) { \
diff --git a/libavcodec/riscv/h264idct_dequant_rvv.S b/libavcodec/riscv/h264idct_dequant_rvv.S
index 73a68a28ab..bc49ca6ad4 100644
--- a/libavcodec/riscv/h264idct_dequant_rvv.S
+++ b/libavcodec/riscv/h264idct_dequant_rvv.S
@@ -84,3 +84,58 @@ func ff_h264_luma_dc_dequant_idct_8_rvv, zve32x
         vsuxei16.v  v19, (t3), v24
         ret
 endfunc
+
+const offsets_9, 1
+        .short 0, 128, 512, 640
+endconst
+
+func ff_h264_luma_dc_dequant_idct_9_rvv, zve32x
+        lpad   0
+        csrwi  vxrm, 0
+        vsetivli    zero, 4, e32, m1, ta, ma
+        vlseg4e32.v v8, (a1)
+        vadd.vv v16, v8, v9     # z0
+        addi    t1, sp, 4 * 4 * -3
+        vadd.vv v19, v10, v11   # z3
+        addi    t2, sp, 4 * 4 * -2
+        vsub.vv v17, v8, v9     # z1
+        addi    t3, sp, 4 * 4 * -1
+        vsub.vv v18, v10, v11   # z2
+        vadd.vv v8, v16, v19
+        addi    sp, sp, 4 * 4 * -4
+        vsub.vv v9, v16, v19
+        vsub.vv v10, v17, v18
+        vadd.vv v11, v17, v18
+        vsseg4e32.v v8, (sp)
+        vle32.v v8, (sp)
+        vle32.v v9, (t1)
+        vle32.v v10, (t2)
+        vle32.v v11, (t3)
+        vadd.vv v16, v8, v10    # z0
+        addi    sp, sp, 4 * 4 * 4
+        vadd.vv v19, v9, v11    # z3
+        lla     t0, offsets_9
+        vsub.vv v17, v8, v10    # z1
+        vsub.vv v18, v9, v11    # z2
+        vadd.vv v8, v16, v19
+        vadd.vv v9, v17, v18
+        vsub.vv v10, v17, v18
+        vsub.vv v11, v16, v19
+        vle16.v v24, (t0)
+        vmul.vx v8, v8, a2
+        vmul.vx v9, v9, a2
+        vmul.vx v10, v10, a2
+        vmul.vx v11, v11, a2
+        vssra.vi    v16, v8, 8
+        addi    t1, a0, 4 * 16 * 1
+        vssra.vi    v17, v9, 8
+        addi    t2, a0, 4 * 16 * 4
+        vssra.vi    v18, v10, 8
+        addi    t3, a0, 4 * 16 * 5
+        vssra.vi    v19, v11, 8
+        vsuxei16.v  v16, (a0), v24
+        vsuxei16.v  v17, (t1), v24
+        vsuxei16.v  v18, (t2), v24
+        vsuxei16.v  v19, (t3), v24
+        ret
+endfunc
-- 
2.52.0


From 72ad4d359e4660a183102d1d296ef76da42bdaee Mon Sep 17 00:00:00 2001
From: caifan3 <caifan3@xiaomi.com>
Date: Thu, 4 Dec 2025 22:15:15 +0800
Subject: [PATCH 234/304] avformat/unix: add pkt_size option

Add a "pkt_size" AVOption to the Unix domain socket protocol. This option
allows the user to specify the maximum packet size for packet-oriented
sockets (SOCK_DGRAM and SOCK_SEQPACKET).

Signed-off-by: caifan3 <caifan3@xiaomi.com>
---
 doc/protocols.texi | 4 ++++
 libavformat/unix.c | 5 +++++
 2 files changed, 9 insertions(+)

diff --git a/doc/protocols.texi b/doc/protocols.texi
index b74383122a..dee2b845ac 100644
--- a/doc/protocols.texi
+++ b/doc/protocols.texi
@@ -2271,6 +2271,10 @@ The following parameters can be set via command line options
 Timeout in ms.
 @item listen
 Create the Unix socket in listening mode.
+@item pkt_size
+Maximum packet size for packet-oriented sockets (SOCK_DGRAM and
+SOCK_SEQPACKET). If greater than zero, this value is used as
+@code{max_packet_size}. Ignored for SOCK_STREAM. Default is @code{0}.
 @end table
 
 @section zmq
diff --git a/libavformat/unix.c b/libavformat/unix.c
index fed215a691..3a0caf96ed 100644
--- a/libavformat/unix.c
+++ b/libavformat/unix.c
@@ -39,6 +39,7 @@ typedef struct UnixContext {
     int listen;
     int type;
     int fd;
+    int pkt_size;
 } UnixContext;
 
 #define OFFSET(x) offsetof(UnixContext, x)
@@ -50,6 +51,7 @@ static const AVOption unix_options[] = {
     { "stream",    "Stream (reliable stream-oriented)",     0,               AV_OPT_TYPE_CONST, { .i64 = SOCK_STREAM },    INT_MIN, INT_MAX, ED, .unit = "type" },
     { "datagram",  "Datagram (unreliable packet-oriented)", 0,               AV_OPT_TYPE_CONST, { .i64 = SOCK_DGRAM },     INT_MIN, INT_MAX, ED, .unit = "type" },
     { "seqpacket", "Seqpacket (reliable packet-oriented",   0,               AV_OPT_TYPE_CONST, { .i64 = SOCK_SEQPACKET }, INT_MIN, INT_MAX, ED, .unit = "type" },
+    { "pkt_size",  "Maximum packet size",                   OFFSET(pkt_size), AV_OPT_TYPE_INT,  { .i64 = 0 },              0, INT_MAX, ED },
     { NULL }
 };
 
@@ -91,6 +93,9 @@ static int unix_open(URLContext *h, const char *filename, int flags)
     s->fd = fd;
     h->is_streamed = 1;
 
+    if ((s->type == SOCK_DGRAM || s->type == SOCK_SEQPACKET) && s->pkt_size > 0)
+        h->max_packet_size = s->pkt_size;
+
     return 0;
 
 fail:
-- 
2.52.0


From 09fab3e35f6a6c5bbd8731d910bb724777c1e609 Mon Sep 17 00:00:00 2001
From: Marton Balint <cus@passwd.hu>
Date: Thu, 27 Nov 2025 23:57:20 +0100
Subject: [PATCH 235/304] avfilter/af_amerge: fix possible crash with custom
 layouts

The check if a native layout can be created from the sources was incomplete and
casued a crash with custom layouts if the layout contained a native channel
multiple times, as in this example command line:

ffmpeg -lavfi "sine[a0];sine,pan=FL+FL[a1];[a0][a1]amerge[aout]" -map "[aout]" -t 1 -f framecrc -

Signed-off-by: Marton Balint <cus@passwd.hu>
---
 libavfilter/af_amerge.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/libavfilter/af_amerge.c b/libavfilter/af_amerge.c
index bb82128a84..41137d4b8b 100644
--- a/libavfilter/af_amerge.c
+++ b/libavfilter/af_amerge.c
@@ -77,7 +77,7 @@ static int query_formats(AVFilterContext *ctx)
     AVChannelLayout *inlayout[SWR_CH_MAX] = { NULL }, outlayout = { 0 };
     uint64_t outmask = 0;
     AVFilterChannelLayouts *layouts;
-    int i, ret, overlap = 0, nb_ch = 0;
+    int i, ret, nb_ch = 0;
 
     for (i = 0; i < s->nb_inputs; i++) {
         if (!ctx->inputs[i]->incfg.channel_layouts ||
@@ -92,15 +92,11 @@ static int query_formats(AVFilterContext *ctx)
             av_channel_layout_describe(inlayout[i], buf, sizeof(buf));
             av_log(ctx, AV_LOG_INFO, "Using \"%s\" for input %d\n", buf, i + 1);
         }
-        s->in[i].nb_ch = FF_LAYOUT2COUNT(inlayout[i]);
-        if (s->in[i].nb_ch) {
-            overlap++;
-        } else {
-            s->in[i].nb_ch = inlayout[i]->nb_channels;
-            if (av_channel_layout_subset(inlayout[i], outmask))
-                overlap++;
-            outmask |= inlayout[i]->order == AV_CHANNEL_ORDER_NATIVE ?
-                       inlayout[i]->u.mask : 0;
+        s->in[i].nb_ch = inlayout[i]->nb_channels;
+        for (int j = 0; j < s->in[i].nb_ch; j++) {
+            enum AVChannel id = av_channel_layout_channel_from_index(inlayout[i], j);
+            if (id >= 0 && id < 64)
+                outmask |= (1ULL << id);
         }
         nb_ch += s->in[i].nb_ch;
     }
@@ -108,7 +104,7 @@ static int query_formats(AVFilterContext *ctx)
         av_log(ctx, AV_LOG_ERROR, "Too many channels (max %d)\n", SWR_CH_MAX);
         return AVERROR(EINVAL);
     }
-    if (overlap) {
+    if (av_popcount64(outmask) != nb_ch) {
         av_log(ctx, AV_LOG_WARNING,
                "Input channel layouts overlap: "
                "output layout will be determined by the number of distinct input channels\n");
-- 
2.52.0


From 173421820297b45ac59b65e289d7376d29b8f1f5 Mon Sep 17 00:00:00 2001
From: Marton Balint <cus@passwd.hu>
Date: Fri, 28 Nov 2025 21:01:28 +0100
Subject: [PATCH 236/304] avfilter/af_amerge: rework routing calculation

No change in functionality.

Signed-off-by: Marton Balint <cus@passwd.hu>
---
 libavfilter/af_amerge.c | 43 ++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 22 deletions(-)

diff --git a/libavfilter/af_amerge.c b/libavfilter/af_amerge.c
index 41137d4b8b..92afb4691f 100644
--- a/libavfilter/af_amerge.c
+++ b/libavfilter/af_amerge.c
@@ -63,6 +63,8 @@ static av_cold void uninit(AVFilterContext *ctx)
     av_freep(&s->in);
 }
 
+#define INLAYOUT(ctx, i) (&(ctx)->inputs[i]->incfg.channel_layouts->channel_layouts[0])
+
 static int query_formats(AVFilterContext *ctx)
 {
     static const enum AVSampleFormat packed_sample_fmts[] = {
@@ -74,10 +76,11 @@ static int query_formats(AVFilterContext *ctx)
         AV_SAMPLE_FMT_NONE
     };
     AMergeContext *s = ctx->priv;
-    AVChannelLayout *inlayout[SWR_CH_MAX] = { NULL }, outlayout = { 0 };
+    AVChannelLayout outlayout = { 0 };
     uint64_t outmask = 0;
     AVFilterChannelLayouts *layouts;
     int i, ret, nb_ch = 0;
+    int native_layout_routes[SWR_CH_MAX] = { 0 };
 
     for (i = 0; i < s->nb_inputs; i++) {
         if (!ctx->inputs[i]->incfg.channel_layouts ||
@@ -86,51 +89,47 @@ static int query_formats(AVFilterContext *ctx)
                    "No channel layout for input %d\n", i + 1);
             return AVERROR(EAGAIN);
         }
-        inlayout[i] = &ctx->inputs[i]->incfg.channel_layouts->channel_layouts[0];
         if (ctx->inputs[i]->incfg.channel_layouts->nb_channel_layouts > 1) {
             char buf[256];
-            av_channel_layout_describe(inlayout[i], buf, sizeof(buf));
+            av_channel_layout_describe(INLAYOUT(ctx, i), buf, sizeof(buf));
             av_log(ctx, AV_LOG_INFO, "Using \"%s\" for input %d\n", buf, i + 1);
         }
-        s->in[i].nb_ch = inlayout[i]->nb_channels;
-        for (int j = 0; j < s->in[i].nb_ch; j++) {
-            enum AVChannel id = av_channel_layout_channel_from_index(inlayout[i], j);
-            if (id >= 0 && id < 64)
-                outmask |= (1ULL << id);
-        }
+        s->in[i].nb_ch = INLAYOUT(ctx, i)->nb_channels;
         nb_ch += s->in[i].nb_ch;
     }
     if (nb_ch > SWR_CH_MAX) {
         av_log(ctx, AV_LOG_ERROR, "Too many channels (max %d)\n", SWR_CH_MAX);
         return AVERROR(EINVAL);
     }
+    for (int i = 0, ch_idx = 0; i < s->nb_inputs; i++) {
+        for (int j = 0; j < s->in[i].nb_ch; j++) {
+            enum AVChannel id = av_channel_layout_channel_from_index(INLAYOUT(ctx, i), j);
+            if (id >= 0 && id < 64) {
+                outmask |= (1ULL << id);
+                native_layout_routes[id] = ch_idx;
+            }
+            s->route[ch_idx] = ch_idx;
+            ch_idx++;
+        }
+    }
     if (av_popcount64(outmask) != nb_ch) {
         av_log(ctx, AV_LOG_WARNING,
                "Input channel layouts overlap: "
                "output layout will be determined by the number of distinct input channels\n");
-        for (i = 0; i < nb_ch; i++)
-            s->route[i] = i;
         av_channel_layout_default(&outlayout, nb_ch);
         if (!KNOWN(&outlayout) && nb_ch)
             av_channel_layout_from_mask(&outlayout, 0xFFFFFFFFFFFFFFFFULL >> (64 - nb_ch));
     } else {
-        int *route[SWR_CH_MAX];
-        int c, out_ch_number = 0;
-
+        for (int c = 0, ch_idx = 0; c < 64; c++)
+            if ((1ULL << c) & outmask)
+                s->route[native_layout_routes[c]] = ch_idx++;
         av_channel_layout_from_mask(&outlayout, outmask);
-        route[0] = s->route;
-        for (i = 1; i < s->nb_inputs; i++)
-            route[i] = route[i - 1] + s->in[i - 1].nb_ch;
-        for (c = 0; c < 64; c++)
-            for (i = 0; i < s->nb_inputs; i++)
-                if (av_channel_layout_index_from_channel(inlayout[i], c) >= 0)
-                    *(route[i]++) = out_ch_number++;
     }
     if ((ret = ff_set_common_formats_from_list(ctx, packed_sample_fmts)) < 0)
         return ret;
     for (i = 0; i < s->nb_inputs; i++) {
         layouts = NULL;
-        if ((ret = ff_add_channel_layout(&layouts, inlayout[i])) < 0)
+        if ((ret = ff_add_channel_layout(&layouts, INLAYOUT(ctx, i))) < 0)
             return ret;
         if ((ret = ff_channel_layouts_ref(layouts, &ctx->inputs[i]->outcfg.channel_layouts)) < 0)
             return ret;
-- 
2.52.0


From b016912f52a07fb3cb2de92aaadf396818dc819d Mon Sep 17 00:00:00 2001
From: Marton Balint <cus@passwd.hu>
Date: Fri, 28 Nov 2025 23:59:18 +0100
Subject: [PATCH 237/304] avfilter/af_amerge: add layout_mode option to control
 output channel layout

Signed-off-by: Marton Balint <cus@passwd.hu>
---
 doc/filters.texi        | 49 ++++++++++++++++++++++++++++---------
 libavfilter/af_amerge.c | 53 ++++++++++++++++++++++++++++++++++++-----
 2 files changed, 85 insertions(+), 17 deletions(-)

diff --git a/doc/filters.texi b/doc/filters.texi
index 168ea0d2da..464a602f8d 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -2436,6 +2436,11 @@ Only used if option named @var{start} is set to @code{-1}.
 
 Merge two or more audio streams into a single multi-channel stream.
 
+All inputs must have the same sample rate, and format.
+
+If inputs do not have the same duration, the output will stop with the
+shortest.
+
 The filter accepts the following options:
 
 @table @option
@@ -2443,15 +2448,25 @@ The filter accepts the following options:
 @item inputs
 Set the number of inputs. Default is 2.
 
-@end table
+@item layout_mode
 
-If the channel layouts of the inputs are disjoint, and therefore compatible,
-the channel layout of the output will be set accordingly and the channels
-will be reordered as necessary. If the channel layouts of the inputs are not
-disjoint, the output will have all the channels of the first input then all
-the channels of the second input, in that order, and the channel layout of
-the output will be the default value corresponding to the total number of
-channels.
+This option controls how the output channel layout is determined and if the
+audio channels are reordered during merge.
+
+@table @option
+
+@item legacy
+
+This is the mode how the filter behaved historically so it is the default.
+
+If the channel layouts of the inputs are known and disjoint, and therefore
+compatible, the channel layout of the output will be set accordingly and the
+channels will be reordered as necessary. If the channel layouts of the inputs
+are not disjoint, some of them are unknown, or they are using special channel
+layouts, such as ambisonics, the output will have all the channels of the first
+input then all the channels of the second input, in that order, and the channel
+layout of the output will be the default value corresponding to the total
+number of channels.
 
 For example, if the first input is in 2.1 (FL+FR+LF) and the second input
 is FC+BL+BR, then the output will be in 5.1, with the channels in the
@@ -2462,10 +2477,22 @@ On the other hand, if both input are in stereo, the output channels will be
 in the default order: a1, a2, b1, b2, and the channel layout will be
 arbitrarily set to 4.0, which may or may not be the expected value.
 
-All inputs must have the same sample rate, and format.
+@item reset
+This mode ignores the input channel layouts and does no channel reordering.
+The output will have all the channels of the first input, then all the channels
+of the second input, in that order, and so on.
 
-If inputs do not have the same duration, the output will stop with the
-shortest.
+The output channel layout will only specify the total channel count.
+
+@item normal
+This mode keeps channel name and designation information from the input
+channels and does no channel reordering. The output will have all the channels
+of the first input, then all the channels of the second input, in that order,
+and so on.
+
+@end table
+
+@end table
 
 @subsection Examples
 
diff --git a/libavfilter/af_amerge.c b/libavfilter/af_amerge.c
index 92afb4691f..e300cf9c7b 100644
--- a/libavfilter/af_amerge.c
+++ b/libavfilter/af_amerge.c
@@ -23,6 +23,7 @@
  * Audio merging filter
  */
 
+#include "libavutil/avassert.h"
 #include "libavutil/avstring.h"
 #include "libavutil/bprint.h"
 #include "libavutil/channel_layout.h"
@@ -43,14 +44,27 @@ typedef struct AMergeContext {
     struct amerge_input {
         int nb_ch;         /**< number of channels for the input */
     } *in;
+    int layout_mode;       /**< the method for determining the output channel layout */
 } AMergeContext;
 
 #define OFFSET(x) offsetof(AMergeContext, x)
 #define FLAGS AV_OPT_FLAG_AUDIO_PARAM|AV_OPT_FLAG_FILTERING_PARAM
 
+enum LayoutModes {
+    LM_LEGACY,
+    LM_RESET,
+    LM_NORMAL,
+    NB_LAYOUTMODES
+};
+
 static const AVOption amerge_options[] = {
     { "inputs", "specify the number of inputs", OFFSET(nb_inputs),
       AV_OPT_TYPE_INT, { .i64 = 2 }, 1, SWR_CH_MAX, FLAGS },
+    { "layout_mode",   "method used to determine the output channel layout", OFFSET(layout_mode),
+      AV_OPT_TYPE_INT, { .i64 = LM_LEGACY }, 0, NB_LAYOUTMODES - 1, FLAGS, .unit = "layout_mode"},
+        { "legacy",   NULL, 0, AV_OPT_TYPE_CONST, {.i64 = LM_LEGACY   }, 0, 0, FLAGS, .unit = "layout_mode" },
+        { "reset",    NULL, 0, AV_OPT_TYPE_CONST, {.i64 = LM_RESET    }, 0, 0, FLAGS, .unit = "layout_mode" },
+        { "normal",   NULL, 0, AV_OPT_TYPE_CONST, {.i64 = LM_NORMAL   }, 0, 0, FLAGS, .unit = "layout_mode" },
     { NULL }
 };
 
@@ -101,9 +115,16 @@ static int query_formats(AVFilterContext *ctx)
         av_log(ctx, AV_LOG_ERROR, "Too many channels (max %d)\n", SWR_CH_MAX);
         return AVERROR(EINVAL);
     }
+    ret = av_channel_layout_custom_init(&outlayout, nb_ch);
+    if (ret < 0)
+        return ret;
     for (int i = 0, ch_idx = 0; i < s->nb_inputs; i++) {
         for (int j = 0; j < s->in[i].nb_ch; j++) {
             enum AVChannel id = av_channel_layout_channel_from_index(INLAYOUT(ctx, i), j);
+            if (INLAYOUT(ctx, i)->order == AV_CHANNEL_ORDER_CUSTOM)
+                outlayout.u.map[ch_idx] = INLAYOUT(ctx, i)->u.map[j];
+            else
+                outlayout.u.map[ch_idx].id = (id == AV_CHAN_NONE ? AV_CHAN_UNKNOWN : id);
             if (id >= 0 && id < 64) {
                 outmask |= (1ULL << id);
                 native_layout_routes[id] = ch_idx;
@@ -112,6 +133,9 @@ static int query_formats(AVFilterContext *ctx)
             ch_idx++;
         }
     }
+    switch (s->layout_mode) {
+    case LM_LEGACY:
+    av_channel_layout_uninit(&outlayout);
     if (av_popcount64(outmask) != nb_ch) {
         av_log(ctx, AV_LOG_WARNING,
                "Input channel layouts overlap: "
@@ -125,22 +149,39 @@ static int query_formats(AVFilterContext *ctx)
                 s->route[native_layout_routes[c]] = ch_idx++;
         av_channel_layout_from_mask(&outlayout, outmask);
     }
+    break;
+    case LM_RESET:
+        av_channel_layout_uninit(&outlayout);
+        outlayout.order = AV_CHANNEL_ORDER_UNSPEC;
+        outlayout.nb_channels = nb_ch;
+        break;
+    case LM_NORMAL:
+        ret = av_channel_layout_retype(&outlayout, 0, AV_CHANNEL_LAYOUT_RETYPE_FLAG_CANONICAL);
+        if (ret < 0)
+            goto out;
+        break;
+    default:
+        av_unreachable("Invalid layout_mode");
+    }
     if ((ret = ff_set_common_formats_from_list(ctx, packed_sample_fmts)) < 0)
-        return ret;
+        goto out;
     for (i = 0; i < s->nb_inputs; i++) {
         layouts = NULL;
         if ((ret = ff_add_channel_layout(&layouts, INLAYOUT(ctx, i))) < 0)
-            return ret;
+            goto out;
         if ((ret = ff_channel_layouts_ref(layouts, &ctx->inputs[i]->outcfg.channel_layouts)) < 0)
-            return ret;
+            goto out;
     }
     layouts = NULL;
     if ((ret = ff_add_channel_layout(&layouts, &outlayout)) < 0)
-        return ret;
+        goto out;
     if ((ret = ff_channel_layouts_ref(layouts, &ctx->outputs[0]->incfg.channel_layouts)) < 0)
-        return ret;
+        goto out;
 
-    return ff_set_common_all_samplerates(ctx);
+    ret = ff_set_common_all_samplerates(ctx);
+out:
+    av_channel_layout_uninit(&outlayout);
+    return ret;
 }
 
 static int config_output(AVFilterLink *outlink)
-- 
2.52.0


From 139ee6f6ca18109a53eeb2ac9cf1097aa31326d0 Mon Sep 17 00:00:00 2001
From: Marton Balint <cus@passwd.hu>
Date: Sat, 29 Nov 2025 00:10:50 +0100
Subject: [PATCH 238/304] avfilter/af_amerge: fix indentation

Signed-off-by: Marton Balint <cus@passwd.hu>
---
 libavfilter/af_amerge.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/libavfilter/af_amerge.c b/libavfilter/af_amerge.c
index e300cf9c7b..13decdadc5 100644
--- a/libavfilter/af_amerge.c
+++ b/libavfilter/af_amerge.c
@@ -135,21 +135,21 @@ static int query_formats(AVFilterContext *ctx)
     }
     switch (s->layout_mode) {
     case LM_LEGACY:
-    av_channel_layout_uninit(&outlayout);
-    if (av_popcount64(outmask) != nb_ch) {
-        av_log(ctx, AV_LOG_WARNING,
-               "Input channel layouts overlap: "
-               "output layout will be determined by the number of distinct input channels\n");
-        av_channel_layout_default(&outlayout, nb_ch);
-        if (!KNOWN(&outlayout) && nb_ch)
-            av_channel_layout_from_mask(&outlayout, 0xFFFFFFFFFFFFFFFFULL >> (64 - nb_ch));
-    } else {
-        for (int c = 0, ch_idx = 0; c < 64; c++)
-            if ((1ULL << c) & outmask)
-                s->route[native_layout_routes[c]] = ch_idx++;
-        av_channel_layout_from_mask(&outlayout, outmask);
-    }
-    break;
+        av_channel_layout_uninit(&outlayout);
+        if (av_popcount64(outmask) != nb_ch) {
+            av_log(ctx, AV_LOG_WARNING,
+                   "Input channel layouts overlap: "
+                   "output layout will be determined by the number of distinct input channels\n");
+            av_channel_layout_default(&outlayout, nb_ch);
+            if (!KNOWN(&outlayout) && nb_ch)
+                av_channel_layout_from_mask(&outlayout, 0xFFFFFFFFFFFFFFFFULL >> (64 - nb_ch));
+        } else {
+            for (int c = 0, ch_idx = 0; c < 64; c++)
+                if ((1ULL << c) & outmask)
+                    s->route[native_layout_routes[c]] = ch_idx++;
+            av_channel_layout_from_mask(&outlayout, outmask);
+        }
+        break;
     case LM_RESET:
         av_channel_layout_uninit(&outlayout);
         outlayout.order = AV_CHANNEL_ORDER_UNSPEC;
-- 
2.52.0


From ffa558a62d8306a7ca31100f68630b37b2da2d0a Mon Sep 17 00:00:00 2001
From: Marton Balint <cus@passwd.hu>
Date: Sun, 30 Nov 2025 23:00:15 +0100
Subject: [PATCH 239/304] fate/filter-audio: add amerge layout_mode test

Signed-off-by: Marton Balint <cus@passwd.hu>
---
 tests/fate/filter-audio.mak       |  6 +++
 tests/ref/fate/filter-amerge-mode | 65 +++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)
 create mode 100644 tests/ref/fate/filter-amerge-mode

diff --git a/tests/fate/filter-audio.mak b/tests/fate/filter-audio.mak
index b244f82bb7..526645a634 100644
--- a/tests/fate/filter-audio.mak
+++ b/tests/fate/filter-audio.mak
@@ -75,6 +75,12 @@ fate-filter-amerge: tests/data/asynth-44100-1.wav
 fate-filter-amerge: SRC = $(TARGET_PATH)/tests/data/asynth-44100-1.wav
 fate-filter-amerge: CMD = framecrc -i $(SRC) -i $(SRC) -filter_complex "[0:a][1:a]amerge=inputs=2[aout]" -map "[aout]"
 
+FATE_AFILTER-$(call FILTERDEMDECENCMUX, AMERGE, WAV, PCM_S16LE, PCM_S16LE, WAV) += fate-filter-amerge-mode
+fate-filter-amerge-mode: tests/data/asynth-44100-1.wav
+fate-filter-amerge-mode: SRC = $(TARGET_PATH)/tests/data/asynth-44100-1.wav
+fate-filter-amerge-mode: CMD = framecrc -channel_layout FL -i $(SRC) -ss 0.1 -channel_layout FR -i $(SRC) -ss 0.2 -i $(SRC) -ss 0.3 -i $(SRC) -ss 0.4 -i $(SRC) -ss 0.5 -i $(SRC) \
+                               -filter_complex "[1:a][0:a]amerge[tmp1];[2:a][3:a]amerge=layout_mode=reset[tmp2];[tmp1][tmp2][4:a][5:a]amerge=inputs=4:layout_mode=normal[aout]" -map "[aout]"
+
 FATE_AFILTER-$(call FILTERDEMDECENCMUX, APAD, WAV, PCM_S16LE, PCM_S16LE, WAV) += fate-filter-apad
 fate-filter-apad: tests/data/asynth-44100-2.wav
 fate-filter-apad: SRC = $(TARGET_PATH)/tests/data/asynth-44100-2.wav
diff --git a/tests/ref/fate/filter-amerge-mode b/tests/ref/fate/filter-amerge-mode
new file mode 100644
index 0000000000..b066323f13
--- /dev/null
+++ b/tests/ref/fate/filter-amerge-mode
@@ -0,0 +1,65 @@
+#tb 0: 1/44100
+#media_type 0: audio
+#codec_id 0: pcm_s16le
+#sample_rate 0: 44100
+#channel_layout_name 0: 6 channels (FL+FR+UNK+UNK+FC+FC)
+0,          0,          0,     4096,    49152, 0xd5b59e0b
+0,       4096,       4096,     4096,    49152, 0x5d99b2bc
+0,       8192,       8192,     4096,    49152, 0xafa8901f
+0,      12288,      12288,     4096,    49152, 0x97cd98b3
+0,      16384,      16384,     4096,    49152, 0x767fa951
+0,      20480,      20480,     4096,    49152, 0x09bc8763
+0,      24576,      24576,     4096,    49152, 0xd50e90ae
+0,      28672,      28672,     4096,    49152, 0xc7ce978d
+0,      32768,      32768,     4096,    49152, 0xb90ac520
+0,      36864,      36864,     4096,    49152, 0x32a2ac52
+0,      40960,      40960,     4096,    49152, 0x71ec85c8
+0,      45056,      45056,     4096,    49152, 0x4401b98a
+0,      49152,      49152,     4096,    49152, 0x972cb3b7
+0,      53248,      53248,     4096,    49152, 0x2c37f62d
+0,      57344,      57344,     4096,    49152, 0xee612003
+0,      61440,      61440,     4096,    49152, 0x4f46e987
+0,      65536,      65536,     4096,    49152, 0xb39484ca
+0,      69632,      69632,     4096,    49152, 0xc9042028
+0,      73728,      73728,     4096,    49152, 0xb196a39a
+0,      77824,      77824,     4096,    49152, 0xe4627739
+0,      81920,      81920,     4096,    49152, 0x3107d993
+0,      86016,      86016,     4096,    49152, 0x88606597
+0,      90112,      90112,     4096,    49152, 0xa3df9656
+0,      94208,      94208,     4096,    49152, 0x49442705
+0,      98304,      98304,     4096,    49152, 0x800256b2
+0,     102400,     102400,     4096,    49152, 0x1cb9af12
+0,     106496,     106496,     4096,    49152, 0xbe2d3e59
+0,     110592,     110592,     4096,    49152, 0x73e17139
+0,     114688,     114688,     4096,    49152, 0xc91a7787
+0,     118784,     118784,     4096,    49152, 0x4edf8c55
+0,     122880,     122880,     4096,    49152, 0x70057319
+0,     126976,     126976,     4096,    49152, 0x8a629a55
+0,     131072,     131072,     4096,    49152, 0xc9786b28
+0,     135168,     135168,     4096,    49152, 0x2efd7e7c
+0,     139264,     139264,     4096,    49152, 0x28877cd0
+0,     143360,     143360,     4096,    49152, 0xfd64967e
+0,     147456,     147456,     4096,    49152, 0x0caa8be5
+0,     151552,     151552,     4096,    49152, 0x097dc3c2
+0,     155648,     155648,     4096,    49152, 0xde78524d
+0,     159744,     159744,     4096,    49152, 0xbddb968b
+0,     163840,     163840,     4096,    49152, 0x146347cd
+0,     167936,     167936,     4096,    49152, 0x21ab8f0d
+0,     172032,     172032,     4096,    49152, 0xd2a0b60e
+0,     176128,     176128,     4096,    49152, 0xc7916e40
+0,     180224,     180224,     4096,    49152, 0xd42f5b66
+0,     184320,     184320,     4096,    49152, 0x2daeda35
+0,     188416,     188416,     4096,    49152, 0xd0220a25
+0,     192512,     192512,     4096,    49152, 0xfb962b0d
+0,     196608,     196608,     4096,    49152, 0xb1c6418c
+0,     200704,     200704,     4096,    49152, 0xc5e35827
+0,     204800,     204800,     4096,    49152, 0xf3cb0c12
+0,     208896,     208896,     4096,    49152, 0xfec64d90
+0,     212992,     212992,     4096,    49152, 0xb8685f78
+0,     217088,     217088,     4096,    49152, 0xe7d1562f
+0,     221184,     221184,     4096,    49152, 0xf453cba9
+0,     225280,     225280,     4096,    49152, 0x28928fce
+0,     229376,     229376,     4096,    49152, 0x64a909d9
+0,     233472,     233472,     4096,    49152, 0x2bf762b1
+0,     237568,     237568,     4096,    49152, 0x085daec8
+0,     241664,     241664,      886,    10632, 0x1522906c
-- 
2.52.0


From d4865d7a913f226ff9fc944e928e1a0acda4a05a Mon Sep 17 00:00:00 2001
From: Oliver Chang <ochang@google.com>
Date: Fri, 5 Dec 2025 02:07:10 +0000
Subject: [PATCH 240/304] avcodec/dpx: Fix heap-buffer-overflow in 16-bit
 decoding

Fixes a heap-buffer-overflow in `libavcodec/dpx.c` triggered by a stale
`unpadded_10bit` flag in the `DPXDecContext`. This flag, set for 10-bit
unpadded frames, persisted across `decode_frame` calls. If a subsequent
frame was 16-bit, the stale flag caused incorrect buffer size
validation, allowing truncated buffers to pass checks designed for
smaller 10-bit packed data. This led to an out-of-bounds read in
`av_image_copy_plane` during 16-bit decoding.

The fix explicitly resets `dpx->unpadded_10bit = 0` at the start of
`decode_frame` to ensure correct validation for each frame.

Fixes: https://issues.oss-fuzz.com/issues/464471792
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array read
Fixes: 464471792/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DPX_DEC_fuzzer-5275522210004992
---
 libavcodec/dpx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libavcodec/dpx.c b/libavcodec/dpx.c
index 7355b50f7a..8c075fd538 100644
--- a/libavcodec/dpx.c
+++ b/libavcodec/dpx.c
@@ -612,6 +612,7 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *p,
     av_dict_set(&p->metadata, "Input Device", input_device, 0);
 
     // Some devices do not pad 10bit samples to whole 32bit words per row
+    dpx->unpadded_10bit = 0;
     if (!memcmp(input_device, "Scanity", 7) ||
         !memcmp(creator, "Lasergraphics Inc.", 18)) {
         if (avctx->bits_per_raw_sample == 10)
-- 
2.52.0


From 01273a9c30b000414ad0461edccb620ffae8d09c Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Sun, 7 Dec 2025 19:15:29 +0100
Subject: [PATCH 241/304] forgejo/labeler: automatically flag Vulkan-related
 commits #20118

---
 .forgejo/labeler/labeler.js  | 3 ++-
 .forgejo/labeler/labeler.yml | 4 ++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/.forgejo/labeler/labeler.js b/.forgejo/labeler/labeler.js
index b0c0898668..173d98080c 100644
--- a/.forgejo/labeler/labeler.js
+++ b/.forgejo/labeler/labeler.js
@@ -11,7 +11,8 @@ module.exports = async ({github, context}) => {
       'avutil': 'avutil',
       'swresample': 'swresample',
       'swscale': 'swscale',
-      'fftools': 'CLI'
+      'fftools': 'CLI',
+      'vulkan': 'vulkan'
     };
 
     async function isOrgMember(username) {
diff --git a/.forgejo/labeler/labeler.yml b/.forgejo/labeler/labeler.yml
index 212dd12a9a..446a675316 100644
--- a/.forgejo/labeler/labeler.yml
+++ b/.forgejo/labeler/labeler.yml
@@ -29,3 +29,7 @@ swscale:
 CLI:
   - changed-files:
     - any-glob-to-any-file: 'fftools/**'
+
+vulkan:
+  - changed-files:
+    - any-glob-to-any-file: '**/*vulkan*'
-- 
2.52.0


From fdeb113e5f9bf9167bb92f676d0aac7e883c3616 Mon Sep 17 00:00:00 2001
From: Araz Iusubov <Primeadvice@gmail.com>
Date: Fri, 5 Dec 2025 16:27:41 +0100
Subject: [PATCH 242/304] avfilter: D3D12 scale video filter support

This filter allows scaling of video frames using Direct3D 12 acceleration.

Example:
    ffmpeg -hwaccel d3d12va -hwaccel_output_format d3d12 \
           -i input.mp4 -vf scale_d3d12=1920:1280 \
           -c:v hevc_d3d12va -y output_1920x1280.mp4
---
 Changelog                    |   1 +
 configure                    |   2 +
 libavfilter/Makefile         |   1 +
 libavfilter/allfilters.c     |   1 +
 libavfilter/vf_scale_d3d12.c | 774 +++++++++++++++++++++++++++++++++++
 5 files changed, 779 insertions(+)
 create mode 100644 libavfilter/vf_scale_d3d12.c

diff --git a/Changelog b/Changelog
index cda59ebc90..ef72af1e93 100644
--- a/Changelog
+++ b/Changelog
@@ -14,6 +14,7 @@ version <next>:
 - ProRes Vulkan hwaccel
 - DPX Vulkan hwaccel
 - Rockchip H.264/HEVC hardware encoder
+- Add vf_scale_d3d12 filter
 
 
 version 8.0:
diff --git a/configure b/configure
index 04e086c32a..189e973501 100755
--- a/configure
+++ b/configure
@@ -3430,6 +3430,7 @@ ddagrab_filter_deps="d3d11va IDXGIOutput1 DXGI_OUTDUPL_FRAME_INFO"
 gfxcapture_filter_deps="cxx17 threads d3d11va IGraphicsCaptureItemInterop __x_ABI_CWindows_CGraphics_CCapture_CIGraphicsCaptureSession3"
 gfxcapture_filter_extralibs="-lstdc++"
 scale_d3d11_filter_deps="d3d11va"
+scale_d3d12_filter_deps="d3d12va ID3D12VideoProcessor"
 
 amf_deps_any="libdl LoadLibrary"
 nvenc_deps="ffnvcodec"
@@ -6962,6 +6963,7 @@ check_type "windows.h d3d11.h" "ID3D11VideoContext"
 check_type "windows.h d3d12.h" "ID3D12Device"
 check_type "windows.h d3d12video.h" "ID3D12VideoDecoder"
 check_type "windows.h d3d12video.h" "ID3D12VideoEncoder"
+check_type "windows.h d3d12video.h" "ID3D12VideoProcessor"
 test_code cc "windows.h d3d12video.h" "D3D12_FEATURE_VIDEO feature = D3D12_FEATURE_VIDEO_ENCODER_CODEC" && \
 test_code cc "windows.h d3d12video.h" "D3D12_FEATURE_DATA_VIDEO_ENCODER_RESOURCE_REQUIREMENTS req" && enable d3d12_encoder_feature
 test_code cc "windows.h d3d12video.h" "D3D12_VIDEO_ENCODER_CODEC c = D3D12_VIDEO_ENCODER_CODEC_AV1; (void)c;" && enable d3d12va_av1_headers
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index 67814c0d77..5f5a199a63 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -469,6 +469,7 @@ OBJS-$(CONFIG_ROTATE_FILTER)                 += vf_rotate.o
 OBJS-$(CONFIG_SAB_FILTER)                    += vf_sab.o
 OBJS-$(CONFIG_SCALE_FILTER)                  += vf_scale.o scale_eval.o framesync.o
 OBJS-$(CONFIG_SCALE_D3D11_FILTER)            += vf_scale_d3d11.o scale_eval.o
+OBJS-$(CONFIG_SCALE_D3D12_FILTER)            += vf_scale_d3d12.o scale_eval.o
 OBJS-$(CONFIG_SCALE_CUDA_FILTER)             += vf_scale_cuda.o scale_eval.o \
                                                 vf_scale_cuda.ptx.o cuda/load_helper.o
 OBJS-$(CONFIG_SCALE_NPP_FILTER)              += vf_scale_npp.o scale_eval.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 0a3e782fe9..9ea07a486d 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -442,6 +442,7 @@ extern const FFFilter ff_vf_vpp_amf;
 extern const FFFilter ff_vf_sr_amf;
 extern const FFFilter ff_vf_scale_cuda;
 extern const FFFilter ff_vf_scale_d3d11;
+extern const FFFilter ff_vf_scale_d3d12;
 extern const FFFilter ff_vf_scale_npp;
 extern const FFFilter ff_vf_scale_qsv;
 extern const FFFilter ff_vf_scale_vaapi;
diff --git a/libavfilter/vf_scale_d3d12.c b/libavfilter/vf_scale_d3d12.c
new file mode 100644
index 0000000000..6950feb32b
--- /dev/null
+++ b/libavfilter/vf_scale_d3d12.c
@@ -0,0 +1,774 @@
+/**
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#define COBJMACROS
+
+#include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
+#include "compat/w32dlfcn.h"
+
+#include "libavutil/hwcontext.h"
+#include "libavutil/hwcontext_d3d12va.h"
+#include "libavutil/hwcontext_d3d12va_internal.h"
+
+#include "filters.h"
+#include "scale_eval.h"
+#include "video.h"
+
+typedef struct ScaleD3D12Context {
+    const AVClass *classCtx;
+    char *w_expr;
+    char *h_expr;
+    enum AVPixelFormat format;
+
+    /* D3D12 objects */
+    ID3D12Device *device;
+    ID3D12VideoDevice *video_device;
+    ID3D12VideoProcessor *video_processor;
+    ID3D12CommandQueue *command_queue;
+    ID3D12VideoProcessCommandList *command_list;
+    ID3D12CommandAllocator *command_allocator;
+
+    /* Synchronization */
+    ID3D12Fence *fence;
+    UINT64 fence_value;
+    HANDLE fence_event;
+
+    /* Buffer references */
+    AVBufferRef *hw_device_ctx;
+    AVBufferRef *hw_frames_ctx_out;
+
+    /* Dimensions and formats */
+    int width, height;
+    int input_width, input_height;
+    DXGI_FORMAT input_format;
+    DXGI_FORMAT output_format;
+
+    /* Color space and frame rate */
+    DXGI_COLOR_SPACE_TYPE input_colorspace;
+    AVRational input_framerate;
+
+    /* Video processor capabilities */
+    D3D12_FEATURE_DATA_VIDEO_PROCESS_SUPPORT process_support;
+} ScaleD3D12Context;
+
+static av_cold int scale_d3d12_init(AVFilterContext *ctx) {
+    return 0;
+}
+
+static void release_d3d12_resources(ScaleD3D12Context *s) {
+    UINT64 fence_value;
+    HRESULT hr;
+    /* Wait for all GPU operations to complete before releasing resources */
+    if (s->command_queue && s->fence && s->fence_event) {
+        fence_value = s->fence_value - 1;
+        hr = ID3D12CommandQueue_Signal(s->command_queue, s->fence, fence_value);
+        if (SUCCEEDED(hr)) {
+            UINT64 completed = ID3D12Fence_GetCompletedValue(s->fence);
+            if (completed < fence_value) {
+                hr = ID3D12Fence_SetEventOnCompletion(s->fence, fence_value, s->fence_event);
+                if (SUCCEEDED(hr)) {
+                    WaitForSingleObject(s->fence_event, INFINITE);
+                }
+            }
+        }
+    }
+
+    if (s->fence_event) {
+        CloseHandle(s->fence_event);
+        s->fence_event = NULL;
+    }
+
+    if (s->fence) {
+        ID3D12Fence_Release(s->fence);
+        s->fence = NULL;
+    }
+
+    if (s->command_list) {
+        ID3D12VideoProcessCommandList_Release(s->command_list);
+        s->command_list = NULL;
+    }
+
+    if (s->command_allocator) {
+        ID3D12CommandAllocator_Release(s->command_allocator);
+        s->command_allocator = NULL;
+    }
+
+    if (s->video_processor) {
+        ID3D12VideoProcessor_Release(s->video_processor);
+        s->video_processor = NULL;
+    }
+
+    if (s->video_device) {
+        ID3D12VideoDevice_Release(s->video_device);
+        s->video_device = NULL;
+    }
+
+    if (s->command_queue) {
+        ID3D12CommandQueue_Release(s->command_queue);
+        s->command_queue = NULL;
+    }
+}
+
+static DXGI_COLOR_SPACE_TYPE get_dxgi_colorspace(enum AVColorSpace colorspace, enum AVColorTransferCharacteristic trc, int is_10bit)
+{
+    /* Map FFmpeg color space to DXGI color space */
+    if (is_10bit) {
+        /* 10-bit formats (P010) */
+        if (colorspace == AVCOL_SPC_BT2020_NCL || colorspace == AVCOL_SPC_BT2020_CL) {
+            if (trc == AVCOL_TRC_SMPTE2084) {
+                return DXGI_COLOR_SPACE_YCBCR_STUDIO_G2084_LEFT_P2020;      ///< HDR10
+            } else if (trc == AVCOL_TRC_ARIB_STD_B67) {
+                return DXGI_COLOR_SPACE_YCBCR_STUDIO_GHLG_TOPLEFT_P2020;    ///< HLG
+            } else {
+                return DXGI_COLOR_SPACE_YCBCR_STUDIO_G22_LEFT_P2020;
+            }
+        } else {
+            return DXGI_COLOR_SPACE_YCBCR_STUDIO_G22_LEFT_P709;             ///< Rec.709 10-bit
+        }
+    } else {
+        /* 8-bit formats (NV12) */
+        if (colorspace == AVCOL_SPC_BT2020_NCL || colorspace == AVCOL_SPC_BT2020_CL) {
+            return DXGI_COLOR_SPACE_YCBCR_STUDIO_G22_LEFT_P2020;
+        } else if (colorspace == AVCOL_SPC_BT470BG || colorspace == AVCOL_SPC_SMPTE170M) {
+            return DXGI_COLOR_SPACE_YCBCR_STUDIO_G22_LEFT_P601;
+        } else {
+            return DXGI_COLOR_SPACE_YCBCR_STUDIO_G22_LEFT_P709;             ///< Default to Rec.709
+        }
+    }
+}
+
+static AVRational get_input_framerate(AVFilterContext *ctx, AVFilterLink *inlink, AVFrame *in)
+{
+    int64_t duration_scaled;
+    int64_t time_base_den;
+    AVRational framerate = {0, 0};
+
+    if (in->duration > 0 && inlink->time_base.num > 0 && inlink->time_base.den > 0) {
+        /*
+        * Calculate framerate from frame duration and timebase
+        * framerate = 1 / (duration * timebase)
+        */
+        duration_scaled = in->duration * inlink->time_base.num;
+        time_base_den = inlink->time_base.den;
+        framerate.num = time_base_den;
+        framerate.den = duration_scaled;
+        /* Reduce the fraction */
+        av_reduce(&framerate.num, &framerate.den,
+                 framerate.num, framerate.den, INT_MAX);
+    } else if (inlink->time_base.num > 0 && inlink->time_base.den > 0) {
+        /* Estimate from timebase (inverse of timebase is often the framerate) */
+        framerate.num = inlink->time_base.den;
+        framerate.den = inlink->time_base.num;
+    } else {
+        /* Default to 30fps if framerate cannot be determined */
+        framerate.num = 30;
+        framerate.den = 1;
+        av_log(ctx, AV_LOG_WARNING, "Input framerate not determinable, defaulting to 30fps\n");
+    }
+
+    return framerate;
+}
+
+static int scale_d3d12_configure_processor(ScaleD3D12Context *s, AVFilterContext *ctx) {
+    HRESULT hr;
+
+    if (s->output_format == DXGI_FORMAT_UNKNOWN) {
+        av_log(ctx, AV_LOG_ERROR, "Output format not initialized\n");
+        return AVERROR(EINVAL);
+    }
+
+    AVHWDeviceContext *hwctx = (AVHWDeviceContext *)s->hw_device_ctx->data;
+    AVD3D12VADeviceContext *d3d12_hwctx = (AVD3D12VADeviceContext *)hwctx->hwctx;
+    s->device = d3d12_hwctx->device;
+
+    av_log(ctx, AV_LOG_VERBOSE, "Configuring D3D12 video processor: %dx%d -> %dx%d\n",
+           s->input_width, s->input_height, s->width, s->height);
+
+    hr = ID3D12Device_QueryInterface(s->device, &IID_ID3D12VideoDevice, (void **)&s->video_device);
+    if (FAILED(hr)) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to get D3D12 video device interface: HRESULT 0x%lX\n", hr);
+        return AVERROR_EXTERNAL;
+    }
+
+    D3D12_COMMAND_QUEUE_DESC queue_desc = {
+        .Type = D3D12_COMMAND_LIST_TYPE_VIDEO_PROCESS,
+        .Priority = D3D12_COMMAND_QUEUE_PRIORITY_NORMAL,
+        .Flags = D3D12_COMMAND_QUEUE_FLAG_NONE,
+        .NodeMask = 0
+    };
+
+    hr = ID3D12Device_CreateCommandQueue(s->device, &queue_desc, &IID_ID3D12CommandQueue, (void **)&s->command_queue);
+    if (FAILED(hr)) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to create command queue: HRESULT 0x%lX\n", hr);
+        return AVERROR_EXTERNAL;
+    }
+
+    s->process_support.NodeIndex = 0;
+
+    s->process_support.InputSample.Format.Format     = s->input_format;
+    s->process_support.InputSample.Format.ColorSpace = s->input_colorspace;
+    s->process_support.InputSample.Width             = s->input_width;
+    s->process_support.InputSample.Height            = s->input_height;
+    s->process_support.InputFrameRate.Numerator      = s->input_framerate.num;
+    s->process_support.InputFrameRate.Denominator    = s->input_framerate.den;
+    s->process_support.InputFieldType                = D3D12_VIDEO_FIELD_TYPE_NONE;
+    s->process_support.InputStereoFormat             = D3D12_VIDEO_FRAME_STEREO_FORMAT_NONE;
+
+    s->process_support.OutputFormat.Format           = s->output_format;
+    s->process_support.OutputFormat.ColorSpace       = s->input_colorspace;
+    s->process_support.OutputFrameRate.Numerator     = s->input_framerate.num;
+    s->process_support.OutputFrameRate.Denominator   = s->input_framerate.den;
+    s->process_support.OutputStereoFormat            = D3D12_VIDEO_FRAME_STEREO_FORMAT_NONE;
+
+    hr = ID3D12VideoDevice_CheckFeatureSupport(
+        s->video_device,
+        D3D12_FEATURE_VIDEO_PROCESS_SUPPORT,
+        &s->process_support,
+        sizeof(s->process_support)
+    );
+
+    if (FAILED(hr)) {
+        av_log(ctx, AV_LOG_ERROR, "Video process feature not supported: HRESULT 0x%lX\n", hr);
+        return AVERROR_EXTERNAL;
+    }
+
+    if (!(s->process_support.SupportFlags & D3D12_VIDEO_PROCESS_SUPPORT_FLAG_SUPPORTED)) {
+        av_log(ctx, AV_LOG_ERROR, "Video process configuration not supported by hardware\n");
+        return AVERROR_EXTERNAL;
+    }
+
+    D3D12_VIDEO_PROCESS_OUTPUT_STREAM_DESC processor_output_desc = {
+        .Format                         = s->output_format,
+        .ColorSpace                     = s->input_colorspace,
+        .AlphaFillMode                  = D3D12_VIDEO_PROCESS_ALPHA_FILL_MODE_OPAQUE,
+        .AlphaFillModeSourceStreamIndex = 0,
+        .BackgroundColor                = { 0.0f, 0.0f, 0.0f, 1.0f },
+        .FrameRate                      = { s->input_framerate.num, s->input_framerate.den },
+        .EnableStereo                   = FALSE,
+    };
+
+    D3D12_VIDEO_PROCESS_INPUT_STREAM_DESC processor_input_desc = {
+        .Format                 = s->input_format,
+        .ColorSpace             = s->input_colorspace,
+        .SourceAspectRatio      = { s->input_width, s->input_height },
+        .DestinationAspectRatio = { s->width, s->height },
+        .FrameRate              = { s->input_framerate.num, s->input_framerate.den },
+        .StereoFormat           = D3D12_VIDEO_FRAME_STEREO_FORMAT_NONE,
+        .FieldType              = D3D12_VIDEO_FIELD_TYPE_NONE,
+        .DeinterlaceMode        = D3D12_VIDEO_PROCESS_DEINTERLACE_FLAG_NONE,
+        .EnableOrientation      = FALSE,
+        .FilterFlags            = D3D12_VIDEO_PROCESS_FILTER_FLAG_NONE,
+        .SourceSizeRange        = {
+            .MaxWidth  = s->input_width,
+            .MaxHeight = s->input_height,
+            .MinWidth  = s->input_width,
+            .MinHeight = s->input_height
+        },
+        .DestinationSizeRange  = {
+            .MaxWidth  = s->width,
+            .MaxHeight = s->height,
+            .MinWidth  = s->width,
+            .MinHeight = s->height
+        },
+        .EnableAlphaBlending   = FALSE,
+        .LumaKey               = { .Enable = FALSE, .Lower = 0.0f, .Upper = 1.0f },
+        .NumPastFrames         = 0,
+        .NumFutureFrames       = 0,
+        .EnableAutoProcessing  = FALSE,
+    };
+
+    /* If pixel aspect ratio adjustment is not supported, set to 1:1 and warn */
+    if (!(s->process_support.FeatureSupport & D3D12_VIDEO_PROCESS_FEATURE_FLAG_PIXEL_ASPECT_RATIO)) {
+        processor_input_desc.SourceAspectRatio.Numerator        = 1;
+        processor_input_desc.SourceAspectRatio.Denominator      = 1;
+        processor_input_desc.DestinationAspectRatio.Numerator   = 1;
+        processor_input_desc.DestinationAspectRatio.Denominator = 1;
+        av_log(ctx, AV_LOG_WARNING, "Pixel aspect ratio adjustment not supported by hardware\n");
+    }
+
+    hr = ID3D12VideoDevice_CreateVideoProcessor(
+        s->video_device,
+        0,
+        &processor_output_desc,
+        1,
+        &processor_input_desc,
+        &IID_ID3D12VideoProcessor,
+        (void **)&s->video_processor
+    );
+
+    if (FAILED(hr)) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to create video processor: HRESULT 0x%lX\n", hr);
+        return AVERROR_EXTERNAL;
+    }
+
+    hr = ID3D12Device_CreateCommandAllocator(
+        s->device,
+        D3D12_COMMAND_LIST_TYPE_VIDEO_PROCESS,
+        &IID_ID3D12CommandAllocator,
+        (void **)&s->command_allocator
+    );
+
+    if (FAILED(hr)) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to create command allocator: HRESULT 0x%lX\n", hr);
+        return AVERROR_EXTERNAL;
+    }
+
+    hr = ID3D12Device_CreateCommandList(
+        s->device,
+        0,
+        D3D12_COMMAND_LIST_TYPE_VIDEO_PROCESS,
+        s->command_allocator,
+        NULL,
+        &IID_ID3D12VideoProcessCommandList,
+        (void **)&s->command_list
+    );
+
+    if (FAILED(hr)) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to create command list: HRESULT 0x%lX\n", hr);
+        return AVERROR_EXTERNAL;
+    }
+
+    ID3D12VideoProcessCommandList_Close(s->command_list);
+
+    hr = ID3D12Device_CreateFence(s->device, 0, D3D12_FENCE_FLAG_NONE, &IID_ID3D12Fence, (void **)&s->fence);
+    if (FAILED(hr)) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to create fence: HRESULT 0x%lX\n", hr);
+        return AVERROR_EXTERNAL;
+    }
+
+    s->fence_value = 1;
+    s->fence_event = CreateEvent(NULL, FALSE, FALSE, NULL);
+    if (!s->fence_event) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to create fence event\n");
+        return AVERROR_EXTERNAL;
+    }
+
+    av_log(ctx, AV_LOG_VERBOSE, "D3D12 video processor successfully configured\n");
+    return 0;
+}
+
+static int scale_d3d12_filter_frame(AVFilterLink *inlink, AVFrame *in)
+{
+    AVFilterContext *ctx  = inlink->dst;
+    ScaleD3D12Context *s  = ctx->priv;
+    AVFilterLink *outlink = ctx->outputs[0];
+    AVFrame          *out = NULL;
+    int ret = 0;
+    HRESULT hr;
+
+    if (!in) {
+        av_log(ctx, AV_LOG_ERROR, "Null input frame\n");
+        return AVERROR(EINVAL);
+    }
+
+    if (!in->hw_frames_ctx) {
+        av_log(ctx, AV_LOG_ERROR, "No hardware frames context in input frame\n");
+        av_frame_free(&in);
+        return AVERROR(EINVAL);
+    }
+
+    AVHWFramesContext *frames_ctx = (AVHWFramesContext *)in->hw_frames_ctx->data;
+
+    if (!s->hw_device_ctx) {
+        av_log(ctx, AV_LOG_ERROR, "Filter hardware device context is uninitialized\n");
+        av_frame_free(&in);
+        return AVERROR(EINVAL);
+    }
+
+    AVHWDeviceContext *input_device_ctx = (AVHWDeviceContext *)frames_ctx->device_ref->data;
+    AVHWDeviceContext *filter_device_ctx = (AVHWDeviceContext *)s->hw_device_ctx->data;
+
+    if (input_device_ctx->type != filter_device_ctx->type) {
+        av_log(ctx, AV_LOG_ERROR, "Mismatch between input and filter hardware device types\n");
+        av_frame_free(&in);
+        return AVERROR(EINVAL);
+    }
+
+    out = av_frame_alloc();
+    if (!out) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to allocate output frame\n");
+        av_frame_free(&in);
+        return AVERROR(ENOMEM);
+    }
+
+    ret = av_hwframe_get_buffer(s->hw_frames_ctx_out, out, 0);
+    if (ret < 0) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to get output frame from pool\n");
+        goto fail;
+    }
+
+    if (!s->video_processor) {
+        AVHWFramesContext *input_frames_ctx = (AVHWFramesContext *)in->hw_frames_ctx->data;
+
+        s->input_width = input_frames_ctx->width;
+        s->input_height = input_frames_ctx->height;
+
+        AVD3D12VAFramesContext *input_hwctx = (AVD3D12VAFramesContext *)input_frames_ctx->hwctx;
+        s->input_format = input_hwctx->format;
+
+        if (s->input_format == DXGI_FORMAT_UNKNOWN) {
+            switch (input_frames_ctx->sw_format) {
+                case AV_PIX_FMT_NV12:
+                    s->input_format = DXGI_FORMAT_NV12;
+                    break;
+                case AV_PIX_FMT_P010:
+                    s->input_format = DXGI_FORMAT_P010;
+                    break;
+                default:
+                    av_log(ctx, AV_LOG_ERROR, "Unsupported input format\n");
+                    ret = AVERROR(EINVAL);
+                    goto fail;
+            }
+        }
+
+        int is_10bit = (s->input_format == DXGI_FORMAT_P010);
+        s->input_colorspace = get_dxgi_colorspace(in->colorspace, in->color_trc, is_10bit);
+
+        s->input_framerate = get_input_framerate(ctx, inlink, in);
+
+        av_log(ctx, AV_LOG_VERBOSE, "Input format: %dx%d, DXGI format: %d, colorspace: %d, framerate: %d/%d\n",
+               s->input_width, s->input_height, s->input_format, s->input_colorspace,
+               s->input_framerate.num, s->input_framerate.den);
+
+        ret = scale_d3d12_configure_processor(s, ctx);
+        if (ret < 0) {
+            av_log(ctx, AV_LOG_ERROR, "Failed to configure processor\n");
+            goto fail;
+        }
+    }
+
+    AVD3D12VAFrame *input_frame  = (AVD3D12VAFrame *)in->data[0];
+    AVD3D12VAFrame *output_frame = (AVD3D12VAFrame *)out->data[0];
+
+    if (!input_frame || !output_frame) {
+        av_log(ctx, AV_LOG_ERROR, "Invalid frame pointers\n");
+        ret = AVERROR(EINVAL);
+        goto fail;
+    }
+
+    ID3D12Resource *input_resource  = input_frame->texture;
+    ID3D12Resource *output_resource = output_frame->texture;
+
+    if (!input_resource || !output_resource) {
+        av_log(ctx, AV_LOG_ERROR, "Invalid D3D12 resources in frames\n");
+        ret = AVERROR(EINVAL);
+        goto fail;
+    }
+
+    /* Wait for input frame's fence before accessing it */
+    if (input_frame->sync_ctx.fence && input_frame->sync_ctx.fence_value > 0) {
+        UINT64 completed = ID3D12Fence_GetCompletedValue(input_frame->sync_ctx.fence);
+        if (completed < input_frame->sync_ctx.fence_value) {
+            hr = ID3D12CommandQueue_Wait(s->command_queue, input_frame->sync_ctx.fence, input_frame->sync_ctx.fence_value);
+            if (FAILED(hr)) {
+                av_log(ctx, AV_LOG_ERROR, "Failed to wait for input fence: HRESULT 0x%lX\n", hr);
+                ret = AVERROR_EXTERNAL;
+                goto fail;
+            }
+        }
+    }
+
+    hr = ID3D12CommandAllocator_Reset(s->command_allocator);
+    if (FAILED(hr)) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to reset command allocator: HRESULT 0x%lX\n", hr);
+        ret = AVERROR_EXTERNAL;
+        goto fail;
+    }
+
+    hr = ID3D12VideoProcessCommandList_Reset(s->command_list, s->command_allocator);
+    if (FAILED(hr)) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to reset command list: HRESULT 0x%lX\n", hr);
+        ret = AVERROR_EXTERNAL;
+        goto fail;
+    }
+
+    D3D12_RESOURCE_BARRIER barriers[2] = {
+        {
+            .Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION,
+            .Flags = D3D12_RESOURCE_BARRIER_FLAG_NONE,
+            .Transition = {
+                .pResource   = input_resource,
+                .Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES,
+                .StateBefore = D3D12_RESOURCE_STATE_COMMON,
+                .StateAfter  = D3D12_RESOURCE_STATE_VIDEO_PROCESS_READ
+            }
+        },
+        {
+            .Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION,
+            .Flags = D3D12_RESOURCE_BARRIER_FLAG_NONE,
+            .Transition = {
+                .pResource   = output_resource,
+                .Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES,
+                .StateBefore = D3D12_RESOURCE_STATE_COMMON,
+                .StateAfter  = D3D12_RESOURCE_STATE_VIDEO_PROCESS_WRITE
+            }
+        }
+    };
+
+    ID3D12VideoProcessCommandList_ResourceBarrier(s->command_list, 2, barriers);
+
+    D3D12_VIDEO_PROCESS_INPUT_STREAM_ARGUMENTS input_args = {0};
+
+    input_args.InputStream[0].pTexture2D = input_resource;
+    input_args.Transform.SourceRectangle.right       = s->input_width;
+    input_args.Transform.SourceRectangle.bottom      = s->input_height;
+    input_args.Transform.DestinationRectangle.right  = s->width;
+    input_args.Transform.DestinationRectangle.bottom = s->height;
+    input_args.Transform.Orientation = D3D12_VIDEO_PROCESS_ORIENTATION_DEFAULT;
+
+    input_args.Flags = D3D12_VIDEO_PROCESS_INPUT_STREAM_FLAG_NONE;
+
+    input_args.RateInfo.OutputIndex = 0;
+    input_args.RateInfo.InputFrameOrField = 0;
+
+    memset(input_args.FilterLevels, 0, sizeof(input_args.FilterLevels));
+
+    input_args.AlphaBlending.Enable = FALSE;
+    input_args.AlphaBlending.Alpha = 1.0f;
+
+    D3D12_VIDEO_PROCESS_OUTPUT_STREAM_ARGUMENTS output_args = {0};
+
+    output_args.OutputStream[0].pTexture2D = output_resource;
+    output_args.TargetRectangle.right  = s->width;
+    output_args.TargetRectangle.bottom = s->height;
+
+    ID3D12VideoProcessCommandList_ProcessFrames(
+        s->command_list,
+        s->video_processor,
+        &output_args,
+        1,
+        &input_args
+    );
+
+    for (int i = 0; i < 2; i++) {
+        FFSWAP(D3D12_RESOURCE_STATES, barriers[i].Transition.StateBefore, barriers[i].Transition.StateAfter);
+    }
+    ID3D12VideoProcessCommandList_ResourceBarrier(s->command_list, 2, barriers);
+
+    hr = ID3D12VideoProcessCommandList_Close(s->command_list);
+    if (FAILED(hr)) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to close command list: HRESULT 0x%lX\n", hr);
+        ret = AVERROR_EXTERNAL;
+        goto fail;
+    }
+
+    ID3D12CommandList *cmd_lists[] = { (ID3D12CommandList *)s->command_list };
+    ID3D12CommandQueue_ExecuteCommandLists(s->command_queue, 1, cmd_lists);
+
+    hr = ID3D12CommandQueue_Signal(s->command_queue, s->fence, s->fence_value);
+    if (FAILED(hr)) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to signal fence: HRESULT 0x%lX\n", hr);
+        ret = AVERROR_EXTERNAL;
+        goto fail;
+    }
+
+    output_frame->sync_ctx.fence = s->fence;
+    output_frame->sync_ctx.fence_value = s->fence_value;
+    ID3D12Fence_AddRef(s->fence);  ///< Increment reference count
+
+    s->fence_value++;
+
+    ret = av_frame_copy_props(out, in);
+    if (ret < 0) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to copy frame properties\n");
+        goto fail;
+    }
+
+    out->width = s->width;
+    out->height = s->height;
+    out->format = AV_PIX_FMT_D3D12;
+
+    av_frame_free(&in);
+
+    return ff_filter_frame(outlink, out);
+
+fail:
+    av_frame_free(&in);
+    av_frame_free(&out);
+    return ret;
+}
+
+static int scale_d3d12_config_props(AVFilterLink *outlink)
+{
+    AVFilterContext *ctx = outlink->src;
+    ScaleD3D12Context *s = ctx->priv;
+    AVFilterLink *inlink = ctx->inputs[0];
+    FilterLink      *inl = ff_filter_link(inlink);
+    FilterLink     *outl = ff_filter_link(outlink);
+    int ret;
+
+    release_d3d12_resources(s);
+
+    av_buffer_unref(&s->hw_frames_ctx_out);
+    av_buffer_unref(&s->hw_device_ctx);
+
+    ret = ff_scale_eval_dimensions(s, s->w_expr, s->h_expr, inlink, outlink, &s->width, &s->height);
+    if (ret < 0) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to evaluate dimensions\n");
+        return ret;
+    }
+
+    /* Adjust dimensions to meet codec/hardware alignment requirements */
+    ff_scale_adjust_dimensions(inlink, &s->width, &s->height, 0, 1, 1.f);
+
+    outlink->w = s->width;
+    outlink->h = s->height;
+
+    if (!inl->hw_frames_ctx) {
+        av_log(ctx, AV_LOG_ERROR, "No hw_frames_ctx available on input link\n");
+        return AVERROR(EINVAL);
+    }
+
+    if (!s->hw_device_ctx) {
+        AVHWFramesContext *in_frames_ctx = (AVHWFramesContext *)inl->hw_frames_ctx->data;
+        s->hw_device_ctx = av_buffer_ref(in_frames_ctx->device_ref);
+        if (!s->hw_device_ctx) {
+            av_log(ctx, AV_LOG_ERROR, "Failed to initialize filter hardware device context\n");
+            return AVERROR(ENOMEM);
+        }
+    }
+
+    AVHWDeviceContext *hwctx = (AVHWDeviceContext *)s->hw_device_ctx->data;
+    AVD3D12VADeviceContext *d3d12_hwctx = (AVD3D12VADeviceContext *)hwctx->hwctx;
+
+    s->device = d3d12_hwctx->device;
+
+    if (!s->device) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to get valid D3D12 device\n");
+        return AVERROR(EINVAL);
+    }
+
+    s->hw_frames_ctx_out = av_hwframe_ctx_alloc(s->hw_device_ctx);
+    if (!s->hw_frames_ctx_out)
+        return AVERROR(ENOMEM);
+
+    AVHWFramesContext *frames_ctx = (AVHWFramesContext *)s->hw_frames_ctx_out->data;
+    AVHWFramesContext *in_frames_ctx = (AVHWFramesContext *)inl->hw_frames_ctx->data;
+
+    if (s->format == AV_PIX_FMT_NONE) {
+        /* If format is not specified, use the same format as input */
+        frames_ctx->sw_format = in_frames_ctx->sw_format;
+        s->format = in_frames_ctx->sw_format;
+        av_log(ctx, AV_LOG_VERBOSE, "D3D12 scale output format not specified, using input format: %s\n",
+               av_get_pix_fmt_name(s->format));
+    } else {
+        frames_ctx->sw_format = s->format;
+    }
+
+    /* Set output format based on sw_format */
+    switch (frames_ctx->sw_format) {
+        case AV_PIX_FMT_NV12:
+            s->output_format = DXGI_FORMAT_NV12;
+            break;
+        case AV_PIX_FMT_P010:
+            s->output_format = DXGI_FORMAT_P010;
+            break;
+        default:
+            av_log(ctx, AV_LOG_ERROR, "Unsupported output format: %s\n",
+                   av_get_pix_fmt_name(frames_ctx->sw_format));
+            av_buffer_unref(&s->hw_frames_ctx_out);
+            return AVERROR(EINVAL);
+    }
+
+    frames_ctx->width  = s->width;
+    frames_ctx->height = s->height;
+    frames_ctx->format = AV_PIX_FMT_D3D12;
+    frames_ctx->initial_pool_size = 10;
+
+    if (ctx->extra_hw_frames > 0)
+        frames_ctx->initial_pool_size += ctx->extra_hw_frames;
+
+    AVD3D12VAFramesContext *frames_hwctx = frames_ctx->hwctx;
+
+    /*
+    * Set D3D12 resource flags for video processing
+    * ALLOW_RENDER_TARGET is needed for video processor output
+    */
+    frames_hwctx->format = s->output_format;
+    frames_hwctx->resource_flags = D3D12_RESOURCE_FLAG_ALLOW_RENDER_TARGET;
+    frames_hwctx->heap_flags = D3D12_HEAP_FLAG_NONE;
+
+    ret = av_hwframe_ctx_init(s->hw_frames_ctx_out);
+    if (ret < 0) {
+        av_buffer_unref(&s->hw_frames_ctx_out);
+        return ret;
+    }
+
+    outl->hw_frames_ctx = av_buffer_ref(s->hw_frames_ctx_out);
+    if (!outl->hw_frames_ctx)
+        return AVERROR(ENOMEM);
+
+    av_log(ctx, AV_LOG_VERBOSE, "D3D12 scale config: %dx%d -> %dx%d\n",
+           inlink->w, inlink->h, outlink->w, outlink->h);
+    return 0;
+}
+
+static av_cold void scale_d3d12_uninit(AVFilterContext *ctx) {
+    ScaleD3D12Context *s = ctx->priv;
+
+    release_d3d12_resources(s);
+
+    av_buffer_unref(&s->hw_frames_ctx_out);
+    av_buffer_unref(&s->hw_device_ctx);
+
+    av_freep(&s->w_expr);
+    av_freep(&s->h_expr);
+}
+
+static const AVFilterPad scale_d3d12_inputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_VIDEO,
+        .filter_frame = scale_d3d12_filter_frame,
+    },
+};
+
+static const AVFilterPad scale_d3d12_outputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_VIDEO,
+        .config_props = scale_d3d12_config_props,
+    },
+};
+
+#define OFFSET(x) offsetof(ScaleD3D12Context, x)
+#define FLAGS (AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM)
+
+static const AVOption scale_d3d12_options[] = {
+    { "w",  "Output video width",       OFFSET(w_expr), AV_OPT_TYPE_STRING,    {.str = "iw"}, .flags = FLAGS },
+    { "h", "Output video height",       OFFSET(h_expr), AV_OPT_TYPE_STRING,    {.str = "ih"}, .flags = FLAGS },
+    { "format", "Output video pixel format", OFFSET(format), AV_OPT_TYPE_PIXEL_FMT, { .i64 = AV_PIX_FMT_NONE }, INT_MIN, INT_MAX, .flags=FLAGS },
+    { NULL }
+};
+
+AVFILTER_DEFINE_CLASS(scale_d3d12);
+
+const FFFilter ff_vf_scale_d3d12 = {
+    .p.name           = "scale_d3d12",
+    .p.description    = NULL_IF_CONFIG_SMALL("Scale video using Direct3D12"),
+    .priv_size        = sizeof(ScaleD3D12Context),
+    .p.priv_class     = &scale_d3d12_class,
+    .init             = scale_d3d12_init,
+    .uninit           = scale_d3d12_uninit,
+    FILTER_INPUTS(scale_d3d12_inputs),
+    FILTER_OUTPUTS(scale_d3d12_outputs),
+    FILTER_SINGLE_PIXFMT(AV_PIX_FMT_D3D12),
+    .p.flags          = AVFILTER_FLAG_HWDEVICE,
+    .flags_internal   = FF_FILTER_FLAG_HWFRAME_AWARE,
+};
-- 
2.52.0


From 40d08d2173079cdd13eaae6af9507bdad9b22ef7 Mon Sep 17 00:00:00 2001
From: Timo Rothenpieler <timo@rothenpieler.org>
Date: Mon, 8 Dec 2025 14:18:36 +0100
Subject: [PATCH 243/304] avfilter/vf_scale_d3d12: fix integer overflow in
 input framerate calculation

Also removes pointless intermediate variables that caused
the overflow and truncation to happen in the first place.

Fixes #YWH-PGM40646-1
---
 libavfilter/vf_scale_d3d12.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/libavfilter/vf_scale_d3d12.c b/libavfilter/vf_scale_d3d12.c
index 6950feb32b..8066bd0725 100644
--- a/libavfilter/vf_scale_d3d12.c
+++ b/libavfilter/vf_scale_d3d12.c
@@ -155,8 +155,6 @@ static DXGI_COLOR_SPACE_TYPE get_dxgi_colorspace(enum AVColorSpace colorspace, e
 
 static AVRational get_input_framerate(AVFilterContext *ctx, AVFilterLink *inlink, AVFrame *in)
 {
-    int64_t duration_scaled;
-    int64_t time_base_den;
     AVRational framerate = {0, 0};
 
     if (in->duration > 0 && inlink->time_base.num > 0 && inlink->time_base.den > 0) {
@@ -164,13 +162,9 @@ static AVRational get_input_framerate(AVFilterContext *ctx, AVFilterLink *inlink
         * Calculate framerate from frame duration and timebase
         * framerate = 1 / (duration * timebase)
         */
-        duration_scaled = in->duration * inlink->time_base.num;
-        time_base_den = inlink->time_base.den;
-        framerate.num = time_base_den;
-        framerate.den = duration_scaled;
-        /* Reduce the fraction */
         av_reduce(&framerate.num, &framerate.den,
-                 framerate.num, framerate.den, INT_MAX);
+                  inlink->time_base.den, in->duration * inlink->time_base.num,
+                  INT_MAX);
     } else if (inlink->time_base.num > 0 && inlink->time_base.den > 0) {
         /* Estimate from timebase (inverse of timebase is often the framerate) */
         framerate.num = inlink->time_base.den;
-- 
2.52.0


From a55b259af4fb238c18d72983fcaa5b3ba52ac53d Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Thu, 4 Dec 2025 19:32:16 +0100
Subject: [PATCH 244/304] swscale/format: handle YA format swizzles more
 robustly

This code was previously broken; since YAF32BE/LE were not included as
part of the format enumeration. However, since we *always* know the correct
swizzle for YA formats, we can just special-case this by the number of
components instead.
---
 libswscale/format.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libswscale/format.c b/libswscale/format.c
index 2ae6d50523..5ac0418266 100644
--- a/libswscale/format.c
+++ b/libswscale/format.c
@@ -630,6 +630,10 @@ static SwsPixelType fmt_pixel_type(enum AVPixelFormat fmt)
 
 static SwsSwizzleOp fmt_swizzle(enum AVPixelFormat fmt)
 {
+    const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(fmt);
+    if (desc->nb_components == 2) /* YA formats */
+        return (SwsSwizzleOp) {{ .x = 0, 3, 1, 2 }};
+
     switch (fmt) {
     case AV_PIX_FMT_ARGB:
     case AV_PIX_FMT_0RGB:
@@ -663,10 +667,6 @@ static SwsSwizzleOp fmt_swizzle(enum AVPixelFormat fmt)
     case AV_PIX_FMT_X2BGR10LE:
     case AV_PIX_FMT_X2BGR10BE:
         return (SwsSwizzleOp) {{ .x = 3, 2, 1, 0 }};
-    case AV_PIX_FMT_YA8:
-    case AV_PIX_FMT_YA16BE:
-    case AV_PIX_FMT_YA16LE:
-        return (SwsSwizzleOp) {{ .x = 0, 3, 1, 2 }};
     case AV_PIX_FMT_XV30BE:
     case AV_PIX_FMT_XV30LE:
         return (SwsSwizzleOp) {{ .x = 3, 2, 0, 1 }};
-- 
2.52.0


From 95e127f8ef837f415c28ed74b67f6c395a813506 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Fri, 5 Dec 2025 13:26:22 +0100
Subject: [PATCH 245/304] swscale/format: add assertion to prevent nan/inf
 matrix coeffs

---
 libswscale/format.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libswscale/format.c b/libswscale/format.c
index 5ac0418266..cbb5cb35fe 100644
--- a/libswscale/format.c
+++ b/libswscale/format.c
@@ -1216,6 +1216,7 @@ static SwsLinearOp fmt_decode_range(const SwsFormat fmt, bool *incomplete)
 
     /* Invert main diagonal + offset: x = s * y + k  ==>  y = (x - k) / s */
     for (int i = 0; i < 4; i++) {
+        av_assert1(c.m[i][i].num);
         c.m[i][i] = av_inv_q(c.m[i][i]);
         c.m[i][4] = av_mul_q(c.m[i][4], av_neg_q(c.m[i][i]));
     }
-- 
2.52.0


From 98a9b546b2ce4a63bb6967e6f541c0f7c6e046be Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Thu, 4 Dec 2025 19:25:51 +0100
Subject: [PATCH 246/304] swscale/tests: add new test for generated operation
 lists

This is similar to swscale/tests/swscale.c, but significantly cheaper - it
merely prints the generated (optimized) operation list for every format
conversion.

Mostly useful for my own purposes as a regression test when making changes
to the ops optimizer. Note the distinction between this and tests/swscale.c,
the latter of which tests the result of *applying* an operation list for
equality.

There is an argument to be made that the two tests could be merged, but
I think the amount of overlap is small enough to not be worth the amount
of differences.
---
 libswscale/Makefile        |  1 +
 libswscale/tests/sws_ops.c | 96 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 97 insertions(+)
 create mode 100644 libswscale/tests/sws_ops.c

diff --git a/libswscale/Makefile b/libswscale/Makefile
index a096ed331e..bde1144897 100644
--- a/libswscale/Makefile
+++ b/libswscale/Makefile
@@ -43,3 +43,4 @@ TESTPROGS = colorspace                                                  \
             floatimg_cmp                                                \
             pixdesc_query                                               \
             swscale                                                     \
+            sws_ops                                                     \
diff --git a/libswscale/tests/sws_ops.c b/libswscale/tests/sws_ops.c
new file mode 100644
index 0000000000..8bb44d634d
--- /dev/null
+++ b/libswscale/tests/sws_ops.c
@@ -0,0 +1,96 @@
+/*
+ * Copyright (C) 2025 Niklas Haas
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/pixdesc.h"
+#include "libswscale/ops.h"
+#include "libswscale/format.h"
+
+static int run_test(SwsContext *const ctx, AVFrame *frame,
+                    const AVPixFmtDescriptor *const src_desc,
+                    const AVPixFmtDescriptor *const dst_desc)
+{
+    /* Reuse ff_fmt_from_frame() to ensure correctly sanitized metadata */
+    frame->format = av_pix_fmt_desc_get_id(src_desc);
+    SwsFormat src = ff_fmt_from_frame(frame, 0);
+    frame->format = av_pix_fmt_desc_get_id(dst_desc);
+    SwsFormat dst = ff_fmt_from_frame(frame, 0);
+    bool incomplete = ff_infer_colors(&src.color, &dst.color);
+
+    SwsOpList *ops = ff_sws_op_list_alloc();
+    if (!ops)
+        return AVERROR(ENOMEM);
+    ops->src = src;
+    ops->dst = dst;
+
+    if (ff_sws_decode_pixfmt(ops, src.format) < 0)
+        goto fail;
+    if (ff_sws_decode_colors(ctx, SWS_PIXEL_F32, ops, src, &incomplete) < 0)
+        goto fail;
+    if (ff_sws_encode_colors(ctx, SWS_PIXEL_F32, ops, dst, &incomplete) < 0)
+        goto fail;
+    if (ff_sws_encode_pixfmt(ops, dst.format) < 0)
+        goto fail;
+
+    av_log(NULL, AV_LOG_INFO, "%s -> %s:\n",
+           av_get_pix_fmt_name(src.format), av_get_pix_fmt_name(dst.format));
+
+    ff_sws_op_list_optimize(ops);
+    ff_sws_op_list_print(NULL, AV_LOG_INFO, ops);
+
+fail:
+    /* silently skip unsupported formats */
+    ff_sws_op_list_free(&ops);
+    return 0;
+}
+
+static void log_stdout(void *avcl, int level, const char *fmt, va_list vl)
+{
+    if (level != AV_LOG_INFO) {
+        av_log_default_callback(avcl, level, fmt, vl);
+    } else {
+        vfprintf(stdout, fmt, vl);
+    }
+}
+
+int main(int argc, char **argv)
+{
+    int ret = 1;
+
+    SwsContext *ctx = sws_alloc_context();
+    AVFrame *frame = av_frame_alloc();
+    if (!ctx || !frame)
+        goto fail;
+    frame->width = frame->height = 16;
+
+    av_log_set_callback(log_stdout);
+    for (const AVPixFmtDescriptor *src = NULL; (src = av_pix_fmt_desc_next(src));) {
+        for (const AVPixFmtDescriptor *dst = NULL; (dst = av_pix_fmt_desc_next(dst));) {
+            int err = run_test(ctx, frame, src, dst);
+            if (err < 0)
+                goto fail;
+        }
+    }
+
+    ret = 0;
+fail:
+    av_frame_free(&frame);
+    sws_free_context(&ctx);
+    return ret;
+}
-- 
2.52.0


From fa3333184c04da22ce76f10827242093e9ef6790 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Fri, 5 Dec 2025 13:53:59 +0100
Subject: [PATCH 247/304] tests/fate: add fate-sws-ops-list test

This one uses the test framework added in the previous commit to add a
light-weight regression test to ensure the generated SwsOpsList is identical.

Only compare the md5sum, to make the reference file significantly smaller
(down from ~10 MB).
---
 tests/fate/libswscale.mak   | 4 ++++
 tests/ref/fate/sws-ops-list | 1 +
 2 files changed, 5 insertions(+)
 create mode 100644 tests/ref/fate/sws-ops-list

diff --git a/tests/fate/libswscale.mak b/tests/fate/libswscale.mak
index 59da506648..8d87c39ebf 100644
--- a/tests/fate/libswscale.mak
+++ b/tests/fate/libswscale.mak
@@ -36,6 +36,10 @@ FATE_LIBSWSCALE-$(CONFIG_UNSTABLE) += fate-sws-unscaled
 fate-sws-unscaled: libswscale/tests/swscale$(EXESUF)
 fate-sws-unscaled: CMD = run libswscale/tests/swscale$(EXESUF) -unscaled 1 -flags 0x100000 -v 16
 
+FATE_LIBSWSCALE-$(CONFIG_UNSTABLE) += fate-sws-ops-list
+fate-sws-ops-list: libswscale/tests/sws_ops$(EXESUF)
+fate-sws-ops-list: CMD = run libswscale/tests/sws_ops$(EXESUF) | do_md5sum | cut -d" " -f1
+
 FATE_LIBSWSCALE += $(FATE_LIBSWSCALE-yes)
 FATE_LIBSWSCALE_SAMPLES += $(FATE_LIBSWSCALE_SAMPLES-yes)
 FATE-$(CONFIG_SWSCALE) += $(FATE_LIBSWSCALE)
diff --git a/tests/ref/fate/sws-ops-list b/tests/ref/fate/sws-ops-list
new file mode 100644
index 0000000000..f77c60fde3
--- /dev/null
+++ b/tests/ref/fate/sws-ops-list
@@ -0,0 +1 @@
+e124847bc6663ca538b784de17bf42f0
-- 
2.52.0


From 9afa4b5f4781fba42d57d28fa12765c1cc4dfd66 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 8 Dec 2025 04:23:04 +0100
Subject: [PATCH 248/304] avcodec/x86/h264_idct: fix version check for NASM 3
 and newer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 libavcodec/x86/h264_idct.asm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm
index 47e4116f42..6ae8202748 100644
--- a/libavcodec/x86/h264_idct.asm
+++ b/libavcodec/x86/h264_idct.asm
@@ -695,7 +695,7 @@ cglobal h264_luma_dc_dequant_idct, 3, 4, 7
     RET
 
 %ifdef __NASM_VER__
-%if __NASM_MAJOR__ >= 2 && __NASM_MINOR__ >= 4
+%if __NASM_MAJOR__ > 2 || (__NASM_MAJOR__ == 2 && __NASM_MINOR__ >= 4)
 %unmacro STORE_DIFFx2 8 ; remove macro from x86util.asm but yasm doesn't have this yet
 %endif
 %endif
-- 
2.52.0


From 0eb05e49dee62e1b515785f0cdeb6338fa497bac Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 8 Dec 2025 04:27:29 +0100
Subject: [PATCH 249/304] swscale/x86/yuv2yuvX: don't use deprecated
 hexadecimal prefix
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fixes: warning: $ prefix for hexadecimal is deprecated [-w+number-deprecated-hex]
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 libswscale/x86/yuv2yuvX.asm | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libswscale/x86/yuv2yuvX.asm b/libswscale/x86/yuv2yuvX.asm
index 369c850674..7137be2e17 100644
--- a/libswscale/x86/yuv2yuvX.asm
+++ b/libswscale/x86/yuv2yuvX.asm
@@ -54,8 +54,8 @@ cglobal yuv2yuvX, 7, 7, 8, filter, filterSize, src, dest, dstW, dither, offset
     jz                   .offset
 
     ; offset != 0 path.
-    psrlq                m5, m3, $18
-    psllq                m3, m3, $28
+    psrlq                m5, m3, 0x18
+    psllq                m3, m3, 0x28
     por                  m3, m3, m5
 
 .offset:
@@ -94,7 +94,7 @@ cglobal yuv2yuvX, 7, 7, 8, filter, filterSize, src, dest, dstW, dither, offset
     paddw                m6, m6, m2
     paddw                m1, m1, m5
 %endif
-    add                  filterSizeq, $10
+    add                  filterSizeq, 0x10
     mov                  srcq, [filterSizeq]
     test                 srcq, srcq
     jnz                  .loop
-- 
2.52.0


From 97a3edd7ffd54b78c965114b96545adf2745dbe3 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sun, 30 Nov 2025 18:08:38 +0100
Subject: [PATCH 250/304] avcodec/vp9mc: Remove MMXEXT functions overridden by
 SSSE3

SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMXEXT functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp9dsp_init.c                | 20 +++++++++-----------
 libavcodec/x86/vp9dsp_init.h                | 14 ++++++++++----
 libavcodec/x86/vp9dsp_init_16bpp_template.c |  8 ++++----
 libavcodec/x86/vp9mc.asm                    | 20 ++++++--------------
 4 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/libavcodec/x86/vp9dsp_init.c b/libavcodec/x86/vp9dsp_init.c
index c103751351..25a007008b 100644
--- a/libavcodec/x86/vp9dsp_init.c
+++ b/libavcodec/x86/vp9dsp_init.c
@@ -41,7 +41,6 @@ decl_fpel_func(put, 64,   , avx);
 decl_fpel_func(avg, 32, _8, avx2);
 decl_fpel_func(avg, 64, _8, avx2);
 
-decl_mc_funcs(4, mmxext, int16_t, 8, 8);
 decl_mc_funcs(8, sse2, int16_t,  8, 8);
 decl_mc_funcs(4, ssse3, int8_t, 32, 8);
 decl_mc_funcs(8, ssse3, int8_t, 32, 8);
@@ -70,10 +69,11 @@ mc_rep_funcs(64, 32, 32, avx2,  int8_t,  32, 8)
 extern const int8_t ff_filters_ssse3[3][15][4][32];
 extern const int16_t ff_filters_sse2[3][15][8][8];
 
-filters_8tap_2d_fn2(put, 16, 8, 1, mmxext, sse2, sse2)
-filters_8tap_2d_fn2(avg, 16, 8, 1, mmxext, sse2, sse2)
-filters_8tap_2d_fn2(put, 16, 8, 1, ssse3, ssse3, ssse3)
-filters_8tap_2d_fn2(avg, 16, 8, 1, ssse3, ssse3, ssse3)
+filters_8tap_2d_fn2(put, 16, 8, 1, sse2, sse2)
+filters_8tap_2d_fn2(avg, 16, 8, 1, sse2, sse2)
+filters_8tap_2d_fn3(put, 16, 8, 1, ssse3, ssse3)
+filters_8tap_2d_fn3(avg, 16, 8, 1, ssse3, ssse3)
+
 #if ARCH_X86_64 && HAVE_AVX2_EXTERNAL
 filters_8tap_2d_fn(put, 64, 32, 8, 1, avx2, ssse3)
 filters_8tap_2d_fn(put, 32, 32, 8, 1, avx2, ssse3)
@@ -81,10 +81,10 @@ filters_8tap_2d_fn(avg, 64, 32, 8, 1, avx2, ssse3)
 filters_8tap_2d_fn(avg, 32, 32, 8, 1, avx2, ssse3)
 #endif
 
-filters_8tap_1d_fn3(put, 8, mmxext, sse2, sse2)
-filters_8tap_1d_fn3(avg, 8, mmxext, sse2, sse2)
-filters_8tap_1d_fn3(put, 8, ssse3, ssse3, ssse3)
-filters_8tap_1d_fn3(avg, 8, ssse3, ssse3, ssse3)
+filters_8tap_1d_fn3(put, 8, sse2, sse2)
+filters_8tap_1d_fn3(avg, 8, sse2, sse2)
+filters_8tap_1d_fn4(put, 8, ssse3, ssse3)
+filters_8tap_1d_fn4(avg, 8, ssse3, ssse3)
 #if ARCH_X86_64 && HAVE_AVX2_EXTERNAL
 filters_8tap_1d_fn2(put, 64, 8, avx2, ssse3)
 filters_8tap_1d_fn2(put, 32, 8, avx2, ssse3)
@@ -285,8 +285,6 @@ av_cold void ff_vp9dsp_init_x86(VP9DSPContext *dsp, int bpp, int bitexact)
         dsp->loop_filter_8[0][1] = ff_vp9_loop_filter_v_4_8_mmxext;
         dsp->loop_filter_8[1][0] = ff_vp9_loop_filter_h_8_8_mmxext;
         dsp->loop_filter_8[1][1] = ff_vp9_loop_filter_v_8_8_mmxext;
-        init_subpel2(4, 0, 4, put, 8, mmxext);
-        init_subpel2(4, 1, 4, avg, 8, mmxext);
         init_fpel_func(4, 1,  4, avg, _8, mmxext);
         init_fpel_func(3, 1,  8, avg, _8, mmxext);
         dsp->itxfm_add[TX_4X4][DCT_DCT] = ff_vp9_idct_idct_4x4_add_mmxext;
diff --git a/libavcodec/x86/vp9dsp_init.h b/libavcodec/x86/vp9dsp_init.h
index 5690d16970..64747173c8 100644
--- a/libavcodec/x86/vp9dsp_init.h
+++ b/libavcodec/x86/vp9dsp_init.h
@@ -107,12 +107,15 @@ filter_8tap_1d_fn(op, sz, FILTER_8TAP_SMOOTH,  f_opt, smooth,  dir, dvar, bpp, o
 filters_8tap_1d_fn(op, sz, h, mx, bpp, opt, f_opt) \
 filters_8tap_1d_fn(op, sz, v, my, bpp, opt, f_opt)
 
-#define filters_8tap_1d_fn3(op, bpp, opt4, opt8, f_opt) \
+#define filters_8tap_1d_fn3(op, bpp, opt8, f_opt) \
 filters_8tap_1d_fn2(op, 64, bpp, opt8, f_opt) \
 filters_8tap_1d_fn2(op, 32, bpp, opt8, f_opt) \
 filters_8tap_1d_fn2(op, 16, bpp, opt8, f_opt) \
 filters_8tap_1d_fn2(op, 8, bpp, opt8, f_opt) \
-filters_8tap_1d_fn2(op, 4, bpp, opt4, f_opt)
+
+#define filters_8tap_1d_fn4(op, bpp, opt, f_opt) \
+filters_8tap_1d_fn3(op, bpp, opt, f_opt) \
+filters_8tap_1d_fn2(op, 4, bpp, opt, f_opt) \
 
 #define filter_8tap_2d_fn(op, sz, f, f_opt, fname, align, bpp, bytes, opt) \
 static void op##_8tap_##fname##_##sz##hv_##bpp##_##opt(uint8_t *dst, ptrdiff_t dst_stride, \
@@ -133,12 +136,15 @@ filter_8tap_2d_fn(op, sz, FILTER_8TAP_REGULAR, f_opt, regular, align, bpp, bytes
 filter_8tap_2d_fn(op, sz, FILTER_8TAP_SHARP,   f_opt, sharp, align, bpp, bytes, opt) \
 filter_8tap_2d_fn(op, sz, FILTER_8TAP_SMOOTH,  f_opt, smooth, align, bpp, bytes, opt)
 
-#define filters_8tap_2d_fn2(op, align, bpp, bytes, opt4, opt8, f_opt) \
+#define filters_8tap_2d_fn2(op, align, bpp, bytes, opt8, f_opt) \
 filters_8tap_2d_fn(op, 64, align, bpp, bytes, opt8, f_opt) \
 filters_8tap_2d_fn(op, 32, align, bpp, bytes, opt8, f_opt) \
 filters_8tap_2d_fn(op, 16, align, bpp, bytes, opt8, f_opt) \
 filters_8tap_2d_fn(op, 8, align, bpp, bytes, opt8, f_opt) \
-filters_8tap_2d_fn(op, 4, align, bpp, bytes, opt4, f_opt)
+
+#define filters_8tap_2d_fn3(op, align, bpp, bytes, opt, f_opt) \
+filters_8tap_2d_fn2(op, align, bpp, bytes, opt, f_opt) \
+filters_8tap_2d_fn(op, 4, align, bpp, bytes, opt, f_opt)
 
 #define init_fpel_func(idx1, idx2, sz, type, bpp, opt) \
     dsp->mc[idx1][FILTER_8TAP_SMOOTH ][idx2][0][0] = \
diff --git a/libavcodec/x86/vp9dsp_init_16bpp_template.c b/libavcodec/x86/vp9dsp_init_16bpp_template.c
index a6aa03bdc8..54ff8892cf 100644
--- a/libavcodec/x86/vp9dsp_init_16bpp_template.c
+++ b/libavcodec/x86/vp9dsp_init_16bpp_template.c
@@ -40,8 +40,8 @@ mc_rep_funcs(32, 16, 32, avx2, int16_t, 16, BPC)
 mc_rep_funcs(64, 32, 64, avx2, int16_t, 16, BPC)
 #endif
 
-filters_8tap_2d_fn2(put, 16, BPC, 2, sse2, sse2, 16bpp)
-filters_8tap_2d_fn2(avg, 16, BPC, 2, sse2, sse2, 16bpp)
+filters_8tap_2d_fn3(put, 16, BPC, 2, sse2, 16bpp)
+filters_8tap_2d_fn3(avg, 16, BPC, 2, sse2, 16bpp)
 #if HAVE_AVX2_EXTERNAL
 filters_8tap_2d_fn(put, 64, 32, BPC, 2, avx2, 16bpp)
 filters_8tap_2d_fn(avg, 64, 32, BPC, 2, avx2, 16bpp)
@@ -51,8 +51,8 @@ filters_8tap_2d_fn(put, 16, 32, BPC, 2, avx2, 16bpp)
 filters_8tap_2d_fn(avg, 16, 32, BPC, 2, avx2, 16bpp)
 #endif
 
-filters_8tap_1d_fn3(put, BPC, sse2, sse2, 16bpp)
-filters_8tap_1d_fn3(avg, BPC, sse2, sse2, 16bpp)
+filters_8tap_1d_fn4(put, BPC, sse2, 16bpp)
+filters_8tap_1d_fn4(avg, BPC, sse2, 16bpp)
 #if HAVE_AVX2_EXTERNAL
 filters_8tap_1d_fn2(put, 64, BPC, avx2, 16bpp)
 filters_8tap_1d_fn2(avg, 64, BPC, avx2, 16bpp)
diff --git a/libavcodec/x86/vp9mc.asm b/libavcodec/x86/vp9mc.asm
index b9a62e79a8..682c6a6ea0 100644
--- a/libavcodec/x86/vp9mc.asm
+++ b/libavcodec/x86/vp9mc.asm
@@ -205,7 +205,7 @@ cglobal vp9_%1_8tap_1d_h_ %+ %%px %+ _8, 6, 6, 15, dst, dstride, src, sstride, h
     pxor        m5, m5
     mova        m6, [pw_64]
     mova        m7, [filteryq+  0]
-%if ARCH_X86_64 && mmsize > 8
+%if ARCH_X86_64
     mova        m8, [filteryq+ 16]
     mova        m9, [filteryq+ 32]
     mova       m10, [filteryq+ 48]
@@ -226,7 +226,7 @@ cglobal vp9_%1_8tap_1d_h_ %+ %%px %+ _8, 6, 6, 15, dst, dstride, src, sstride, h
     punpcklbw   m3, m5
     punpcklbw   m4, m5
     pmullw      m0, m7
-%if ARCH_X86_64 && mmsize > 8
+%if ARCH_X86_64
     pmullw      m1, m8
     pmullw      m2, m9
     pmullw      m3, m10
@@ -247,7 +247,7 @@ cglobal vp9_%1_8tap_1d_h_ %+ %%px %+ _8, 6, 6, 15, dst, dstride, src, sstride, h
     punpcklbw   m1, m5
     punpcklbw   m3, m5
     punpcklbw   m4, m5
-%if ARCH_X86_64 && mmsize > 8
+%if ARCH_X86_64
     pmullw      m1, m12
     pmullw      m3, m13
     pmullw      m4, m14
@@ -276,10 +276,6 @@ cglobal vp9_%1_8tap_1d_h_ %+ %%px %+ _8, 6, 6, 15, dst, dstride, src, sstride, h
     RET
 %endmacro
 
-INIT_MMX mmxext
-filter_sse2_h_fn put
-filter_sse2_h_fn avg
-
 INIT_XMM sse2
 filter_sse2_h_fn put
 filter_sse2_h_fn avg
@@ -421,7 +417,7 @@ cglobal vp9_%1_8tap_1d_v_ %+ %%px %+ _8, 4, 7, 15, dst, dstride, src, sstride, f
     lea      src4q, [srcq+sstrideq]
     sub       srcq, sstride3q
     mova        m7, [filteryq+  0]
-%if ARCH_X86_64 && mmsize > 8
+%ifdef m8
     mova        m8, [filteryq+ 16]
     mova        m9, [filteryq+ 32]
     mova       m10, [filteryq+ 48]
@@ -446,7 +442,7 @@ cglobal vp9_%1_8tap_1d_v_ %+ %%px %+ _8, 4, 7, 15, dst, dstride, src, sstride, f
     punpcklbw   m3, m5
     punpcklbw   m4, m5
     pmullw      m0, m7
-%if ARCH_X86_64 && mmsize > 8
+%ifdef m8
     pmullw      m1, m8
     pmullw      m2, m9
     pmullw      m3, m10
@@ -467,7 +463,7 @@ cglobal vp9_%1_8tap_1d_v_ %+ %%px %+ _8, 4, 7, 15, dst, dstride, src, sstride, f
     punpcklbw   m1, m5
     punpcklbw   m3, m5
     punpcklbw   m4, m5
-%if ARCH_X86_64 && mmsize > 8
+%ifdef m8
     pmullw      m1, m12
     pmullw      m3, m13
     pmullw      m4, m14
@@ -496,10 +492,6 @@ cglobal vp9_%1_8tap_1d_v_ %+ %%px %+ _8, 4, 7, 15, dst, dstride, src, sstride, f
     RET
 %endmacro
 
-INIT_MMX mmxext
-filter_sse2_v_fn put
-filter_sse2_v_fn avg
-
 INIT_XMM sse2
 filter_sse2_v_fn put
 filter_sse2_v_fn avg
-- 
2.52.0


From a9f7a283fdf62fe3f4d3d4ec9130ea7887b338d8 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sun, 30 Nov 2025 20:26:44 +0100
Subject: [PATCH 251/304] avcodec/vp9intrapred: Remove MMXEXT functions
 overridden by SSSE3

SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMXEXT functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp9dsp_init.c    |  12 ++--
 libavcodec/x86/vp9intrapred.asm | 122 +++-----------------------------
 2 files changed, 13 insertions(+), 121 deletions(-)

diff --git a/libavcodec/x86/vp9dsp_init.c b/libavcodec/x86/vp9dsp_init.c
index 25a007008b..85332da2b9 100644
--- a/libavcodec/x86/vp9dsp_init.c
+++ b/libavcodec/x86/vp9dsp_init.c
@@ -154,6 +154,8 @@ lpf_funcs(88, 16, avx);
 void ff_vp9_ipred_##type##_##size##x##size##_##opt(uint8_t *dst, ptrdiff_t stride, \
                                                    const uint8_t *l, const uint8_t *a)
 
+ipred_func(4, hd, mmxext);
+ipred_func(4, vl, mmxext);
 ipred_func(8, v, mmx);
 
 #define ipred_dc_funcs(size, opt) \
@@ -161,9 +163,6 @@ ipred_func(size, dc, opt); \
 ipred_func(size, dc_left, opt); \
 ipred_func(size, dc_top, opt)
 
-ipred_dc_funcs(4, mmxext);
-ipred_dc_funcs(8, mmxext);
-
 #define ipred_dir_tm_funcs(size, opt) \
 ipred_func(size, tm, opt); \
 ipred_func(size, dl, opt); \
@@ -173,8 +172,6 @@ ipred_func(size, hu, opt); \
 ipred_func(size, vl, opt); \
 ipred_func(size, vr, opt)
 
-ipred_dir_tm_funcs(4, mmxext);
-
 ipred_func(16, v, sse);
 ipred_func(32, v, sse);
 
@@ -288,9 +285,8 @@ av_cold void ff_vp9dsp_init_x86(VP9DSPContext *dsp, int bpp, int bitexact)
         init_fpel_func(4, 1,  4, avg, _8, mmxext);
         init_fpel_func(3, 1,  8, avg, _8, mmxext);
         dsp->itxfm_add[TX_4X4][DCT_DCT] = ff_vp9_idct_idct_4x4_add_mmxext;
-        init_dc_ipred(4, mmxext);
-        init_dc_ipred(8, mmxext);
-        init_dir_tm_ipred(4, mmxext);
+        dsp->intra_pred[TX_4X4][HOR_DOWN_PRED] = ff_vp9_ipred_hd_4x4_mmxext;
+        dsp->intra_pred[TX_4X4][VERT_LEFT_PRED] = ff_vp9_ipred_vl_4x4_mmxext;
     }
 
     if (EXTERNAL_SSE(cpu_flags)) {
diff --git a/libavcodec/x86/vp9intrapred.asm b/libavcodec/x86/vp9intrapred.asm
index b67addd7e3..22390ca831 100644
--- a/libavcodec/x86/vp9intrapred.asm
+++ b/libavcodec/x86/vp9intrapred.asm
@@ -93,21 +93,14 @@ SECTION .text
 
 ; dc_NxN(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a)
 
-%macro DC_4to8_FUNCS 0
+INIT_MMX ssse3
 cglobal vp9_ipred_dc_4x4, 4, 4, 0, dst, stride, l, a
     movd                    m0, [lq]
     punpckldq               m0, [aq]
     pxor                    m1, m1
     psadbw                  m0, m1
-%if cpuflag(ssse3)
     pmulhrsw                m0, [pw_4096]
     pshufb                  m0, m1
-%else
-    paddw                   m0, [pw_4]
-    psraw                   m0, 3
-    punpcklbw               m0, m0
-    pshufw                  m0, m0, q0000
-%endif
     movd      [dstq+strideq*0], m0
     movd      [dstq+strideq*1], m0
     lea                   dstq, [dstq+strideq*2]
@@ -124,15 +117,8 @@ cglobal vp9_ipred_dc_8x8, 4, 4, 0, dst, stride, l, a
     psadbw                  m0, m2
     psadbw                  m1, m2
     paddw                   m0, m1
-%if cpuflag(ssse3)
     pmulhrsw                m0, [pw_2048]
     pshufb                  m0, m2
-%else
-    paddw                   m0, [pw_8]
-    psraw                   m0, 4
-    punpcklbw               m0, m0
-    pshufw                  m0, m0, q0000
-%endif
     movq      [dstq+strideq*0], m0
     movq      [dstq+strideq*1], m0
     movq      [dstq+strideq*2], m0
@@ -143,12 +129,7 @@ cglobal vp9_ipred_dc_8x8, 4, 4, 0, dst, stride, l, a
     movq      [dstq+strideq*2], m0
     movq      [dstq+stride3q ], m0
     RET
-%endmacro
 
-INIT_MMX mmxext
-DC_4to8_FUNCS
-INIT_MMX ssse3
-DC_4to8_FUNCS
 
 %macro DC_16to32_FUNCS 0
 cglobal vp9_ipred_dc_16x16, 4, 4, 3, dst, stride, l, a
@@ -238,15 +219,8 @@ cglobal vp9_ipred_dc_%1_4x4, 4, 4, 0, dst, stride, l, a
     movd                    m0, [%2q]
     pxor                    m1, m1
     psadbw                  m0, m1
-%if cpuflag(ssse3)
     pmulhrsw                m0, [pw_8192]
     pshufb                  m0, m1
-%else
-    paddw                   m0, [pw_2]
-    psraw                   m0, 2
-    punpcklbw               m0, m0
-    pshufw                  m0, m0, q0000
-%endif
     movd      [dstq+strideq*0], m0
     movd      [dstq+strideq*1], m0
     lea                   dstq, [dstq+strideq*2]
@@ -260,15 +234,8 @@ cglobal vp9_ipred_dc_%1_8x8, 4, 4, 0, dst, stride, l, a
     lea               stride3q, [strideq*3]
     pxor                    m1, m1
     psadbw                  m0, m1
-%if cpuflag(ssse3)
     pmulhrsw                m0, [pw_4096]
     pshufb                  m0, m1
-%else
-    paddw                   m0, [pw_4]
-    psraw                   m0, 3
-    punpcklbw               m0, m0
-    pshufw                  m0, m0, q0000
-%endif
     movq      [dstq+strideq*0], m0
     movq      [dstq+strideq*1], m0
     movq      [dstq+strideq*2], m0
@@ -281,9 +248,6 @@ cglobal vp9_ipred_dc_%1_8x8, 4, 4, 0, dst, stride, l, a
     RET
 %endmacro
 
-INIT_MMX mmxext
-DC_1D_4to8_FUNCS top,  a
-DC_1D_4to8_FUNCS left, l
 INIT_MMX ssse3
 DC_1D_4to8_FUNCS top,  a
 DC_1D_4to8_FUNCS left, l
@@ -548,33 +512,22 @@ H_XMM_FUNCS 4, 8
 INIT_XMM avx
 H_XMM_FUNCS 4, 8
 
-%macro TM_MMX_FUNCS 0
+INIT_MMX ssse3
 cglobal vp9_ipred_tm_4x4, 4, 4, 0, dst, stride, l, a
     pxor                    m1, m1
     movd                    m0, [aq]
     pinsrw                  m2, [aq-1], 0
     punpcklbw               m0, m1
     DEFINE_ARGS dst, stride, l, cnt
-%if cpuflag(ssse3)
     mova                    m3, [pw_m256]
     mova                    m1, [pw_m255]
     pshufb                  m2, m3
-%else
-    punpcklbw               m2, m1
-    pshufw                  m2, m2, q0000
-%endif
     psubw                   m0, m2
     mov                   cntq, 1
 .loop:
     pinsrw                  m2, [lq+cntq*2], 0
-%if cpuflag(ssse3)
     pshufb                  m4, m2, m1
     pshufb                  m2, m3
-%else
-    punpcklbw               m2, m1
-    pshufw                  m4, m2, q1111
-    pshufw                  m2, m2, q0000
-%endif
     paddw                   m4, m0
     paddw                   m2, m0
     packuswb                m4, m4
@@ -585,12 +538,6 @@ cglobal vp9_ipred_tm_4x4, 4, 4, 0, dst, stride, l, a
     dec                   cntq
     jge .loop
     RET
-%endmacro
-
-INIT_MMX mmxext
-TM_MMX_FUNCS
-INIT_MMX ssse3
-TM_MMX_FUNCS
 
 %macro TM_XMM_FUNCS 0
 cglobal vp9_ipred_tm_8x8, 4, 4, 5, dst, stride, l, a
@@ -784,20 +731,11 @@ TM_XMM_FUNCS
     pavgb                  m%1, m%2
 %endmacro
 
-%macro DL_MMX_FUNCS 0
+INIT_MMX ssse3
 cglobal vp9_ipred_dl_4x4, 4, 4, 0, dst, stride, l, a
     movq                    m1, [aq]
-%if cpuflag(ssse3)
     pshufb                  m0, m1, [pb_0to5_2x7]
     pshufb                  m2, m1, [pb_2to6_3x7]
-%else
-    punpckhbw               m3, m1, m1              ; 44556677
-    pand                    m0, m1, [pb_6xm1_2x0]   ; 012345__
-    pand                    m3, [pb_6x0_2xm1]       ; ______77
-    psrlq                   m2, m1, 16              ; 234567__
-    por                     m0, m3                  ; 01234577
-    por                     m2, m3                  ; 23456777
-%endif
     psrlq                   m1, 8
     LOWPASS                  0, 1, 2, 3
 
@@ -810,12 +748,6 @@ cglobal vp9_ipred_dl_4x4, 4, 4, 0, dst, stride, l, a
     movd      [dstq+strideq*0], m0
     movd      [dstq+strideq*2], m1
     RET
-%endmacro
-
-INIT_MMX mmxext
-DL_MMX_FUNCS
-INIT_MMX ssse3
-DL_MMX_FUNCS
 
 %macro DL_XMM_FUNCS 0
 cglobal vp9_ipred_dl_8x8, 4, 4, 4, dst, stride, stride5, a
@@ -964,14 +896,14 @@ DL_XMM_FUNCS
 
 ; dr
 
-%macro DR_MMX_FUNCS 0
+INIT_MMX ssse3
 cglobal vp9_ipred_dr_4x4, 4, 4, 0, dst, stride, l, a
     movd                    m0, [lq]
     punpckldq               m0, [aq-1]
     movd                    m1, [aq+3]
     DEFINE_ARGS dst, stride, stride3
     lea               stride3q, [strideq*3]
-    PALIGNR                 m1, m0, 1, m3
+    palignr                 m1, m0, 1
     psrlq                   m2, m1, 8
     LOWPASS                  0, 1, 2, 3
 
@@ -983,12 +915,6 @@ cglobal vp9_ipred_dr_4x4, 4, 4, 0, dst, stride, l, a
     psrlq                   m0, 8
     movd      [dstq+strideq*0], m0
     RET
-%endmacro
-
-INIT_MMX mmxext
-DR_MMX_FUNCS
-INIT_MMX ssse3
-DR_MMX_FUNCS
 
 %macro DR_XMM_FUNCS 0
 cglobal vp9_ipred_dr_8x8, 4, 4, 4, dst, stride, l, a
@@ -1266,7 +1192,7 @@ VL_XMM_FUNCS
 
 ; vr
 
-%macro VR_MMX_FUNCS 0
+INIT_MMX ssse3
 cglobal vp9_ipred_vr_4x4, 4, 4, 0, dst, stride, l, a
     movq                    m1, [aq-1]
     punpckldq               m2, [lq]
@@ -1274,7 +1200,7 @@ cglobal vp9_ipred_vr_4x4, 4, 4, 0, dst, stride, l, a
     DEFINE_ARGS dst, stride, stride3
     lea               stride3q, [strideq*3]
     pavgb                   m0, m1
-    PALIGNR                 m1, m2, 5, m3
+    palignr                 m1, m2, 5
     psrlq                   m2, m1, 8
     psllq                   m3, m1, 8
     LOWPASS                  2,  1, 3, 4
@@ -1284,7 +1210,6 @@ cglobal vp9_ipred_vr_4x4, 4, 4, 0, dst, stride, l, a
     ; IABC  | m0 contains ABCDxxxx
     ; JEFG  | m2 contains xJIEFGHx
 
-%if cpuflag(ssse3)
     punpckldq               m0, m2
     pshufb                  m2, [pb_13456_3xm1]
     movd      [dstq+strideq*0], m0
@@ -1293,24 +1218,7 @@ cglobal vp9_ipred_vr_4x4, 4, 4, 0, dst, stride, l, a
     psrlq                   m2, 8
     movd      [dstq+strideq*2], m0
     movd      [dstq+strideq*1], m2
-%else
-    psllq                   m1, m2, 40
-    psrlq                   m2, 24
-    movd      [dstq+strideq*0], m0
-    movd      [dstq+strideq*1], m2
-    PALIGNR                 m0, m1, 7, m3
-    psllq                   m1, 8
-    PALIGNR                 m2, m1, 7, m3
-    movd      [dstq+strideq*2], m0
-    movd      [dstq+stride3q ], m2
-%endif
     RET
-%endmacro
-
-INIT_MMX mmxext
-VR_MMX_FUNCS
-INIT_MMX ssse3
-VR_MMX_FUNCS
 
 %macro VR_XMM_FUNCS 1 ; n_xmm_regs for 16x16
 cglobal vp9_ipred_vr_8x8, 4, 4, 5, dst, stride, l, a
@@ -1688,16 +1596,10 @@ HD_XMM_FUNCS
 INIT_XMM avx
 HD_XMM_FUNCS
 
-%macro HU_MMX_FUNCS 0
+INIT_MMX ssse3
 cglobal vp9_ipred_hu_4x4, 3, 3, 0, dst, stride, l
     movd                    m0, [lq]
-%if cpuflag(ssse3)
     pshufb                  m0, [pb_0to2_5x3]
-%else
-    punpcklbw               m1, m0, m0          ; 00112233
-    pshufw                  m1, m1, q3333       ; 33333333
-    punpckldq               m0, m1              ; 01233333
-%endif
     psrlq                   m1, m0, 8
     psrlq                   m2, m1, 8
     LOWPASS                  2,  1, 0, 3
@@ -1705,7 +1607,7 @@ cglobal vp9_ipred_hu_4x4, 3, 3, 0, dst, stride, l
     DEFINE_ARGS dst, stride, stride3
     lea               stride3q, [strideq*3]
     SBUTTERFLY              bw,  1, 2, 0
-    PALIGNR                 m2, m1, 2, m0
+    palignr                 m2, m1, 2
     movd      [dstq+strideq*0], m1
     movd      [dstq+strideq*1], m2
     punpckhdq               m1, m1
@@ -1713,12 +1615,6 @@ cglobal vp9_ipred_hu_4x4, 3, 3, 0, dst, stride, l
     movd      [dstq+strideq*2], m1
     movd      [dstq+stride3q ], m2
     RET
-%endmacro
-
-INIT_MMX mmxext
-HU_MMX_FUNCS
-INIT_MMX ssse3
-HU_MMX_FUNCS
 
 %macro HU_XMM_FUNCS 1 ; n_xmm_regs in hu_32x32
 cglobal vp9_ipred_hu_8x8, 3, 4, 4, dst, stride, l
-- 
2.52.0


From d7ef87018709203dac8a3a17e2e8b11f4a05a948 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sun, 30 Nov 2025 20:49:51 +0100
Subject: [PATCH 252/304] avcodec/vp9itxfm{,_16bpp}: Remove MMXEXT functions
 overridden by SSSE3

SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMXEXT functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp9dsp_init.c                |  2 --
 libavcodec/x86/vp9dsp_init_16bpp_template.c |  4 ---
 libavcodec/x86/vp9itxfm.asm                 | 30 +--------------------
 libavcodec/x86/vp9itxfm_16bpp.asm           | 18 +------------
 4 files changed, 2 insertions(+), 52 deletions(-)

diff --git a/libavcodec/x86/vp9dsp_init.c b/libavcodec/x86/vp9dsp_init.c
index 85332da2b9..e479fd25ee 100644
--- a/libavcodec/x86/vp9dsp_init.c
+++ b/libavcodec/x86/vp9dsp_init.c
@@ -101,7 +101,6 @@ itxfm_func(iadst, idct,  size, opt); \
 itxfm_func(idct,  iadst, size, opt); \
 itxfm_func(iadst, iadst, size, opt)
 
-itxfm_func(idct,  idct,  4, mmxext);
 itxfm_func(idct,  iadst, 4, sse2);
 itxfm_func(iadst, idct,  4, sse2);
 itxfm_func(iadst, iadst, 4, sse2);
@@ -284,7 +283,6 @@ av_cold void ff_vp9dsp_init_x86(VP9DSPContext *dsp, int bpp, int bitexact)
         dsp->loop_filter_8[1][1] = ff_vp9_loop_filter_v_8_8_mmxext;
         init_fpel_func(4, 1,  4, avg, _8, mmxext);
         init_fpel_func(3, 1,  8, avg, _8, mmxext);
-        dsp->itxfm_add[TX_4X4][DCT_DCT] = ff_vp9_idct_idct_4x4_add_mmxext;
         dsp->intra_pred[TX_4X4][HOR_DOWN_PRED] = ff_vp9_ipred_hd_4x4_mmxext;
         dsp->intra_pred[TX_4X4][VERT_LEFT_PRED] = ff_vp9_ipred_vl_4x4_mmxext;
     }
diff --git a/libavcodec/x86/vp9dsp_init_16bpp_template.c b/libavcodec/x86/vp9dsp_init_16bpp_template.c
index 54ff8892cf..969db94d3c 100644
--- a/libavcodec/x86/vp9dsp_init_16bpp_template.c
+++ b/libavcodec/x86/vp9dsp_init_16bpp_template.c
@@ -123,7 +123,6 @@ decl_ipred_fns(tm, BPC, mmxext, sse2);
 
 decl_itxfm_func(iwht, iwht, 4, BPC, mmxext);
 #if BPC == 10
-decl_itxfm_func(idct,  idct,  4, BPC, mmxext);
 decl_itxfm_funcs(4, BPC, ssse3);
 decl_itxfm_funcs(16, BPC, avx512icl);
 decl_itxfm_func(idct,  idct, 32, BPC, avx512icl);
@@ -184,9 +183,6 @@ av_cold void INIT_FUNC(VP9DSPContext *dsp, int bitexact)
         init_ipred_func(tm, TM_VP8, 4, BPC, mmxext);
         if (!bitexact) {
             init_itx_func_one(4 /* lossless */, iwht, iwht, 4, BPC, mmxext);
-#if BPC == 10
-            init_itx_func(TX_4X4, DCT_DCT, idct, idct, 4, 10, mmxext);
-#endif
         }
     }
 
diff --git a/libavcodec/x86/vp9itxfm.asm b/libavcodec/x86/vp9itxfm.asm
index fe650d519c..bd5966646c 100644
--- a/libavcodec/x86/vp9itxfm.asm
+++ b/libavcodec/x86/vp9itxfm.asm
@@ -223,49 +223,28 @@ cglobal vp9_iwht_iwht_4x4_add, 3, 3, 0, dst, stride, block, eob
     VP9_STORE_2X         2,  3,  6,  7,  4
 %endmacro
 
-%macro IDCT_4x4_FN 1
-INIT_MMX %1
+INIT_MMX ssse3
 cglobal vp9_idct_idct_4x4_add, 4, 4, 0, dst, stride, block, eob
 
-%if cpuflag(ssse3)
     cmp eobd, 4 ; 2x2 or smaller
     jg .idctfull
 
     cmp eobd, 1 ; faster path for when only DC is set
     jne .idct2x2
-%else
-    cmp eobd, 1
-    jg .idctfull
-%endif
 
-%if cpuflag(ssse3)
     movd                m0, [blockq]
     mova                m5, [pw_11585x2]
     pmulhrsw            m0, m5
     pmulhrsw            m0, m5
-%else
-    DEFINE_ARGS dst, stride, block, coef
-    movsx            coefd, word [blockq]
-    imul             coefd, 11585
-    add              coefd, 8192
-    sar              coefd, 14
-    imul             coefd, 11585
-    add              coefd, (8 << 14) + 8192
-    sar              coefd, 14 + 4
-    movd                m0, coefd
-%endif
     pshufw              m0, m0, 0
     pxor                m4, m4
     movh          [blockq], m4
-%if cpuflag(ssse3)
     pmulhrsw            m0, [pw_2048]       ; (x*2048 + (1<<14))>>15 <=> (x+8)>>4
-%endif
     VP9_STORE_2X         0,  0,  6,  7,  4
     lea               dstq, [dstq+2*strideq]
     VP9_STORE_2X         0,  0,  6,  7,  4
     RET
 
-%if cpuflag(ssse3)
 ; faster path for when only top left 2x2 block is set
 .idct2x2:
     movd                m0, [blockq+0]
@@ -285,16 +264,13 @@ cglobal vp9_idct_idct_4x4_add, 4, 4, 0, dst, stride, block, eob
     movh       [blockq+ 8], m4
     VP9_IDCT4_WRITEOUT
     RET
-%endif
 
 .idctfull: ; generic full 4x4 idct/idct
     mova                m0, [blockq+ 0]
     mova                m1, [blockq+ 8]
     mova                m2, [blockq+16]
     mova                m3, [blockq+24]
-%if cpuflag(ssse3)
     mova                m6, [pw_11585x2]
-%endif
     mova                m7, [pd_8192]       ; rounding
     VP9_IDCT4_1D
     TRANSPOSE4x4W  0, 1, 2, 3, 4
@@ -306,10 +282,6 @@ cglobal vp9_idct_idct_4x4_add, 4, 4, 0, dst, stride, block, eob
     mova       [blockq+24], m4
     VP9_IDCT4_WRITEOUT
     RET
-%endmacro
-
-IDCT_4x4_FN mmxext
-IDCT_4x4_FN ssse3
 
 ;-------------------------------------------------------------------------------------------
 ; void vp9_iadst_iadst_4x4_add_<opt>(uint8_t *dst, ptrdiff_t stride, int16_t *block, int eob);
diff --git a/libavcodec/x86/vp9itxfm_16bpp.asm b/libavcodec/x86/vp9itxfm_16bpp.asm
index ebe6222285..161c73f5a1 100644
--- a/libavcodec/x86/vp9itxfm_16bpp.asm
+++ b/libavcodec/x86/vp9itxfm_16bpp.asm
@@ -243,29 +243,21 @@ IWHT4_FN 12, 4095
 ; 4x4 coefficients are 5+depth+sign bits, so for 10bpp, everything still fits
 ; in 15+1 words without additional effort, since the coefficients are 15bpp.
 
-%macro IDCT4_10_FN 0
+INIT_MMX ssse3
 cglobal vp9_idct_idct_4x4_add_10, 4, 4, 8, dst, stride, block, eob
     cmp               eobd, 1
     jg .idctfull
 
     ; dc-only
     pxor                m4, m4
-%if cpuflag(ssse3)
     movd                m0, [blockq]
     movd          [blockq], m4
     mova                m5, [pw_11585x2]
     pmulhrsw            m0, m5
     pmulhrsw            m0, m5
-%else
-    DEFINE_ARGS dst, stride, block, coef
-    DC_ONLY              4, m4
-    movd                m0, coefd
-%endif
     pshufw              m0, m0, 0
     mova                m5, [pw_1023]
-%if cpuflag(ssse3)
     pmulhrsw            m0, [pw_2048]       ; (x*2048 + (1<<14))>>15 <=> (x+8)>>4
-%endif
     VP9_STORE_2X         0,  0,  6,  7,  4,  5
     lea               dstq, [dstq+2*strideq]
     VP9_STORE_2X         0,  0,  6,  7,  4,  5
@@ -281,9 +273,7 @@ cglobal vp9_idct_idct_4x4_add_10, 4, 4, 8, dst, stride, block, eob
     packssdw            m2, [blockq+2*16+8]
     packssdw            m3, [blockq+3*16+8]
 
-%if cpuflag(ssse3)
     mova                m6, [pw_11585x2]
-%endif
     mova                m7, [pd_8192]       ; rounding
     VP9_IDCT4_1D
     TRANSPOSE4x4W  0, 1, 2, 3, 4
@@ -293,12 +283,6 @@ cglobal vp9_idct_idct_4x4_add_10, 4, 4, 8, dst, stride, block, eob
     ZERO_BLOCK      blockq, 16, 4, m4
     VP9_IDCT4_WRITEOUT
     RET
-%endmacro
-
-INIT_MMX mmxext
-IDCT4_10_FN
-INIT_MMX ssse3
-IDCT4_10_FN
 
 %macro IADST4_FN 4
 cglobal vp9_%1_%3_4x4_add_10, 3, 3, 0, dst, stride, block, eob
-- 
2.52.0


From 7082d9f9e38d995622b080922331f3876afe585f Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sun, 30 Nov 2025 23:48:36 +0100
Subject: [PATCH 253/304] tests/checkasm/vp9dsp: Allow to run only a subset of
 tests

Make it possible to run only a subset of the VP9 tests
in addition to all of them (via the vp9dsp test). This
reduces noise and speeds up testing.
FATE continues to use vp9dsp.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 tests/checkasm/checkasm.c |  6 +++++-
 tests/checkasm/checkasm.h |  4 ++++
 tests/checkasm/vp9dsp.c   | 16 ++++++++--------
 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 7edc8e4e6e..6e066d469a 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -264,7 +264,11 @@ static const struct {
         { "vp8dsp", checkasm_check_vp8dsp },
     #endif
     #if CONFIG_VP9_DECODER
-        { "vp9dsp", checkasm_check_vp9dsp },
+        { "vp9dsp", checkasm_check_vp9dsp }, // all of the below
+        { "vp9_ipred", checkasm_check_vp9_ipred },
+        { "vp9_itxfm", checkasm_check_vp9_itxfm },
+        { "vp9_loopfilter", checkasm_check_vp9_loopfilter },
+        { "vp9_mc", checkasm_check_vp9_mc },
     #endif
     #if CONFIG_VIDEODSP
         { "videodsp", checkasm_check_videodsp },
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index 9f4fb8b283..910fc417a7 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -159,6 +159,10 @@ void checkasm_check_vp3dsp(void);
 void checkasm_check_vp6dsp(void);
 void checkasm_check_vp8dsp(void);
 void checkasm_check_vp9dsp(void);
+void checkasm_check_vp9_ipred(void);
+void checkasm_check_vp9_itxfm(void);
+void checkasm_check_vp9_loopfilter(void);
+void checkasm_check_vp9_mc(void);
 void checkasm_check_videodsp(void);
 void checkasm_check_vorbisdsp(void);
 void checkasm_check_vvc_alf(void);
diff --git a/tests/checkasm/vp9dsp.c b/tests/checkasm/vp9dsp.c
index 2a3374541f..d5ff5aa2cd 100644
--- a/tests/checkasm/vp9dsp.c
+++ b/tests/checkasm/vp9dsp.c
@@ -47,7 +47,7 @@ static const uint32_t pixel_mask[3] = { 0xffffffff, 0x03ff03ff, 0x0fff0fff };
         }                                                          \
     } while (0)
 
-static void check_ipred(void)
+void checkasm_check_vp9_ipred(void)
 {
     LOCAL_ALIGNED_32(uint8_t, a_buf, [64 * 2]);
     uint8_t *a = &a_buf[32 * 2];
@@ -308,7 +308,7 @@ static int is_zero(const int16_t *c, int sz)
 
 #define SIZEOF_COEF (2 * ((bit_depth + 7) / 8))
 
-static void check_itxfm(void)
+void checkasm_check_vp9_itxfm(void)
 {
     LOCAL_ALIGNED_64(uint8_t, src, [32 * 32 * 2]);
     LOCAL_ALIGNED_64(uint8_t, dst, [32 * 32 * 2]);
@@ -449,7 +449,7 @@ static void randomize_loopfilter_buffers(int bidx, int lineoff, int str,
         randomize_loopfilter_buffers(bidx, lineoff, str, bit_depth, dir, \
                                      E, F, H, I, buf0, buf1)
 
-static void check_loopfilter(void)
+void checkasm_check_vp9_loopfilter(void)
 {
     LOCAL_ALIGNED_32(uint8_t, base0, [32 + 16 * 16 * 2]);
     LOCAL_ALIGNED_32(uint8_t, base1, [32 + 16 * 16 * 2]);
@@ -556,7 +556,7 @@ static void check_loopfilter(void)
         }                                                 \
     } while (0)
 
-static void check_mc(void)
+void checkasm_check_vp9_mc(void)
 {
     LOCAL_ALIGNED_64(uint8_t, buf, [72 * 72 * 2]);
     LOCAL_ALIGNED_64(uint8_t, dst0, [64 * 64 * 2]);
@@ -626,8 +626,8 @@ static void check_mc(void)
 
 void checkasm_check_vp9dsp(void)
 {
-    check_ipred();
-    check_itxfm();
-    check_loopfilter();
-    check_mc();
+    checkasm_check_vp9_ipred();
+    checkasm_check_vp9_itxfm();
+    checkasm_check_vp9_loopfilter();
+    checkasm_check_vp9_mc();
 }
-- 
2.52.0


From 68b1d1a363a891eeb170b253792207a79eb499c7 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Mon, 1 Dec 2025 00:17:55 +0100
Subject: [PATCH 254/304] avcodec/x86/vp9mc: Avoid reloads, MMX regs in width 4
 vert 8tap func

Four rows of four bytes fit into one xmm register; therefore
one can arrange the rows as follows (A,B,C: first, second, third etc.
row)

xmm0: ABABABAB BCBCBCBC
xmm1: CDCDCDCD DEDEDEDE
xmm2: EFEFEFEF FGFGFGFG
xmm3: GHGHGHGH HIHIHIHI

and use four pmaddubsw to calculate two rows in parallel. The history
fits into four registers, making this possible even on 32bit systems.

Old benchmarks (Unix 64):
vp9_avg_8tap_smooth_4v_8bpp_c:                         105.5 ( 1.00x)
vp9_avg_8tap_smooth_4v_8bpp_ssse3:                      16.4 ( 6.44x)
vp9_put_8tap_smooth_4v_8bpp_c:                          99.3 ( 1.00x)
vp9_put_8tap_smooth_4v_8bpp_ssse3:                      15.4 ( 6.44x)

New benchmarks (Unix 64):
vp9_avg_8tap_smooth_4v_8bpp_c:                         105.0 ( 1.00x)
vp9_avg_8tap_smooth_4v_8bpp_ssse3:                      11.8 ( 8.90x)
vp9_put_8tap_smooth_4v_8bpp_c:                          99.7 ( 1.00x)
vp9_put_8tap_smooth_4v_8bpp_ssse3:                      10.7 ( 9.30x)

Old benchmarks (x86-32):
vp9_avg_8tap_smooth_4v_8bpp_c:                         138.2 ( 1.00x)
vp9_avg_8tap_smooth_4v_8bpp_ssse3:                      28.0 ( 4.93x)
vp9_put_8tap_smooth_4v_8bpp_c:                         123.6 ( 1.00x)
vp9_put_8tap_smooth_4v_8bpp_ssse3:                      28.0 ( 4.41x)

New benchmarks (x86-32):
vp9_avg_8tap_smooth_4v_8bpp_c:                         139.0 ( 1.00x)
vp9_avg_8tap_smooth_4v_8bpp_ssse3:                      20.1 ( 6.92x)
vp9_put_8tap_smooth_4v_8bpp_c:                         124.5 ( 1.00x)
vp9_put_8tap_smooth_4v_8bpp_ssse3:                      19.9 ( 6.26x)

Loading the constants into registers did not turn out to be advantageous
here (not to mention Win64, where this would necessitate saving
and restoring ever more register); probably because there are only two
loop iterations.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp9mc.asm | 90 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 81 insertions(+), 9 deletions(-)

diff --git a/libavcodec/x86/vp9mc.asm b/libavcodec/x86/vp9mc.asm
index 682c6a6ea0..85249bb507 100644
--- a/libavcodec/x86/vp9mc.asm
+++ b/libavcodec/x86/vp9mc.asm
@@ -496,12 +496,84 @@ INIT_XMM sse2
 filter_sse2_v_fn put
 filter_sse2_v_fn avg
 
-%macro filter_v_fn 1
-%assign %%px mmsize/2
+%macro filter4_v_fn 1
 %if ARCH_X86_64
-cglobal vp9_%1_8tap_1d_v_ %+ %%px %+ _8, 6, 8, 11, dst, dstride, src, sstride, h, filtery, src4, sstride3
+cglobal vp9_%1_8tap_1d_v_4_8, 6, 7, 8, dst, dstride, src, sstride, h, filtery, sstride3
 %else
-cglobal vp9_%1_8tap_1d_v_ %+ %%px %+ _8, 4, 7, 11, dst, dstride, src, sstride, filtery, src4, sstride3
+cglobal vp9_%1_8tap_1d_v_4_8, 4, 5, 8, dst, dstride, src, sstride, filtery
+%define hd r4mp
+%define sstride3q filteryq
+%endif
+    lea  sstride3q, [sstrideq*3]
+    sub       srcq, sstride3q
+    movd        m0, [srcq]
+    movd        m1, [srcq+sstrideq]
+    movd        m2, [srcq+sstrideq*2]
+    movd        m3, [srcq+sstride3q]
+    lea       srcq, [srcq+sstrideq*4]
+    movd        m4, [srcq]
+    movd        m5, [srcq+sstrideq]
+    punpcklbw   m0, m1
+    punpcklbw   m1, m2
+    punpcklbw   m2, m3
+    punpcklbw   m3, m4
+    punpcklqdq  m0, m1
+    movd        m1, [srcq+sstrideq*2]
+    add       srcq, sstride3q
+%if ARCH_X86_32
+    mov   filteryq, r5mp
+%endif
+    punpcklqdq  m2, m3
+    punpcklbw   m4, m5
+    punpcklbw   m5, m1
+    punpcklqdq  m4, m5
+.loop:
+    pmaddubsw   m0, [filteryq]
+    movd        m3, [srcq]
+    movd        m5, [srcq+sstrideq]
+    pmaddubsw   m7, m4, [filteryq+64]
+    pmaddubsw   m6, m2, [filteryq+32]
+    punpcklbw   m1, m3
+    punpcklbw   m3, m5
+    punpcklqdq  m1, m3
+    pmaddubsw   m3, m1, [filteryq+96]
+    paddw       m0, [pw_64]
+    lea       srcq, [srcq+2*sstrideq]
+    paddw       m7, m0
+    mova        m0, m2
+    mova        m2, m4
+%ifidn %1, avg
+    movd        m4, [dstq]
+%endif
+    paddw       m6, m3
+%ifidn %1, avg
+    movd        m3, [dstq+dstrideq]
+%endif
+    paddsw      m6, m7
+    psraw       m6, 7
+    packuswb    m6, m6
+    pshuflw     m7, m6, 0xE
+%ifidn %1, avg
+    pavgb       m6, m4
+%endif
+    movd    [dstq], m6
+    mova        m4, m1
+%ifidn %1, avg
+    pavgb       m7, m3
+%endif
+    movd [dstq+dstrideq], m7
+    lea       dstq, [dstq+2*dstrideq]
+    mova        m1, m5
+    sub         hd, 2
+    jg .loop
+    RET
+%endmacro
+
+%macro filter_v_fn 1
+%if ARCH_X86_64
+cglobal vp9_%1_8tap_1d_v_8_8, 6, 8, 11, dst, dstride, src, sstride, h, filtery, src4, sstride3
+%else
+cglobal vp9_%1_8tap_1d_v_8_8, 4, 7, 11, dst, dstride, src, sstride, filtery, src4, sstride3
     mov   filteryq, r5mp
 %define hd r4mp
 %endif
@@ -510,7 +582,7 @@ cglobal vp9_%1_8tap_1d_v_ %+ %%px %+ _8, 4, 7, 11, dst, dstride, src, sstride, f
     lea      src4q, [srcq+sstrideq]
     sub       srcq, sstride3q
     mova        m7, [filteryq+ 0]
-%if ARCH_X86_64 && mmsize > 8
+%if ARCH_X86_64
     mova        m8, [filteryq+32]
     mova        m9, [filteryq+64]
     mova       m10, [filteryq+96]
@@ -533,7 +605,7 @@ cglobal vp9_%1_8tap_1d_v_ %+ %%px %+ _8, 4, 7, 11, dst, dstride, src, sstride, f
     punpcklbw   m4, m5
     punpcklbw   m1, m3
     pmaddubsw   m0, m7
-%if ARCH_X86_64 && mmsize > 8
+%if ARCH_X86_64
     pmaddubsw   m2, m8
     pmaddubsw   m4, m9
     pmaddubsw   m1, m10
@@ -560,9 +632,9 @@ cglobal vp9_%1_8tap_1d_v_ %+ %%px %+ _8, 4, 7, 11, dst, dstride, src, sstride, f
     RET
 %endmacro
 
-INIT_MMX ssse3
-filter_v_fn put
-filter_v_fn avg
+INIT_XMM ssse3
+filter4_v_fn put
+filter4_v_fn avg
 
 INIT_XMM ssse3
 filter_v_fn put
-- 
2.52.0


From 65e34ee999578c0b36694f51b05dd324001098ff Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 2 Dec 2025 13:47:32 +0100
Subject: [PATCH 255/304] avcodec/x86/vp9mc: Avoid MMX regs in width 4 hor 8tap
 funcs

Using wider registers (and pshufb) allows to halve the number of
pmaddubsw used. It is also ABI compliant (no more missing emms).

Old benchmarks:
vp9_avg_8tap_smooth_4h_8bpp_c:                          97.6 ( 1.00x)
vp9_avg_8tap_smooth_4h_8bpp_ssse3:                      15.0 ( 6.52x)
vp9_avg_8tap_smooth_4hv_8bpp_c:                        342.9 ( 1.00x)
vp9_avg_8tap_smooth_4hv_8bpp_ssse3:                     54.0 ( 6.35x)
vp9_put_8tap_smooth_4h_8bpp_c:                          94.9 ( 1.00x)
vp9_put_8tap_smooth_4h_8bpp_ssse3:                      14.2 ( 6.67x)
vp9_put_8tap_smooth_4hv_8bpp_c:                        325.9 ( 1.00x)
vp9_put_8tap_smooth_4hv_8bpp_ssse3:                     52.5 ( 6.20x)

New benchmarks:
vp9_avg_8tap_smooth_4h_8bpp_c:                          97.6 ( 1.00x)
vp9_avg_8tap_smooth_4h_8bpp_ssse3:                      10.8 ( 9.08x)
vp9_avg_8tap_smooth_4hv_8bpp_c:                        342.4 ( 1.00x)
vp9_avg_8tap_smooth_4hv_8bpp_ssse3:                     38.8 ( 8.82x)
vp9_put_8tap_smooth_4h_8bpp_c:                          94.7 ( 1.00x)
vp9_put_8tap_smooth_4h_8bpp_ssse3:                       9.7 ( 9.75x)
vp9_put_8tap_smooth_4hv_8bpp_c:                        321.7 ( 1.00x)
vp9_put_8tap_smooth_4hv_8bpp_ssse3:                     37.0 ( 8.69x)

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp9mc.asm | 50 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 44 insertions(+), 6 deletions(-)

diff --git a/libavcodec/x86/vp9mc.asm b/libavcodec/x86/vp9mc.asm
index 85249bb507..42f9074c21 100644
--- a/libavcodec/x86/vp9mc.asm
+++ b/libavcodec/x86/vp9mc.asm
@@ -114,6 +114,9 @@ FILTER sse2
 ; int16_t ff_filters_16bpp[3][15][4][16]
 FILTER 16bpp
 
+filter4_h_perm0: db 0, 1, 1, 2, 2, 3, 3, 4, 2, 3, 3, 4, 4, 5, 5, 6
+filter4_h_perm1: db 1, 2, 2, 3, 3, 4, 4, 5, 3, 4, 4, 5, 5, 6, 6, 7
+
 %if HAVE_AVX512ICL_EXTERNAL && ARCH_X86_64
 ALIGN 64
 spel_h_perm16:  db  0,  1,  2,  3,  1,  2,  3,  4,  2,  3,  4,  5,  3,  4,  5,  6
@@ -280,12 +283,51 @@ INIT_XMM sse2
 filter_sse2_h_fn put
 filter_sse2_h_fn avg
 
+%macro filter4_h_fn 2
+cglobal vp9_%1_8tap_1d_h_4_8, 6, 6, %2, dst, dstride, src, sstride, h, filtery
+    mova        m2, [filter4_h_perm0]
+    mova        m3, [filter4_h_perm1]
+    pcmpeqw     m4, m4
+    movu        m5, [filteryq+24]
+    movu        m6, [filteryq+88]
+    psllw       m4, 6   ; pw_m64
+.loop:
+    movq        m0, [srcq-3]
+    movq        m1, [srcq+0]
+    pshufb      m0, m2
+    pshufb      m1, m3
+    pmaddubsw   m0, m5
+    pmaddubsw   m1, m6
+%ifidn %1, avg
+    movd        m7, [dstq]
+%endif
+    add       srcq, sstrideq
+    paddw       m0, m1
+    movhlps     m1, m0
+    psubw       m0, m4
+    paddsw      m0, m1
+    psraw       m0, 7
+    packuswb    m0, m0
+%ifidn %1, avg
+    pavgb       m0, m7
+%endif
+    movd    [dstq], m0
+    add       dstq, dstrideq
+    sub         hd, 1
+    jg .loop
+    RET
+%endmacro
+
+INIT_XMM ssse3
+filter4_h_fn put, 7
+filter4_h_fn avg, 8
+
 %macro filter_h_fn 1
 %assign %%px mmsize/2
 cglobal vp9_%1_8tap_1d_h_ %+ %%px %+ _8, 6, 6, 11, dst, dstride, src, sstride, h, filtery
     mova        m6, [pw_256]
     mova        m7, [filteryq+ 0]
-%if ARCH_X86_64 && mmsize > 8
+%ifdef m8
     mova        m8, [filteryq+32]
     mova        m9, [filteryq+64]
     mova       m10, [filteryq+96]
@@ -305,7 +347,7 @@ cglobal vp9_%1_8tap_1d_h_ %+ %%px %+ _8, 6, 6, 11, dst, dstride, src, sstride, h
     punpcklbw   m4, m5
     punpcklbw   m1, m3
     pmaddubsw   m0, m7
-%if ARCH_X86_64 && mmsize > 8
+%ifdef m8
     pmaddubsw   m2, m8
     pmaddubsw   m4, m9
     pmaddubsw   m1, m10
@@ -332,10 +374,6 @@ cglobal vp9_%1_8tap_1d_h_ %+ %%px %+ _8, 6, 6, 11, dst, dstride, src, sstride, h
     RET
 %endmacro
 
-INIT_MMX ssse3
-filter_h_fn put
-filter_h_fn avg
-
 INIT_XMM ssse3
 filter_h_fn put
 filter_h_fn avg
-- 
2.52.0


From f8c4e563296fc783fda7f5f6e9fad31958e3b4dc Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 2 Dec 2025 14:27:23 +0100
Subject: [PATCH 256/304] avcodec/x86/vp9mc: Deduplicate coefficient tables

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp9mc.asm | 72 ++++++++++++----------------------------
 1 file changed, 22 insertions(+), 50 deletions(-)

diff --git a/libavcodec/x86/vp9mc.asm b/libavcodec/x86/vp9mc.asm
index 42f9074c21..36e83643a9 100644
--- a/libavcodec/x86/vp9mc.asm
+++ b/libavcodec/x86/vp9mc.asm
@@ -53,8 +53,11 @@ times 8 dw %5, %6
 times 8 dw %7, %8
 %endmacro
 
-%macro FILTER 1
-const filters_%1 ; smooth
+%macro FILTER 0-1
+                    ; smooth
+%if %0 > 0
+%1 %+ _smooth:
+%endif
                     F8_TAPS -3, -1,  32,  64,  38,   1, -3,  0
                     F8_TAPS -2, -2,  29,  63,  41,   2, -3,  0
                     F8_TAPS -2, -2,  26,  63,  43,   4, -4,  0
@@ -71,6 +74,9 @@ const filters_%1 ; smooth
                     F8_TAPS  0, -3,   2,  41,  63,  29, -2, -2
                     F8_TAPS  0, -3,   1,  38,  64,  32, -1, -3
                     ; regular
+%if %0 > 0
+%1 %+ _regular:
+%endif
                     F8_TAPS  0,  1,  -5, 126,   8,  -3,  1,  0
                     F8_TAPS -1,  3, -10, 122,  18,  -6,  2,  0
                     F8_TAPS -1,  4, -13, 118,  27,  -9,  3, -1
@@ -87,6 +93,9 @@ const filters_%1 ; smooth
                     F8_TAPS  0,  2,  -6,  18, 122, -10,  3, -1
                     F8_TAPS  0,  1,  -3,   8, 126,  -5,  1,  0
                     ; sharp
+%if %0 > 0
+%1 %+ _sharp:
+%endif
                     F8_TAPS -1,  3,  -7, 127,   8,  -3,  1,  0
                     F8_TAPS -2,  5, -13, 125,  17,  -6,  3, -1
                     F8_TAPS -3,  7, -17, 121,  27, -10,  5, -2
@@ -106,13 +115,16 @@ const filters_%1 ; smooth
 
 %define F8_TAPS F8_SSSE3_TAPS
 ; int8_t ff_filters_ssse3[3][15][4][32]
-FILTER ssse3
+const filters_ssse3
+FILTER
 %define F8_TAPS F8_SSE2_TAPS
 ; int16_t ff_filters_sse2[3][15][8][8]
-FILTER sse2
+const filters_sse2
+FILTER
 %define F8_TAPS F8_16BPP_TAPS
 ; int16_t ff_filters_16bpp[3][15][4][16]
-FILTER 16bpp
+const filters_16bpp
+FILTER
 
 filter4_h_perm0: db 0, 1, 1, 2, 2, 3, 3, 4, 2, 3, 3, 4, 4, 5, 5, 6
 filter4_h_perm1: db 1, 2, 2, 3, 3, 4, 4, 5, 3, 4, 4, 5, 5, 6, 6, 7
@@ -148,51 +160,11 @@ spel_h_shufB:   db  4,  5,  6,  7,  5,  6,  7,  8,  6,  7,  8,  9,  7,  8,  9, 1
 %define spel_h_shufA (spel_h_perm16+ 0)
 %define spel_h_shufC (spel_h_perm16+16)
 
-vp9_spel_filter_regular: db   0,   1,  -5, 126,   8,  -3,   1,   0
-                         db  -1,   3, -10, 122,  18,  -6,   2,   0
-                         db  -1,   4, -13, 118,  27,  -9,   3,  -1
-                         db  -1,   4, -16, 112,  37, -11,   4,  -1
-                         db  -1,   5, -18, 105,  48, -14,   4,  -1
-                         db  -1,   5, -19,  97,  58, -16,   5,  -1
-                         db  -1,   6, -19,  88,  68, -18,   5,  -1
-                         db  -1,   6, -19,  78,  78, -19,   6,  -1
-                         db  -1,   5, -18,  68,  88, -19,   6,  -1
-                         db  -1,   5, -16,  58,  97, -19,   5,  -1
-                         db  -1,   4, -14,  48, 105, -18,   5,  -1
-                         db  -1,   4, -11,  37, 112, -16,   4,  -1
-                         db  -1,   3,  -9,  27, 118, -13,   4,  -1
-                         db   0,   2,  -6,  18, 122, -10,   3,  -1
-                         db   0,   1,  -3,   8, 126,  -5,   1,   0
-vp9_spel_filter_sharp:   db  -1,   3,  -7, 127,   8,  -3,   1,   0
-                         db  -2,   5, -13, 125,  17,  -6,   3,  -1
-                         db  -3,   7, -17, 121,  27, -10,   5,  -2
-                         db  -4,   9, -20, 115,  37, -13,   6,  -2
-                         db  -4,  10, -23, 108,  48, -16,   8,  -3
-                         db  -4,  10, -24, 100,  59, -19,   9,  -3
-                         db  -4,  11, -24,  90,  70, -21,  10,  -4
-                         db  -4,  11, -23,  80,  80, -23,  11,  -4
-                         db  -4,  10, -21,  70,  90, -24,  11,  -4
-                         db  -3,   9, -19,  59, 100, -24,  10,  -4
-                         db  -3,   8, -16,  48, 108, -23,  10,  -4
-                         db  -2,   6, -13,  37, 115, -20,   9,  -4
-                         db  -2,   5, -10,  27, 121, -17,   7,  -3
-                         db  -1,   3,  -6,  17, 125, -13,   5,  -2
-                         db   0,   1,  -3,   8, 127,  -7,   3,  -1
-vp9_spel_filter_smooth:  db  -3,  -1,  32,  64,  38,   1,  -3,   0
-                         db  -2,  -2,  29,  63,  41,   2,  -3,   0
-                         db  -2,  -2,  26,  63,  43,   4,  -4,   0
-                         db  -2,  -3,  24,  62,  46,   5,  -4,   0
-                         db  -2,  -3,  21,  60,  49,   7,  -4,   0
-                         db  -1,  -4,  18,  59,  51,   9,  -4,   0
-                         db  -1,  -4,  16,  57,  53,  12,  -4,  -1
-                         db  -1,  -4,  14,  55,  55,  14,  -4,  -1
-                         db  -1,  -4,  12,  53,  57,  16,  -4,  -1
-                         db   0,  -4,   9,  51,  59,  18,  -4,  -1
-                         db   0,  -4,   7,  49,  60,  21,  -3,  -2
-                         db   0,  -4,   5,  46,  62,  24,  -3,  -2
-                         db   0,  -4,   4,  43,  63,  26,  -2,  -2
-                         db   0,  -3,   2,  41,  63,  29,  -2,  -2
-                         db   0,  -3,   1,  38,  64,  32,  -1,  -3
+%macro F8_AVX512_TAPS 8
+db %1, %2, %3, %4, %5, %6, %7, %8
+%endmacro
+%define F8_TAPS F8_AVX512_TAPS
+FILTER vp9_spel_filter
 
 pb_02461357:    db  0,  2,  4,  6,  1,  3,  5,  7
 pd_64:          dd 64
-- 
2.52.0


From fe9a9c7e00635fa4330f7a021b179d9c48b95f26 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 2 Dec 2025 17:13:31 +0100
Subject: [PATCH 257/304] avcodec/x86/vp9mc: Reindent after the previous commit

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/x86/vp9mc.asm | 96 ++++++++++++++++++++--------------------
 1 file changed, 48 insertions(+), 48 deletions(-)

diff --git a/libavcodec/x86/vp9mc.asm b/libavcodec/x86/vp9mc.asm
index 36e83643a9..0e6aa627db 100644
--- a/libavcodec/x86/vp9mc.asm
+++ b/libavcodec/x86/vp9mc.asm
@@ -54,63 +54,63 @@ times 8 dw %7, %8
 %endmacro
 
 %macro FILTER 0-1
-                    ; smooth
 %if %0 > 0
 %1 %+ _smooth:
 %endif
-                    F8_TAPS -3, -1,  32,  64,  38,   1, -3,  0
-                    F8_TAPS -2, -2,  29,  63,  41,   2, -3,  0
-                    F8_TAPS -2, -2,  26,  63,  43,   4, -4,  0
-                    F8_TAPS -2, -3,  24,  62,  46,   5, -4,  0
-                    F8_TAPS -2, -3,  21,  60,  49,   7, -4,  0
-                    F8_TAPS -1, -4,  18,  59,  51,   9, -4,  0
-                    F8_TAPS -1, -4,  16,  57,  53,  12, -4, -1
-                    F8_TAPS -1, -4,  14,  55,  55,  14, -4, -1
-                    F8_TAPS -1, -4,  12,  53,  57,  16, -4, -1
-                    F8_TAPS  0, -4,   9,  51,  59,  18, -4, -1
-                    F8_TAPS  0, -4,   7,  49,  60,  21, -3, -2
-                    F8_TAPS  0, -4,   5,  46,  62,  24, -3, -2
-                    F8_TAPS  0, -4,   4,  43,  63,  26, -2, -2
-                    F8_TAPS  0, -3,   2,  41,  63,  29, -2, -2
-                    F8_TAPS  0, -3,   1,  38,  64,  32, -1, -3
-                    ; regular
+    ; smooth
+    F8_TAPS -3, -1,  32,  64,  38,   1, -3,  0
+    F8_TAPS -2, -2,  29,  63,  41,   2, -3,  0
+    F8_TAPS -2, -2,  26,  63,  43,   4, -4,  0
+    F8_TAPS -2, -3,  24,  62,  46,   5, -4,  0
+    F8_TAPS -2, -3,  21,  60,  49,   7, -4,  0
+    F8_TAPS -1, -4,  18,  59,  51,   9, -4,  0
+    F8_TAPS -1, -4,  16,  57,  53,  12, -4, -1
+    F8_TAPS -1, -4,  14,  55,  55,  14, -4, -1
+    F8_TAPS -1, -4,  12,  53,  57,  16, -4, -1
+    F8_TAPS  0, -4,   9,  51,  59,  18, -4, -1
+    F8_TAPS  0, -4,   7,  49,  60,  21, -3, -2
+    F8_TAPS  0, -4,   5,  46,  62,  24, -3, -2
+    F8_TAPS  0, -4,   4,  43,  63,  26, -2, -2
+    F8_TAPS  0, -3,   2,  41,  63,  29, -2, -2
+    F8_TAPS  0, -3,   1,  38,  64,  32, -1, -3
 %if %0 > 0
 %1 %+ _regular:
 %endif
-                    F8_TAPS  0,  1,  -5, 126,   8,  -3,  1,  0
-                    F8_TAPS -1,  3, -10, 122,  18,  -6,  2,  0
-                    F8_TAPS -1,  4, -13, 118,  27,  -9,  3, -1
-                    F8_TAPS -1,  4, -16, 112,  37, -11,  4, -1
-                    F8_TAPS -1,  5, -18, 105,  48, -14,  4, -1
-                    F8_TAPS -1,  5, -19,  97,  58, -16,  5, -1
-                    F8_TAPS -1,  6, -19,  88,  68, -18,  5, -1
-                    F8_TAPS -1,  6, -19,  78,  78, -19,  6, -1
-                    F8_TAPS -1,  5, -18,  68,  88, -19,  6, -1
-                    F8_TAPS -1,  5, -16,  58,  97, -19,  5, -1
-                    F8_TAPS -1,  4, -14,  48, 105, -18,  5, -1
-                    F8_TAPS -1,  4, -11,  37, 112, -16,  4, -1
-                    F8_TAPS -1,  3,  -9,  27, 118, -13,  4, -1
-                    F8_TAPS  0,  2,  -6,  18, 122, -10,  3, -1
-                    F8_TAPS  0,  1,  -3,   8, 126,  -5,  1,  0
-                    ; sharp
+    ; regular
+    F8_TAPS  0,  1,  -5, 126,   8,  -3,  1,  0
+    F8_TAPS -1,  3, -10, 122,  18,  -6,  2,  0
+    F8_TAPS -1,  4, -13, 118,  27,  -9,  3, -1
+    F8_TAPS -1,  4, -16, 112,  37, -11,  4, -1
+    F8_TAPS -1,  5, -18, 105,  48, -14,  4, -1
+    F8_TAPS -1,  5, -19,  97,  58, -16,  5, -1
+    F8_TAPS -1,  6, -19,  88,  68, -18,  5, -1
+    F8_TAPS -1,  6, -19,  78,  78, -19,  6, -1
+    F8_TAPS -1,  5, -18,  68,  88, -19,  6, -1
+    F8_TAPS -1,  5, -16,  58,  97, -19,  5, -1
+    F8_TAPS -1,  4, -14,  48, 105, -18,  5, -1
+    F8_TAPS -1,  4, -11,  37, 112, -16,  4, -1
+    F8_TAPS -1,  3,  -9,  27, 118, -13,  4, -1
+    F8_TAPS  0,  2,  -6,  18, 122, -10,  3, -1
+    F8_TAPS  0,  1,  -3,   8, 126,  -5,  1,  0
 %if %0 > 0
 %1 %+ _sharp:
 %endif
-                    F8_TAPS -1,  3,  -7, 127,   8,  -3,  1,  0
-                    F8_TAPS -2,  5, -13, 125,  17,  -6,  3, -1
-                    F8_TAPS -3,  7, -17, 121,  27, -10,  5, -2
-                    F8_TAPS -4,  9, -20, 115,  37, -13,  6, -2
-                    F8_TAPS -4, 10, -23, 108,  48, -16,  8, -3
-                    F8_TAPS -4, 10, -24, 100,  59, -19,  9, -3
-                    F8_TAPS -4, 11, -24,  90,  70, -21, 10, -4
-                    F8_TAPS -4, 11, -23,  80,  80, -23, 11, -4
-                    F8_TAPS -4, 10, -21,  70,  90, -24, 11, -4
-                    F8_TAPS -3,  9, -19,  59, 100, -24, 10, -4
-                    F8_TAPS -3,  8, -16,  48, 108, -23, 10, -4
-                    F8_TAPS -2,  6, -13,  37, 115, -20,  9, -4
-                    F8_TAPS -2,  5, -10,  27, 121, -17,  7, -3
-                    F8_TAPS -1,  3,  -6,  17, 125, -13,  5, -2
-                    F8_TAPS  0,  1,  -3,   8, 127,  -7,  3, -1
+    ; sharp
+    F8_TAPS -1,  3,  -7, 127,   8,  -3,  1,  0
+    F8_TAPS -2,  5, -13, 125,  17,  -6,  3, -1
+    F8_TAPS -3,  7, -17, 121,  27, -10,  5, -2
+    F8_TAPS -4,  9, -20, 115,  37, -13,  6, -2
+    F8_TAPS -4, 10, -23, 108,  48, -16,  8, -3
+    F8_TAPS -4, 10, -24, 100,  59, -19,  9, -3
+    F8_TAPS -4, 11, -24,  90,  70, -21, 10, -4
+    F8_TAPS -4, 11, -23,  80,  80, -23, 11, -4
+    F8_TAPS -4, 10, -21,  70,  90, -24, 11, -4
+    F8_TAPS -3,  9, -19,  59, 100, -24, 10, -4
+    F8_TAPS -3,  8, -16,  48, 108, -23, 10, -4
+    F8_TAPS -2,  6, -13,  37, 115, -20,  9, -4
+    F8_TAPS -2,  5, -10,  27, 121, -17,  7, -3
+    F8_TAPS -1,  3,  -6,  17, 125, -13,  5, -2
+    F8_TAPS  0,  1,  -3,   8, 127,  -7,  3, -1
 %endmacro
 
 %define F8_TAPS F8_SSSE3_TAPS
-- 
2.52.0


From c4ae29b2f30cd1a643c4b972aa3cb66b9fecfe6d Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Mon, 8 Dec 2025 18:38:36 +0100
Subject: [PATCH 258/304] swscale/ops_optimizer: set correct value range for
 subpixel reads

e.g. rgb4 only reads values up to 15, not 255.

Setting this correctly eliminates a number of redundant clamps in cases
like e.g. rgb4 -> monow.
---
 libswscale/ops_optimizer.c  | 2 +-
 tests/ref/fate/sws-ops-list | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libswscale/ops_optimizer.c b/libswscale/ops_optimizer.c
index b13e7ebdc1..f173e1f78f 100644
--- a/libswscale/ops_optimizer.c
+++ b/libswscale/ops_optimizer.c
@@ -97,7 +97,7 @@ void ff_sws_op_list_update_comps(SwsOpList *ops)
         case SWS_OP_READ:
             for (int i = 0; i < op->rw.elems; i++) {
                 if (ff_sws_pixel_type_is_int(op->type)) {
-                    int bits = 8 * ff_sws_pixel_type_size(op->type);
+                    int bits = 8 * ff_sws_pixel_type_size(op->type) >> op->rw.frac;
                     if (!op->rw.packed && ops->src.desc) {
                         /* Use legal value range from pixdesc if available;
                          * we don't need to do this for packed formats because
diff --git a/tests/ref/fate/sws-ops-list b/tests/ref/fate/sws-ops-list
index f77c60fde3..132f66f4fc 100644
--- a/tests/ref/fate/sws-ops-list
+++ b/tests/ref/fate/sws-ops-list
@@ -1 +1 @@
-e124847bc6663ca538b784de17bf42f0
+eae3a49ac3af42c13ad274883611ac21
-- 
2.52.0


From f033c79df1639a3603ea2d04e88a928a4f791d1a Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Mon, 8 Dec 2025 18:53:33 +0100
Subject: [PATCH 259/304] swscale/ops: clarify SwsOpList.src/dst semantics

Turns out these are not, in fact, purely informative - but the optimizer
can take them into account. This should be documented properly.

I tried to think of a way to avoid needing this in the optimizer, but any
way I could think of would require shoving this to SwsReadWriteOp, which I
am particularly unwilling to do.
---
 libswscale/ops.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libswscale/ops.h b/libswscale/ops.h
index dccc00d2f0..14edc77978 100644
--- a/libswscale/ops.h
+++ b/libswscale/ops.h
@@ -209,8 +209,8 @@ typedef struct SwsOpList {
     SwsOp *ops;
     int num_ops;
 
-    /* Purely informative metadata associated with this operation list */
-    SwsFormat src, dst;
+    /* Metadata associated with this operation list */
+    SwsFormat src, dst; /* if set; may inform the optimizer about e.g value ranges */
 } SwsOpList;
 
 SwsOpList *ff_sws_op_list_alloc(void);
-- 
2.52.0


From 81774fac12819020555aa4834ebb15dc40110b79 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Martin=20Storsj=C3=B6?= <martin@martin.st>
Date: Mon, 8 Dec 2025 22:35:33 +0200
Subject: [PATCH 260/304] swscale/tests: Fix fate-sws-ops-list on Windows

Set stdout to binary mode, to avoid platform specific differences
in the output that is hashed.
---
 libswscale/tests/sws_ops.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/libswscale/tests/sws_ops.c b/libswscale/tests/sws_ops.c
index 8bb44d634d..06fb87e3ae 100644
--- a/libswscale/tests/sws_ops.c
+++ b/libswscale/tests/sws_ops.c
@@ -22,6 +22,11 @@
 #include "libswscale/ops.h"
 #include "libswscale/format.h"
 
+#ifdef _WIN32
+#include <io.h>
+#include <fcntl.h>
+#endif
+
 static int run_test(SwsContext *const ctx, AVFrame *frame,
                     const AVPixFmtDescriptor *const src_desc,
                     const AVPixFmtDescriptor *const dst_desc)
@@ -73,6 +78,10 @@ int main(int argc, char **argv)
 {
     int ret = 1;
 
+#ifdef _WIN32
+    _setmode(_fileno(stdout), _O_BINARY);
+#endif
+
     SwsContext *ctx = sws_alloc_context();
     AVFrame *frame = av_frame_alloc();
     if (!ctx || !frame)
-- 
2.52.0


From d6b0751db4404688e1bcfb00d3183a98cfb2b9ba Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 8 Dec 2025 03:36:05 +0100
Subject: [PATCH 261/304] avfilter/vf_neighbor_opencl: add error condition when
 filter name doesn't match
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This cannot really happen, but to suppress compiler warnings, we can
just return AVERROR_BUG here.

Fixes: warning: variable 'kernel_name' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 libavfilter/vf_neighbor_opencl.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libavfilter/vf_neighbor_opencl.c b/libavfilter/vf_neighbor_opencl.c
index 0b8d7fc998..eb39a5ec59 100644
--- a/libavfilter/vf_neighbor_opencl.c
+++ b/libavfilter/vf_neighbor_opencl.c
@@ -69,6 +69,9 @@ static int neighbor_opencl_init(AVFilterContext *avctx)
         kernel_name = "erosion_global";
     } else if (!strcmp(avctx->filter->name, "dilation_opencl")){
         kernel_name = "dilation_global";
+    } else {
+        err = AVERROR_BUG;
+        goto fail;
     }
     ctx->kernel = clCreateKernel(ctx->ocf.program, kernel_name, &cle);
     CL_FAIL_ON_ERROR(AVERROR(EIO), "Failed to create "
-- 
2.52.0


From 402f954e8779c9e9554b1da20b1432927a603e53 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 8 Dec 2025 03:38:56 +0100
Subject: [PATCH 262/304] avfilter/vf_libopencv: make sure there is space for
 null-terminator in shape_str
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fixes: warning: 'sscanf' may overflow; destination buffer in argument 7 has size 32, but the corresponding specifier may require size 33 [-Wfortify-source]
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 libavfilter/vf_libopencv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavfilter/vf_libopencv.c b/libavfilter/vf_libopencv.c
index ae8dd49b2c..8a80a7fdab 100644
--- a/libavfilter/vf_libopencv.c
+++ b/libavfilter/vf_libopencv.c
@@ -205,7 +205,7 @@ static int parse_iplconvkernel(IplConvKernel **kernel, char *buf, void *log_ctx)
     int cols = 0, rows = 0, anchor_x = 0, anchor_y = 0, shape = CV_SHAPE_RECT;
     int *values = NULL, ret = 0;
 
-    sscanf(buf, "%dx%d+%dx%d/%32[^=]=%127s", &cols, &rows, &anchor_x, &anchor_y, shape_str, shape_filename);
+    sscanf(buf, "%dx%d+%dx%d/%31[^=]=%127s", &cols, &rows, &anchor_x, &anchor_y, shape_str, shape_filename);
 
     if      (!strcmp(shape_str, "rect"   )) shape = CV_SHAPE_RECT;
     else if (!strcmp(shape_str, "cross"  )) shape = CV_SHAPE_CROSS;
-- 
2.52.0


From 41fd68caf0d74eedd045b87bec562567c9ff8e1f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 8 Dec 2025 03:42:36 +0100
Subject: [PATCH 263/304] avcodec/libaomdec: add explicit enum cast to suppress
 compiler warnings
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 libavcodec/libaomdec.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libavcodec/libaomdec.c b/libavcodec/libaomdec.c
index cf5986baf4..9e9c4d18c5 100644
--- a/libavcodec/libaomdec.c
+++ b/libavcodec/libaomdec.c
@@ -71,9 +71,9 @@ static int set_pix_fmt(AVCodecContext *avctx, struct aom_image *img)
         AVCOL_RANGE_MPEG, AVCOL_RANGE_JPEG
     };
     avctx->color_range = color_ranges[img->range];
-    avctx->color_primaries = img->cp;
-    avctx->colorspace  = img->mc;
-    avctx->color_trc   = img->tc;
+    avctx->color_primaries = (enum AVColorPrimaries)img->cp;
+    avctx->colorspace  = (enum AVColorSpace)img->mc;
+    avctx->color_trc   = (enum AVColorTransferCharacteristic)img->tc;
 
     switch (img->fmt) {
     case AOM_IMG_FMT_I420:
-- 
2.52.0


From 15e22eed94359c69ae7f1c8eba6bd26388a911b2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 8 Dec 2025 03:48:08 +0100
Subject: [PATCH 264/304] avcodec/libsvtav1: add explicit enum cast to suppress
 compiler warnings
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 libavcodec/libsvtav1.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libavcodec/libsvtav1.c b/libavcodec/libsvtav1.c
index 0e3e748b8d..7047b72422 100644
--- a/libavcodec/libsvtav1.c
+++ b/libavcodec/libsvtav1.c
@@ -241,10 +241,10 @@ static int config_enc_params(EbSvtAv1EncConfiguration *param,
     }
 
     desc = av_pix_fmt_desc_get(avctx->pix_fmt);
-    param->color_primaries          = avctx->color_primaries;
-    param->matrix_coefficients      = (desc->flags & AV_PIX_FMT_FLAG_RGB) ?
-                                      AVCOL_SPC_RGB : avctx->colorspace;
-    param->transfer_characteristics = avctx->color_trc;
+    param->color_primaries          = (enum EbColorPrimaries)avctx->color_primaries;
+    param->matrix_coefficients      = (enum EbMatrixCoefficients)((desc->flags & AV_PIX_FMT_FLAG_RGB) ?
+                                      AVCOL_SPC_RGB : avctx->colorspace);
+    param->transfer_characteristics = (enum EbTransferCharacteristics)avctx->color_trc;
 
     if (avctx->color_range != AVCOL_RANGE_UNSPECIFIED)
         param->color_range = avctx->color_range == AVCOL_RANGE_JPEG;
-- 
2.52.0


From 3f46a309bebfe0491297ba5bf7cfc54b155cbb8f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 8 Dec 2025 03:53:10 +0100
Subject: [PATCH 265/304] avcodec/libx265: add explicit enum cast to suppress
 compiler warnings
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 libavcodec/libx265.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavcodec/libx265.c b/libavcodec/libx265.c
index 2b83a91d00..341868e7cd 100644
--- a/libavcodec/libx265.c
+++ b/libavcodec/libx265.c
@@ -770,7 +770,7 @@ static int libx265_encode_frame(AVCodecContext *avctx, AVPacket *pkt,
                 sei_payload = &sei->payloads[sei->numPayloads];
                 sei_payload->payload = sei_data;
                 sei_payload->payloadSize = sei_size;
-                sei_payload->payloadType = SEI_TYPE_USER_DATA_REGISTERED_ITU_T_T35;
+                sei_payload->payloadType = (SEIPayloadType)SEI_TYPE_USER_DATA_REGISTERED_ITU_T_T35;
                 sei->numPayloads++;
             }
         }
@@ -801,7 +801,7 @@ static int libx265_encode_frame(AVCodecContext *avctx, AVPacket *pkt,
                 }
                 sei_payload->payloadSize = side_data->size;
                 /* Equal to libx265 USER_DATA_UNREGISTERED */
-                sei_payload->payloadType = SEI_TYPE_USER_DATA_UNREGISTERED;
+                sei_payload->payloadType = (SEIPayloadType)SEI_TYPE_USER_DATA_UNREGISTERED;
                 sei->numPayloads++;
             }
         }
-- 
2.52.0


From 5d81a0a5b6d5438159ace5729e7da93d5dcb8ddf Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 8 Dec 2025 03:57:25 +0100
Subject: [PATCH 266/304] swresample/soxr_resample: pass initialized data to
 soxr_process() in flush()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 libswresample/soxr_resample.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libswresample/soxr_resample.c b/libswresample/soxr_resample.c
index cc5b4db5d4..b71ece61f8 100644
--- a/libswresample/soxr_resample.c
+++ b/libswresample/soxr_resample.c
@@ -72,7 +72,7 @@ static int flush(struct SwrContext *s){
     soxr_process((soxr_t)s->resample, NULL, 0, NULL, NULL, 0, NULL);
 
     {
-        float f;
+        float f = 0;
         size_t idone, odone;
         soxr_process((soxr_t)s->resample, &f, 0, &idone, &f, 0, &odone);
         s->delayed_samples_fixup -= soxr_delay((soxr_t)s->resample);
-- 
2.52.0


From 3e0b13894a79e1c985ae68b5ab8996df5ab9baa7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 8 Dec 2025 04:02:19 +0100
Subject: [PATCH 267/304] avutil/hwcontext_vaapi: mark try_all with av_unused
 to suppres warning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fixes: warning: variable 'try_all' set but not used [-Wunused-but-set-variable]
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 libavutil/hwcontext_vaapi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavutil/hwcontext_vaapi.c b/libavutil/hwcontext_vaapi.c
index 4f3502797b..98130f9fb1 100644
--- a/libavutil/hwcontext_vaapi.c
+++ b/libavutil/hwcontext_vaapi.c
@@ -1715,7 +1715,7 @@ static int vaapi_device_create(AVHWDeviceContext *ctx, const char *device,
     VAAPIDevicePriv *priv;
     VADisplay display = NULL;
     const AVDictionaryEntry *ent;
-    int try_drm, try_x11, try_win32, try_all;
+    int try_drm, try_x11, try_win32, try_all av_unused;
 
     priv = av_mallocz(sizeof(*priv));
     if (!priv)
-- 
2.52.0


From ee73954a0a571c900122b13d96ee26e00833800a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 8 Dec 2025 20:49:59 +0100
Subject: [PATCH 268/304] avcodec/d3d12va_encode_av1: remove unused variables
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 libavcodec/d3d12va_encode_av1.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/libavcodec/d3d12va_encode_av1.c b/libavcodec/d3d12va_encode_av1.c
index 31d3df33bd..d98cbefeca 100644
--- a/libavcodec/d3d12va_encode_av1.c
+++ b/libavcodec/d3d12va_encode_av1.c
@@ -346,8 +346,6 @@ fail:
 static int d3d12va_encode_av1_get_buffer_size(AVCodecContext *avctx,
                                               D3D12VAEncodePicture *pic, size_t *size)
 {
-    D3D12VAEncodeContext                                    *ctx = avctx->priv_data;
-    D3D12_VIDEO_ENCODER_OUTPUT_METADATA                    *meta = NULL;
     D3D12_VIDEO_ENCODER_FRAME_SUBREGION_METADATA *subregion_meta = NULL;
     uint8_t                                                *data = NULL;
     HRESULT                                                   hr = S_OK;
@@ -383,7 +381,6 @@ static int d3d12va_encode_av1_get_coded_data(AVCodecContext *avctx,
     size_t    av1_pic_hd_size = 0;
     int tile_group_extra_size = 0;
     size_t            bit_len = 0;
-    D3D12VAEncodeContext *ctx = avctx->priv_data;
 
     char pic_hd_data[MAX_PARAM_BUFFER_SIZE] = { 0 };
 
-- 
2.52.0


From a84a5e7ab223aa61aab6cad01d9d498f353c9132 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 8 Dec 2025 20:50:21 +0100
Subject: [PATCH 269/304] avcodec/d3d12va_encode_av1: fix size_t format
 specifier

---
 libavcodec/d3d12va_encode_av1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/d3d12va_encode_av1.c b/libavcodec/d3d12va_encode_av1.c
index d98cbefeca..762e1c7987 100644
--- a/libavcodec/d3d12va_encode_av1.c
+++ b/libavcodec/d3d12va_encode_av1.c
@@ -425,7 +425,7 @@ static int d3d12va_encode_av1_get_coded_data(AVCodecContext *avctx,
     memcpy(ptr, pic_hd_data, av1_pic_hd_size);
     ptr += av1_pic_hd_size;
     total_size -= av1_pic_hd_size;
-    av_log(avctx, AV_LOG_DEBUG, "AV1 total_size after write picture header: %d.\n", total_size);
+    av_log(avctx, AV_LOG_DEBUG, "AV1 total_size after write picture header: %zu.\n", total_size);
 
     total_size -= tile_group_extra_size;
     err = d3d12va_encode_av1_write_tile_group(avctx, mapped_data, total_size, ptr, &bit_len);
-- 
2.52.0


From 508ed76d87736bcb9dd5fbc222f2b0cec4e2d4f4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Mon, 8 Dec 2025 20:50:42 +0100
Subject: [PATCH 270/304] avcodec/d3d12va_encode_av1: don't ignore return value
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 libavcodec/d3d12va_encode_av1.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/libavcodec/d3d12va_encode_av1.c b/libavcodec/d3d12va_encode_av1.c
index 762e1c7987..cf19597f0d 100644
--- a/libavcodec/d3d12va_encode_av1.c
+++ b/libavcodec/d3d12va_encode_av1.c
@@ -992,9 +992,7 @@ static int d3d12va_encode_av1_init_picture_params(AVCodecContext *avctx,
             d3d12va_pic->pic_ctl.pAV1PicData->ReferenceIndices[i] = fh->ref_frame_idx[i];
     }
 
-    int ret = av_fifo_write(priv->picture_header_list, &priv->units.raw_frame_header, 1);
-
-    return 0;
+    return av_fifo_write(priv->picture_header_list, &priv->units.raw_frame_header, 1);
 }
 
 
-- 
2.52.0


From 7bd53fbca84df776a89def8bf2fc0c4cc4e11a41 Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Mon, 8 Dec 2025 20:07:19 -0300
Subject: [PATCH 271/304] avformat/cbs: add missing license headers

Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavformat/cbs.c     | 18 ++++++++++++++++++
 libavformat/cbs_apv.c | 18 ++++++++++++++++++
 libavformat/cbs_av1.c | 18 ++++++++++++++++++
 3 files changed, 54 insertions(+)

diff --git a/libavformat/cbs.c b/libavformat/cbs.c
index 748d298a40..c71220e338 100644
--- a/libavformat/cbs.c
+++ b/libavformat/cbs.c
@@ -1,2 +1,20 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
 #include "cbs.h"
 #include "libavcodec/cbs.c"
diff --git a/libavformat/cbs_apv.c b/libavformat/cbs_apv.c
index 145e5d09bb..3f6df5ed05 100644
--- a/libavformat/cbs_apv.c
+++ b/libavformat/cbs_apv.c
@@ -1,2 +1,20 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
 #include "cbs.h"
 #include "libavcodec/cbs_apv.c"
diff --git a/libavformat/cbs_av1.c b/libavformat/cbs_av1.c
index 3061832967..efaed7dfd0 100644
--- a/libavformat/cbs_av1.c
+++ b/libavformat/cbs_av1.c
@@ -1,3 +1,21 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
 #define CBS_AV1_OBU_TILE_LIST 0
 #define CBS_AV1_OBU_METADATA 0
 #define CBS_AV1_OBU_PADDING 0
-- 
2.52.0


From c86888735b3392f0421db9fc892852b7d9216190 Mon Sep 17 00:00:00 2001
From: Cameron Gutman <aicommander@gmail.com>
Date: Sun, 7 Dec 2025 13:51:05 -0600
Subject: [PATCH 272/304] hwcontext_vulkan: add APIs to get optional extensions

These provide a way for apps that initialize Vulkan themselves to know
which extensions we may be able to use without having to hardcode it.

Signed-off-by: Cameron Gutman <aicommander@gmail.com>
---
 doc/APIchanges               |  4 ++++
 libavutil/hwcontext_vulkan.c | 28 ++++++++++++++++++++++++++++
 libavutil/hwcontext_vulkan.h | 18 ++++++++++++++++++
 libavutil/version.h          |  4 ++--
 4 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/doc/APIchanges b/doc/APIchanges
index 744c52ff29..e670a08cf9 100644
--- a/doc/APIchanges
+++ b/doc/APIchanges
@@ -2,6 +2,10 @@ The last version increases of all libraries were on 2025-03-28
 
 API changes, most recent first:
 
+2025-12-xx - xxxxxxxxxx - lavu 60.20.100 - hwcontext_vulkan.h
+  Add av_vk_get_optional_instance_extensions().
+  Add av_vk_get_optional_device_extensions().
+
 2025-12-xx - xxxxxxxxxx - lavc 62.22.101 - avcodec.h
   Add avcodec_receive_frame_flags().
   Add AV_CODEC_RECEIVE_FRAME_FLAG_SYNCHRONOUS.
diff --git a/libavutil/hwcontext_vulkan.c b/libavutil/hwcontext_vulkan.c
index a120e6185c..9dfcb503c1 100644
--- a/libavutil/hwcontext_vulkan.c
+++ b/libavutil/hwcontext_vulkan.c
@@ -693,6 +693,34 @@ static const VulkanOptExtension optional_device_exts[] = {
     { VK_KHR_VIDEO_DECODE_AV1_EXTENSION_NAME,                 FF_VK_EXT_VIDEO_DECODE_AV1       },
 };
 
+const char **av_vk_get_optional_instance_extensions(int *count)
+{
+    const char **exts = av_malloc_array(sizeof(*exts),
+                                        FF_ARRAY_ELEMS(optional_instance_exts));
+    if (!exts)
+        return NULL;
+
+    for (int i = 0; i < FF_ARRAY_ELEMS(optional_instance_exts); i++)
+        exts[i] = optional_instance_exts[i].name;
+
+    *count = FF_ARRAY_ELEMS(optional_instance_exts);
+    return exts;
+}
+
+const char **av_vk_get_optional_device_extensions(int *count)
+{
+    const char **exts = av_malloc_array(sizeof(*exts),
+                                        FF_ARRAY_ELEMS(optional_device_exts));
+    if (!exts)
+        return NULL;
+
+    for (int i = 0; i < FF_ARRAY_ELEMS(optional_device_exts); i++)
+        exts[i] = optional_device_exts[i].name;
+
+    *count = FF_ARRAY_ELEMS(optional_device_exts);
+    return exts;
+}
+
 static VkBool32 VKAPI_CALL vk_dbg_callback(VkDebugUtilsMessageSeverityFlagBitsEXT severity,
                                            VkDebugUtilsMessageTypeFlagsEXT messageType,
                                            const VkDebugUtilsMessengerCallbackDataEXT *data,
diff --git a/libavutil/hwcontext_vulkan.h b/libavutil/hwcontext_vulkan.h
index 0bb536ab3f..77d53289b4 100644
--- a/libavutil/hwcontext_vulkan.h
+++ b/libavutil/hwcontext_vulkan.h
@@ -97,6 +97,8 @@ typedef struct AVVulkanDeviceContext {
      * each entry containing the specified Vulkan extension string to enable.
      * Duplicates are possible and accepted.
      * If no extensions are enabled, set these fields to NULL, and 0 respectively.
+     * av_vk_get_optional_instance_extensions() can be used to enumerate extensions
+     * that FFmpeg may use if enabled.
      */
     const char * const *enabled_inst_extensions;
     int nb_enabled_inst_extensions;
@@ -108,6 +110,8 @@ typedef struct AVVulkanDeviceContext {
      * If supplying your own device context, these fields takes the same format as
      * the above fields, with the same conditions that duplicates are possible
      * and accepted, and that NULL and 0 respectively means no extensions are enabled.
+     * av_vk_get_optional_device_extensions() can be used to enumerate extensions
+     * that FFmpeg may use if enabled.
      */
     const char * const *enabled_dev_extensions;
     int nb_enabled_dev_extensions;
@@ -375,4 +379,18 @@ AVVkFrame *av_vk_frame_alloc(void);
  */
 const VkFormat *av_vkfmt_from_pixfmt(enum AVPixelFormat p);
 
+/**
+ * Returns an array of optional Vulkan instance extensions that FFmpeg
+ * may use if enabled.
+ * @note Must be freed via av_free()
+ */
+const char **av_vk_get_optional_instance_extensions(int *count);
+
+/**
+ * Returns an array of optional Vulkan device extensions that FFmpeg
+ * may use if enabled.
+ * @note Must be freed via av_free()
+ */
+const char **av_vk_get_optional_device_extensions(int *count);
+
 #endif /* AVUTIL_HWCONTEXT_VULKAN_H */
diff --git a/libavutil/version.h b/libavutil/version.h
index d058e94425..fe76affb41 100644
--- a/libavutil/version.h
+++ b/libavutil/version.h
@@ -79,8 +79,8 @@
  */
 
 #define LIBAVUTIL_VERSION_MAJOR  60
-#define LIBAVUTIL_VERSION_MINOR  19
-#define LIBAVUTIL_VERSION_MICRO 101
+#define LIBAVUTIL_VERSION_MINOR  20
+#define LIBAVUTIL_VERSION_MICRO 100
 
 #define LIBAVUTIL_VERSION_INT   AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \
                                                LIBAVUTIL_VERSION_MINOR, \
-- 
2.52.0


From b8396e18d535bd9416596670028f4f5abc8b748e Mon Sep 17 00:00:00 2001
From: Cameron Gutman <aicommander@gmail.com>
Date: Sun, 7 Dec 2025 13:57:42 -0600
Subject: [PATCH 273/304] fftools/ffplay_renderer: use new Vulkan extension API

Signed-off-by: Cameron Gutman <aicommander@gmail.com>
---
 fftools/ffplay_renderer.c | 41 +++++++++------------------------------
 1 file changed, 9 insertions(+), 32 deletions(-)

diff --git a/fftools/ffplay_renderer.c b/fftools/ffplay_renderer.c
index 699cd6ecd0..1321452ad8 100644
--- a/fftools/ffplay_renderer.c
+++ b/fftools/ffplay_renderer.c
@@ -104,36 +104,6 @@ static void vk_log_cb(void *log_priv, enum pl_log_level level,
         av_log(log_priv, level_map[level], "%s\n", msg);
 }
 
-// Should keep sync with optional_device_exts inside hwcontext_vulkan.c
-static const char *optional_device_exts[] = {
-    /* Misc or required by other extensions */
-    VK_KHR_PORTABILITY_SUBSET_EXTENSION_NAME,
-    VK_KHR_PUSH_DESCRIPTOR_EXTENSION_NAME,
-    VK_KHR_SAMPLER_YCBCR_CONVERSION_EXTENSION_NAME,
-    VK_EXT_DESCRIPTOR_BUFFER_EXTENSION_NAME,
-    VK_EXT_PHYSICAL_DEVICE_DRM_EXTENSION_NAME,
-    VK_EXT_SHADER_ATOMIC_FLOAT_EXTENSION_NAME,
-    VK_KHR_COOPERATIVE_MATRIX_EXTENSION_NAME,
-
-    /* Imports/exports */
-    VK_KHR_EXTERNAL_MEMORY_FD_EXTENSION_NAME,
-    VK_EXT_EXTERNAL_MEMORY_DMA_BUF_EXTENSION_NAME,
-    VK_EXT_IMAGE_DRM_FORMAT_MODIFIER_EXTENSION_NAME,
-    VK_KHR_EXTERNAL_SEMAPHORE_FD_EXTENSION_NAME,
-    VK_EXT_EXTERNAL_MEMORY_HOST_EXTENSION_NAME,
-#ifdef _WIN32
-    VK_KHR_EXTERNAL_MEMORY_WIN32_EXTENSION_NAME,
-    VK_KHR_EXTERNAL_SEMAPHORE_WIN32_EXTENSION_NAME,
-#endif
-
-    /* Video encoding/decoding */
-    VK_KHR_VIDEO_QUEUE_EXTENSION_NAME,
-    VK_KHR_VIDEO_DECODE_QUEUE_EXTENSION_NAME,
-    VK_KHR_VIDEO_DECODE_H264_EXTENSION_NAME,
-    VK_KHR_VIDEO_DECODE_H265_EXTENSION_NAME,
-    "VK_MESA_video_decode_av1",
-};
-
 static inline int enable_debug(const AVDictionary *opt)
 {
     AVDictionaryEntry *entry = av_dict_get(opt, "debug", NULL, 0);
@@ -374,6 +344,8 @@ static int create_vk_by_placebo(VkRenderer *renderer,
     int decode_index;
     int decode_count;
     int ret;
+    const char **dev_exts;
+    int num_dev_exts;
 
     ctx->get_proc_addr = SDL_Vulkan_GetVkGetInstanceProcAddr();
 
@@ -388,16 +360,21 @@ static int create_vk_by_placebo(VkRenderer *renderer,
     }
     ctx->inst = ctx->placebo_instance->instance;
 
+    dev_exts = av_vk_get_optional_device_extensions(&num_dev_exts);
+    if (!dev_exts)
+        return AVERROR(ENOMEM);
+
     ctx->placebo_vulkan = pl_vulkan_create(ctx->vk_log, pl_vulkan_params(
             .instance = ctx->placebo_instance->instance,
             .get_proc_addr = ctx->placebo_instance->get_proc_addr,
             .surface = ctx->vk_surface,
             .allow_software = false,
-            .opt_extensions = optional_device_exts,
-            .num_opt_extensions = FF_ARRAY_ELEMS(optional_device_exts),
+            .opt_extensions = dev_exts,
+            .num_opt_extensions = num_dev_exts,
             .extra_queues = VK_QUEUE_VIDEO_DECODE_BIT_KHR,
             .device_name = select_device(opt),
     ));
+    av_free(dev_exts);
     if (!ctx->placebo_vulkan)
         return AVERROR_EXTERNAL;
     ctx->hw_device_ref = av_hwdevice_ctx_alloc(AV_HWDEVICE_TYPE_VULKAN);
-- 
2.52.0


From 9c9bde944895921f83c14b437fc4f3bade20fec7 Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Tue, 9 Dec 2025 00:21:25 +0100
Subject: [PATCH 274/304] tests/ref/fate/source: Fix fate-source after last
 commit

Broken in ac9552bebf9701edaba8def91f003f71a6623504.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 tests/ref/fate/source | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/tests/ref/fate/source b/tests/ref/fate/source
index 54af72c008..78d3a2e0fa 100644
--- a/tests/ref/fate/source
+++ b/tests/ref/fate/source
@@ -10,9 +10,6 @@ libavdevice/riscv/cpu_common.c
 libavfilter/file_open.c
 libavfilter/log2_tab.c
 libavfilter/riscv/cpu_common.c
-libavformat/cbs.c
-libavformat/cbs_apv.c
-libavformat/cbs_av1.c
 libavformat/file_open.c
 libavformat/golomb_tab.c
 libavformat/log2_tab.c
-- 
2.52.0


From 3a4bde6c3c093c9dca4119c0d2c0588d2be8d51e Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Date: Sun, 7 Dec 2025 16:58:41 +0100
Subject: [PATCH 275/304] tests/checkasm/checkasm: Don't test 3dnow

The last 3dnow functions have been removed in commit
5ef613bcb0508f16bd5b190168183326391de9b0, so don't test
it in checkasm.

(This will affect only one test, namely scalarproduct_and_madd_int16
from lossless_audiodsp: It does not use an SSSE3 function when
the 3dnow flag is set. So for old AMDs (which advertise support for
3dnow), said SSSE3 function is never tested. Now it will.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 tests/checkasm/checkasm.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 6e066d469a..14faf71275 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -390,8 +390,6 @@ static const struct {
 #elif ARCH_X86
     { "MMX",        "mmx",       AV_CPU_FLAG_MMX|AV_CPU_FLAG_CMOV },
     { "MMXEXT",     "mmxext",    AV_CPU_FLAG_MMXEXT },
-    { "3DNOW",      "3dnow",     AV_CPU_FLAG_3DNOW },
-    { "3DNOWEXT",   "3dnowext",  AV_CPU_FLAG_3DNOWEXT },
     { "SSE",        "sse",       AV_CPU_FLAG_SSE },
     { "SSE2",       "sse2",      AV_CPU_FLAG_SSE2|AV_CPU_FLAG_SSE2SLOW },
     { "SSE3",       "sse3",      AV_CPU_FLAG_SSE3|AV_CPU_FLAG_SSE3SLOW },
-- 
2.52.0


From d37a910c8695e711a993147cdf5cd1492f0ea118 Mon Sep 17 00:00:00 2001
From: Gyan Doshi <ffmpeg@gyani.pro>
Date: Tue, 9 Dec 2025 10:22:36 +0530
Subject: [PATCH 276/304] doc/htmlxref.cnf: add drawvg-reference

Fixes #21125
---
 doc/htmlxref.cnf | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/htmlxref.cnf b/doc/htmlxref.cnf
index 079c848a65..0552ab2a64 100644
--- a/doc/htmlxref.cnf
+++ b/doc/htmlxref.cnf
@@ -4,3 +4,4 @@ ffmpeg-formats mono ./ffmpeg-formats.html
 ffmpeg-resampler mono ./ffmpeg-resampler.html
 ffmpeg-scaler mono ./ffmpeg-scaler.html
 ffmpeg-utils mono ./ffmpeg-utils.html
+drawvg-reference mono ./drawvg-reference.html
-- 
2.52.0


From e6d52a6fceb8ad9ab2c1e6317f185f54c93d2d4a Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Fri, 5 Dec 2025 18:28:10 +0100
Subject: [PATCH 277/304] swscale/format: exclude U32 from sws_pixel_type()

This function is supposed to give us representable pixel types; but U32 is not
representable (due only to the AVRational range limit).
---
 libswscale/format.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libswscale/format.c b/libswscale/format.c
index cbb5cb35fe..f9a156e29e 100644
--- a/libswscale/format.c
+++ b/libswscale/format.c
@@ -616,12 +616,13 @@ static SwsPixelType fmt_pixel_type(enum AVPixelFormat fmt)
     if (desc->flags & AV_PIX_FMT_FLAG_FLOAT) {
         switch (bits) {
         case 32: return SWS_PIXEL_F32;
+        /* TODO: no support for 16-bit float yet */
         }
     } else {
         switch (bits) {
         case  8: return SWS_PIXEL_U8;
         case 16: return SWS_PIXEL_U16;
-        case 32: return SWS_PIXEL_U32;
+        /* TODO: AVRational cannot represent UINT32_MAX */
         }
     }
 
-- 
2.52.0


From 10c9c84df86278e2451947d60fb2a25e0bb28574 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Fri, 5 Dec 2025 18:32:10 +0100
Subject: [PATCH 278/304] swscale/format: check SwsPixelType in
 fmt_read_write()

This is the only function that actually has the ability to return an
error, so just move the pixel type assignment here and add a check to
ensure a valid pixel type is found.
---
 libswscale/format.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/libswscale/format.c b/libswscale/format.c
index f9a156e29e..a1afa64f4d 100644
--- a/libswscale/format.c
+++ b/libswscale/format.c
@@ -788,12 +788,16 @@ static SwsConst fmt_clear(enum AVPixelFormat fmt)
 }
 
 static int fmt_read_write(enum AVPixelFormat fmt, SwsReadWriteOp *rw_op,
-                          SwsPackOp *pack_op)
+                          SwsPackOp *pack_op, SwsPixelType *pixel_type)
 {
     const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(fmt);
     if (!desc)
         return AVERROR(EINVAL);
 
+    *pixel_type = fmt_pixel_type(fmt);
+    if (!*pixel_type)
+        return AVERROR(ENOTSUP);
+
     switch (fmt) {
     case AV_PIX_FMT_NONE:
     case AV_PIX_FMT_NB:
@@ -1024,12 +1028,13 @@ static SwsPixelType get_packed_type(SwsPackOp pack)
 int ff_sws_decode_pixfmt(SwsOpList *ops, enum AVPixelFormat fmt)
 {
     const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(fmt);
-    SwsPixelType pixel_type = fmt_pixel_type(fmt);
-    SwsPixelType raw_type = pixel_type;
+    SwsPixelType pixel_type;
     SwsReadWriteOp rw_op;
     SwsPackOp unpack;
 
-    RET(fmt_read_write(fmt, &rw_op, &unpack));
+    RET(fmt_read_write(fmt, &rw_op, &unpack, &pixel_type));
+
+    SwsPixelType raw_type = pixel_type;
     if (unpack.pattern[0])
         raw_type = get_packed_type(unpack);
 
@@ -1085,12 +1090,13 @@ int ff_sws_decode_pixfmt(SwsOpList *ops, enum AVPixelFormat fmt)
 int ff_sws_encode_pixfmt(SwsOpList *ops, enum AVPixelFormat fmt)
 {
     const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(fmt);
-    SwsPixelType pixel_type = fmt_pixel_type(fmt);
-    SwsPixelType raw_type = pixel_type;
+    SwsPixelType pixel_type;
     SwsReadWriteOp rw_op;
     SwsPackOp pack;
 
-    RET(fmt_read_write(fmt, &rw_op, &pack));
+    RET(fmt_read_write(fmt, &rw_op, &pack, &pixel_type));
+
+    SwsPixelType raw_type = pixel_type;
     if (pack.pattern[0])
         raw_type = get_packed_type(pack);
 
-- 
2.52.0


From ecd42004bd19592bb84e13896f710c6e4c87d91f Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Fri, 5 Dec 2025 18:45:22 +0100
Subject: [PATCH 279/304] swscale/format: derive fmt_swizzle() from
 AVPixFmtDescriptor when possible

Unfortunately, this is exceptionally difficult to handle in the general case,
when packed/bitstream formats come into play - the actual interpretation of
the offset, shift etc. are so difficult to deal with in a general case that
I think it's simpler to continue falling back to a static table of variants
for these exceptions. They are fortunately small in number.
---
 libswscale/format.c | 108 ++++++++++++++++++++++++--------------------
 1 file changed, 59 insertions(+), 49 deletions(-)

diff --git a/libswscale/format.c b/libswscale/format.c
index a1afa64f4d..7be3685c6c 100644
--- a/libswscale/format.c
+++ b/libswscale/format.c
@@ -629,41 +629,86 @@ static SwsPixelType fmt_pixel_type(enum AVPixelFormat fmt)
     return SWS_PIXEL_NONE;
 }
 
+/* A regular format is defined as any format that contains only a single
+ * component per elementary data type (i.e. no sub-byte pack/unpack needed),
+ * and whose components map 1:1 onto elementary data units */
+static int is_regular_fmt(enum AVPixelFormat fmt)
+{
+    const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(fmt);
+    if (desc->flags & (AV_PIX_FMT_FLAG_PAL | AV_PIX_FMT_FLAG_BAYER))
+        return 0; /* no 1:1 correspondence between components and data units */
+    if (desc->flags & (AV_PIX_FMT_FLAG_BITSTREAM))
+        return 0; /* bitstream formats are packed by definition */
+    if ((desc->flags & AV_PIX_FMT_FLAG_PLANAR) || desc->nb_components == 1)
+        return 1; /* planar formats are regular by definition */
+
+    const int step = desc->comp[0].step;
+    int total_bits = 0;
+
+    for (int i = 0; i < desc->nb_components; i++) {
+        if (desc->comp[i].shift || desc->comp[i].step != step)
+            return 0; /* irregular/packed format */
+        total_bits += desc->comp[i].depth;
+    }
+
+    /* Exclude formats with missing components like RGB0, 0RGB, etc. */
+    return total_bits == step * 8;
+}
+
+struct comp {
+    int index;
+    int plane;
+    int offset;
+};
+
+/* Compare by (plane, offset) */
+static int cmp_comp(const void *a, const void *b) {
+    const struct comp *ca = a;
+    const struct comp *cb = b;
+    if (ca->plane != cb->plane)
+        return ca->plane - cb->plane;
+    return ca->offset - cb->offset;
+}
+
 static SwsSwizzleOp fmt_swizzle(enum AVPixelFormat fmt)
 {
     const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(fmt);
-    if (desc->nb_components == 2) /* YA formats */
-        return (SwsSwizzleOp) {{ .x = 0, 3, 1, 2 }};
+    if (desc->nb_components == 2) {
+        /* YA formats */
+        return SWS_SWIZZLE(0, 3, 1, 2);
+    } else if (is_regular_fmt(fmt)) {
+        /* Sort by increasing component order */
+        struct comp sorted[4] = { {0}, {1}, {2}, {3} };
+        for (int i = 0; i < desc->nb_components; i++) {
+            sorted[i].plane  = desc->comp[i].plane;
+            sorted[i].offset = desc->comp[i].offset;
+        }
+
+        qsort(sorted, desc->nb_components, sizeof(struct comp), cmp_comp);
+
+        SwsSwizzleOp swiz = SWS_SWIZZLE(0, 1, 2, 3);
+        for (int i = 0; i < desc->nb_components; i++)
+            swiz.in[i] = sorted[i].index;
+        return swiz;
+    }
 
     switch (fmt) {
-    case AV_PIX_FMT_ARGB:
     case AV_PIX_FMT_0RGB:
-    case AV_PIX_FMT_AYUV64LE:
-    case AV_PIX_FMT_AYUV64BE:
-    case AV_PIX_FMT_AYUV:
     case AV_PIX_FMT_X2RGB10LE:
     case AV_PIX_FMT_X2RGB10BE:
         return (SwsSwizzleOp) {{ .x = 3, 0, 1, 2 }};
-    case AV_PIX_FMT_BGR24:
     case AV_PIX_FMT_BGR8:
     case AV_PIX_FMT_BGR4:
     case AV_PIX_FMT_BGR4_BYTE:
-    case AV_PIX_FMT_BGRA:
     case AV_PIX_FMT_BGR565BE:
     case AV_PIX_FMT_BGR565LE:
     case AV_PIX_FMT_BGR555BE:
     case AV_PIX_FMT_BGR555LE:
     case AV_PIX_FMT_BGR444BE:
     case AV_PIX_FMT_BGR444LE:
-    case AV_PIX_FMT_BGR48BE:
-    case AV_PIX_FMT_BGR48LE:
-    case AV_PIX_FMT_BGRA64BE:
-    case AV_PIX_FMT_BGRA64LE:
     case AV_PIX_FMT_BGR0:
-    case AV_PIX_FMT_VUYA:
     case AV_PIX_FMT_VUYX:
         return (SwsSwizzleOp) {{ .x = 2, 1, 0, 3 }};
-    case AV_PIX_FMT_ABGR:
     case AV_PIX_FMT_0BGR:
     case AV_PIX_FMT_X2BGR10LE:
     case AV_PIX_FMT_X2BGR10BE:
@@ -671,7 +716,6 @@ static SwsSwizzleOp fmt_swizzle(enum AVPixelFormat fmt)
     case AV_PIX_FMT_XV30BE:
     case AV_PIX_FMT_XV30LE:
         return (SwsSwizzleOp) {{ .x = 3, 2, 0, 1 }};
-    case AV_PIX_FMT_VYU444:
     case AV_PIX_FMT_V30XBE:
     case AV_PIX_FMT_V30XLE:
         return (SwsSwizzleOp) {{ .x = 2, 0, 1, 3 }};
@@ -679,41 +723,7 @@ static SwsSwizzleOp fmt_swizzle(enum AVPixelFormat fmt)
     case AV_PIX_FMT_XV36LE:
     case AV_PIX_FMT_XV48BE:
     case AV_PIX_FMT_XV48LE:
-    case AV_PIX_FMT_UYVA:
         return (SwsSwizzleOp) {{ .x = 1, 0, 2, 3 }};
-    case AV_PIX_FMT_GBRP:
-    case AV_PIX_FMT_GBRP9BE:
-    case AV_PIX_FMT_GBRP9LE:
-    case AV_PIX_FMT_GBRP10BE:
-    case AV_PIX_FMT_GBRP10LE:
-    case AV_PIX_FMT_GBRP12BE:
-    case AV_PIX_FMT_GBRP12LE:
-    case AV_PIX_FMT_GBRP14BE:
-    case AV_PIX_FMT_GBRP14LE:
-    case AV_PIX_FMT_GBRP16BE:
-    case AV_PIX_FMT_GBRP16LE:
-    case AV_PIX_FMT_GBRPF16BE:
-    case AV_PIX_FMT_GBRPF16LE:
-    case AV_PIX_FMT_GBRAP:
-    case AV_PIX_FMT_GBRAP10LE:
-    case AV_PIX_FMT_GBRAP10BE:
-    case AV_PIX_FMT_GBRAP12LE:
-    case AV_PIX_FMT_GBRAP12BE:
-    case AV_PIX_FMT_GBRAP14LE:
-    case AV_PIX_FMT_GBRAP14BE:
-    case AV_PIX_FMT_GBRAP16LE:
-    case AV_PIX_FMT_GBRAP16BE:
-    case AV_PIX_FMT_GBRPF32BE:
-    case AV_PIX_FMT_GBRPF32LE:
-    case AV_PIX_FMT_GBRAPF16BE:
-    case AV_PIX_FMT_GBRAPF16LE:
-    case AV_PIX_FMT_GBRAPF32BE:
-    case AV_PIX_FMT_GBRAPF32LE:
-    case AV_PIX_FMT_GBRP10MSBBE:
-    case AV_PIX_FMT_GBRP10MSBLE:
-    case AV_PIX_FMT_GBRP12MSBBE:
-    case AV_PIX_FMT_GBRP12MSBLE:
-        return (SwsSwizzleOp) {{ .x = 1, 2, 0, 3 }};
     default:
         return (SwsSwizzleOp) {{ .x = 0, 1, 2, 3 }};
     }
-- 
2.52.0


From bdb8b46381e6a0c6207096d20deb73970845cbe1 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Fri, 5 Dec 2025 18:49:54 +0100
Subject: [PATCH 280/304] swscale/format: derive fmt_shift() from
 AVPixFmtDescriptor

XV36 is the odd one out, being a byte-shifted packed format whose components
don't actually cross any byte boundaries.
---
 libswscale/format.c | 39 +++++++++++----------------------------
 1 file changed, 11 insertions(+), 28 deletions(-)

diff --git a/libswscale/format.c b/libswscale/format.c
index 7be3685c6c..076990d48d 100644
--- a/libswscale/format.c
+++ b/libswscale/format.c
@@ -742,36 +742,19 @@ static SwsSwizzleOp swizzle_inv(SwsSwizzleOp swiz) {
 /* Shift factor for MSB aligned formats */
 static int fmt_shift(enum AVPixelFormat fmt)
 {
-    switch (fmt) {
-    case AV_PIX_FMT_P010BE:
-    case AV_PIX_FMT_P010LE:
-    case AV_PIX_FMT_P210BE:
-    case AV_PIX_FMT_P210LE:
-    case AV_PIX_FMT_Y210BE:
-    case AV_PIX_FMT_Y210LE:
-    case AV_PIX_FMT_YUV444P10MSBBE:
-    case AV_PIX_FMT_YUV444P10MSBLE:
-    case AV_PIX_FMT_GBRP10MSBBE:
-    case AV_PIX_FMT_GBRP10MSBLE:
-        return 6;
-    case AV_PIX_FMT_P012BE:
-    case AV_PIX_FMT_P012LE:
-    case AV_PIX_FMT_P212BE:
-    case AV_PIX_FMT_P212LE:
-    case AV_PIX_FMT_P412BE:
-    case AV_PIX_FMT_P412LE:
-    case AV_PIX_FMT_XV36BE:
-    case AV_PIX_FMT_XV36LE:
-    case AV_PIX_FMT_XYZ12BE:
-    case AV_PIX_FMT_XYZ12LE:
-    case AV_PIX_FMT_YUV444P12MSBBE:
-    case AV_PIX_FMT_YUV444P12MSBLE:
-    case AV_PIX_FMT_GBRP12MSBBE:
-    case AV_PIX_FMT_GBRP12MSBLE:
-        return 4;
+    if (is_regular_fmt(fmt)) {
+        const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(fmt);
+        return desc->comp[0].shift;
     }
 
-    return 0;
+    switch (fmt) {
+    case AV_PIX_FMT_XV36BE:
+    case AV_PIX_FMT_XV36LE:
+        return 4;
+    default:
+        /* Sub-byte irregular formats are always handled by SWS_OP_(UN)PACK */
+        return 0;
+    }
 }
 
 /**
-- 
2.52.0


From a17db0732f60becf808c44f34556d55fc8e91c6e Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Fri, 5 Dec 2025 19:12:11 +0100
Subject: [PATCH 281/304] swscale/format: explicitly test for unsupported
 subsampled formats

This includes semiplanar formats. Note that the first check typically
subsumes the second check, but I decided to keep both for clarity.
---
 libswscale/format.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/libswscale/format.c b/libswscale/format.c
index 076990d48d..ffdde2af01 100644
--- a/libswscale/format.c
+++ b/libswscale/format.c
@@ -787,6 +787,15 @@ static int fmt_read_write(enum AVPixelFormat fmt, SwsReadWriteOp *rw_op,
     if (!desc)
         return AVERROR(EINVAL);
 
+    /* No support for subsampled formats at the moment */
+    if (desc->log2_chroma_w || desc->log2_chroma_h)
+        return AVERROR(ENOTSUP);
+
+    /* No support for semi-planar formats at the moment */
+    if (desc->flags & AV_PIX_FMT_FLAG_PLANAR &&
+        av_pix_fmt_count_planes(fmt) < desc->nb_components)
+        return AVERROR(ENOTSUP);
+
     *pixel_type = fmt_pixel_type(fmt);
     if (!*pixel_type)
         return AVERROR(ENOTSUP);
-- 
2.52.0


From ec7c490041018349a8ee4b85a795d2bf9d735f17 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Fri, 5 Dec 2025 19:17:30 +0100
Subject: [PATCH 282/304] swscale/format: derive fmt_read_write() for regular
 formats

---
 libswscale/format.c | 129 ++++----------------------------------------
 1 file changed, 9 insertions(+), 120 deletions(-)

diff --git a/libswscale/format.c b/libswscale/format.c
index ffdde2af01..013ff040b4 100644
--- a/libswscale/format.c
+++ b/libswscale/format.c
@@ -800,11 +800,16 @@ static int fmt_read_write(enum AVPixelFormat fmt, SwsReadWriteOp *rw_op,
     if (!*pixel_type)
         return AVERROR(ENOTSUP);
 
-    switch (fmt) {
-    case AV_PIX_FMT_NONE:
-    case AV_PIX_FMT_NB:
-        break;
+    if (is_regular_fmt(fmt)) {
+        *pack_op = (SwsPackOp) {0};
+        *rw_op = (SwsReadWriteOp) {
+            .elems  = desc->nb_components,
+            .packed = desc->nb_components > 1 && !(desc->flags & AV_PIX_FMT_FLAG_PLANAR),
+        };
+        return 0;
+    }
 
+    switch (fmt) {
     /* Packed bitstream formats */
     case AV_PIX_FMT_MONOWHITE:
     case AV_PIX_FMT_MONOBLACK:
@@ -888,122 +893,6 @@ static int fmt_read_write(enum AVPixelFormat fmt, SwsReadWriteOp *rw_op,
         *pack_op = (SwsPackOp) {0};
         *rw_op = (SwsReadWriteOp) { .elems = 4, .packed = true };
         return 0;
-    /* Unpacked byte-aligned 4:4:4 formats */
-    case AV_PIX_FMT_YUV444P:
-    case AV_PIX_FMT_YUVJ444P:
-    case AV_PIX_FMT_YUV444P9BE:
-    case AV_PIX_FMT_YUV444P9LE:
-    case AV_PIX_FMT_YUV444P10BE:
-    case AV_PIX_FMT_YUV444P10LE:
-    case AV_PIX_FMT_YUV444P12BE:
-    case AV_PIX_FMT_YUV444P12LE:
-    case AV_PIX_FMT_YUV444P14BE:
-    case AV_PIX_FMT_YUV444P14LE:
-    case AV_PIX_FMT_YUV444P16BE:
-    case AV_PIX_FMT_YUV444P16LE:
-    case AV_PIX_FMT_YUV444P10MSBBE:
-    case AV_PIX_FMT_YUV444P10MSBLE:
-    case AV_PIX_FMT_YUV444P12MSBBE:
-    case AV_PIX_FMT_YUV444P12MSBLE:
-    case AV_PIX_FMT_YUVA444P:
-    case AV_PIX_FMT_YUVA444P9BE:
-    case AV_PIX_FMT_YUVA444P9LE:
-    case AV_PIX_FMT_YUVA444P10BE:
-    case AV_PIX_FMT_YUVA444P10LE:
-    case AV_PIX_FMT_YUVA444P12BE:
-    case AV_PIX_FMT_YUVA444P12LE:
-    case AV_PIX_FMT_YUVA444P16BE:
-    case AV_PIX_FMT_YUVA444P16LE:
-    case AV_PIX_FMT_AYUV:
-    case AV_PIX_FMT_UYVA:
-    case AV_PIX_FMT_VYU444:
-    case AV_PIX_FMT_AYUV64BE:
-    case AV_PIX_FMT_AYUV64LE:
-    case AV_PIX_FMT_VUYA:
-    case AV_PIX_FMT_RGB24:
-    case AV_PIX_FMT_BGR24:
-    case AV_PIX_FMT_RGB48BE:
-    case AV_PIX_FMT_RGB48LE:
-    case AV_PIX_FMT_BGR48BE:
-    case AV_PIX_FMT_BGR48LE:
-    //case AV_PIX_FMT_RGB96BE: TODO: AVRational can't fit 2^32-1
-    //case AV_PIX_FMT_RGB96LE:
-    //case AV_PIX_FMT_RGBF16BE: TODO: no support for float16 currently
-    //case AV_PIX_FMT_RGBF16LE:
-    case AV_PIX_FMT_RGBF32BE:
-    case AV_PIX_FMT_RGBF32LE:
-    case AV_PIX_FMT_ARGB:
-    case AV_PIX_FMT_RGBA:
-    case AV_PIX_FMT_ABGR:
-    case AV_PIX_FMT_BGRA:
-    case AV_PIX_FMT_RGBA64BE:
-    case AV_PIX_FMT_RGBA64LE:
-    case AV_PIX_FMT_BGRA64BE:
-    case AV_PIX_FMT_BGRA64LE:
-    //case AV_PIX_FMT_RGBA128BE: TODO: AVRational can't fit 2^32-1
-    //case AV_PIX_FMT_RGBA128LE:
-    case AV_PIX_FMT_RGBAF32BE:
-    case AV_PIX_FMT_RGBAF32LE:
-    case AV_PIX_FMT_GBRP:
-    case AV_PIX_FMT_GBRP9BE:
-    case AV_PIX_FMT_GBRP9LE:
-    case AV_PIX_FMT_GBRP10BE:
-    case AV_PIX_FMT_GBRP10LE:
-    case AV_PIX_FMT_GBRP12BE:
-    case AV_PIX_FMT_GBRP12LE:
-    case AV_PIX_FMT_GBRP14BE:
-    case AV_PIX_FMT_GBRP14LE:
-    case AV_PIX_FMT_GBRP16BE:
-    case AV_PIX_FMT_GBRP16LE:
-    //case AV_PIX_FMT_GBRPF16BE: TODO
-    //case AV_PIX_FMT_GBRPF16LE:
-    case AV_PIX_FMT_GBRP10MSBBE:
-    case AV_PIX_FMT_GBRP10MSBLE:
-    case AV_PIX_FMT_GBRP12MSBBE:
-    case AV_PIX_FMT_GBRP12MSBLE:
-    case AV_PIX_FMT_GBRPF32BE:
-    case AV_PIX_FMT_GBRPF32LE:
-    case AV_PIX_FMT_GBRAP:
-    case AV_PIX_FMT_GBRAP10BE:
-    case AV_PIX_FMT_GBRAP10LE:
-    case AV_PIX_FMT_GBRAP12BE:
-    case AV_PIX_FMT_GBRAP12LE:
-    case AV_PIX_FMT_GBRAP14BE:
-    case AV_PIX_FMT_GBRAP14LE:
-    case AV_PIX_FMT_GBRAP16BE:
-    case AV_PIX_FMT_GBRAP16LE:
-    //case AV_PIX_FMT_GBRAPF16BE: TODO
-    //case AV_PIX_FMT_GBRAPF16LE:
-    case AV_PIX_FMT_GBRAPF32BE:
-    case AV_PIX_FMT_GBRAPF32LE:
-    case AV_PIX_FMT_GRAY8:
-    case AV_PIX_FMT_GRAY9BE:
-    case AV_PIX_FMT_GRAY9LE:
-    case AV_PIX_FMT_GRAY10BE:
-    case AV_PIX_FMT_GRAY10LE:
-    case AV_PIX_FMT_GRAY12BE:
-    case AV_PIX_FMT_GRAY12LE:
-    case AV_PIX_FMT_GRAY14BE:
-    case AV_PIX_FMT_GRAY14LE:
-    case AV_PIX_FMT_GRAY16BE:
-    case AV_PIX_FMT_GRAY16LE:
-    //case AV_PIX_FMT_GRAYF16BE: TODO
-    //case AV_PIX_FMT_GRAYF16LE:
-    //case AV_PIX_FMT_YAF16BE:
-    //case AV_PIX_FMT_YAF16LE:
-    case AV_PIX_FMT_GRAYF32BE:
-    case AV_PIX_FMT_GRAYF32LE:
-    case AV_PIX_FMT_YAF32BE:
-    case AV_PIX_FMT_YAF32LE:
-    case AV_PIX_FMT_YA8:
-    case AV_PIX_FMT_YA16LE:
-    case AV_PIX_FMT_YA16BE:
-        *pack_op = (SwsPackOp) {0};
-        *rw_op = (SwsReadWriteOp) {
-            .elems  = desc->nb_components,
-            .packed = desc->nb_components > 1 && !(desc->flags & AV_PIX_FMT_FLAG_PLANAR),
-        };
-        return 0;
     }
 
     return AVERROR(ENOTSUP);
-- 
2.52.0


From fdbabe07468242051d8a294a5b763c03f2b6a5af Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Fri, 5 Dec 2025 19:35:39 +0100
Subject: [PATCH 283/304] swscale/format: consolidate format information into a
 single struct

I use a switch/case instead of an array because this is only needed for
irregular formats, which are very sparse.
---
 libswscale/format.c | 240 ++++++++++++++++++++------------------------
 1 file changed, 111 insertions(+), 129 deletions(-)

diff --git a/libswscale/format.c b/libswscale/format.c
index 013ff040b4..ff41503915 100644
--- a/libswscale/format.c
+++ b/libswscale/format.c
@@ -655,6 +655,107 @@ static int is_regular_fmt(enum AVPixelFormat fmt)
     return total_bits == step * 8;
 }
 
+typedef struct FmtInfo {
+    SwsReadWriteOp rw;
+    SwsSwizzleOp   swizzle;
+    SwsPackOp      pack;
+    int            shift;
+} FmtInfo;
+
+#define BITSTREAM_FMT(SWIZ, FRAC, PACKED, ...) (FmtInfo) {      \
+    .rw = { .elems = 1, .frac = FRAC, .packed = PACKED },       \
+    .swizzle = SWIZ,                                            \
+    __VA_ARGS__                                                 \
+}
+
+#define SUBPACKED_FMT(SWIZ, ...) (FmtInfo) {                    \
+    .rw = { .elems = 1, .packed = true },                       \
+    .swizzle = SWIZ,                                            \
+    .pack.pattern = {__VA_ARGS__},                              \
+}
+
+#define PACKED_FMT(SWIZ, N, ...) (FmtInfo) {                    \
+    .rw = { .elems = N, .packed = (N) > 1 },                    \
+    .swizzle = SWIZ,                                            \
+    __VA_ARGS__                                                 \
+}
+
+#define RGBA SWS_SWIZZLE(0, 1, 2, 3)
+#define BGRA SWS_SWIZZLE(2, 1, 0, 3)
+#define ARGB SWS_SWIZZLE(3, 0, 1, 2)
+#define ABGR SWS_SWIZZLE(3, 2, 1, 0)
+#define AVYU SWS_SWIZZLE(3, 2, 0, 1)
+#define VYUA SWS_SWIZZLE(2, 0, 1, 3)
+#define UYVA SWS_SWIZZLE(1, 0, 2, 3)
+#define VUYA BGRA
+
+static FmtInfo fmt_info_irregular(enum AVPixelFormat fmt)
+{
+    switch (fmt) {
+    /* Bitstream formats */
+    case AV_PIX_FMT_MONOWHITE:
+    case AV_PIX_FMT_MONOBLACK:
+        return BITSTREAM_FMT(RGBA, 3, false);
+    case AV_PIX_FMT_RGB4: return BITSTREAM_FMT(RGBA, 1, true, .pack = {{ 1, 2, 1 }});
+    case AV_PIX_FMT_BGR4: return BITSTREAM_FMT(BGRA, 1, true, .pack = {{ 1, 2, 1 }});
+
+    /* Sub-packed 8-bit aligned formats */
+    case AV_PIX_FMT_RGB4_BYTE:  return SUBPACKED_FMT(RGBA, 1, 2, 1);
+    case AV_PIX_FMT_BGR4_BYTE:  return SUBPACKED_FMT(BGRA, 1, 2, 1);
+    case AV_PIX_FMT_RGB8:       return SUBPACKED_FMT(RGBA, 3, 3, 2);
+    case AV_PIX_FMT_BGR8:       return SUBPACKED_FMT(BGRA, 2, 3, 3);
+
+    /* Sub-packed 16-bit aligned formats */
+    case AV_PIX_FMT_RGB565LE:
+    case AV_PIX_FMT_RGB565BE:
+        return SUBPACKED_FMT(RGBA, 5, 6, 5);
+    case AV_PIX_FMT_BGR565LE:
+    case AV_PIX_FMT_BGR565BE:
+        return SUBPACKED_FMT(BGRA, 5, 6, 5);
+    case AV_PIX_FMT_RGB555LE:
+    case AV_PIX_FMT_RGB555BE:
+        return SUBPACKED_FMT(RGBA, 5, 5, 5);
+    case AV_PIX_FMT_BGR555LE:
+    case AV_PIX_FMT_BGR555BE:
+        return SUBPACKED_FMT(BGRA, 5, 5, 5);
+    case AV_PIX_FMT_RGB444LE:
+    case AV_PIX_FMT_RGB444BE:
+        return SUBPACKED_FMT(RGBA, 4, 4, 4);
+    case AV_PIX_FMT_BGR444LE:
+    case AV_PIX_FMT_BGR444BE:
+        return SUBPACKED_FMT(BGRA, 4, 4, 4);
+
+    /* Sub-packed 32-bit aligned formats */
+    case AV_PIX_FMT_X2RGB10LE:
+    case AV_PIX_FMT_X2RGB10BE:
+        return SUBPACKED_FMT(ARGB, 2, 10, 10, 10);
+    case AV_PIX_FMT_X2BGR10LE:
+    case AV_PIX_FMT_X2BGR10BE:
+        return SUBPACKED_FMT(ABGR, 2, 10, 10, 10);
+    case AV_PIX_FMT_XV30LE:
+    case AV_PIX_FMT_XV30BE:
+        return SUBPACKED_FMT(AVYU, 2, 10, 10, 10);
+    case AV_PIX_FMT_V30XLE:
+    case AV_PIX_FMT_V30XBE:
+        return SUBPACKED_FMT(VYUA, 10, 10, 10, 2);
+
+    /* 3-component formats with extra padding */
+    case AV_PIX_FMT_RGB0:   return PACKED_FMT(RGBA, 4);
+    case AV_PIX_FMT_BGR0:   return PACKED_FMT(BGRA, 4);
+    case AV_PIX_FMT_0RGB:   return PACKED_FMT(ARGB, 4);
+    case AV_PIX_FMT_0BGR:   return PACKED_FMT(ABGR, 4);
+    case AV_PIX_FMT_VUYX:   return PACKED_FMT(VUYA, 4);
+    case AV_PIX_FMT_XV36LE:
+    case AV_PIX_FMT_XV36BE:
+        return PACKED_FMT(UYVA, 4, .shift = 4);
+    case AV_PIX_FMT_XV48LE:
+    case AV_PIX_FMT_XV48BE:
+        return PACKED_FMT(UYVA, 4);
+    }
+
+    return (FmtInfo) {0};
+}
+
 struct comp {
     int index;
     int plane;
@@ -692,41 +793,8 @@ static SwsSwizzleOp fmt_swizzle(enum AVPixelFormat fmt)
         return swiz;
     }
 
-    switch (fmt) {
-    case AV_PIX_FMT_0RGB:
-    case AV_PIX_FMT_X2RGB10LE:
-    case AV_PIX_FMT_X2RGB10BE:
-        return (SwsSwizzleOp) {{ .x = 3, 0, 1, 2 }};
-    case AV_PIX_FMT_BGR8:
-    case AV_PIX_FMT_BGR4:
-    case AV_PIX_FMT_BGR4_BYTE:
-    case AV_PIX_FMT_BGR565BE:
-    case AV_PIX_FMT_BGR565LE:
-    case AV_PIX_FMT_BGR555BE:
-    case AV_PIX_FMT_BGR555LE:
-    case AV_PIX_FMT_BGR444BE:
-    case AV_PIX_FMT_BGR444LE:
-    case AV_PIX_FMT_BGR0:
-    case AV_PIX_FMT_VUYX:
-        return (SwsSwizzleOp) {{ .x = 2, 1, 0, 3 }};
-    case AV_PIX_FMT_0BGR:
-    case AV_PIX_FMT_X2BGR10LE:
-    case AV_PIX_FMT_X2BGR10BE:
-        return (SwsSwizzleOp) {{ .x = 3, 2, 1, 0 }};
-    case AV_PIX_FMT_XV30BE:
-    case AV_PIX_FMT_XV30LE:
-        return (SwsSwizzleOp) {{ .x = 3, 2, 0, 1 }};
-    case AV_PIX_FMT_V30XBE:
-    case AV_PIX_FMT_V30XLE:
-        return (SwsSwizzleOp) {{ .x = 2, 0, 1, 3 }};
-    case AV_PIX_FMT_XV36BE:
-    case AV_PIX_FMT_XV36LE:
-    case AV_PIX_FMT_XV48BE:
-    case AV_PIX_FMT_XV48LE:
-        return (SwsSwizzleOp) {{ .x = 1, 0, 2, 3 }};
-    default:
-        return (SwsSwizzleOp) {{ .x = 0, 1, 2, 3 }};
-    }
+    /* This function is only ever called if fmt_read_write() already passes */
+    return fmt_info_irregular(fmt).swizzle;
 }
 
 static SwsSwizzleOp swizzle_inv(SwsSwizzleOp swiz) {
@@ -747,14 +815,8 @@ static int fmt_shift(enum AVPixelFormat fmt)
         return desc->comp[0].shift;
     }
 
-    switch (fmt) {
-    case AV_PIX_FMT_XV36BE:
-    case AV_PIX_FMT_XV36LE:
-        return 4;
-    default:
-        /* Sub-byte irregular formats are always handled by SWS_OP_(UN)PACK */
-        return 0;
-    }
+    /* This function is only ever called if fmt_read_write() already passes */
+    return fmt_info_irregular(fmt).shift;
 }
 
 /**
@@ -809,93 +871,13 @@ static int fmt_read_write(enum AVPixelFormat fmt, SwsReadWriteOp *rw_op,
         return 0;
     }
 
-    switch (fmt) {
-    /* Packed bitstream formats */
-    case AV_PIX_FMT_MONOWHITE:
-    case AV_PIX_FMT_MONOBLACK:
-        *pack_op = (SwsPackOp) {0};
-        *rw_op = (SwsReadWriteOp) {
-            .elems = 1,
-            .frac  = 3,
-        };
-        return 0;
-    case AV_PIX_FMT_RGB4:
-    case AV_PIX_FMT_BGR4:
-        *pack_op = (SwsPackOp) {{ 1, 2, 1 }};
-        *rw_op = (SwsReadWriteOp) {
-            .elems = 1,
-            .packed = true,
-            .frac  = 1,
-        };
-        return 0;
-    /* Packed 8-bit aligned formats */
-    case AV_PIX_FMT_RGB4_BYTE:
-    case AV_PIX_FMT_BGR4_BYTE:
-        *pack_op = (SwsPackOp) {{ 1, 2, 1 }};
-        *rw_op = (SwsReadWriteOp) { .elems = 1, .packed = true };
-        return 0;
-    case AV_PIX_FMT_BGR8:
-        *pack_op = (SwsPackOp) {{ 2, 3, 3 }};
-        *rw_op = (SwsReadWriteOp) { .elems = 1, .packed = true };
-        return 0;
-    case AV_PIX_FMT_RGB8:
-        *pack_op = (SwsPackOp) {{ 3, 3, 2 }};
-        *rw_op = (SwsReadWriteOp) { .elems = 1, .packed = true };
-        return 0;
+    FmtInfo info = fmt_info_irregular(fmt);
+    if (!info.rw.elems)
+        return AVERROR(ENOTSUP);
 
-    /* Packed 16-bit aligned formats */
-    case AV_PIX_FMT_RGB565BE:
-    case AV_PIX_FMT_RGB565LE:
-    case AV_PIX_FMT_BGR565BE:
-    case AV_PIX_FMT_BGR565LE:
-        *pack_op = (SwsPackOp) {{ 5, 6, 5 }};
-        *rw_op = (SwsReadWriteOp) { .elems = 1, .packed = true };
-        return 0;
-    case AV_PIX_FMT_RGB555BE:
-    case AV_PIX_FMT_RGB555LE:
-    case AV_PIX_FMT_BGR555BE:
-    case AV_PIX_FMT_BGR555LE:
-        *pack_op = (SwsPackOp) {{ 5, 5, 5 }};
-        *rw_op = (SwsReadWriteOp) { .elems = 1, .packed = true };
-        return 0;
-    case AV_PIX_FMT_RGB444BE:
-    case AV_PIX_FMT_RGB444LE:
-    case AV_PIX_FMT_BGR444BE:
-    case AV_PIX_FMT_BGR444LE:
-        *pack_op = (SwsPackOp) {{ 4, 4, 4 }};
-        *rw_op = (SwsReadWriteOp) { .elems = 1, .packed = true };
-        return 0;
-    /* Packed 32-bit aligned 4:4:4 formats */
-    case AV_PIX_FMT_X2RGB10BE:
-    case AV_PIX_FMT_X2RGB10LE:
-    case AV_PIX_FMT_X2BGR10BE:
-    case AV_PIX_FMT_X2BGR10LE:
-    case AV_PIX_FMT_XV30BE:
-    case AV_PIX_FMT_XV30LE:
-        *pack_op = (SwsPackOp) {{ 2, 10, 10, 10 }};
-        *rw_op = (SwsReadWriteOp) { .elems = 1, .packed = true };
-        return 0;
-    case AV_PIX_FMT_V30XBE:
-    case AV_PIX_FMT_V30XLE:
-        *pack_op = (SwsPackOp) {{ 10, 10, 10, 2 }};
-        *rw_op = (SwsReadWriteOp) { .elems = 1, .packed = true };
-        return 0;
-    /* 3 component formats with one channel ignored */
-    case AV_PIX_FMT_RGB0:
-    case AV_PIX_FMT_BGR0:
-    case AV_PIX_FMT_0RGB:
-    case AV_PIX_FMT_0BGR:
-    case AV_PIX_FMT_XV36BE:
-    case AV_PIX_FMT_XV36LE:
-    case AV_PIX_FMT_XV48BE:
-    case AV_PIX_FMT_XV48LE:
-    case AV_PIX_FMT_VUYX:
-        *pack_op = (SwsPackOp) {0};
-        *rw_op = (SwsReadWriteOp) { .elems = 4, .packed = true };
-        return 0;
-    }
-
-    return AVERROR(ENOTSUP);
+    *pack_op = info.pack;
+    *rw_op   = info.rw;
+    return 0;
 }
 
 static SwsPixelType get_packed_type(SwsPackOp pack)
-- 
2.52.0


From bf32caafbe32fcb98dcfe6bc9bb7535f60dcabc8 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Fri, 5 Dec 2025 20:57:55 +0100
Subject: [PATCH 284/304] swscale/format: merge fmt_* helpers into a single
 fmt_analyze()

Handles the split between "regular" and "irregular" pixel formats in a single
place, and opens up the door for more complicated formats.
---
 libswscale/format.c | 160 +++++++++++++++++++++-----------------------
 1 file changed, 75 insertions(+), 85 deletions(-)

diff --git a/libswscale/format.c b/libswscale/format.c
index ff41503915..9ef79ed7dc 100644
--- a/libswscale/format.c
+++ b/libswscale/format.c
@@ -771,13 +771,13 @@ static int cmp_comp(const void *a, const void *b) {
     return ca->offset - cb->offset;
 }
 
-static SwsSwizzleOp fmt_swizzle(enum AVPixelFormat fmt)
+static int fmt_analyze_regular(const AVPixFmtDescriptor *desc, SwsReadWriteOp *rw_op,
+                               SwsSwizzleOp *swizzle, int *shift)
 {
-    const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(fmt);
     if (desc->nb_components == 2) {
         /* YA formats */
-        return SWS_SWIZZLE(0, 3, 1, 2);
-    } else if (is_regular_fmt(fmt)) {
+        *swizzle = SWS_SWIZZLE(0, 3, 1, 2);
+    } else {
         /* Sort by increasing component order */
         struct comp sorted[4] = { {0}, {1}, {2}, {3} };
         for (int i = 0; i < desc->nb_components; i++) {
@@ -790,11 +790,64 @@ static SwsSwizzleOp fmt_swizzle(enum AVPixelFormat fmt)
         SwsSwizzleOp swiz = SWS_SWIZZLE(0, 1, 2, 3);
         for (int i = 0; i < desc->nb_components; i++)
             swiz.in[i] = sorted[i].index;
-        return swiz;
+        *swizzle = swiz;
     }
 
-    /* This function is only ever called if fmt_read_write() already passes */
-    return fmt_info_irregular(fmt).swizzle;
+    *shift = desc->comp[0].shift;
+    *rw_op = (SwsReadWriteOp) {
+        .elems  = desc->nb_components,
+        .packed = desc->nb_components > 1 && !(desc->flags & AV_PIX_FMT_FLAG_PLANAR),
+    };
+    return 0;
+}
+
+static int fmt_analyze(enum AVPixelFormat fmt, SwsReadWriteOp *rw_op,
+                       SwsPackOp *pack_op, SwsSwizzleOp *swizzle, int *shift,
+                       SwsPixelType *pixel_type, SwsPixelType *raw_type)
+{
+    const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(fmt);
+    if (!desc)
+        return AVERROR(EINVAL);
+
+    /* No support for subsampled formats at the moment */
+    if (desc->log2_chroma_w || desc->log2_chroma_h)
+        return AVERROR(ENOTSUP);
+
+    /* No support for semi-planar formats at the moment */
+    if (desc->flags & AV_PIX_FMT_FLAG_PLANAR &&
+        av_pix_fmt_count_planes(fmt) < desc->nb_components)
+        return AVERROR(ENOTSUP);
+
+    *pixel_type = *raw_type = fmt_pixel_type(fmt);
+    if (!*pixel_type)
+        return AVERROR(ENOTSUP);
+
+    if (is_regular_fmt(fmt)) {
+        *pack_op = (SwsPackOp) {0};
+        return fmt_analyze_regular(desc, rw_op, swizzle, shift);
+    }
+
+    FmtInfo info = fmt_info_irregular(fmt);
+    if (!info.rw.elems)
+        return AVERROR(ENOTSUP);
+
+    *rw_op   = info.rw;
+    *pack_op = info.pack;
+    *swizzle = info.swizzle;
+    *shift   = info.shift;
+
+    if (info.pack.pattern[0]) {
+        const int sum = info.pack.pattern[0] + info.pack.pattern[1] +
+                        info.pack.pattern[2] + info.pack.pattern[3];
+        if (sum > 16)
+            *raw_type = SWS_PIXEL_U32;
+        else if (sum > 8)
+            *raw_type = SWS_PIXEL_U16;
+        else
+            *raw_type = SWS_PIXEL_U8;
+    }
+
+    return 0;
 }
 
 static SwsSwizzleOp swizzle_inv(SwsSwizzleOp swiz) {
@@ -807,18 +860,6 @@ static SwsSwizzleOp swizzle_inv(SwsSwizzleOp swiz) {
     return (SwsSwizzleOp) {{ .x = out[0], out[1], out[2], out[3] }};
 }
 
-/* Shift factor for MSB aligned formats */
-static int fmt_shift(enum AVPixelFormat fmt)
-{
-    if (is_regular_fmt(fmt)) {
-        const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(fmt);
-        return desc->comp[0].shift;
-    }
-
-    /* This function is only ever called if fmt_read_write() already passes */
-    return fmt_info_irregular(fmt).shift;
-}
-
 /**
  * This initializes all absent components explicitly to zero. There is no
  * need to worry about the correct neutral value as fmt_decode() will
@@ -842,56 +883,6 @@ static SwsConst fmt_clear(enum AVPixelFormat fmt)
     return c;
 }
 
-static int fmt_read_write(enum AVPixelFormat fmt, SwsReadWriteOp *rw_op,
-                          SwsPackOp *pack_op, SwsPixelType *pixel_type)
-{
-    const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(fmt);
-    if (!desc)
-        return AVERROR(EINVAL);
-
-    /* No support for subsampled formats at the moment */
-    if (desc->log2_chroma_w || desc->log2_chroma_h)
-        return AVERROR(ENOTSUP);
-
-    /* No support for semi-planar formats at the moment */
-    if (desc->flags & AV_PIX_FMT_FLAG_PLANAR &&
-        av_pix_fmt_count_planes(fmt) < desc->nb_components)
-        return AVERROR(ENOTSUP);
-
-    *pixel_type = fmt_pixel_type(fmt);
-    if (!*pixel_type)
-        return AVERROR(ENOTSUP);
-
-    if (is_regular_fmt(fmt)) {
-        *pack_op = (SwsPackOp) {0};
-        *rw_op = (SwsReadWriteOp) {
-            .elems  = desc->nb_components,
-            .packed = desc->nb_components > 1 && !(desc->flags & AV_PIX_FMT_FLAG_PLANAR),
-        };
-        return 0;
-    }
-
-    FmtInfo info = fmt_info_irregular(fmt);
-    if (!info.rw.elems)
-        return AVERROR(ENOTSUP);
-
-    *pack_op = info.pack;
-    *rw_op   = info.rw;
-    return 0;
-}
-
-static SwsPixelType get_packed_type(SwsPackOp pack)
-{
-    const int sum = pack.pattern[0] + pack.pattern[1] +
-                    pack.pattern[2] + pack.pattern[3];
-    if (sum > 16)
-        return SWS_PIXEL_U32;
-    else if (sum > 8)
-        return SWS_PIXEL_U16;
-    else
-        return SWS_PIXEL_U8;
-}
-
 #if HAVE_BIGENDIAN
 #  define NATIVE_ENDIAN_FLAG AV_PIX_FMT_FLAG_BE
 #else
@@ -901,15 +892,14 @@ static SwsPixelType get_packed_type(SwsPackOp pack)
 int ff_sws_decode_pixfmt(SwsOpList *ops, enum AVPixelFormat fmt)
 {
     const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(fmt);
-    SwsPixelType pixel_type;
+    SwsPixelType pixel_type, raw_type;
     SwsReadWriteOp rw_op;
+    SwsSwizzleOp swizzle;
     SwsPackOp unpack;
+    int shift;
 
-    RET(fmt_read_write(fmt, &rw_op, &unpack, &pixel_type));
-
-    SwsPixelType raw_type = pixel_type;
-    if (unpack.pattern[0])
-        raw_type = get_packed_type(unpack);
+    RET(fmt_analyze(fmt, &rw_op, &unpack, &swizzle, &shift,
+                    &pixel_type, &raw_type));
 
     /* TODO: handle subsampled or semipacked input formats */
     RET(ff_sws_op_list_append(ops, &(SwsOp) {
@@ -942,13 +932,13 @@ int ff_sws_decode_pixfmt(SwsOpList *ops, enum AVPixelFormat fmt)
     RET(ff_sws_op_list_append(ops, &(SwsOp) {
         .op      = SWS_OP_SWIZZLE,
         .type    = pixel_type,
-        .swizzle = swizzle_inv(fmt_swizzle(fmt)),
+        .swizzle = swizzle_inv(swizzle),
     }));
 
     RET(ff_sws_op_list_append(ops, &(SwsOp) {
         .op   = SWS_OP_RSHIFT,
         .type = pixel_type,
-        .c.u  = fmt_shift(fmt),
+        .c.u  = shift,
     }));
 
     RET(ff_sws_op_list_append(ops, &(SwsOp) {
@@ -963,20 +953,19 @@ int ff_sws_decode_pixfmt(SwsOpList *ops, enum AVPixelFormat fmt)
 int ff_sws_encode_pixfmt(SwsOpList *ops, enum AVPixelFormat fmt)
 {
     const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(fmt);
-    SwsPixelType pixel_type;
+    SwsPixelType pixel_type, raw_type;
     SwsReadWriteOp rw_op;
+    SwsSwizzleOp swizzle;
     SwsPackOp pack;
+    int shift;
 
-    RET(fmt_read_write(fmt, &rw_op, &pack, &pixel_type));
-
-    SwsPixelType raw_type = pixel_type;
-    if (pack.pattern[0])
-        raw_type = get_packed_type(pack);
+    RET(fmt_analyze(fmt, &rw_op, &pack, &swizzle, &shift,
+                    &pixel_type, &raw_type));
 
     RET(ff_sws_op_list_append(ops, &(SwsOp) {
         .op   = SWS_OP_LSHIFT,
         .type = pixel_type,
-        .c.u  = fmt_shift(fmt),
+        .c.u  = shift,
     }));
 
     if (rw_op.elems > desc->nb_components) {
@@ -992,7 +981,7 @@ int ff_sws_encode_pixfmt(SwsOpList *ops, enum AVPixelFormat fmt)
     RET(ff_sws_op_list_append(ops, &(SwsOp) {
         .op      = SWS_OP_SWIZZLE,
         .type    = pixel_type,
-        .swizzle = fmt_swizzle(fmt),
+        .swizzle = swizzle,
     }));
 
     if (pack.pattern[0]) {
@@ -1021,6 +1010,7 @@ int ff_sws_encode_pixfmt(SwsOpList *ops, enum AVPixelFormat fmt)
         .type = raw_type,
         .rw   = rw_op,
     }));
+
     return 0;
 }
 
-- 
2.52.0


From 3702316f517284c4be166caec62c63e31ea1a434 Mon Sep 17 00:00:00 2001
From: Cameron Gutman <aicommander@gmail.com>
Date: Sun, 7 Dec 2025 13:51:05 -0600
Subject: [PATCH 285/304] hwcontext_vulkan: add APIs to get optional extensions

These provide a way for apps that initialize Vulkan themselves to know
which extensions we may be able to use without having to hardcode it.

Signed-off-by: Cameron Gutman <aicommander@gmail.com>
-- 
2.52.0


From fbad2aa63f4f7632749fee5dc8b66dcedf845457 Mon Sep 17 00:00:00 2001
From: Cameron Gutman <aicommander@gmail.com>
Date: Sun, 7 Dec 2025 13:57:42 -0600
Subject: [PATCH 286/304] fftools/ffplay_renderer: use new Vulkan extension API

Signed-off-by: Cameron Gutman <aicommander@gmail.com>
-- 
2.52.0


From 178893479c87b218c3f82edf75b64d700bab58cd Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Fri, 5 Dec 2025 22:40:40 -0300
Subject: [PATCH 287/304] avformat/iamf_parse: fix parsing of Scalable layouts
 with Mono and Stereo layers

An ASAN heap-buffer-overflow in scalable_channel_layout_config was caused by an
unchecked assumption that the channel layout of a scalable audio layer is a
superset of the previous layer's channel layout.

scalable_channel_layout_config constructs a channel layout map by copying
channels from the previous layer and adding new ones. The memory allocation is
based on the target loudspeaker_layout. However, if the target layout doesn't
encompass all previous channels (e.g., Mono to Stereo), copying previous
channels followed by adding current ones could exceed the allocated size,
causing a heap buffer overflow.

This commit adds an exception for the know case of Mono -> Stereo, and a check
to ensure the previous layer's channel layout is a subset of the current
layer's layout by comparing their masks. If the condition isn't met,
an error is returned.

Fixes: https://issues.oss-fuzz.com/issues/464965414

Co-authored-by: Oliver Chang <ochang@google.com>
Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavformat/iamf_parse.c | 79 ++++++++++++++++++++++++----------------
 1 file changed, 47 insertions(+), 32 deletions(-)

diff --git a/libavformat/iamf_parse.c b/libavformat/iamf_parse.c
index 597d800be0..3d78533faf 100644
--- a/libavformat/iamf_parse.c
+++ b/libavformat/iamf_parse.c
@@ -347,6 +347,41 @@ static int update_extradata(AVCodecParameters *codecpar)
     return 0;
 }
 
+static int parse_coupled_substream(AVChannelLayout *out, AVChannelLayout *in, int n)
+{
+    if (in->u.mask & AV_CH_LAYOUT_STEREO) {
+        out->u.map[n++].id = AV_CHAN_FRONT_LEFT;
+        out->u.map[n++].id = AV_CHAN_FRONT_RIGHT;
+        in->u.mask &= ~AV_CH_LAYOUT_STEREO;
+    } else if (in->u.mask & (AV_CH_FRONT_LEFT_OF_CENTER|AV_CH_FRONT_RIGHT_OF_CENTER)) {
+        out->u.map[n++].id = AV_CHAN_FRONT_LEFT_OF_CENTER;
+        out->u.map[n++].id = AV_CHAN_FRONT_RIGHT_OF_CENTER;
+        in->u.mask &= ~(AV_CH_FRONT_LEFT_OF_CENTER|AV_CH_FRONT_RIGHT_OF_CENTER);
+    } else if (in->u.mask & (AV_CH_SIDE_LEFT|AV_CH_SIDE_RIGHT)) {
+        out->u.map[n++].id = AV_CHAN_SIDE_LEFT;
+        out->u.map[n++].id = AV_CHAN_SIDE_RIGHT;
+        in->u.mask &= ~(AV_CH_SIDE_LEFT|AV_CH_SIDE_RIGHT);
+    } else if (in->u.mask & (AV_CH_BACK_LEFT|AV_CH_BACK_RIGHT)) {
+        out->u.map[n++].id = AV_CHAN_BACK_LEFT;
+        out->u.map[n++].id = AV_CHAN_BACK_RIGHT;
+        in->u.mask &= ~(AV_CH_BACK_LEFT|AV_CH_BACK_RIGHT);
+    } else if (in->u.mask & (AV_CH_TOP_FRONT_LEFT|AV_CH_TOP_FRONT_RIGHT)) {
+        out->u.map[n++].id = AV_CHAN_TOP_FRONT_LEFT;
+        out->u.map[n++].id = AV_CHAN_TOP_FRONT_RIGHT;
+        in->u.mask &= ~(AV_CH_TOP_FRONT_LEFT|AV_CH_TOP_FRONT_RIGHT);
+    } else if (in->u.mask & (AV_CH_TOP_SIDE_LEFT|AV_CH_TOP_SIDE_RIGHT)) {
+        out->u.map[n++].id = AV_CHAN_TOP_SIDE_LEFT;
+        out->u.map[n++].id = AV_CHAN_TOP_SIDE_RIGHT;
+        in->u.mask &= ~(AV_CH_TOP_SIDE_LEFT|AV_CH_TOP_SIDE_RIGHT);
+    } else if (in->u.mask & (AV_CH_TOP_BACK_LEFT|AV_CH_TOP_BACK_RIGHT)) {
+        out->u.map[n++].id = AV_CHAN_TOP_BACK_LEFT;
+        out->u.map[n++].id = AV_CHAN_TOP_BACK_RIGHT;
+        in->u.mask &= ~(AV_CH_TOP_BACK_LEFT|AV_CH_TOP_BACK_RIGHT);
+    }
+
+    return n;
+}
+
 static int scalable_channel_layout_config(void *s, AVIOContext *pb,
                                           IAMFAudioElement *audio_element,
                                           const IAMFCodecConfig *codec_config)
@@ -395,12 +430,19 @@ static int scalable_channel_layout_config(void *s, AVIOContext *pb,
 
         if (!i && loudspeaker_layout == 15)
             expanded_loudspeaker_layout = avio_r8(pb);
-        if (expanded_loudspeaker_layout > 0 && expanded_loudspeaker_layout < 13) {
+        if (expanded_loudspeaker_layout >= 0 && expanded_loudspeaker_layout < 13) {
             av_channel_layout_copy(&ch_layout, &ff_iamf_expanded_scalable_ch_layouts[expanded_loudspeaker_layout]);
         } else if (loudspeaker_layout < 10) {
             av_channel_layout_copy(&ch_layout, &ff_iamf_scalable_ch_layouts[loudspeaker_layout]);
-            if (i)
-                ch_layout.u.mask &= ~av_channel_layout_subset(&audio_element->element->layers[i-1]->ch_layout, UINT64_MAX);
+            if (i) {
+                uint64_t mask = av_channel_layout_subset(&audio_element->element->layers[i-1]->ch_layout, UINT64_MAX);
+                // When the first layer is Mono, the second layer may not have the C channel (e.g. Stereo)
+                if (audio_element->element->layers[i-1]->ch_layout.nb_channels == 1)
+                    n--;
+                else if ((ch_layout.u.mask & mask) != mask)
+                    return AVERROR_INVALIDDATA;
+                ch_layout.u.mask &= ~mask;
+            }
         } else
             ch_layout = (AVChannelLayout){ .order = AV_CHANNEL_ORDER_UNSPEC,
                                                           .nb_channels = substream_count +
@@ -430,38 +472,11 @@ static int scalable_channel_layout_config(void *s, AVIOContext *pb,
 
             coupled_substream_count = audio_element->layers[i].coupled_substream_count;
             while (coupled_substream_count--) {
-                if (ch_layout.u.mask & AV_CH_LAYOUT_STEREO) {
-                    layer->ch_layout.u.map[n++].id = AV_CHAN_FRONT_LEFT;
-                    layer->ch_layout.u.map[n++].id = AV_CHAN_FRONT_RIGHT;
-                    ch_layout.u.mask &= ~AV_CH_LAYOUT_STEREO;
-                } else if (ch_layout.u.mask & (AV_CH_FRONT_LEFT_OF_CENTER|AV_CH_FRONT_RIGHT_OF_CENTER)) {
-                    layer->ch_layout.u.map[n++].id = AV_CHAN_FRONT_LEFT_OF_CENTER;
-                    layer->ch_layout.u.map[n++].id = AV_CHAN_FRONT_RIGHT_OF_CENTER;
-                    ch_layout.u.mask &= ~(AV_CH_FRONT_LEFT_OF_CENTER|AV_CH_FRONT_RIGHT_OF_CENTER);
-                } else if (ch_layout.u.mask & (AV_CH_SIDE_LEFT|AV_CH_SIDE_RIGHT)) {
-                    layer->ch_layout.u.map[n++].id = AV_CHAN_SIDE_LEFT;
-                    layer->ch_layout.u.map[n++].id = AV_CHAN_SIDE_RIGHT;
-                    ch_layout.u.mask &= ~(AV_CH_SIDE_LEFT|AV_CH_SIDE_RIGHT);
-                } else if (ch_layout.u.mask & (AV_CH_BACK_LEFT|AV_CH_BACK_RIGHT)) {
-                    layer->ch_layout.u.map[n++].id = AV_CHAN_BACK_LEFT;
-                    layer->ch_layout.u.map[n++].id = AV_CHAN_BACK_RIGHT;
-                    ch_layout.u.mask &= ~(AV_CH_BACK_LEFT|AV_CH_BACK_RIGHT);
-                } else if (ch_layout.u.mask & (AV_CH_TOP_FRONT_LEFT|AV_CH_TOP_FRONT_RIGHT)) {
-                    layer->ch_layout.u.map[n++].id = AV_CHAN_TOP_FRONT_LEFT;
-                    layer->ch_layout.u.map[n++].id = AV_CHAN_TOP_FRONT_RIGHT;
-                    ch_layout.u.mask &= ~(AV_CH_TOP_FRONT_LEFT|AV_CH_TOP_FRONT_RIGHT);
-                } else if (ch_layout.u.mask & (AV_CH_TOP_SIDE_LEFT|AV_CH_TOP_SIDE_RIGHT)) {
-                    layer->ch_layout.u.map[n++].id = AV_CHAN_TOP_SIDE_LEFT;
-                    layer->ch_layout.u.map[n++].id = AV_CHAN_TOP_SIDE_RIGHT;
-                    ch_layout.u.mask &= ~(AV_CH_TOP_SIDE_LEFT|AV_CH_TOP_SIDE_RIGHT);
-                } else if (ch_layout.u.mask & (AV_CH_TOP_BACK_LEFT|AV_CH_TOP_BACK_RIGHT)) {
-                    layer->ch_layout.u.map[n++].id = AV_CHAN_TOP_BACK_LEFT;
-                    layer->ch_layout.u.map[n++].id = AV_CHAN_TOP_BACK_RIGHT;
-                    ch_layout.u.mask &= ~(AV_CH_TOP_BACK_LEFT|AV_CH_TOP_BACK_RIGHT);
-                }
+                n = parse_coupled_substream(&layer->ch_layout, &ch_layout, n);
             }
 
             substream_count -= audio_element->layers[i].coupled_substream_count;
+            n = parse_coupled_substream(&layer->ch_layout, &ch_layout, n); // In case the first layer is Mono
             while (substream_count--) {
                 if (ch_layout.u.mask & AV_CH_FRONT_CENTER) {
                     layer->ch_layout.u.map[n++].id = AV_CHAN_FRONT_CENTER;
-- 
2.52.0


From 1142aeee5b66d117ac7ad78ee6bd3d45d7f7eec8 Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Fri, 5 Dec 2025 22:53:22 -0300
Subject: [PATCH 288/304] avformat/iamf_parse: add a few extra sanity checks

Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavformat/iamf_parse.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/libavformat/iamf_parse.c b/libavformat/iamf_parse.c
index 3d78533faf..57ed89e74d 100644
--- a/libavformat/iamf_parse.c
+++ b/libavformat/iamf_parse.c
@@ -418,7 +418,8 @@ static int scalable_channel_layout_config(void *s, AVIOContext *pb,
         substream_count = avio_r8(pb);
         coupled_substream_count = avio_r8(pb);
 
-        if (substream_count + k > audio_element->nb_substreams)
+        if (!substream_count || coupled_substream_count > substream_count ||
+            substream_count + k > audio_element->nb_substreams)
             return AVERROR_INVALIDDATA;
 
         audio_element->layers[i].substream_count         = substream_count;
@@ -428,8 +429,14 @@ static int scalable_channel_layout_config(void *s, AVIOContext *pb,
             layer->output_gain = av_make_q(sign_extend(avio_rb16(pb), 16), 1 << 8);
         }
 
-        if (!i && loudspeaker_layout == 15)
+        if (loudspeaker_layout == 15) {
+            if (i) {
+                av_log(s, AV_LOG_ERROR, "expanded_loudspeaker_layout set with more than one layer in Audio Element #%d\n",
+                       audio_element->audio_element_id);
+                return AVERROR_INVALIDDATA;
+            }
             expanded_loudspeaker_layout = avio_r8(pb);
+        }
         if (expanded_loudspeaker_layout >= 0 && expanded_loudspeaker_layout < 13) {
             av_channel_layout_copy(&ch_layout, &ff_iamf_expanded_scalable_ch_layouts[expanded_loudspeaker_layout]);
         } else if (loudspeaker_layout < 10) {
@@ -488,6 +495,9 @@ static int scalable_channel_layout_config(void *s, AVIOContext *pb,
                 }
             }
 
+            if (n != ch_layout.nb_channels)
+                return AVERROR_INVALIDDATA;
+
             ret = av_channel_layout_retype(&layer->ch_layout, AV_CHANNEL_ORDER_NATIVE, 0);
             if (ret < 0 && ret != AVERROR(ENOSYS))
                 return ret;
-- 
2.52.0


From 378c27ec84e3b597fb942048c90b65a365108331 Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Sun, 7 Dec 2025 14:52:25 -0300
Subject: [PATCH 289/304] avformat/iamf_writer: ensure
 expanded_loudspeaker_layout is only written when using a single scalable
 layout layer

Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavformat/iamf_writer.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/libavformat/iamf_writer.c b/libavformat/iamf_writer.c
index e9dd170380..b917a15978 100644
--- a/libavformat/iamf_writer.c
+++ b/libavformat/iamf_writer.c
@@ -753,8 +753,16 @@ static int iamf_write_audio_element(const IAMFContext *iamf,
         int layout = 0, expanded_layout = 0;
         get_loudspeaker_layout(element->layers[0], &layout, &expanded_layout);
         /* When the loudspeaker_layout = 15, the type PARAMETER_DEFINITION_DEMIXING SHALL NOT be present. */
-        if (layout == 15)
+        if (layout == 15) {
             param_definition_types &= ~AV_IAMF_PARAMETER_DEFINITION_DEMIXING;
+            /* expanded_loudspeaker_layout SHALL only be present when num_layers = 1 and loudspeaker_layout is set to 15 */
+            if (element->nb_layers > 1) {
+                av_log(log_ctx, AV_LOG_ERROR, "expanded_loudspeaker_layout present when using more than one layer in "
+                                              "Stream Group #%u\n",
+                       audio_element->audio_element_id);
+                return AVERROR(EINVAL);
+            }
+        }
         /* When the loudspeaker_layout of the (non-)scalable channel audio (i.e., num_layers = 1) is less than or equal to 3.1.2ch,
          * (i.e., Mono, Stereo, or 3.1.2ch), the type PARAMETER_DEFINITION_DEMIXING SHALL NOT be present. */
         else if (element->nb_layers == 1 && (layout == 0 || layout == 1 || layout == 8))
-- 
2.52.0


From 1f5ba02471a5b7954db27be48f1418040a156df4 Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Thu, 4 Dec 2025 13:25:03 +0100
Subject: [PATCH 290/304] avfilter/avfiltergraph: always retry format
 negotiation after auto-filters

There is an edge case not covered by the current logic: If there is only
a single auto-filter inserted, but the auto-inserted filter is incompatible
with a *different* format attribute (after settling the previous formats),
we may need a second auto-filter (e.g. `scale`) to settle the newly introduced
incompatibility.

A regression test demonstrating the issue is added.
---
 libavfilter/avfiltergraph.c             |  9 +++++----
 tests/fate/filter-video.mak             |  3 +++
 tests/ref/fate/filter-scale-premultiply | 15 +++++++++++++++
 3 files changed, 23 insertions(+), 4 deletions(-)
 create mode 100644 tests/ref/fate/filter-scale-premultiply

diff --git a/libavfilter/avfiltergraph.c b/libavfilter/avfiltergraph.c
index d5c2ef54e6..c15d95b91e 100644
--- a/libavfilter/avfiltergraph.c
+++ b/libavfilter/avfiltergraph.c
@@ -701,10 +701,11 @@ retry:
                 }
             }
 
-            /* if there is more than one auto filter, we may need another round
-             * to fully settle formats due to possible cross-incompatibilities
-             * between the auto filters themselves */
-            if (num_conv > 1)
+            /* if there is an auto filter, we may need another round to fully
+             * settle formats due to possible cross-incompatibilities between
+             * the auto filters themselves, or between the auto filters and
+             * a different attribute of the filter they are modifying */
+            if (num_conv)
                 goto retry;
         }
     }
diff --git a/tests/fate/filter-video.mak b/tests/fate/filter-video.mak
index 9f3c92a395..35b2687b27 100644
--- a/tests/fate/filter-video.mak
+++ b/tests/fate/filter-video.mak
@@ -836,6 +836,9 @@ FATE_FILTER-$(call ALLYES, TESTSRC2_FILTER SPLIT_FILTER AVGBLUR_FILTER        \
                            METADATA_FILTER WRAPPED_AVFRAME_ENCODER NULL_MUXER \
                            PIPE_PROTOCOL) += $(FATE_FILTER_REFCMP_METADATA-yes)
 
+FATE_FILTER-$(call FILTERFRAMECRC, TESTSRC SCALE PREMULTIPLY, LAVFI_INDEV) += fate-filter-scale-premultiply
+fate-filter-scale-premultiply: CMD = framecrc -auto_conversion_filters -lavfi "testsrc,format=rgba,setparams=alpha_mode=premultiplied,format=rgba:alpha_modes=straight" -frames:v 10
+
 FATE_SAMPLES_FFPROBE += $(FATE_METADATA_FILTER-yes)
 FATE_SAMPLES_FFMPEG += $(FATE_FILTER_SAMPLES-yes)
 FATE_FFPROBE += $(FATE_FILTER_FFPROBE-yes)
diff --git a/tests/ref/fate/filter-scale-premultiply b/tests/ref/fate/filter-scale-premultiply
new file mode 100644
index 0000000000..13fd79310a
--- /dev/null
+++ b/tests/ref/fate/filter-scale-premultiply
@@ -0,0 +1,15 @@
+#tb 0: 1/25
+#media_type 0: video
+#codec_id 0: rawvideo
+#dimensions 0: 320x240
+#sar 0: 1/1
+0,          0,          0,        1,   307200, 0x73a3b71f
+0,          1,          1,        1,   307200, 0x3f05f047
+0,          2,          2,        1,   307200, 0x50382370
+0,          3,          3,        1,   307200, 0x41a15136
+0,          4,          4,        1,   307200, 0xbc8c78ee
+0,          5,          5,        1,   307200, 0x19839e1b
+0,          6,          6,        1,   307200, 0x8545b93b
+0,          7,          7,        1,   307200, 0x4a4acf07
+0,          8,          8,        1,   307200, 0xefa1dee4
+0,          9,          9,        1,   307200, 0x22dae9ca
-- 
2.52.0


From e1e868713dad9837dc272e2d9d849b99ab8a8fac Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.dev>
Date: Thu, 4 Dec 2025 13:10:47 +0100
Subject: [PATCH 291/304] avfilter/avfiltergraph: add missing newlines to
 format printing

---
 libavfilter/avfiltergraph.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libavfilter/avfiltergraph.c b/libavfilter/avfiltergraph.c
index c15d95b91e..ad81d91bbc 100644
--- a/libavfilter/avfiltergraph.c
+++ b/libavfilter/avfiltergraph.c
@@ -485,13 +485,13 @@ static void print_filter_formats(void *log_ctx, int level, const AVFilterContext
     for (int i = 0; i < f->nb_inputs; i++) {
         const AVFilterLink *in = f->inputs[i];
         const AVFilterNegotiation *neg = ff_filter_get_negotiation(in);
-        av_log(log_ctx, level, "  in[%d] '%s':", i, f->input_pads[i].name);
+        av_log(log_ctx, level, "  in[%d] '%s':\n", i, f->input_pads[i].name);
 
         for (unsigned i = 0; i < neg->nb_mergers; i++) {
             const AVFilterFormatsMerger *m = &neg->mergers[i];
             m->print_list(&bp, FF_FIELD_AT(void *, m->offset, in->outcfg));
             if (av_bprint_is_complete(&bp))
-                av_log(log_ctx, level, "    %s: %s", m->name, bp.str);
+                av_log(log_ctx, level, "    %s: %s\n", m->name, bp.str);
             av_bprint_clear(&bp);
         }
     }
@@ -499,13 +499,13 @@ static void print_filter_formats(void *log_ctx, int level, const AVFilterContext
     for (int i = 0; i < f->nb_outputs; i++) {
         const AVFilterLink *out = f->outputs[i];
         const AVFilterNegotiation *neg = ff_filter_get_negotiation(out);
-        av_log(log_ctx, level, "  out[%d] '%s':", i, f->output_pads[i].name);
+        av_log(log_ctx, level, "  out[%d] '%s':\n", i, f->output_pads[i].name);
 
         for (unsigned i = 0; i < neg->nb_mergers; i++) {
             const AVFilterFormatsMerger *m = &neg->mergers[i];
             m->print_list(&bp, FF_FIELD_AT(void *, m->offset, out->incfg));
             if (av_bprint_is_complete(&bp))
-                av_log(log_ctx, level, "    %s: %s", m->name, bp.str);
+                av_log(log_ctx, level, "    %s: %s\n", m->name, bp.str);
             av_bprint_clear(&bp);
         }
     }
-- 
2.52.0


From 9f49a95e4ee3610f3c7ff199af1e6c561e2b06d8 Mon Sep 17 00:00:00 2001
From: Georgii Zagoruiko <george.zaguri@gmail.com>
Date: Mon, 8 Dec 2025 19:44:57 +0000
Subject: [PATCH 292/304] Makefile: add missing variables in Makefile

---
 Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 4fc0aebd33..f3fed75954 100644
--- a/Makefile
+++ b/Makefile
@@ -110,7 +110,8 @@ SUBDIR_VARS := CLEANFILES FFLIBS HOSTPROGS TESTPROGS TOOLS               \
                ALTIVEC-OBJS VSX-OBJS X86ASM-OBJS                         \
                MIPSFPU-OBJS MIPSDSPR2-OBJS MIPSDSP-OBJS MSA-OBJS         \
                MMI-OBJS LSX-OBJS LASX-OBJS RV-OBJS RVV-OBJS RVVB-OBJS    \
-               OBJS SHLIBOBJS STLIBOBJS HOSTOBJS TESTOBJS SIMD128-OBJS
+               OBJS SHLIBOBJS STLIBOBJS HOSTOBJS TESTOBJS SIMD128-OBJS   \
+               SVE-OBJS SVE2-OBJS
 
 define RESET
 $(1) :=
-- 
2.52.0


From e3d7a1f5031a41ee8d532c87fc2adf7cb5e6d7db Mon Sep 17 00:00:00 2001
From: Georgii Zagoruiko <george.zaguri@gmail.com>
Date: Mon, 8 Dec 2025 19:45:32 +0000
Subject: [PATCH 293/304] configure: add detection of assembler support for SME

All changes are made during development/testing of SVE/SME for ffmpeg (vvc). Tested on Apple M4
---
 Makefile                    |  2 +-
 configure                   |  8 +++++++-
 ffbuild/arch.mak            |  1 +
 libavutil/aarch64/Makefile  |  2 ++
 libavutil/aarch64/asm.S     |  9 +++++++++
 libavutil/aarch64/cpu.c     | 12 ++++++++++++
 libavutil/aarch64/cpu.h     |  5 +++++
 libavutil/aarch64/cpu_sme.S | 31 +++++++++++++++++++++++++++++++
 libavutil/cpu.c             |  1 +
 libavutil/cpu.h             |  1 +
 libavutil/tests/cpu.c       |  8 +++++++-
 tests/checkasm/checkasm.c   |  9 ++++++++-
 12 files changed, 85 insertions(+), 4 deletions(-)
 create mode 100644 libavutil/aarch64/cpu_sme.S

diff --git a/Makefile b/Makefile
index f3fed75954..f563a37fca 100644
--- a/Makefile
+++ b/Makefile
@@ -111,7 +111,7 @@ SUBDIR_VARS := CLEANFILES FFLIBS HOSTPROGS TESTPROGS TOOLS               \
                MIPSFPU-OBJS MIPSDSPR2-OBJS MIPSDSP-OBJS MSA-OBJS         \
                MMI-OBJS LSX-OBJS LASX-OBJS RV-OBJS RVV-OBJS RVVB-OBJS    \
                OBJS SHLIBOBJS STLIBOBJS HOSTOBJS TESTOBJS SIMD128-OBJS   \
-               SVE-OBJS SVE2-OBJS
+               SVE-OBJS SVE2-OBJS SME-OBJS
 
 define RESET
 $(1) :=
diff --git a/configure b/configure
index 189e973501..15b33bb870 100755
--- a/configure
+++ b/configure
@@ -478,6 +478,7 @@ Optimization options (experts only):
   --disable-i8mm           disable I8MM optimizations
   --disable-sve            disable SVE optimizations
   --disable-sve2           disable SVE2 optimizations
+  --disable-sme            disable SME optimizations
   --disable-inline-asm     disable use of inline assembly
   --disable-x86asm         disable use of standalone x86 assembly
   --disable-mipsdsp        disable MIPS DSP ASE R1 optimizations
@@ -2224,6 +2225,7 @@ ARCH_EXT_LIST_ARM="
     setend
     sve
     sve2
+    sme
 "
 
 ARCH_EXT_LIST_MIPS="
@@ -2491,6 +2493,7 @@ TOOLCHAIN_FEATURES="
     as_archext_i8mm_directive
     as_archext_sve_directive
     as_archext_sve2_directive
+    as_archext_sme_directive
     as_dn_directive
     as_fpu_directive
     as_func
@@ -2823,6 +2826,7 @@ dotprod_deps="aarch64 neon"
 i8mm_deps="aarch64 neon"
 sve_deps="aarch64 neon"
 sve2_deps="aarch64 neon sve"
+sme_deps="aarch64 neon sve sve2"
 
 map 'eval ${v}_inline_deps=inline_asm' $ARCH_EXT_LIST_ARM
 
@@ -6447,11 +6451,12 @@ if enabled aarch64; then
     # internal assembler in clang 3.3 does not support this instruction
     enabled neon && check_insn neon 'ext   v0.8B, v0.8B, v1.8B, #1'
 
-    archext_list="dotprod i8mm sve sve2"
+    archext_list="dotprod i8mm sve sve2 sme"
     enabled dotprod && check_archext_insn dotprod 'udot v0.4s, v0.16b, v0.16b'
     enabled i8mm    && check_archext_insn i8mm    'usdot v0.4s, v0.16b, v0.16b'
     enabled sve     && check_archext_insn sve     'whilelt p0.s, x0, x1'
     enabled sve2    && check_archext_insn sve2    'sqrdmulh z0.s, z0.s, z0.s'
+    enabled sme     && check_archext_insn sme     'smstop'
 
     # Disable the main feature (e.g. HAVE_NEON) if neither inline nor external
     # assembly support the feature out of the box. Skip this for the features
@@ -8211,6 +8216,7 @@ if enabled aarch64; then
     echo "I8MM enabled              ${i8mm-no}"
     echo "SVE enabled               ${sve-no}"
     echo "SVE2 enabled              ${sve2-no}"
+    echo "SME enabled               ${sme-no}"
 fi
 if enabled arm; then
     echo "ARMv5TE enabled           ${armv5te-no}"
diff --git a/ffbuild/arch.mak b/ffbuild/arch.mak
index ec79ae7866..83d6bf276f 100644
--- a/ffbuild/arch.mak
+++ b/ffbuild/arch.mak
@@ -5,6 +5,7 @@ OBJS-$(HAVE_VFP)     += $(VFP-OBJS)     $(VFP-OBJS-yes)
 OBJS-$(HAVE_NEON)    += $(NEON-OBJS)    $(NEON-OBJS-yes)
 OBJS-$(HAVE_SVE)     += $(SVE-OBJS)     $(SVE-OBJS-yes)
 OBJS-$(HAVE_SVE2)    += $(SVE2-OBJS)    $(SVE2-OBJS-yes)
+OBJS-$(HAVE_SME)     += $(SME-OBJS)     $(SME-OBJS-yes)
 
 OBJS-$(HAVE_MIPSFPU)   += $(MIPSFPU-OBJS)    $(MIPSFPU-OBJS-yes)
 OBJS-$(HAVE_MIPSDSP)   += $(MIPSDSP-OBJS)    $(MIPSDSP-OBJS-yes)
diff --git a/libavutil/aarch64/Makefile b/libavutil/aarch64/Makefile
index 992e95e4df..744c2c53d7 100644
--- a/libavutil/aarch64/Makefile
+++ b/libavutil/aarch64/Makefile
@@ -6,3 +6,5 @@ NEON-OBJS += aarch64/float_dsp_neon.o                                 \
              aarch64/tx_float_neon.o                                  \
 
 SVE-OBJS += aarch64/cpu_sve.o                                         \
+
+SME-OBJS += aarch64/cpu_sme.o                                         \
diff --git a/libavutil/aarch64/asm.S b/libavutil/aarch64/asm.S
index 2e4e451ec2..77cea57cfc 100644
--- a/libavutil/aarch64/asm.S
+++ b/libavutil/aarch64/asm.S
@@ -72,10 +72,19 @@
 #define DISABLE_SVE2
 #endif
 
+#if HAVE_AS_ARCHEXT_SME_DIRECTIVE
+#define ENABLE_SME   .arch_extension sme
+#define DISABLE_SME  .arch_extension nosme
+#else
+#define ENABLE_SME
+#define DISABLE_SME
+#endif
+
 DISABLE_DOTPROD
 DISABLE_I8MM
 DISABLE_SVE
 DISABLE_SVE2
+DISABLE_SME
 
 
 /* Support macros for
diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c
index e82c0f19ab..3394963303 100644
--- a/libavutil/aarch64/cpu.c
+++ b/libavutil/aarch64/cpu.c
@@ -28,6 +28,7 @@
 #define HWCAP_AARCH64_SVE     (1 << 22)
 #define HWCAP2_AARCH64_SVE2   (1 << 1)
 #define HWCAP2_AARCH64_I8MM   (1 << 13)
+#define HWCAP2_AARCH64_SME    (1 << 23)
 
 static int detect_flags(void)
 {
@@ -44,6 +45,8 @@ static int detect_flags(void)
         flags |= AV_CPU_FLAG_SVE2;
     if (hwcap2 & HWCAP2_AARCH64_I8MM)
         flags |= AV_CPU_FLAG_I8MM;
+    if (hwcap & HWCAP2_AARCH64_SME)
+        flags |= AV_CPU_FLAG_SME;
 
     return flags;
 }
@@ -67,6 +70,8 @@ static int detect_flags(void)
         flags |= AV_CPU_FLAG_DOTPROD;
     if (have_feature("hw.optional.arm.FEAT_I8MM"))
         flags |= AV_CPU_FLAG_I8MM;
+    if (have_feature("hw.optional.arm.FEAT_SME"))
+        flags |= AV_CPU_FLAG_SME;
 
     return flags;
 }
@@ -133,6 +138,10 @@ static int detect_flags(void)
 #ifdef PF_ARM_SVE2_INSTRUCTIONS_AVAILABLE
     if (IsProcessorFeaturePresent(PF_ARM_SVE2_INSTRUCTIONS_AVAILABLE))
         flags |= AV_CPU_FLAG_SVE2;
+#endif
+#ifdef PF_ARM_SME_INSTRUCTIONS_AVAILABLE
+    if (IsProcessorFeaturePresent(PF_ARM_SME_INSTRUCTIONS_AVAILABLE))
+        flags |= AV_CPU_FLAG_SME;
 #endif
     return flags;
 }
@@ -162,6 +171,9 @@ int ff_get_cpu_flags_aarch64(void)
 #ifdef __ARM_FEATURE_SVE2
     flags |= AV_CPU_FLAG_SVE2;
 #endif
+#ifdef __ARM_FEATURE_SME
+    flags |= AV_CPU_FLAG_SME;
+#endif
 
     flags |= detect_flags();
 
diff --git a/libavutil/aarch64/cpu.h b/libavutil/aarch64/cpu.h
index a41b729659..62d5eb768f 100644
--- a/libavutil/aarch64/cpu.h
+++ b/libavutil/aarch64/cpu.h
@@ -29,9 +29,14 @@
 #define have_i8mm(flags)    CPUEXT(flags, I8MM)
 #define have_sve(flags)     CPUEXT(flags, SVE)
 #define have_sve2(flags)    CPUEXT(flags, SVE2)
+#define have_sme(flags)     CPUEXT(flags, SME)
 
 #if HAVE_SVE
 int ff_aarch64_sve_length(void);
 #endif
 
+#if HAVE_SME
+int ff_aarch64_sme_length(void);
+#endif
+
 #endif /* AVUTIL_AARCH64_CPU_H */
diff --git a/libavutil/aarch64/cpu_sme.S b/libavutil/aarch64/cpu_sme.S
new file mode 100644
index 0000000000..ba79d483a1
--- /dev/null
+++ b/libavutil/aarch64/cpu_sme.S
@@ -0,0 +1,31 @@
+/*
+ * Copyright (c) 2025 Georgii Zagoruiko
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config.h"
+#include "asm.S"
+
+ENABLE_SME
+
+function ff_aarch64_sme_length, export=1
+        smstart
+        cntb            x0
+        smstop
+        ret
+endfunc
diff --git a/libavutil/cpu.c b/libavutil/cpu.c
index 8f9b785ebc..5aed2f39dc 100644
--- a/libavutil/cpu.c
+++ b/libavutil/cpu.c
@@ -186,6 +186,7 @@ int av_parse_cpu_caps(unsigned *flags, const char *s)
         { "i8mm",     NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_I8MM     },    .unit = "flags" },
         { "sve",      NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_SVE      },    .unit = "flags" },
         { "sve2",     NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_SVE2     },    .unit = "flags" },
+        { "sme",      NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_SME      },    .unit = "flags" },
 #elif ARCH_MIPS
         { "mmi",      NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_MMI      },    .unit = "flags" },
         { "msa",      NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_MSA      },    .unit = "flags" },
diff --git a/libavutil/cpu.h b/libavutil/cpu.h
index a06fc08e56..87cecd0424 100644
--- a/libavutil/cpu.h
+++ b/libavutil/cpu.h
@@ -76,6 +76,7 @@
 #define AV_CPU_FLAG_I8MM         (1 << 9)
 #define AV_CPU_FLAG_SVE          (1 <<10)
 #define AV_CPU_FLAG_SVE2         (1 <<11)
+#define AV_CPU_FLAG_SME          (1 <<12)
 #define AV_CPU_FLAG_SETEND       (1 <<16)
 
 #define AV_CPU_FLAG_MMI          (1 << 0)
diff --git a/libavutil/tests/cpu.c b/libavutil/tests/cpu.c
index fd2e32901d..c63b7e7d53 100644
--- a/libavutil/tests/cpu.c
+++ b/libavutil/tests/cpu.c
@@ -48,6 +48,7 @@ static const struct {
     { AV_CPU_FLAG_I8MM,      "i8mm"       },
     { AV_CPU_FLAG_SVE,       "sve"        },
     { AV_CPU_FLAG_SVE2,      "sve2"       },
+    { AV_CPU_FLAG_SME,       "sme"        },
 #elif ARCH_ARM
     { AV_CPU_FLAG_ARMV5TE,   "armv5te"    },
     { AV_CPU_FLAG_ARMV6,     "armv6"      },
@@ -174,7 +175,12 @@ int main(int argc, char **argv)
 #if ARCH_AARCH64 && HAVE_SVE
     if (cpu_flags_raw & AV_CPU_FLAG_SVE)
         printf("sve_vector_length = %d\n", 8 * ff_aarch64_sve_length());
-#elif ARCH_RISCV && HAVE_RVV
+#endif
+#if ARCH_AARCH64 && HAVE_SME
+    if (cpu_flags_raw & AV_CPU_FLAG_SME)
+        printf("sme_vector_length = %d\n", 8 * ff_aarch64_sme_length());
+#endif
+#if ARCH_RISCV && HAVE_RVV
     if (cpu_flags_raw & AV_CPU_FLAG_RVV_I32) {
         size_t bytes = ff_get_rv_vlenb();
 
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 14faf71275..54665c2fad 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -362,6 +362,7 @@ static const struct {
     { "I8MM",     "i8mm",     AV_CPU_FLAG_I8MM },
     { "SVE",      "sve",      AV_CPU_FLAG_SVE },
     { "SVE2",     "sve2",     AV_CPU_FLAG_SVE2 },
+    { "SME",      "sme",      AV_CPU_FLAG_SME },
 #elif ARCH_ARM
     { "ARMV5TE",  "armv5te",  AV_CPU_FLAG_ARMV5TE },
     { "ARMV6",    "armv6",    AV_CPU_FLAG_ARMV6 },
@@ -1039,7 +1040,13 @@ int main(int argc, char *argv[])
     if (have_sve(av_get_cpu_flags()))
         snprintf(arch_info_buf, sizeof(arch_info_buf),
                  "SVE %d bits, ", 8 * ff_aarch64_sve_length());
-#elif ARCH_RISCV && HAVE_RVV
+#endif
+#if ARCH_AARCH64 && HAVE_SME
+    if (have_sme(av_get_cpu_flags()))
+        snprintf(arch_info_buf, sizeof(arch_info_buf),
+                 "SME %d bits, ", 8 * ff_aarch64_sme_length());
+#endif
+#if ARCH_RISCV && HAVE_RVV
     if (av_get_cpu_flags() & AV_CPU_FLAG_RVV_I32)
         snprintf(arch_info_buf, sizeof (arch_info_buf),
                  "%zu-bit vectors, ", 8 * ff_get_rv_vlenb());
-- 
2.52.0


From 34f66e0b676f082e4425c7be5c65ede7fa865c49 Mon Sep 17 00:00:00 2001
From: Hyunjun Ko <zzoon@igalia.com>
Date: Mon, 8 Dec 2025 16:01:45 +0100
Subject: [PATCH 294/304] vulkan_vp9: fix subsampling source and show_frame
 flag

---
 libavcodec/vulkan_vp9.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/libavcodec/vulkan_vp9.c b/libavcodec/vulkan_vp9.c
index 5b592c1443..6d0a5ce46f 100644
--- a/libavcodec/vulkan_vp9.c
+++ b/libavcodec/vulkan_vp9.c
@@ -115,6 +115,7 @@ static int vk_vp9_start_frame(AVCodecContext          *avctx,
     uint32_t frame_id_alloc_mask = 0;
 
     const VP9Frame *pic = &s->frames[CUR_FRAME];
+    const AVPixFmtDescriptor *pixdesc = av_pix_fmt_desc_get(avctx->sw_pix_fmt);
     FFVulkanDecodeContext *dec = avctx->internal->hwaccel_priv_data;
     uint8_t profile = (pic->frame_header->profile_high_bit << 1) | pic->frame_header->profile_low_bit;
 
@@ -222,8 +223,9 @@ static int vk_vp9_start_frame(AVCodecContext          *avctx,
         },
         .BitDepth = profile < 2 ? 8 :
                     pic->frame_header->ten_or_twelve_bit ? 12 : 10,
-        .subsampling_x = pic->frame_header->subsampling_x,
-        .subsampling_y = pic->frame_header->subsampling_y,
+        .subsampling_x = pixdesc->log2_chroma_w,
+        .subsampling_y = pixdesc->log2_chroma_h,
+
         .color_space = pic->frame_header->color_space,
     };
 
@@ -235,7 +237,7 @@ static int vk_vp9_start_frame(AVCodecContext          *avctx,
            .refresh_frame_context = pic->frame_header->refresh_frame_context,
            .frame_parallel_decoding_mode = pic->frame_header->frame_parallel_decoding_mode,
            .segmentation_enabled = pic->frame_header->segmentation_enabled,
-           .show_frame = pic->frame_header->segmentation_enabled,
+           .show_frame = !s->h.invisible,
            .UsePrevFrameMvs = s->h.use_last_frame_mvs,
         },
         .profile = profile,
-- 
2.52.0


From 1bf22c2a3e74e8febf5819c2d49b3617d41f56ba Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <kasper93@gmail.com>
Date: Sun, 30 Nov 2025 04:31:19 +0100
Subject: [PATCH 295/304] configure: make --disable-optimizations work on
 MSVC/ICC
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

_cflags_noopt was previously set to optimize for size as a workaround
for issues in MSVC builds without any optimizations. Specifically the
lack of DCE and constant merging. This is no longer needed, as those
issues have been fixed in our codebase.

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
---
 configure | 2 --
 1 file changed, 2 deletions(-)

diff --git a/configure b/configure
index 15b33bb870..9ec421cd23 100755
--- a/configure
+++ b/configure
@@ -5160,7 +5160,6 @@ probe_cc(){
         _depflags='-MMD'
         _cflags_speed='-O3'
         _cflags_size='-Os'
-        _cflags_noopt='-O1'
         _flags_filter=icc_flags
     elif $_cc -v 2>&1 | grep -q xlc; then
         _type=xlc
@@ -5274,7 +5273,6 @@ probe_cc(){
         _DEPCXXFLAGS='$(CXXFLAGS)'
         _cflags_speed="-O2"
         _cflags_size="-O1"
-        _cflags_noopt="-O1"
         if $_cc -nologo- 2>&1 | grep -q Linker; then
             _ld_o='-out:$@'
             _flags_filter=msvc_flags_link
-- 
2.52.0


From d7db7460800968a0d2a3716f881e773a43c028b5 Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Wed, 10 Dec 2025 21:51:11 -0300
Subject: [PATCH 296/304] avutil/aarch64/cpu: fix check for SME on Linux

SME is a AT_HWCAP2 entry, not AT_HWCAP.

Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavutil/aarch64/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c
index 3394963303..f93ff08fb5 100644
--- a/libavutil/aarch64/cpu.c
+++ b/libavutil/aarch64/cpu.c
@@ -45,7 +45,7 @@ static int detect_flags(void)
         flags |= AV_CPU_FLAG_SVE2;
     if (hwcap2 & HWCAP2_AARCH64_I8MM)
         flags |= AV_CPU_FLAG_I8MM;
-    if (hwcap & HWCAP2_AARCH64_SME)
+    if (hwcap2 & HWCAP2_AARCH64_SME)
         flags |= AV_CPU_FLAG_SME;
 
     return flags;
-- 
2.52.0


From 87fb3644dfb85278816059d577c4744286293189 Mon Sep 17 00:00:00 2001
From: James Almer <jamrial@gmail.com>
Date: Wed, 10 Dec 2025 22:01:33 -0300
Subject: [PATCH 297/304] tests/fate/filter-video: add missing lavfi_indev
 dependency to fate-filter-feedback-hflip

Signed-off-by: James Almer <jamrial@gmail.com>
---
 tests/fate/filter-video.mak | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/fate/filter-video.mak b/tests/fate/filter-video.mak
index 35b2687b27..0c6cbb2ee2 100644
--- a/tests/fate/filter-video.mak
+++ b/tests/fate/filter-video.mak
@@ -175,7 +175,7 @@ FATE_FILTER-$(call FILTERFRAMECRC, TESTSRC FORMAT CONCAT SCALE, LAVFI_INDEV FILE
 fate-filter-lavd-scalenorm: tests/data/filtergraphs/scalenorm
 fate-filter-lavd-scalenorm: CMD = framecrc -f lavfi -graph_file $(TARGET_PATH)/tests/data/filtergraphs/scalenorm -i dummy
 
-FATE_FILTER-$(call FILTERFRAMECRC, TESTSRC2 FEEDBACK HFLIP) += fate-filter-feedback-hflip
+FATE_FILTER-$(call FILTERFRAMECRC, TESTSRC2 FEEDBACK HFLIP, LAVFI_INDEV) += fate-filter-feedback-hflip
 fate-filter-feedback-hflip: CMD = framecrc -f lavfi -i testsrc2=d=1 -vf "[in][hflipin]feedback=x=0:y=0:w=100:h=100[out][hflipout];[hflipout]hflip[hflipin]"
 
 FATE_FILTER-$(call FILTERFRAMECRC, FRAMERATE TESTSRC2) += fate-filter-framerate-up fate-filter-framerate-down
-- 
2.52.0


From b984161566ee326c62ba92717c0935a68ccc5841 Mon Sep 17 00:00:00 2001
From: Leo Izen <leo.izen@gmail.com>
Date: Mon, 1 Dec 2025 06:11:58 -0500
Subject: [PATCH 298/304] avcodec/libjxlenc: give display matrix sidedata
 priority

Before this commit, we ignore the display matrix side data if any EXIF
side data is present, even if that side data contains no orientation
tag. This allows us to calculate the orientation from the display
matrix sidedata first, if present. Ideally the decoder will have
removed the orientation tag upon decoding and attached the data as
display matrix side data instead, so this makes our orientation code
respect this behavior.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
---
 libavcodec/libjxlenc.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/libavcodec/libjxlenc.c b/libavcodec/libjxlenc.c
index a2fec89560..8ddbfaa098 100644
--- a/libavcodec/libjxlenc.c
+++ b/libavcodec/libjxlenc.c
@@ -325,7 +325,7 @@ static int libjxl_preprocess_stream(AVCodecContext *avctx, const AVFrame *frame,
     LibJxlEncodeContext *ctx = avctx->priv_data;
     AVFrameSideData *sd;
     int32_t *matrix = (int32_t[9]){ 0 };
-    int ret = 0;
+    int ret = 0, have_matrix = 0;
     const AVPixFmtDescriptor *pix_desc = av_pix_fmt_desc_get(frame->format);
     JxlBasicInfo info;
     JxlPixelFormat *jxl_fmt = &ctx->jxl_fmt;
@@ -383,6 +383,11 @@ static int libjxl_preprocess_stream(AVCodecContext *avctx, const AVFrame *frame,
     /* bitexact lossless requires there to be no XYB transform */
     info.uses_original_profile = ctx->distance == 0.0 || !ctx->xyb;
 
+    sd = av_frame_get_side_data(frame, AV_FRAME_DATA_DISPLAYMATRIX);
+    if (sd) {
+        matrix = (int32_t *) sd->data;
+        have_matrix = 1;
+    }
     sd = av_frame_get_side_data(frame, AV_FRAME_DATA_EXIF);
     if (sd) {
         AVExifMetadata ifd = { 0 };
@@ -393,25 +398,26 @@ static int libjxl_preprocess_stream(AVCodecContext *avctx, const AVFrame *frame,
             ret = ff_exif_sanitize_ifd(avctx, frame, &ifd);
         if (ret >= 0)
             ret = av_exif_get_entry(avctx, &ifd, tag, 0, &orient);
-        if (ret >= 0 && orient && orient->value.uint[0] >= 1 && orient->value.uint[0] <= 8) {
-            av_exif_orientation_to_matrix(matrix, orient->value.uint[0]);
+        if (ret >= 0 && orient) {
+            if (!have_matrix && orient->value.uint[0] >= 1 && orient->value.uint[0] <= 8) {
+                av_exif_orientation_to_matrix(matrix, orient->value.uint[0]);
+                have_matrix = 1;
+            }
+            /* pop the orientation tag anyway, because it only creates */
+            /* ambiguity with the codestream orientation taking precdence */
             ret = av_exif_remove_entry(avctx, &ifd, tag, 0);
-        } else {
-            av_exif_orientation_to_matrix(matrix, 1);
         }
         if (ret >= 0)
             ret = av_exif_write(avctx, &ifd, &exif_buffer, AV_EXIF_TIFF_HEADER);
         if (ret < 0)
             av_log(avctx, AV_LOG_WARNING, "unable to process EXIF frame data\n");
         av_exif_free(&ifd);
-    } else {
-        sd = av_frame_get_side_data(frame, AV_FRAME_DATA_DISPLAYMATRIX);
-        if (sd)
-            matrix = (int32_t *) sd->data;
-        else
-            av_exif_orientation_to_matrix(matrix, 1);
     }
 
+    /* use identity matrix as default */
+    if (!have_matrix)
+        av_exif_orientation_to_matrix(matrix, 1);
+
     /* av_display_matrix_flip is a right-multipilcation */
     /* i.e. flip is applied before the previous matrix */
     if (frame->linesize < 0)
-- 
2.52.0


From f1b359c52b78086fe081df7e9a4671771b3a96ce Mon Sep 17 00:00:00 2001
From: Leo Izen <leo.izen@gmail.com>
Date: Mon, 1 Dec 2025 06:20:58 -0500
Subject: [PATCH 299/304] avcodec/libjxlenc: avoid calling functions inside if
 statements

It leads to messier, less readable code, and can also lead to bugs.
I prefer this code style.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
---
 libavcodec/libjxlenc.c | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/libavcodec/libjxlenc.c b/libavcodec/libjxlenc.c
index 8ddbfaa098..67650da7e9 100644
--- a/libavcodec/libjxlenc.c
+++ b/libavcodec/libjxlenc.c
@@ -326,6 +326,7 @@ static int libjxl_preprocess_stream(AVCodecContext *avctx, const AVFrame *frame,
     AVFrameSideData *sd;
     int32_t *matrix = (int32_t[9]){ 0 };
     int ret = 0, have_matrix = 0;
+    JxlEncoderStatus jret = JXL_ENC_SUCCESS;
     const AVPixFmtDescriptor *pix_desc = av_pix_fmt_desc_get(frame->format);
     JxlBasicInfo info;
     JxlPixelFormat *jxl_fmt = &ctx->jxl_fmt;
@@ -445,7 +446,8 @@ static int libjxl_preprocess_stream(AVCodecContext *avctx, const AVFrame *frame,
         info.animation.tps_denominator = avctx->time_base.num;
     }
 
-    if (JxlEncoderSetBasicInfo(ctx->encoder, &info) != JXL_ENC_SUCCESS) {
+    jret = JxlEncoderSetBasicInfo(ctx->encoder, &info);
+    if (jret != JXL_ENC_SUCCESS) {
         av_log(avctx, AV_LOG_ERROR, "Failed to set JxlBasicInfo\n");
         ret = AVERROR_EXTERNAL;
         goto end;
@@ -457,36 +459,42 @@ static int libjxl_preprocess_stream(AVCodecContext *avctx, const AVFrame *frame,
         extra_info.bits_per_sample = info.alpha_bits;
         extra_info.exponent_bits_per_sample = info.alpha_exponent_bits;
         extra_info.alpha_premultiplied = info.alpha_premultiplied;
-
-        if (JxlEncoderSetExtraChannelInfo(ctx->encoder, 0, &extra_info) != JXL_ENC_SUCCESS) {
+        jret = JxlEncoderSetExtraChannelInfo(ctx->encoder, 0, &extra_info);
+        if (jret != JXL_ENC_SUCCESS) {
             av_log(avctx, AV_LOG_ERROR, "Failed to set JxlExtraChannelInfo for alpha!\n");
-            return AVERROR_EXTERNAL;
+            ret = AVERROR_EXTERNAL;
+            goto end;
         }
     }
 
     sd = av_frame_get_side_data(frame, AV_FRAME_DATA_ICC_PROFILE);
-    if (sd && sd->size && JxlEncoderSetICCProfile(ctx->encoder, sd->data, sd->size) != JXL_ENC_SUCCESS) {
-        av_log(avctx, AV_LOG_WARNING, "Could not set ICC Profile\n");
-        sd = NULL;
+    if (sd && sd->size) {
+        jret = JxlEncoderSetICCProfile(ctx->encoder, sd->data, sd->size);
+        if (jret != JXL_ENC_SUCCESS)
+            av_log(avctx, AV_LOG_WARNING, "Could not set ICC Profile\n");
     }
 
-    if (!sd || !sd->size)
+    /* jret != JXL_ENC_SUCCESS means fallthrough from above */
+    if (!sd || !sd->size || jret != JXL_ENC_SUCCESS)
         libjxl_populate_colorspace(avctx, frame, pix_desc, &info);
 
 #if JPEGXL_NUMERIC_VERSION >= JPEGXL_COMPUTE_NUMERIC_VERSION(0, 8, 0)
-    if (JxlEncoderSetFrameBitDepth(ctx->options, &jxl_bit_depth) != JXL_ENC_SUCCESS)
+    jret = JxlEncoderSetFrameBitDepth(ctx->options, &jxl_bit_depth);
+    if (jret != JXL_ENC_SUCCESS)
         av_log(avctx, AV_LOG_WARNING, "Failed to set JxlBitDepth\n");
 #endif
 
     if (exif_buffer) {
-        if (JxlEncoderUseBoxes(ctx->encoder) != JXL_ENC_SUCCESS)
+        jret = JxlEncoderUseBoxes(ctx->encoder);
+        if (jret != JXL_ENC_SUCCESS)
             av_log(avctx, AV_LOG_WARNING, "Couldn't enable UseBoxes\n");
     }
 
     /* depending on basic info, level 10 might
      * be required instead of level 5 */
     if (JxlEncoderGetRequiredCodestreamLevel(ctx->encoder) > 5) {
-        if (JxlEncoderSetCodestreamLevel(ctx->encoder, 10) != JXL_ENC_SUCCESS)
+        jret = JxlEncoderSetCodestreamLevel(ctx->encoder, 10);
+        if (jret != JXL_ENC_SUCCESS)
             av_log(avctx, AV_LOG_WARNING, "Could not increase codestream level\n");
     }
 
-- 
2.52.0


From ad2e06c6f53582caf622186bd07864a5e107feee Mon Sep 17 00:00:00 2001
From: Leo Izen <leo.izen@gmail.com>
Date: Mon, 1 Dec 2025 06:43:32 -0500
Subject: [PATCH 300/304] avcodec/libjxlenc: add EXIF box to output

We already parse the EXIF side data to extract the orientation, so we
should add it to the output file as an EXIF box.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
---
 libavcodec/libjxlenc.c | 48 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 39 insertions(+), 9 deletions(-)

diff --git a/libavcodec/libjxlenc.c b/libavcodec/libjxlenc.c
index 67650da7e9..ca46061545 100644
--- a/libavcodec/libjxlenc.c
+++ b/libavcodec/libjxlenc.c
@@ -59,6 +59,7 @@ typedef struct LibJxlEncodeContext {
     uint8_t *buffer;
     size_t buffer_size;
     JxlPixelFormat jxl_fmt;
+    AVBufferRef *exif_buffer;
 
     /* animation stuff */
     AVFrame *frame;
@@ -316,6 +317,40 @@ static int libjxl_populate_colorspace(AVCodecContext *avctx, const AVFrame *fram
     return 0;
 }
 
+static int libjxl_add_boxes(AVCodecContext *avctx)
+{
+    LibJxlEncodeContext *ctx = avctx->priv_data;
+    JxlEncoderStatus jret = JXL_ENC_SUCCESS;
+    int ret = 0, opened = 0;
+
+    /* no boxes need to be added */
+    if (!ctx->exif_buffer)
+        goto end;
+
+    jret = JxlEncoderUseBoxes(ctx->encoder);
+    if (jret != JXL_ENC_SUCCESS) {
+        av_log(avctx, AV_LOG_WARNING, "Could not enable UseBoxes\n");
+        ret = AVERROR_EXTERNAL;
+        goto end;
+    }
+    opened = 1;
+
+    jret = JxlEncoderAddBox(ctx->encoder, "Exif", ctx->exif_buffer->data, ctx->exif_buffer->size, JXL_TRUE);
+    if (jret != JXL_ENC_SUCCESS)
+        jret = JxlEncoderAddBox(ctx->encoder, "Exif", ctx->exif_buffer->data, ctx->exif_buffer->size, JXL_FALSE);
+    if (jret != JXL_ENC_SUCCESS) {
+        av_log(avctx, AV_LOG_WARNING, "Failed to add Exif box\n");
+        ret = AVERROR_EXTERNAL;
+        goto end;
+    }
+
+end:
+    if (opened)
+        JxlEncoderCloseBoxes(ctx->encoder);
+
+    return ret;
+}
+
 /**
  * Sends metadata to libjxl based on the first frame of the stream, such as pixel format,
  * orientation, bit depth, and that sort of thing.
@@ -332,7 +367,6 @@ static int libjxl_preprocess_stream(AVCodecContext *avctx, const AVFrame *frame,
     JxlPixelFormat *jxl_fmt = &ctx->jxl_fmt;
     int bits_per_sample;
     int orientation;
-    AVBufferRef *exif_buffer = NULL;
 #if JPEGXL_NUMERIC_VERSION >= JPEGXL_COMPUTE_NUMERIC_VERSION(0, 8, 0)
     JxlBitDepth jxl_bit_depth;
 #endif
@@ -409,7 +443,7 @@ static int libjxl_preprocess_stream(AVCodecContext *avctx, const AVFrame *frame,
             ret = av_exif_remove_entry(avctx, &ifd, tag, 0);
         }
         if (ret >= 0)
-            ret = av_exif_write(avctx, &ifd, &exif_buffer, AV_EXIF_TIFF_HEADER);
+            ret = av_exif_write(avctx, &ifd, &ctx->exif_buffer, AV_EXIF_T_OFF);
         if (ret < 0)
             av_log(avctx, AV_LOG_WARNING, "unable to process EXIF frame data\n");
         av_exif_free(&ifd);
@@ -484,12 +518,6 @@ static int libjxl_preprocess_stream(AVCodecContext *avctx, const AVFrame *frame,
         av_log(avctx, AV_LOG_WARNING, "Failed to set JxlBitDepth\n");
 #endif
 
-    if (exif_buffer) {
-        jret = JxlEncoderUseBoxes(ctx->encoder);
-        if (jret != JXL_ENC_SUCCESS)
-            av_log(avctx, AV_LOG_WARNING, "Couldn't enable UseBoxes\n");
-    }
-
     /* depending on basic info, level 10 might
      * be required instead of level 5 */
     if (JxlEncoderGetRequiredCodestreamLevel(ctx->encoder) > 5) {
@@ -498,8 +526,10 @@ static int libjxl_preprocess_stream(AVCodecContext *avctx, const AVFrame *frame,
             av_log(avctx, AV_LOG_WARNING, "Could not increase codestream level\n");
     }
 
+    libjxl_add_boxes(avctx);
+
 end:
-    av_buffer_unref(&exif_buffer);
+    av_buffer_unref(&ctx->exif_buffer);
     return ret;
 }
 
-- 
2.52.0


From cf16eb74c59d016af65418c78f041c5c6a17036e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Denis-Courmont?= <remi@remlab.net>
Date: Wed, 10 Dec 2025 20:35:59 +0200
Subject: [PATCH 301/304] lavc/llvidencdsp: R-V V sub_left_predict

SpacemiT X60:
sub_left_predict_c:                                  51836.0 ( 1.00x)
sub_left_predict_rvv_i32:                             5843.1 ( 8.87x)
---
 libavcodec/riscv/llvidencdsp_init.c |  4 ++++
 libavcodec/riscv/llvidencdsp_rvv.S  | 27 +++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/libavcodec/riscv/llvidencdsp_init.c b/libavcodec/riscv/llvidencdsp_init.c
index e35406dc41..bd2ffef42f 100644
--- a/libavcodec/riscv/llvidencdsp_init.c
+++ b/libavcodec/riscv/llvidencdsp_init.c
@@ -26,6 +26,9 @@
 
 void ff_llvidenc_diff_bytes_rvv(uint8_t *dst, const uint8_t *src1,
                                 const uint8_t *src2, intptr_t w);
+void ff_llvidenc_sub_left_predict_rvv(uint8_t *dst, const uint8_t *src,
+                                      ptrdiff_t stride, ptrdiff_t width,
+                                      int height);
 
 av_cold void ff_llvidencdsp_init_riscv(LLVidEncDSPContext *c)
 {
@@ -34,6 +37,7 @@ av_cold void ff_llvidencdsp_init_riscv(LLVidEncDSPContext *c)
 
     if (flags & AV_CPU_FLAG_RVV_I32) {
         c->diff_bytes = ff_llvidenc_diff_bytes_rvv;
+        c->sub_left_predict = ff_llvidenc_sub_left_predict_rvv;
     }
 #endif
 }
diff --git a/libavcodec/riscv/llvidencdsp_rvv.S b/libavcodec/riscv/llvidencdsp_rvv.S
index 44bf3ac7e5..a862b776e0 100644
--- a/libavcodec/riscv/llvidencdsp_rvv.S
+++ b/libavcodec/riscv/llvidencdsp_rvv.S
@@ -36,3 +36,30 @@ func ff_llvidenc_diff_bytes_rvv, zve32x
 
         ret
 endfunc
+
+func ff_llvidenc_sub_left_predict_rvv, zve32x
+        lpad    0
+        li      a5, -0x80
+        sub     a2, a2, a3
+1:
+        mv      t3, a3
+        addi    a4, a4, -1
+2:
+        vsetvli t0, t3, e8, m8, ta, ma
+        vle8.v  v16, (a1)
+        sub     t3, t3, t0
+        vle8.v  v8, (a0)
+        add     a1, a1, t0
+        vslide1up.vx    v24, v16, a5
+        vadd.vv v8, v8, v16
+        lb      a5, -1(a1)
+        vsub.vv v8, v8, v24
+        vse8.v  v8, (a0)
+        add     a0, a0, t0
+        bnez    t3, 2b
+
+        add     a1, a1, a2
+        bnez    a4, 1b
+
+        ret
+endfunc
-- 
2.52.0


From 5e3af66f6e8d1932daa14bc32581426ff8ca053a Mon Sep 17 00:00:00 2001
From: Ruikai Peng <ruikai@pwno.io>
Date: Thu, 11 Dec 2025 06:27:48 +0000
Subject: [PATCH 302/304] avformat/sierravmd: fix header read error check

The header read check stored the comparison result into ret, so read
failures became ret=1 and were treated as success, leaving the VMD header
uninitialized and letting parsing continue with bogus state.

Regression since: ee623a43e3.
Found-by: Pwno
---
 libavformat/sierravmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavformat/sierravmd.c b/libavformat/sierravmd.c
index bb1d1c5df7..2519756533 100644
--- a/libavformat/sierravmd.c
+++ b/libavformat/sierravmd.c
@@ -103,7 +103,7 @@ static int vmd_read_header(AVFormatContext *s)
 
     /* fetch the main header, including the 2 header length bytes */
     avio_seek(pb, 0, SEEK_SET);
-    if ((ret = ffio_read_size(pb, vmd->vmd_header, VMD_HEADER_SIZE) < 0))
+    if ((ret = ffio_read_size(pb, vmd->vmd_header, VMD_HEADER_SIZE)) < 0)
         return ret;
 
     width = AV_RL16(&vmd->vmd_header[12]);
-- 
2.52.0


From 4b82093623c5c6ae1bc9525646cd4f415f03d696 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jan=20Ekstr=C3=B6m?= <jeebjp@gmail.com>
Date: Thu, 11 Dec 2025 22:11:07 +0200
Subject: [PATCH 303/304] fate/ffmpeg: remove comparison against ref from
 fix_sub_duration_heartbeat

After the full ffmpeg CLI multithreading changes went in, this
test started depending on how far the input side read and decoded
the input compared to how quickly the output encoded things, causing
spurious failures on the CI.

To my knowledge all of the failures have so far been valid correct
results, but unfortunately FATE's built in checks mostly consist of
whether there is a difference against an exact result.

This way we still get the CI and valgrind running of the code,
but stop its comparison. Reference file is left around so that
the previous reference is still available.
---
 tests/fate/ffmpeg.mak | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tests/fate/ffmpeg.mak b/tests/fate/ffmpeg.mak
index c0da2da7f8..cd00275638 100644
--- a/tests/fate/ffmpeg.mak
+++ b/tests/fate/ffmpeg.mak
@@ -133,6 +133,11 @@ fate-ffmpeg-fix_sub_duration_heartbeat: CMD = fmtstdout srt -fix_sub_duration \
   -c:v mpeg2video -b:v 2M -g 30 -sc_threshold 1000000000 \
   -c:s srt \
   -f null -
+# FIXME: disabling comparison against reference as after ffmpeg multithreading
+#        went in, this test started depending on how far the input side
+#        progressed compared to how quickly the output encoded packets,
+#        causing spurious failures on the CI.
+fate-ffmpeg-fix_sub_duration_heartbeat: CMP = null
 
 # FIXME: the integer AAC decoder does not produce the same output on all platforms
 # so until that is fixed we use the volume filter to silence the data
-- 
2.52.0


From 2805061010ad0858f6b9f2d0884bf786c46f7dbd Mon Sep 17 00:00:00 2001
From: Anthony Bajoua <anthonybajoua@meta.com>
Date: Fri, 30 Jan 2026 13:43:03 -0800
Subject: [PATCH 304/304] libavformat/movenc: Uses dynamic buffers for
 fragmented chunks

---
 libavformat/movenc.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/libavformat/movenc.c b/libavformat/movenc.c
index 8d8acd2aff..f3b5f2d758 100644
--- a/libavformat/movenc.c
+++ b/libavformat/movenc.c
@@ -6386,7 +6386,7 @@ static int mov_flush_fragment_interleaving(AVFormatContext *s, MOVTrack *track)
 
     offset = avio_tell(mov->mdat_buf);
     avio_write(mov->mdat_buf, buf, buf_size);
-    ffio_free_dyn_buf(&track->mdat_buf);
+    ffio_reset_dyn_buf(track->mdat_buf);
 
     for (i = track->entries_flushed; i < track->entry; i++)
         track->cluster[i].pos += offset;
@@ -6591,7 +6591,7 @@ static int mov_flush_fragment(AVFormatContext *s, int force)
         avio_wb32(s->pb, buf_size + 8);
         ffio_wfourcc(s->pb, "mdat");
         avio_write(s->pb, buf, buf_size);
-        ffio_free_dyn_buf(&mov->mdat_buf);
+        ffio_reset_dyn_buf(mov->mdat_buf);
 
         if (mov->flags & FF_MOV_FLAG_GLOBAL_SIDX)
             mov->reserved_header_pos = avio_tell(s->pb);
@@ -6678,17 +6678,16 @@ static int mov_flush_fragment(AVFormatContext *s, int force)
         if (!mov->frag_interleave) {
             if (!track->mdat_buf)
                 continue;
-            buf_size = avio_close_dyn_buf(track->mdat_buf, &buf);
-            track->mdat_buf = NULL;
+            buf_size = avio_get_dyn_buf(track->mdat_buf, &buf);
+            avio_write(s->pb, buf, buf_size);
+            ffio_reset_dyn_buf(track->mdat_buf);
         } else {
             if (!mov->mdat_buf)
                 continue;
-            buf_size = avio_close_dyn_buf(mov->mdat_buf, &buf);
-            mov->mdat_buf = NULL;
+            buf_size = avio_get_dyn_buf(mov->mdat_buf, &buf);
+            avio_write(s->pb, buf, buf_size);
+            ffio_reset_dyn_buf(mov->mdat_buf);
         }
-
-        avio_write(s->pb, buf, buf_size);
-        av_free(buf);
     }
 
     mov->mdat_size = 0;
-- 
2.52.0

_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-01-30 23:25 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-30 23:22 [FFmpeg-devel] [PR] WIP: libavformat/movenc: Uses dynamic buffers for fragmented chunks (PR #21613) anthonybajoua via ffmpeg-devel

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git