Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
* [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel
@ 2025-04-12  7:22 Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 02/18] vulkan_ffv1: enable acceleration " Lynne
                   ` (17 more replies)
  0 siblings, 18 replies; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

Temporary workaround. Will be replaced with a version check once a fix is
in the works and a known next version for Mesa with a fix is known.
---
 libavutil/hwcontext_vulkan.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/libavutil/hwcontext_vulkan.c b/libavutil/hwcontext_vulkan.c
index 319b71ed04..d11c0274d2 100644
--- a/libavutil/hwcontext_vulkan.c
+++ b/libavutil/hwcontext_vulkan.c
@@ -773,6 +773,11 @@ static int check_extensions(AVHWDeviceContext *ctx, int dev, AVDictionary *opts,
         tstr = optional_exts[i].name;
         found = 0;
 
+        /* Intel has had a bad descriptor buffer implementation for a while */
+        if (p->vkctx.driver_props.driverID == VK_DRIVER_ID_INTEL_OPEN_SOURCE_MESA &&
+            !strcmp(tstr, VK_EXT_DESCRIPTOR_BUFFER_EXTENSION_NAME))
+            continue;
+
         if (dev &&
             ((debug_mode == FF_VULKAN_DEBUG_VALIDATE) ||
              (debug_mode == FF_VULKAN_DEBUG_PRINTF) ||
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 02/18] vulkan_ffv1: enable acceleration on Intel
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
@ 2025-04-12  7:22 ` Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 03/18] vulkan_ffv1: remove unused define Lynne
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

Fixed by previous commit.
---
 libavcodec/vulkan_ffv1.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/libavcodec/vulkan_ffv1.c b/libavcodec/vulkan_ffv1.c
index 17bfc943d4..1156d6749b 100644
--- a/libavcodec/vulkan_ffv1.c
+++ b/libavcodec/vulkan_ffv1.c
@@ -1142,20 +1142,6 @@ static int vk_decode_ffv1_init(AVCodecContext *avctx)
         return err;
     ctx = dec->shared_ctx;
 
-    switch (ctx->s.driver_props.driverID) {
-    case VK_DRIVER_ID_INTEL_PROPRIETARY_WINDOWS:
-    case VK_DRIVER_ID_INTEL_OPEN_SOURCE_MESA:
-        if (avctx->strict_std_compliance < FF_COMPLIANCE_UNOFFICIAL) {
-            av_log(avctx, AV_LOG_ERROR,
-                   "Intel's drivers are unsupported, use -strict -1 to enable acceleration.\n");
-            return AVERROR(ENOTSUP);
-        } else {
-            av_log(avctx, AV_LOG_WARNING,
-                   "Enabling acceleration on Intel's drivers.\n");
-        }
-        break;
-    };
-
     fv = ctx->sd_ctx = av_mallocz(sizeof(*fv));
     if (!fv) {
         err = AVERROR(ENOMEM);
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 03/18] vulkan_ffv1: remove unused define
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 02/18] vulkan_ffv1: enable acceleration " Lynne
@ 2025-04-12  7:22 ` Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 04/18] vulkan_ffv1: slightly optimize the range decoder Lynne
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

Leftover debug macro.
---
 libavcodec/vulkan_ffv1.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/libavcodec/vulkan_ffv1.c b/libavcodec/vulkan_ffv1.c
index 1156d6749b..b6c9320ec2 100644
--- a/libavcodec/vulkan_ffv1.c
+++ b/libavcodec/vulkan_ffv1.c
@@ -41,8 +41,6 @@ const FFVulkanDecodeDescriptor ff_vk_dec_ffv1_desc = {
     .queue_flags      = VK_QUEUE_COMPUTE_BIT,
 };
 
-#define HOST_MAP
-
 typedef struct FFv1VulkanDecodePicture {
     FFVulkanDecodePicture vp;
 
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 04/18] vulkan_ffv1: slightly optimize the range decoder
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 02/18] vulkan_ffv1: enable acceleration " Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 03/18] vulkan_ffv1: remove unused define Lynne
@ 2025-04-12  7:22 ` Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 05/18] vulkan_ffv1: optimize symbol reader Lynne
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

GPUs have cmovs as standard.
---
 libavcodec/vulkan/rangecoder.comp | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/libavcodec/vulkan/rangecoder.comp b/libavcodec/vulkan/rangecoder.comp
index 4272b2a42f..ba8a6cfd9d 100644
--- a/libavcodec/vulkan/rangecoder.comp
+++ b/libavcodec/vulkan/rangecoder.comp
@@ -219,7 +219,7 @@ void refill(inout RangeCoder c)
     c.range <<= 8;
     c.low   <<= 8;
     if (c.bytestream < c.bytestream_end) {
-        c.low += u8buf(c.bytestream).v;
+        c.low |= u8buf(c.bytestream).v;
         c.bytestream++;
     } else {
         overread++;
@@ -234,11 +234,10 @@ bool get_rac(inout RangeCoder c, uint64_t state)
     int ranged = c.range + range1;
 
     bool bit = c.low >= ranged;
-    int bv = bit ? 0xFFFFFFFF : 0;
-    sb.v = zero_one_state[(bv & 0x100) + val];
+    sb.v = zero_one_state[val + (bit ? 256 : 0)];
 
-    c.low = c.low - (bv & ranged);
-    c.range = (ranged & ~bv) - (range1 & bv);
+    c.low = c.low - (bit ? ranged : 0);
+    c.range = (bit ? 0 : ranged) - (bit ? range1 : 0);
 
     if (c.range < 0x100)
         refill(c);
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 05/18] vulkan_ffv1: optimize symbol reader
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
                   ` (2 preceding siblings ...)
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 04/18] vulkan_ffv1: slightly optimize the range decoder Lynne
@ 2025-04-12  7:22 ` Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 06/18] vulkan_ffv1: allocate just as much memory for slice state as needed Lynne
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

This was the fastest variant tested.
---
 libavcodec/vulkan/ffv1_dec.comp | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/libavcodec/vulkan/ffv1_dec.comp b/libavcodec/vulkan/ffv1_dec.comp
index f9ffe1cee1..7d3150ed63 100644
--- a/libavcodec/vulkan/ffv1_dec.comp
+++ b/libavcodec/vulkan/ffv1_dec.comp
@@ -78,13 +78,11 @@ int get_isymbol(inout RangeCoder c, uint64_t state)
 
     state += 21;
 
-    int a = 1 << e;
-    int i;
-    for (i = e - 1; i >= 9; i--)
-        a |= int(get_rac(c, state + 9)) << i;  // 22..31
-
-    for (; i >= 0; i--)
-        a |= int(get_rac(c, state + i)) << i;  // 22..31
+    int a = 1;
+    for (int i = e - 1; i >= 0; i--) {
+        a <<= 1;
+        a |= int(get_rac(c, state + min(i, 9)));  // 22..31
+    }
 
     return get_rac(c, state - 11 + min(e, 10)) ? -a : a;
 }
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 06/18] vulkan_ffv1: allocate just as much memory for slice state as needed
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
                   ` (3 preceding siblings ...)
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 05/18] vulkan_ffv1: optimize symbol reader Lynne
@ 2025-04-12  7:22 ` Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 07/18] vulkan_ffv1: init overread/corrupt fields Lynne
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

Rather than always using the maximum allowed slices, just use the number
of slices present in this frame.
---
 libavcodec/vulkan_ffv1.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libavcodec/vulkan_ffv1.c b/libavcodec/vulkan_ffv1.c
index b6c9320ec2..9747721f0d 100644
--- a/libavcodec/vulkan_ffv1.c
+++ b/libavcodec/vulkan_ffv1.c
@@ -197,7 +197,7 @@ static int vk_ffv1_start_frame(AVCodecContext          *avctx,
                                       &fp->slice_state,
                                       VK_BUFFER_USAGE_STORAGE_BUFFER_BIT |
                                       VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT,
-                                      NULL, f->max_slice_count*fp->slice_state_size,
+                                      NULL, f->slice_count*fp->slice_state_size,
                                       VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT);
         if (err < 0)
             return err;
@@ -213,7 +213,7 @@ static int vk_ffv1_start_frame(AVCodecContext          *avctx,
                                   &fp->tmp_data,
                                   VK_BUFFER_USAGE_STORAGE_BUFFER_BIT |
                                   VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT,
-                                  NULL, f->max_slice_count*CONTEXT_SIZE,
+                                  NULL, f->slice_count*CONTEXT_SIZE,
                                   VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT);
     if (err < 0)
         return err;
@@ -223,7 +223,7 @@ static int vk_ffv1_start_frame(AVCodecContext          *avctx,
                                   &fp->slice_offset_buf,
                                   VK_BUFFER_USAGE_STORAGE_BUFFER_BIT |
                                   VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT,
-                                  NULL, 2*f->max_slice_count*sizeof(uint32_t),
+                                  NULL, 2*f->slice_count*sizeof(uint32_t),
                                   VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
                                   VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT);
     if (err < 0)
@@ -234,7 +234,7 @@ static int vk_ffv1_start_frame(AVCodecContext          *avctx,
                                   &fp->slice_status_buf,
                                   VK_BUFFER_USAGE_STORAGE_BUFFER_BIT |
                                   VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT,
-                                  NULL, f->max_slice_count*sizeof(uint32_t),
+                                  NULL, f->slice_count*sizeof(uint32_t),
                                   VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
                                   VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT);
     if (err < 0)
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 07/18] vulkan_ffv1: init overread/corrupt fields
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
                   ` (4 preceding siblings ...)
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 06/18] vulkan_ffv1: allocate just as much memory for slice state as needed Lynne
@ 2025-04-12  7:22 ` Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 08/18] vulkan_ffv1: fallback to upload if mapping packet fails, fix fallback Lynne
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

Forgotten.
---
 libavcodec/vulkan/rangecoder.comp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavcodec/vulkan/rangecoder.comp b/libavcodec/vulkan/rangecoder.comp
index ba8a6cfd9d..e332bce8a5 100644
--- a/libavcodec/vulkan/rangecoder.comp
+++ b/libavcodec/vulkan/rangecoder.comp
@@ -193,8 +193,8 @@ void rac_init(out RangeCoder r, u8buf data, uint buf_size)
 }
 
 /* Decoder */
-uint overread;
-bool corrupt;
+uint overread = 0;
+bool corrupt = false;
 
 void rac_init_dec(out RangeCoder r, u8buf data, uint buf_size)
 {
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 08/18] vulkan_ffv1: fallback to upload if mapping packet fails, fix fallback
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
                   ` (5 preceding siblings ...)
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 07/18] vulkan_ffv1: init overread/corrupt fields Lynne
@ 2025-04-12  7:22 ` Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 09/18] vulkan_ffv1: fix reset shader dependencies Lynne
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

The commit which added support for host mapping accidentally broke the
original, upload route.
For drivers without host-mapping (very few), fix it.
---
 libavcodec/vulkan_ffv1.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/libavcodec/vulkan_ffv1.c b/libavcodec/vulkan_ffv1.c
index 9747721f0d..ccff927200 100644
--- a/libavcodec/vulkan_ffv1.c
+++ b/libavcodec/vulkan_ffv1.c
@@ -182,14 +182,11 @@ static int vk_ffv1_start_frame(AVCodecContext          *avctx,
     fp->crc_checked = f->ec && (avctx->err_recognition & AV_EF_CRCCHECK);
 
     /* Host map the input slices data if supported */
-    if (ctx->s.extensions & FF_VK_EXT_EXTERNAL_HOST_MEMORY) {
-        err = ff_vk_host_map_buffer(&ctx->s, &vp->slices_buf, buffer_ref->data,
-                                    buffer_ref,
-                                    VK_BUFFER_USAGE_STORAGE_BUFFER_BIT |
-                                    VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT);
-        if (err < 0)
-            return err;
-    }
+    if (ctx->s.extensions & FF_VK_EXT_EXTERNAL_HOST_MEMORY)
+        ff_vk_host_map_buffer(&ctx->s, &vp->slices_buf, buffer_ref->data,
+                              buffer_ref,
+                              VK_BUFFER_USAGE_STORAGE_BUFFER_BIT |
+                              VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT);
 
     /* Allocate slice state data */
     if (f->picture.f->flags & AV_FRAME_FLAG_KEY) {
@@ -266,16 +263,14 @@ static int vk_ffv1_decode_slice(AVCodecContext *avctx,
                                 uint32_t        size)
 {
     FFV1Context *f = avctx->priv_data;
-    FFVulkanDecodeContext *dec = avctx->internal->hwaccel_priv_data;
-    FFVulkanDecodeShared *ctx = dec->shared_ctx;
 
     FFv1VulkanDecodePicture *fp = f->hwaccel_picture_private;
     FFVulkanDecodePicture *vp = &fp->vp;
 
     FFVkBuffer *slice_offset = (FFVkBuffer *)fp->slice_offset_buf->data;
+    FFVkBuffer *slices_buf = vp->slices_buf ? (FFVkBuffer *)vp->slices_buf->data : NULL;
 
-    if (ctx->s.extensions & FF_VK_EXT_EXTERNAL_HOST_MEMORY) {
-        FFVkBuffer *slices_buf = (FFVkBuffer *)vp->slices_buf->data;
+    if (slices_buf && slices_buf->host_ref) {
         AV_WN32(slice_offset->mapped_mem + (2*fp->slice_num + 0)*sizeof(uint32_t),
                 data - slices_buf->mapped_mem);
         AV_WN32(slice_offset->mapped_mem + (2*fp->slice_num + 1)*sizeof(uint32_t),
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 09/18] vulkan_ffv1: fix reset shader dependencies
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
                   ` (6 preceding siblings ...)
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 08/18] vulkan_ffv1: fallback to upload if mapping packet fails, fix fallback Lynne
@ 2025-04-12  7:22 ` Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 10/18] vulkan_ffv1: improve buffer barrier correctness for slice state Lynne
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

Without a barrier upfront, the reset shader may read data fields not
yet set by the setup shader.
---
 libavcodec/vulkan_ffv1.c | 36 +++++++++++++++++-------------------
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/libavcodec/vulkan_ffv1.c b/libavcodec/vulkan_ffv1.c
index ccff927200..d90db291aa 100644
--- a/libavcodec/vulkan_ffv1.c
+++ b/libavcodec/vulkan_ffv1.c
@@ -375,31 +375,29 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
     fp->tmp_data = NULL;
 
     /* Entry barrier for the slice state */
-    if (!(f->picture.f->flags & AV_FRAME_FLAG_KEY)) {
-        buf_bar[nb_buf_bar++] = (VkBufferMemoryBarrier2) {
-            .sType = VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER_2,
-            .srcStageMask = slice_state->stage,
-            .dstStageMask = VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT,
-            .srcAccessMask = slice_state->access,
-            .dstAccessMask = VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT,
-            .srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
-            .dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
-            .buffer = slice_state->buf,
-            .offset = 0,
-            .size = VK_WHOLE_SIZE,
-        };
-    }
+    buf_bar[nb_buf_bar++] = (VkBufferMemoryBarrier2) {
+        .sType = VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER_2,
+        .srcStageMask = slice_state->stage,
+        .dstStageMask = VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT,
+        .srcAccessMask = slice_state->access,
+        .dstAccessMask = VK_ACCESS_2_SHADER_STORAGE_READ_BIT |
+                         VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT,
+        .srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
+        .dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
+        .buffer = slice_state->buf,
+        .offset = 0,
+        .size = fp->slice_data_size*f->slice_count,
+    };
 
     vk->CmdPipelineBarrier2(exec->buf, &(VkDependencyInfo) {
         .sType = VK_STRUCTURE_TYPE_DEPENDENCY_INFO,
         .pBufferMemoryBarriers = buf_bar,
         .bufferMemoryBarrierCount = nb_buf_bar,
     });
-    if (nb_buf_bar) {
-        slice_state->stage = buf_bar[1].dstStageMask;
-        slice_state->access = buf_bar[1].dstAccessMask;
-        nb_buf_bar = 0;
-    }
+    slice_state->stage = buf_bar[0].dstStageMask;
+    slice_state->access = buf_bar[0].dstAccessMask;
+    nb_buf_bar = 0;
+    nb_img_bar = 0;
 
     /* Setup shader */
     ff_vk_shader_update_desc_buffer(&ctx->s, exec, &fv->setup,
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 10/18] vulkan_ffv1: improve buffer barrier correctness for slice state
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
                   ` (7 preceding siblings ...)
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 09/18] vulkan_ffv1: fix reset shader dependencies Lynne
@ 2025-04-12  7:22 ` Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 11/18] vulkan_ffv1: fix left-2 sample addressing Lynne
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

This is likely a nanooptimization, but its more correct.
---
 libavcodec/vulkan_ffv1.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/libavcodec/vulkan_ffv1.c b/libavcodec/vulkan_ffv1.c
index d90db291aa..e511840a01 100644
--- a/libavcodec/vulkan_ffv1.c
+++ b/libavcodec/vulkan_ffv1.c
@@ -484,8 +484,7 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
         .srcStageMask = slice_state->stage,
         .dstStageMask = VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT,
         .srcAccessMask = slice_state->access,
-        .dstAccessMask = VK_ACCESS_2_SHADER_STORAGE_READ_BIT |
-                         VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT,
+        .dstAccessMask = VK_ACCESS_2_SHADER_STORAGE_READ_BIT,
         .srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
         .dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
         .buffer = slice_state->buf,
@@ -534,7 +533,7 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
         .dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
         .buffer = slice_state->buf,
         .offset = fp->slice_data_size*f->slice_count,
-        .size = slice_state->size - fp->slice_data_size*f->slice_count,
+        .size = f->slice_count*(fp->slice_state_size - fp->slice_data_size),
     };
 
     /* Input frame barrier */
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 11/18] vulkan_ffv1: fix left-2 sample addressing
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
                   ` (8 preceding siblings ...)
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 10/18] vulkan_ffv1: improve buffer barrier correctness for slice state Lynne
@ 2025-04-12  7:22 ` Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 12/18] vulkan_ffv1: cache only 2 lines when decoding RGB Lynne
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

Typo.
Not enough to fix context=1, but its a start.
---
 libavcodec/vulkan/ffv1_dec.comp | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/libavcodec/vulkan/ffv1_dec.comp b/libavcodec/vulkan/ffv1_dec.comp
index 7d3150ed63..1954c050f8 100644
--- a/libavcodec/vulkan/ffv1_dec.comp
+++ b/libavcodec/vulkan/ffv1_dec.comp
@@ -44,15 +44,17 @@ ivec2 get_pred(ivec2 pos, ivec2 off, int p, int sw, uint8_t quant_table_idx)
 
     if ((quant_table[quant_table_idx][3][127] != 0) ||
         (quant_table[quant_table_idx][4][127] != 0)) {
+        TYPE cur2 = TYPE(0);
         if (off.x > 0 && off != ivec2(1, 0)) {
             const ivec2 yoff_border2 = off.x == 1 ? ivec2(1, -1) : ivec2(0, 0);
-            TYPE cur2 = TYPE(imageLoad(dst[p], pos + ivec2(-2,  0) + yoff_border2)[0]);
-            base += quant_table[quant_table_idx][3][(cur2 - cur) & MAX_QUANT_TABLE_MASK];
-        }
-        if (off.y > 1) {
-            TYPE top2 = TYPE(imageLoad(dst[p], pos + ivec2(0, -2))[0]);
-            base += quant_table[quant_table_idx][4][(top2 - top[1]) & MAX_QUANT_TABLE_MASK];
+            cur2 = TYPE(imageLoad(dst[p], pos + ivec2(-2,  0) + yoff_border2)[0]);
         }
+        base += quant_table[quant_table_idx][3][(cur2 - cur) & MAX_QUANT_TABLE_MASK];
+
+        TYPE top2 = TYPE(0);
+        if (off.y > 1)
+            top2 = TYPE(imageLoad(dst[p], pos + ivec2(0, -2))[0]);
+        base += quant_table[quant_table_idx][4][(top2 - top[1]) & MAX_QUANT_TABLE_MASK];
     }
 
     /* context, prediction */
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 12/18] vulkan_ffv1: cache only 2 lines when decoding RGB
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
                   ` (9 preceding siblings ...)
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 11/18] vulkan_ffv1: fix left-2 sample addressing Lynne
@ 2025-04-12  7:22 ` Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 13/18] ffv1/vulkan: redo context count tracking and quant_table_idx management Lynne
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

This reduces the intermediate VRAM used for RGB decoding by a
factor of 100x for 6k video.
This also speeds the decoder up by 16% for 4k RGB24 and 31% for 6k video.

This is equivalent to what the software decoder does, but with less pointers.
---
 libavcodec/vulkan/Makefile          |   3 +-
 libavcodec/vulkan/ffv1_dec.comp     | 158 ++++++++++++----
 libavcodec/vulkan/ffv1_dec_rct.comp |  88 ---------
 libavcodec/vulkan_ffv1.c            | 283 ++++++++--------------------
 libavutil/vulkan_functions.h        |   1 +
 5 files changed, 203 insertions(+), 330 deletions(-)
 delete mode 100644 libavcodec/vulkan/ffv1_dec_rct.comp

diff --git a/libavcodec/vulkan/Makefile b/libavcodec/vulkan/Makefile
index e6bad486bd..feb5d2ea51 100644
--- a/libavcodec/vulkan/Makefile
+++ b/libavcodec/vulkan/Makefile
@@ -14,8 +14,7 @@ OBJS-$(CONFIG_FFV1_VULKAN_ENCODER)  +=  vulkan/common.o \
 OBJS-$(CONFIG_FFV1_VULKAN_HWACCEL)  +=  vulkan/common.o \
 					vulkan/rangecoder.o vulkan/ffv1_vlc.o \
 					vulkan/ffv1_common.o vulkan/ffv1_reset.o \
-					vulkan/ffv1_dec_setup.o vulkan/ffv1_dec.o \
-					vulkan/ffv1_dec_rct.o
+					vulkan/ffv1_dec_setup.o vulkan/ffv1_dec.o
 
 VULKAN = $(subst $(SRC_PATH)/,,$(wildcard $(SRC_PATH)/libavcodec/vulkan/*.comp))
 .SECONDARY: $(VULKAN:.comp=.c)
diff --git a/libavcodec/vulkan/ffv1_dec.comp b/libavcodec/vulkan/ffv1_dec.comp
index 1954c050f8..ae0324cb26 100644
--- a/libavcodec/vulkan/ffv1_dec.comp
+++ b/libavcodec/vulkan/ffv1_dec.comp
@@ -20,23 +20,69 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
-ivec2 get_pred(ivec2 pos, ivec2 off, int p, int sw, uint8_t quant_table_idx)
+#ifndef RGB
+#define LADDR(p) (p)
+#else
+#define RGB_LINECACHE 2
+#define RGB_LBUF (RGB_LINECACHE - 1)
+#define LADDR(p) (ivec2((p).x, ((p).y & RGB_LBUF)))
+#endif
+
+#ifdef RGB
+ivec2 get_pred(ivec2 sp, ivec2 off, int p, int sw, uint8_t quant_table_idx)
+{
+    const ivec2 yoff_border1 = off.x == 0 ? ivec2(1, -1) : ivec2(0, 0);
+
+    /* Thanks to the same coincidence as below, we can skip checking if off == 0, 1 */
+    VTYPE3 top  = VTYPE3(TYPE(imageLoad(dec[p], sp + LADDR(off + ivec2(-1, -1) + yoff_border1))[0]),
+                         TYPE(imageLoad(dec[p], sp + LADDR(off + ivec2(0, -1)))[0]),
+                         TYPE(imageLoad(dec[p], sp + LADDR(off + ivec2(min(1, sw - off.x - 1), -1)))[0]));
+
+    /* Normally, we'd need to check if off != ivec2(0, 0) here, since otherwise, we must
+     * return zero. However, ivec2(-1,  0) + ivec2(1, -1) == ivec2(0, -1), e.g. previous
+     * row, 0 offset, same slice, which is zero since we zero out the buffer for RGB */
+    TYPE cur = TYPE(imageLoad(dec[p], sp + LADDR(off + ivec2(-1,  0) + yoff_border1))[0]);
+
+    int base = quant_table[quant_table_idx][0][(cur    - top[0]) & MAX_QUANT_TABLE_MASK] +
+               quant_table[quant_table_idx][1][(top[0] - top[1]) & MAX_QUANT_TABLE_MASK] +
+               quant_table[quant_table_idx][2][(top[1] - top[2]) & MAX_QUANT_TABLE_MASK];
+
+    if ((quant_table[quant_table_idx][3][127] != 0) ||
+        (quant_table[quant_table_idx][4][127] != 0)) {
+        TYPE cur2 = TYPE(0);
+        if (off.x > 0) {
+            const ivec2 yoff_border2 = off.x == 1 ? ivec2(1, -1) : ivec2(0, 0);
+            cur2 = TYPE(imageLoad(dec[p], sp + LADDR(off + ivec2(-2,  0) + yoff_border2))[0]);
+        }
+        base += quant_table[quant_table_idx][3][(cur2 - cur) & MAX_QUANT_TABLE_MASK];
+
+        /* top-2 became current upon swap */
+        TYPE top2 = TYPE(imageLoad(dec[p], sp + LADDR(off))[0]);
+        base += quant_table[quant_table_idx][4][(top2 - top[1]) & MAX_QUANT_TABLE_MASK];
+    }
+
+    /* context, prediction */
+    return ivec2(base, predict(cur, VTYPE2(top)));
+}
+#else
+ivec2 get_pred(ivec2 sp, ivec2 off, int p, int sw, uint8_t quant_table_idx)
 {
     const ivec2 yoff_border1 = off.x == 0 ? ivec2(1, -1) : ivec2(0, 0);
+    sp += off;
 
     VTYPE3 top  = VTYPE3(TYPE(0),
                          TYPE(0),
                          TYPE(0));
     if (off.y > 0 && off != ivec2(0, 1))
-        top[0] = TYPE(imageLoad(dst[p], pos + ivec2(-1, -1) + yoff_border1)[0]);
+        top[0] = TYPE(imageLoad(dec[p], sp + ivec2(-1, -1) + yoff_border1)[0]);
     if (off.y > 0) {
-        top[1] = TYPE(imageLoad(dst[p], pos + ivec2(0, -1))[0]);
-        top[2] = TYPE(imageLoad(dst[p], pos + ivec2(min(1, sw - off.x - 1), -1))[0]);
+        top[1] = TYPE(imageLoad(dec[p], sp + ivec2(0, -1))[0]);
+        top[2] = TYPE(imageLoad(dec[p], sp + ivec2(min(1, sw - off.x - 1), -1))[0]);
     }
 
     TYPE cur = TYPE(0);
     if (off != ivec2(0, 0))
-        cur = TYPE(imageLoad(dst[p], pos + ivec2(-1,  0) + yoff_border1)[0]);
+        cur = TYPE(imageLoad(dec[p], sp + ivec2(-1,  0) + yoff_border1)[0]);
 
     int base = quant_table[quant_table_idx][0][(cur - top[0]) & MAX_QUANT_TABLE_MASK] +
                quant_table[quant_table_idx][1][(top[0] - top[1]) & MAX_QUANT_TABLE_MASK] +
@@ -47,19 +93,20 @@ ivec2 get_pred(ivec2 pos, ivec2 off, int p, int sw, uint8_t quant_table_idx)
         TYPE cur2 = TYPE(0);
         if (off.x > 0 && off != ivec2(1, 0)) {
             const ivec2 yoff_border2 = off.x == 1 ? ivec2(1, -1) : ivec2(0, 0);
-            cur2 = TYPE(imageLoad(dst[p], pos + ivec2(-2,  0) + yoff_border2)[0]);
+            cur2 = TYPE(imageLoad(dec[p], sp + ivec2(-2,  0) + yoff_border2)[0]);
         }
         base += quant_table[quant_table_idx][3][(cur2 - cur) & MAX_QUANT_TABLE_MASK];
 
         TYPE top2 = TYPE(0);
         if (off.y > 1)
-            top2 = TYPE(imageLoad(dst[p], pos + ivec2(0, -2))[0]);
+            top2 = TYPE(imageLoad(dec[p], sp + ivec2(0, -2))[0]);
         base += quant_table[quant_table_idx][4][(top2 - top[1]) & MAX_QUANT_TABLE_MASK];
     }
 
     /* context, prediction */
     return ivec2(base, predict(cur, VTYPE2(top)));
 }
+#endif
 
 #ifndef GOLOMB
 int get_isymbol(inout RangeCoder c, uint64_t state)
@@ -89,11 +136,8 @@ int get_isymbol(inout RangeCoder c, uint64_t state)
     return get_rac(c, state - 11 + min(e, 10)) ? -a : a;
 }
 
-void decode_line_pcm(inout SliceContext sc, int y, int p, int bits)
+void decode_line_pcm(inout SliceContext sc, ivec2 sp, int w, int y, int p, int bits)
 {
-    ivec2 sp = sc.slice_pos;
-    int w = sc.slice_dim.x;
-
 #ifndef RGB
     if (p > 0 && p < 3) {
         w >>= chroma_shift.x;
@@ -106,16 +150,14 @@ void decode_line_pcm(inout SliceContext sc, int y, int p, int bits)
         for (int i = (bits - 1); i >= 0; i--)
             v |= uint(get_rac_equi(sc.c)) << i;
 
-        imageStore(dst[p], sp + ivec2(x, y), uvec4(v));
+        imageStore(dec[p], sp + LADDR(ivec2(x, y)), uvec4(v));
     }
 }
 
-void decode_line(inout SliceContext sc, uint64_t state,
-                 int y, int p, int bits, const int run_index)
+void decode_line(inout SliceContext sc, ivec2 sp, int w,
+                 int y, int p, int bits, uint64_t state,
+                 const int run_index)
 {
-    ivec2 sp = sc.slice_pos;
-    int w = sc.slice_dim.x;
-
 #ifndef RGB
     if (p > 0 && p < 3) {
         w >>= chroma_shift.x;
@@ -124,7 +166,7 @@ void decode_line(inout SliceContext sc, uint64_t state,
 #endif
 
     for (int x = 0; x < w; x++) {
-        ivec2 pr = get_pred(sp + ivec2(x, y), ivec2(x, y), p, w,
+        ivec2 pr = get_pred(sp, ivec2(x, y), p, w,
                             sc.quant_table_idx[p]);
 
         int diff = get_isymbol(sc.c, state + CONTEXT_SIZE*abs(pr[0]));
@@ -132,18 +174,16 @@ void decode_line(inout SliceContext sc, uint64_t state,
             diff = -diff;
 
         uint v = zero_extend(pr[1] + diff, bits);
-        imageStore(dst[p], sp + ivec2(x, y), uvec4(v));
+        imageStore(dec[p], sp + LADDR(ivec2(x, y)), uvec4(v));
     }
 }
 
 #else /* GOLOMB */
 
-void decode_line(inout SliceContext sc, uint64_t state,
-                 int y, int p, int bits, inout int run_index)
+void decode_line(inout SliceContext sc, ivec2 sp, int w,
+                 int y, int p, int bits, uint64_t state,
+                 inout int run_index)
 {
-    ivec2 sp = sc.slice_pos;
-    int w = sc.slice_dim.x;
-
 #ifndef RGB
     if (p > 0 && p < 3) {
         w >>= chroma_shift.x;
@@ -157,7 +197,7 @@ void decode_line(inout SliceContext sc, uint64_t state,
     for (int x = 0; x < w; x++) {
         ivec2 pos = sp + ivec2(x, y);
         int diff;
-        ivec2 pr = get_pred(sp + ivec2(x, y), ivec2(x, y), p, w,
+        ivec2 pr = get_pred(sp, ivec2(x, y), p, w,
                             sc.quant_table_idx[p]);
 
         VlcState sb = VlcState(state + VLC_STATE_SIZE*abs(pr[0]));
@@ -202,7 +242,44 @@ void decode_line(inout SliceContext sc, uint64_t state,
             diff = -diff;
 
         uint v = zero_extend(pr[1] + diff, bits);
-        imageStore(dst[p], sp + ivec2(x, y), uvec4(v));
+        imageStore(dec[p], sp + LADDR(ivec2(x, y)), uvec4(v));
+    }
+}
+#endif
+
+#ifdef RGB
+ivec4 transform_sample(ivec4 pix, ivec2 rct_coef)
+{
+    pix.b -= rct_offset;
+    pix.r -= rct_offset;
+    pix.g -= (pix.b*rct_coef.y + pix.r*rct_coef.x) >> 2;
+    pix.b += pix.g;
+    pix.r += pix.g;
+    return ivec4(pix[fmt_lut[0]], pix[fmt_lut[1]],
+                 pix[fmt_lut[2]], pix[fmt_lut[3]]);
+}
+
+void writeout_rgb(in SliceContext sc, ivec2 sp, int w, int y, bool apply_rct)
+{
+    for (int x = 0; x < w; x++) {
+        ivec2 lpos = sp + LADDR(ivec2(x, y));
+        ivec2 pos = sc.slice_pos + ivec2(x, y);
+
+        ivec4 pix;
+        pix.r = int(imageLoad(dec[2], lpos)[0]);
+        pix.g = int(imageLoad(dec[0], lpos)[0]);
+        pix.b = int(imageLoad(dec[1], lpos)[0]);
+        if (transparency != 0)
+            pix.a = int(imageLoad(dec[3], lpos)[0]);
+
+        if (apply_rct)
+            pix = transform_sample(pix, sc.slice_rct_coef);
+
+        imageStore(dst[0], pos, pix);
+        if (planar_rgb != 0) {
+            for (int i = 1; i < color_planes; i++)
+                imageStore(dst[i], pos, ivec4(pix[i]));
+        }
     }
 }
 #endif
@@ -210,6 +287,8 @@ void decode_line(inout SliceContext sc, uint64_t state,
 void decode_slice(inout SliceContext sc, const uint slice_idx)
 {
     int run_index = 0;
+    int w = sc.slice_dim.x;
+    ivec2 sp = sc.slice_pos;
 
 #ifndef RGB
     int bits = bits_per_raw_sample;
@@ -217,6 +296,8 @@ void decode_slice(inout SliceContext sc, const uint slice_idx)
     int bits = 9;
     if (bits != 8 || sc.slice_coding_mode != 0)
         bits = bits_per_raw_sample + int(sc.slice_coding_mode != 1);
+
+    sp.y = int(gl_WorkGroupID.y)*RGB_LINECACHE;
 #endif
 
     /* PCM coding */
@@ -229,12 +310,14 @@ void decode_slice(inout SliceContext sc, const uint slice_idx)
                 h >>= chroma_shift.y;
 
             for (int y = 0; y < h; y++)
-                decode_line_pcm(sc, y, p, bits);
+                decode_line_pcm(sc, sp, w, y, p, bits);
         }
 #else
         for (int y = 0; y < sc.slice_dim.y; y++) {
             for (int p = 0; p < color_planes; p++)
-                decode_line_pcm(sc, y, p, bits);
+                decode_line_pcm(sc, sp, w, y, p, bits);
+
+            writeout_rgb(sc, sp, w, y, false);
         }
 #endif
     } else
@@ -242,8 +325,9 @@ void decode_slice(inout SliceContext sc, const uint slice_idx)
     /* Arithmetic coding */
 #endif
     {
-        uint64_t slice_state_off = uint64_t(slice_state) +
-                                   slice_idx*plane_state_size*codec_planes;
+        u64vec4 slice_state_off = (uint64_t(slice_state) +
+                                   slice_idx*plane_state_size*codec_planes) +
+                                  plane_state_size*uvec4(0, 1, 1, 2);
 
 #ifndef RGB
         for (int p = 0; p < planes; p++) {
@@ -252,18 +336,16 @@ void decode_slice(inout SliceContext sc, const uint slice_idx)
                 h >>= chroma_shift.y;
 
             for (int y = 0; y < h; y++)
-                decode_line(sc, slice_state_off, y, p, bits, run_index);
-
-            /* For the second chroma plane, reuse the first plane's state */
-            if (p != 1)
-                slice_state_off += plane_state_size;
+                decode_line(sc, sp, w, y, p, bits,
+                            slice_state_off[p], run_index);
         }
 #else
         for (int y = 0; y < sc.slice_dim.y; y++) {
             for (int p = 0; p < color_planes; p++)
-                decode_line(sc,
-                            slice_state_off + plane_state_size*((p + 1) >> 1),
-                            y, p, bits, run_index);
+                decode_line(sc, sp, w, y, p, bits,
+                            slice_state_off[p], run_index);
+
+            writeout_rgb(sc, sp, w, y, true);
         }
 #endif
     }
diff --git a/libavcodec/vulkan/ffv1_dec_rct.comp b/libavcodec/vulkan/ffv1_dec_rct.comp
deleted file mode 100644
index a550a5fcb8..0000000000
--- a/libavcodec/vulkan/ffv1_dec_rct.comp
+++ /dev/null
@@ -1,88 +0,0 @@
-/*
- * FFv1 codec
- *
- * Copyright (c) 2025 Lynne <dev@lynne.ee>
- *
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-void bypass_block(in SliceContext sc)
-{
-    ivec2 start = ivec2(gl_LocalInvocationID) + sc.slice_pos;
-    ivec2 end = sc.slice_pos + sc.slice_dim;
-
-    for (uint y = start.y; y < end.y; y += gl_WorkGroupSize.y) {
-        for (uint x = start.x; x < end.x; x += gl_WorkGroupSize.x) {
-            ivec2 pos = ivec2(x, y);
-            ivec4 pix;
-            for (int i = 0; i < color_planes; i++)
-                pix[i] = int(imageLoad(src[i], pos)[0]);
-
-            imageStore(dst[0], pos, pix);
-            if (planar_rgb != 0) {
-                for (int i = 1; i < color_planes; i++)
-                    imageStore(dst[i], pos, ivec4(pix[i]));
-            }
-        }
-    }
-}
-
-void transform_sample(ivec2 pos, ivec2 rct_coef)
-{
-    ivec4 pix;
-    pix.r = int(imageLoad(src[2], pos)[0]);
-    pix.g = int(imageLoad(src[0], pos)[0]);
-    pix.b = int(imageLoad(src[1], pos)[0]);
-    if (transparency != 0)
-        pix.a = int(imageLoad(src[3], pos)[0]);
-
-    pix.b -= offset;
-    pix.r -= offset;
-    pix.g -= (pix.b*rct_coef.y + pix.r*rct_coef.x) >> 2;
-    pix.b += pix.g;
-    pix.r += pix.g;
-
-    pix = ivec4(pix[fmt_lut[0]], pix[fmt_lut[1]],
-                pix[fmt_lut[2]], pix[fmt_lut[3]]);
-
-    imageStore(dst[0], pos, pix);
-    if (planar_rgb != 0) {
-        for (int i = 1; i < color_planes; i++)
-            imageStore(dst[i], pos, ivec4(pix[i]));
-    }
-}
-
-void transform_block(in SliceContext sc)
-{
-    const ivec2 rct_coef = sc.slice_rct_coef;
-    const ivec2 start = ivec2(gl_LocalInvocationID) + sc.slice_pos;
-    const ivec2 end = sc.slice_pos + sc.slice_dim;
-
-    for (uint y = start.y; y < end.y; y += gl_WorkGroupSize.y)
-        for (uint x = start.x; x < end.x; x += gl_WorkGroupSize.x)
-            transform_sample(ivec2(x, y), rct_coef);
-}
-
-void main()
-{
-    const uint slice_idx = gl_WorkGroupID.y*gl_NumWorkGroups.x + gl_WorkGroupID.x;
-
-    if (slice_ctx[slice_idx].slice_coding_mode == 1)
-        bypass_block(slice_ctx[slice_idx]);
-    else
-        transform_block(slice_ctx[slice_idx]);
-}
diff --git a/libavcodec/vulkan_ffv1.c b/libavcodec/vulkan_ffv1.c
index e511840a01..5584b72385 100644
--- a/libavcodec/vulkan_ffv1.c
+++ b/libavcodec/vulkan_ffv1.c
@@ -33,7 +33,6 @@ extern const char *ff_source_ffv1_common_comp;
 extern const char *ff_source_ffv1_dec_setup_comp;
 extern const char *ff_source_ffv1_reset_comp;
 extern const char *ff_source_ffv1_dec_comp;
-extern const char *ff_source_ffv1_dec_rct_comp;
 
 const FFVulkanDecodeDescriptor ff_vk_dec_ffv1_desc = {
     .codec_id         = AV_CODEC_ID_FFV1,
@@ -66,7 +65,6 @@ typedef struct FFv1VulkanDecodeContext {
     FFVulkanShader setup;
     FFVulkanShader reset[2]; /* AC/Golomb */
     FFVulkanShader decode[2][2][2]; /* 16/32 bit, AC/Golomb, Normal/RGB */
-    FFVulkanShader rct[2]; /* 16/32 bit */
 
     FFVkBuffer rangecoder_static_buf;
     FFVkBuffer quant_buf;
@@ -85,11 +83,13 @@ typedef struct FFv1VkParameters {
     VkDeviceAddress slice_state;
     VkDeviceAddress scratch_data;
 
+    int fmt_lut[4];
     uint32_t img_size[2];
     uint32_t chroma_shift[2];
 
     uint32_t plane_state_size;
     uint32_t crcref;
+    int rct_offset;
 
     uint8_t bits_per_raw_sample;
     uint8_t quant_table_count;
@@ -100,6 +100,7 @@ typedef struct FFv1VkParameters {
     uint8_t codec_planes;
     uint8_t color_planes;
     uint8_t transparency;
+    uint8_t planar_rgb;
     uint8_t colorspace;
     uint8_t ec;
     uint8_t golomb;
@@ -116,11 +117,13 @@ static void add_push_data(FFVulkanShader *shd)
     GLSLC(1,    u8buf slice_state;                                  );
     GLSLC(1,    u8buf scratch_data;                                 );
     GLSLC(0,                                                        );
+    GLSLC(1,    ivec4 fmt_lut;                                      );
     GLSLC(1,    uvec2 img_size;                                     );
     GLSLC(1,    uvec2 chroma_shift;                                 );
     GLSLC(0,                                                        );
     GLSLC(1,    uint plane_state_size;                              );
     GLSLC(1,    uint32_t crcref;                                    );
+    GLSLC(1,    int rct_offset;                                     );
     GLSLC(0,                                                        );
     GLSLC(1,    uint8_t bits_per_raw_sample;                        );
     GLSLC(1,    uint8_t quant_table_count;                          );
@@ -131,6 +134,7 @@ static void add_push_data(FFVulkanShader *shd)
     GLSLC(1,    uint8_t codec_planes;                               );
     GLSLC(1,    uint8_t color_planes;                               );
     GLSLC(1,    uint8_t transparency;                               );
+    GLSLC(1,    uint8_t planar_rgb;                                 );
     GLSLC(1,    uint8_t colorspace;                                 );
     GLSLC(1,    uint8_t ec;                                         );
     GLSLC(1,    uint8_t golomb;                                     );
@@ -349,11 +353,17 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
         return err;
 
     if (is_rgb) {
-        RET(ff_vk_exec_add_dep_frame(&ctx->s, exec, vp->dpb_frame,
-                                     VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
-                                     VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT));
         RET(ff_vk_create_imageviews(&ctx->s, exec, rct_image_views,
                                     vp->dpb_frame, FF_VK_REP_NATIVE));
+        RET(ff_vk_exec_add_dep_frame(&ctx->s, exec, vp->dpb_frame,
+                                     VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
+                                     VK_PIPELINE_STAGE_2_CLEAR_BIT));
+        ff_vk_frame_barrier(&ctx->s, exec, decode_dst, img_bar, &nb_img_bar,
+                            VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
+                            VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
+                            VK_ACCESS_2_TRANSFER_WRITE_BIT,
+                            VK_IMAGE_LAYOUT_GENERAL,
+                            VK_QUEUE_FAMILY_IGNORED);
     }
 
     if (!(f->picture.f->flags & AV_FRAME_FLAG_KEY)) {
@@ -391,6 +401,8 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
 
     vk->CmdPipelineBarrier2(exec->buf, &(VkDependencyInfo) {
         .sType = VK_STRUCTURE_TYPE_DEPENDENCY_INFO,
+        .pImageMemoryBarriers = img_bar,
+        .imageMemoryBarrierCount = nb_img_bar,
         .pBufferMemoryBarriers = buf_bar,
         .bufferMemoryBarrierCount = nb_buf_bar,
     });
@@ -431,6 +443,7 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
 
         .plane_state_size = fp->plane_state_size,
         .crcref = f->crcref,
+        .rct_offset = 1 << bits,
 
         .bits_per_raw_sample = bits,
         .quant_table_count = f->quant_table_count,
@@ -441,11 +454,23 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
         .codec_planes = f->plane_count,
         .color_planes = color_planes,
         .transparency = f->transparency,
+        .planar_rgb = ff_vk_mt_is_np_rgb(sw_format) &&
+                      (ff_vk_count_images((AVVkFrame *)f->picture.f->data[0]) > 1),
         .colorspace = f->colorspace,
         .ec = f->ec,
         .golomb = f->ac == AC_GOLOMB_RICE,
         .check_crc = !!(avctx->err_recognition & AV_EF_CRCCHECK),
     };
+
+    /* For some reason the C FFv1 encoder/decoder treats these differently */
+    if (sw_format == AV_PIX_FMT_GBRP10 || sw_format == AV_PIX_FMT_GBRP12 ||
+        sw_format == AV_PIX_FMT_GBRP14)
+        memcpy(pd.fmt_lut, (int [4]) { 2, 1, 0, 3 }, 4*sizeof(int));
+    else if (sw_format == AV_PIX_FMT_X2BGR10)
+        memcpy(pd.fmt_lut, (int [4]) { 0, 2, 1, 3 }, 4*sizeof(int));
+    else
+        ff_vk_set_perm(sw_format, pd.fmt_lut, 0);
+
     for (int i = 0; i < MAX_QUANT_TABLES; i++)
         pd.context_count[i] = f->context_count[i];
 
@@ -455,6 +480,18 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
 
     vk->CmdDispatch(exec->buf, f->num_h_slices, f->num_v_slices, 1);
 
+    if (is_rgb) {
+        AVVkFrame *vkf = (AVVkFrame *)vp->dpb_frame->data[0];
+        for (int i = 0; i < color_planes; i++)
+            vk->CmdClearColorImage(exec->buf, vkf->img[i], VK_IMAGE_LAYOUT_GENERAL,
+                                   &((VkClearColorValue) { 0 }),
+                                   1, &((VkImageSubresourceRange) {
+                                       .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
+                                       .levelCount = 1,
+                                       .layerCount = 1,
+                                   }));
+    }
+
     /* Reset shader */
     reset_shader = &fv->reset[f->ac == AC_GOLOMB_RICE];
     ff_vk_shader_update_desc_buffer(&ctx->s, exec, reset_shader,
@@ -493,12 +530,15 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
     };
     vk->CmdPipelineBarrier2(exec->buf, &(VkDependencyInfo) {
         .sType = VK_STRUCTURE_TYPE_DEPENDENCY_INFO,
+        .pImageMemoryBarriers = img_bar,
+        .imageMemoryBarrierCount = nb_img_bar,
         .pBufferMemoryBarriers = buf_bar,
         .bufferMemoryBarrierCount = nb_buf_bar,
     });
     slice_state->stage = buf_bar[0].dstStageMask;
     slice_state->access = buf_bar[0].dstAccessMask;
     nb_buf_bar = 0;
+    nb_img_bar = 0;
 
     vk->CmdDispatch(exec->buf, f->num_h_slices, f->num_v_slices,
                     f->plane_count);
@@ -515,6 +555,12 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
                                   1, 1,
                                   VK_IMAGE_LAYOUT_GENERAL,
                                   VK_NULL_HANDLE);
+    if (is_rgb)
+        ff_vk_shader_update_img_array(&ctx->s, exec, decode_shader,
+                                      f->picture.f, vp->view.out,
+                                      1, 2,
+                                      VK_IMAGE_LAYOUT_GENERAL,
+                                      VK_NULL_HANDLE);
 
     ff_vk_exec_bind_shader(&ctx->s, exec, decode_shader);
     ff_vk_shader_update_push_const(&ctx->s, exec, decode_shader,
@@ -537,12 +583,20 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
     };
 
     /* Input frame barrier */
-    ff_vk_frame_barrier(&ctx->s, exec, decode_dst, img_bar, &nb_img_bar,
+    ff_vk_frame_barrier(&ctx->s, exec, f->picture.f, img_bar, &nb_img_bar,
                         VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
                         VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT,
-                        VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_SHADER_WRITE_BIT,
+                        VK_ACCESS_SHADER_WRITE_BIT |
+                        (!is_rgb ? VK_ACCESS_SHADER_READ_BIT : 0),
                         VK_IMAGE_LAYOUT_GENERAL,
                         VK_QUEUE_FAMILY_IGNORED);
+    if (is_rgb)
+        ff_vk_frame_barrier(&ctx->s, exec, vp->dpb_frame, img_bar, &nb_img_bar,
+                            VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
+                            VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT,
+                            VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_SHADER_WRITE_BIT,
+                            VK_IMAGE_LAYOUT_GENERAL,
+                            VK_QUEUE_FAMILY_IGNORED);
 
     vk->CmdPipelineBarrier2(exec->buf, &(VkDependencyInfo) {
         .sType = VK_STRUCTURE_TYPE_DEPENDENCY_INFO,
@@ -558,74 +612,6 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
 
     vk->CmdDispatch(exec->buf, f->num_h_slices, f->num_v_slices, 1);
 
-    /* RCT */
-    if (is_rgb) {
-        FFVulkanShader *rct_shader = &fv->rct[f->use32bit];
-        FFv1VkRCTParameters pd_rct;
-
-        ff_vk_shader_update_desc_buffer(&ctx->s, exec, rct_shader,
-                                        1, 0, 0,
-                                        slice_state,
-                                        0, fp->slice_data_size*f->slice_count,
-                                        VK_FORMAT_UNDEFINED);
-        ff_vk_shader_update_img_array(&ctx->s, exec, rct_shader,
-                                      decode_dst, decode_dst_view,
-                                      1, 1,
-                                      VK_IMAGE_LAYOUT_GENERAL,
-                                      VK_NULL_HANDLE);
-        ff_vk_shader_update_img_array(&ctx->s, exec, rct_shader,
-                                      f->picture.f, vp->view.out,
-                                      1, 2,
-                                      VK_IMAGE_LAYOUT_GENERAL,
-                                      VK_NULL_HANDLE);
-
-        ff_vk_exec_bind_shader(&ctx->s, exec, rct_shader);
-
-        pd_rct = (FFv1VkRCTParameters) {
-            .offset = 1 << bits,
-            .bits = bits,
-            .planar_rgb = ff_vk_mt_is_np_rgb(sw_format) &&
-                          (ff_vk_count_images((AVVkFrame *)f->picture.f->data[0]) > 1),
-            .color_planes = color_planes,
-            .transparency = f->transparency,
-        };
-
-        /* For some reason the C FFv1 encoder/decoder treats these differently */
-        if (sw_format == AV_PIX_FMT_GBRP10 || sw_format == AV_PIX_FMT_GBRP12 ||
-            sw_format == AV_PIX_FMT_GBRP14)
-            memcpy(pd_rct.fmt_lut, (int [4]) { 2, 1, 0, 3 }, 4*sizeof(int));
-        else if (sw_format == AV_PIX_FMT_X2BGR10)
-            memcpy(pd_rct.fmt_lut, (int [4]) { 0, 2, 1, 3 }, 4*sizeof(int));
-        else
-            ff_vk_set_perm(sw_format, pd_rct.fmt_lut, 0);
-
-        ff_vk_shader_update_push_const(&ctx->s, exec, rct_shader,
-                                       VK_SHADER_STAGE_COMPUTE_BIT,
-                                       0, sizeof(pd_rct), &pd_rct);
-
-        ff_vk_frame_barrier(&ctx->s, exec, decode_dst, img_bar, &nb_img_bar,
-                            VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
-                            VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT,
-                            VK_ACCESS_SHADER_READ_BIT,
-                            VK_IMAGE_LAYOUT_GENERAL,
-                            VK_QUEUE_FAMILY_IGNORED);
-        ff_vk_frame_barrier(&ctx->s, exec, f->picture.f, img_bar, &nb_img_bar,
-                            VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
-                            VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT,
-                            VK_ACCESS_SHADER_WRITE_BIT,
-                            VK_IMAGE_LAYOUT_GENERAL,
-                            VK_QUEUE_FAMILY_IGNORED);
-
-        vk->CmdPipelineBarrier2(exec->buf, &(VkDependencyInfo) {
-            .sType = VK_STRUCTURE_TYPE_DEPENDENCY_INFO,
-            .pImageMemoryBarriers = img_bar,
-            .imageMemoryBarrierCount = nb_img_bar,
-        });
-        nb_img_bar = 0;
-
-        vk->CmdDispatch(exec->buf, f->num_h_slices, f->num_v_slices, 1);
-    }
-
     err = ff_vk_exec_submit(&ctx->s, exec);
     if (err < 0)
         return err;
@@ -845,7 +831,9 @@ fail:
 
 static int init_decode_shader(FFV1Context *f, FFVulkanContext *s,
                               FFVkExecPool *pool, FFVkSPIRVCompiler *spv,
-                              FFVulkanShader *shd, AVHWFramesContext *frames_ctx,
+                              FFVulkanShader *shd,
+                              AVHWFramesContext *dec_frames_ctx,
+                              AVHWFramesContext *out_frames_ctx,
                               int use32bit, int ac, int rgb)
 {
     int err;
@@ -910,127 +898,28 @@ static int init_decode_shader(FFV1Context *f, FFVulkanContext *s,
             .buf_elems   = f->max_slice_count,
         },
         {
-            .name       = "dst",
-            .type       = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
-            .dimensions = 2,
-            .mem_layout = ff_vk_shader_rep_fmt(frames_ctx->sw_format,
-                                               FF_VK_REP_NATIVE),
-            .elems      = av_pix_fmt_count_planes(frames_ctx->sw_format),
-            .stages     = VK_SHADER_STAGE_COMPUTE_BIT,
-        },
-    };
-    RET(ff_vk_shader_add_descriptor_set(s, shd, desc_set, 2, 0, 0));
-
-    GLSLD(ff_source_ffv1_dec_comp);
-
-    RET(spv->compile_shader(s, spv, shd, &spv_data, &spv_len, "main",
-                            &spv_opaque));
-    RET(ff_vk_shader_link(s, shd, spv_data, spv_len, "main"));
-
-    RET(ff_vk_shader_register_exec(s, pool, shd));
-
-fail:
-    if (spv_opaque)
-        spv->free_shader(spv, &spv_opaque);
-
-    return err;
-}
-
-static int init_rct_shader(FFV1Context *f, FFVulkanContext *s,
-                           FFVkExecPool *pool, FFVkSPIRVCompiler *spv,
-                           FFVulkanShader *shd, int use32bit,
-                           AVHWFramesContext *src_ctx, AVHWFramesContext *dst_ctx)
-{
-    int err;
-    FFVulkanDescriptorSetBinding *desc_set;
-
-    uint8_t *spv_data;
-    size_t spv_len;
-    void *spv_opaque = NULL;
-    int wg_count = sqrt(s->props.properties.limits.maxComputeWorkGroupInvocations);
-
-    RET(ff_vk_shader_init(s, shd, "ffv1_rct",
-                          VK_SHADER_STAGE_COMPUTE_BIT,
-                          (const char *[]) { "GL_EXT_buffer_reference",
-                                             "GL_EXT_buffer_reference2" }, 2,
-                          wg_count, wg_count, 1,
-                          0));
-
-    /* Common codec header */
-    GLSLD(ff_source_common_comp);
-
-    GLSLC(0, layout(push_constant, scalar) uniform pushConstants {             );
-    GLSLC(1,    ivec4 fmt_lut;                                                 );
-    GLSLC(1,    int offset;                                                    );
-    GLSLC(1,    uint8_t bits;                                                  );
-    GLSLC(1,    uint8_t planar_rgb;                                            );
-    GLSLC(1,    uint8_t color_planes;                                          );
-    GLSLC(1,    uint8_t transparency;                                          );
-    GLSLC(1,    uint8_t version;                                               );
-    GLSLC(1,    uint8_t micro_version;                                         );
-    GLSLC(1,    uint8_t padding[2];                                            );
-    GLSLC(0, };                                                                );
-    ff_vk_shader_add_push_const(shd, 0, sizeof(FFv1VkRCTParameters),
-                                VK_SHADER_STAGE_COMPUTE_BIT);
-
-    av_bprintf(&shd->src, "#define MAX_QUANT_TABLES %i\n", MAX_QUANT_TABLES);
-    av_bprintf(&shd->src, "#define MAX_CONTEXT_INPUTS %i\n", MAX_CONTEXT_INPUTS);
-    av_bprintf(&shd->src, "#define MAX_QUANT_TABLE_SIZE %i\n", MAX_QUANT_TABLE_SIZE);
-
-    desc_set = (FFVulkanDescriptorSetBinding []) {
-        {
-            .name        = "rangecoder_static_buf",
-            .type        = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
-            .stages      = VK_SHADER_STAGE_COMPUTE_BIT,
-            .mem_layout  = "scalar",
-            .buf_content = "uint8_t zero_one_state[512];",
-        },
-        {
-            .name        = "quant_buf",
-            .type        = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
-            .stages      = VK_SHADER_STAGE_COMPUTE_BIT,
-            .mem_layout  = "scalar",
-            .buf_content = "int16_t quant_table[MAX_QUANT_TABLES]"
-                           "[MAX_CONTEXT_INPUTS][MAX_QUANT_TABLE_SIZE];",
-        },
-    };
-    RET(ff_vk_shader_add_descriptor_set(s, shd, desc_set, 2, 1, 0));
-
-    define_shared_code(shd, use32bit);
-
-    desc_set = (FFVulkanDescriptorSetBinding []) {
-        {
-            .name        = "slice_data_buf",
-            .type        = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-            .mem_quali   = "readonly",
-            .stages      = VK_SHADER_STAGE_COMPUTE_BIT,
-            .buf_content = "SliceContext slice_ctx",
-            .buf_elems   = f->max_slice_count,
-        },
-        {
-            .name       = "src",
+            .name       = "dec",
             .type       = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
             .dimensions = 2,
-            .mem_layout = ff_vk_shader_rep_fmt(src_ctx->sw_format,
+            .mem_layout = ff_vk_shader_rep_fmt(dec_frames_ctx->sw_format,
                                                FF_VK_REP_NATIVE),
-            .mem_quali  = "readonly",
-            .elems      = av_pix_fmt_count_planes(src_ctx->sw_format),
+            .elems      = av_pix_fmt_count_planes(dec_frames_ctx->sw_format),
             .stages     = VK_SHADER_STAGE_COMPUTE_BIT,
         },
         {
             .name       = "dst",
             .type       = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
             .dimensions = 2,
-            .mem_layout = ff_vk_shader_rep_fmt(dst_ctx->sw_format,
+            .mem_layout = ff_vk_shader_rep_fmt(out_frames_ctx->sw_format,
                                                FF_VK_REP_NATIVE),
             .mem_quali  = "writeonly",
-            .elems      = av_pix_fmt_count_planes(dst_ctx->sw_format),
+            .elems      = av_pix_fmt_count_planes(out_frames_ctx->sw_format),
             .stages     = VK_SHADER_STAGE_COMPUTE_BIT,
         },
     };
-    RET(ff_vk_shader_add_descriptor_set(s, shd, desc_set, 3, 0, 0));
+    RET(ff_vk_shader_add_descriptor_set(s, shd, desc_set, 2 + rgb, 0, 0));
 
-    GLSLD(ff_source_ffv1_dec_rct_comp);
+    GLSLD(ff_source_ffv1_dec_comp);
 
     RET(spv->compile_shader(s, spv, shd, &spv_data, &spv_len, "main",
                             &spv_opaque));
@@ -1051,6 +940,7 @@ static int init_indirect(AVCodecContext *avctx, FFVulkanContext *s,
     int err;
     AVHWFramesContext *frames_ctx;
     AVVulkanFramesContext *vk_frames;
+    FFV1Context *f = avctx->priv_data;
 
     *dst = av_hwframe_ctx_alloc(s->device_ref);
     if (!(*dst))
@@ -1059,13 +949,14 @@ static int init_indirect(AVCodecContext *avctx, FFVulkanContext *s,
     frames_ctx = (AVHWFramesContext *)((*dst)->data);
     frames_ctx->format    = AV_PIX_FMT_VULKAN;
     frames_ctx->sw_format = sw_format;
-    frames_ctx->width     = FFALIGN(s->frames->width, 32);
-    frames_ctx->height    = FFALIGN(s->frames->height, 32);
+    frames_ctx->width     = s->frames->width;
+    frames_ctx->height    = f->num_v_slices*2;
 
     vk_frames = frames_ctx->hwctx;
     vk_frames->tiling    = VK_IMAGE_TILING_OPTIMAL;
-    vk_frames->usage     = VK_IMAGE_USAGE_STORAGE_BIT;
     vk_frames->img_flags = VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT;
+    vk_frames->usage     = VK_IMAGE_USAGE_STORAGE_BIT |
+                           VK_IMAGE_USAGE_TRANSFER_DST_BIT;
 
     err = av_hwframe_ctx_init(*dst);
     if (err < 0) {
@@ -1095,9 +986,6 @@ static void vk_decode_ffv1_uninit(FFVulkanDecodeShared *ctx)
             for (int k = 0; k < 2; k++) /* Normal/RGB */
                 ff_vk_shader_free(&ctx->s, &fv->decode[i][j][k]);
 
-    for (int i = 0; i < 2; i++) /* 16/32 bit */
-        ff_vk_shader_free(&ctx->s, &fv->rct[i]);
-
     ff_vk_free_buf(&ctx->s, &fv->quant_buf);
     ff_vk_free_buf(&ctx->s, &fv->rangecoder_static_buf);
     ff_vk_free_buf(&ctx->s, &fv->crc_tab_buf);
@@ -1165,12 +1053,13 @@ static int vk_decode_ffv1_init(AVCodecContext *avctx)
     for (int i = 0; i < 2; i++) { /* 16/32 bit */
         for (int j = 0; j < 2; j++) { /* AC/Golomb */
             for (int k = 0; k < 2; k++) { /* Normal/RGB */
-                AVHWFramesContext *frames_ctx;
-                frames_ctx = k ? (AVHWFramesContext *)fv->intermediate_frames_ref[i]->data :
-                                 (AVHWFramesContext *)avctx->hw_frames_ctx->data;
+                AVHWFramesContext *dec_frames_ctx;
+                dec_frames_ctx = k ? (AVHWFramesContext *)fv->intermediate_frames_ref[i]->data :
+                                     (AVHWFramesContext *)avctx->hw_frames_ctx->data;
                 err = init_decode_shader(f, &ctx->s, &ctx->exec_pool,
                                          spv, &fv->decode[i][j][k],
-                                         frames_ctx,
+                                         dec_frames_ctx,
+                                         (AVHWFramesContext *)avctx->hw_frames_ctx->data,
                                          i,
                                          !j ? AC_RANGE_CUSTOM_TAB : AC_GOLOMB_RICE,
                                          k);
@@ -1180,16 +1069,6 @@ static int vk_decode_ffv1_init(AVCodecContext *avctx)
         }
     }
 
-    /* RCT shaders */
-    for (int i = 0; i < 2; i++) { /* 16/32 bit */
-        err = init_rct_shader(f, &ctx->s, &ctx->exec_pool,
-                              spv, &fv->rct[i], i,
-                              (AVHWFramesContext *)fv->intermediate_frames_ref[i]->data,
-                              (AVHWFramesContext *)avctx->hw_frames_ctx->data);
-        if (err < 0)
-            return err;
-    }
-
     /* Range coder data */
     err = ff_ffv1_vk_init_state_transition_data(&ctx->s,
                                                 &fv->rangecoder_static_buf,
diff --git a/libavutil/vulkan_functions.h b/libavutil/vulkan_functions.h
index 85279dd082..8f2bbb38c9 100644
--- a/libavutil/vulkan_functions.h
+++ b/libavutil/vulkan_functions.h
@@ -147,6 +147,7 @@ typedef uint64_t FFVulkanExtensions;
     MACRO(1, 1, FF_VK_EXT_NO_FLAG,              CmdPipelineBarrier)                      \
     MACRO(1, 1, FF_VK_EXT_NO_FLAG,              CmdCopyBufferToImage)                    \
     MACRO(1, 1, FF_VK_EXT_NO_FLAG,              CmdCopyImageToBuffer)                    \
+    MACRO(1, 1, FF_VK_EXT_NO_FLAG,              CmdClearColorImage)                                    \
     MACRO(1, 1, FF_VK_EXT_NO_FLAG,              CmdCopyBuffer)                                         \
                                                                                          \
     /* Buffer */                                                                         \
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 13/18] ffv1/vulkan: redo context count tracking and quant_table_idx management
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
                   ` (10 preceding siblings ...)
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 12/18] vulkan_ffv1: cache only 2 lines when decoding RGB Lynne
@ 2025-04-12  7:22 ` Lynne
  2025-04-13 20:39   ` Jerome Martinez
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 14/18] vulkan_ffv1: externalize extended lookup check Lynne
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

This commit also makes it possible for the encoder to choose a different
quantization table on a per-slice basis, as well as adding this capability
to the decoder.

Also, this commit fully fixes decoding of context=1 encoded files.
---
 libavcodec/ffv1_vulkan.h              |  2 +-
 libavcodec/ffv1enc_vulkan.c           |  6 ++++--
 libavcodec/vulkan/ffv1_common.comp    |  3 +--
 libavcodec/vulkan/ffv1_dec.comp       | 17 +++++++++--------
 libavcodec/vulkan/ffv1_dec_setup.comp |  1 -
 libavcodec/vulkan/ffv1_enc_setup.comp |  3 ++-
 libavcodec/vulkan/ffv1_reset.comp     | 11 ++++++-----
 libavcodec/vulkan_ffv1.c              | 22 ++++++++--------------
 8 files changed, 31 insertions(+), 34 deletions(-)

diff --git a/libavcodec/ffv1_vulkan.h b/libavcodec/ffv1_vulkan.h
index 1e0e6dd228..372478f4b7 100644
--- a/libavcodec/ffv1_vulkan.h
+++ b/libavcodec/ffv1_vulkan.h
@@ -49,9 +49,9 @@ typedef struct FFv1VkRCTParameters {
 } FFv1VkRCTParameters;
 
 typedef struct FFv1VkResetParameters {
+    uint32_t context_count[MAX_QUANT_TABLES];
     VkDeviceAddress slice_state;
     uint32_t plane_state_size;
-    uint32_t context_count;
     uint8_t codec_planes;
     uint8_t key_frame;
     uint8_t version;
diff --git a/libavcodec/ffv1enc_vulkan.c b/libavcodec/ffv1enc_vulkan.c
index 5409927589..688c14fb81 100644
--- a/libavcodec/ffv1enc_vulkan.c
+++ b/libavcodec/ffv1enc_vulkan.c
@@ -542,10 +542,12 @@ static int vulkan_encode_ffv1_submit_frame(AVCodecContext *avctx,
         pd_reset = (FFv1VkResetParameters) {
             .slice_state = slice_data_buf->address + f->slice_count*256,
             .plane_state_size = plane_state_size,
-            .context_count = context_count,
             .codec_planes = f->plane_count,
             .key_frame = f->key_frame,
         };
+        for (int i = 0; i < f->quant_table_count; i++)
+            pd_reset.context_count[i] = f->context_count[i];
+
         ff_vk_shader_update_push_const(&fv->s, exec, &fv->reset,
                                        VK_SHADER_STAGE_COMPUTE_BIT,
                                        0, sizeof(pd_reset), &pd_reset);
@@ -1071,9 +1073,9 @@ static int init_reset_shader(AVCodecContext *avctx, FFVkSPIRVCompiler *spv)
     GLSLD(ff_source_common_comp);
 
     GLSLC(0, layout(push_constant, scalar) uniform pushConstants {             );
+    GLSLF(1,    uint context_count[%i];                                        ,MAX_QUANT_TABLES);
     GLSLC(1,    u8buf slice_state;                                             );
     GLSLC(1,    uint plane_state_size;                                         );
-    GLSLC(1,    uint context_count;                                            );
     GLSLC(1,    uint8_t codec_planes;                                          );
     GLSLC(1,    uint8_t key_frame;                                             );
     GLSLC(1,    uint8_t version;                                               );
diff --git a/libavcodec/vulkan/ffv1_common.comp b/libavcodec/vulkan/ffv1_common.comp
index d2bd7e736e..64c1c2ce80 100644
--- a/libavcodec/vulkan/ffv1_common.comp
+++ b/libavcodec/vulkan/ffv1_common.comp
@@ -32,8 +32,7 @@ struct SliceContext {
     ivec2 slice_dim;
     ivec2 slice_pos;
     ivec2 slice_rct_coef;
-    u8vec4 quant_table_idx;
-    uint context_count;
+    u8vec3 quant_table_idx;
 
     uint hdr_len; // only used for golomb
 
diff --git a/libavcodec/vulkan/ffv1_dec.comp b/libavcodec/vulkan/ffv1_dec.comp
index ae0324cb26..a6272d4832 100644
--- a/libavcodec/vulkan/ffv1_dec.comp
+++ b/libavcodec/vulkan/ffv1_dec.comp
@@ -51,8 +51,8 @@ ivec2 get_pred(ivec2 sp, ivec2 off, int p, int sw, uint8_t quant_table_idx)
         (quant_table[quant_table_idx][4][127] != 0)) {
         TYPE cur2 = TYPE(0);
         if (off.x > 0) {
-            const ivec2 yoff_border2 = off.x == 1 ? ivec2(1, -1) : ivec2(0, 0);
-            cur2 = TYPE(imageLoad(dec[p], sp + LADDR(off + ivec2(-2,  0) + yoff_border2))[0]);
+            const ivec2 yoff_border2 = off.x == 1 ? ivec2(-1, -1) : ivec2(-2, 0);
+            cur2 = TYPE(imageLoad(dec[p], sp + LADDR(off + yoff_border2))[0]);
         }
         base += quant_table[quant_table_idx][3][(cur2 - cur) & MAX_QUANT_TABLE_MASK];
 
@@ -156,7 +156,7 @@ void decode_line_pcm(inout SliceContext sc, ivec2 sp, int w, int y, int p, int b
 
 void decode_line(inout SliceContext sc, ivec2 sp, int w,
                  int y, int p, int bits, uint64_t state,
-                 const int run_index)
+                 uint8_t quant_table_idx, const int run_index)
 {
 #ifndef RGB
     if (p > 0 && p < 3) {
@@ -167,7 +167,7 @@ void decode_line(inout SliceContext sc, ivec2 sp, int w,
 
     for (int x = 0; x < w; x++) {
         ivec2 pr = get_pred(sp, ivec2(x, y), p, w,
-                            sc.quant_table_idx[p]);
+                            quant_table_idx);
 
         int diff = get_isymbol(sc.c, state + CONTEXT_SIZE*abs(pr[0]));
         if (pr[0] < 0)
@@ -182,7 +182,7 @@ void decode_line(inout SliceContext sc, ivec2 sp, int w,
 
 void decode_line(inout SliceContext sc, ivec2 sp, int w,
                  int y, int p, int bits, uint64_t state,
-                 inout int run_index)
+                 uint8_t quant_table_idx, inout int run_index)
 {
 #ifndef RGB
     if (p > 0 && p < 3) {
@@ -198,7 +198,7 @@ void decode_line(inout SliceContext sc, ivec2 sp, int w,
         ivec2 pos = sp + ivec2(x, y);
         int diff;
         ivec2 pr = get_pred(sp, ivec2(x, y), p, w,
-                            sc.quant_table_idx[p]);
+                            quant_table_idx);
 
         VlcState sb = VlcState(state + VLC_STATE_SIZE*abs(pr[0]));
 
@@ -325,6 +325,7 @@ void decode_slice(inout SliceContext sc, const uint slice_idx)
     /* Arithmetic coding */
 #endif
     {
+        u8vec4 quant_table_idx = sc.quant_table_idx.xyyz;
         u64vec4 slice_state_off = (uint64_t(slice_state) +
                                    slice_idx*plane_state_size*codec_planes) +
                                   plane_state_size*uvec4(0, 1, 1, 2);
@@ -337,13 +338,13 @@ void decode_slice(inout SliceContext sc, const uint slice_idx)
 
             for (int y = 0; y < h; y++)
                 decode_line(sc, sp, w, y, p, bits,
-                            slice_state_off[p], run_index);
+                            slice_state_off[p], quant_table_idx[p], run_index);
         }
 #else
         for (int y = 0; y < sc.slice_dim.y; y++) {
             for (int p = 0; p < color_planes; p++)
                 decode_line(sc, sp, w, y, p, bits,
-                            slice_state_off[p], run_index);
+                            slice_state_off[p], quant_table_idx[p], run_index);
 
             writeout_rgb(sc, sp, w, y, true);
         }
diff --git a/libavcodec/vulkan/ffv1_dec_setup.comp b/libavcodec/vulkan/ffv1_dec_setup.comp
index a10163a8d6..5da63be56d 100644
--- a/libavcodec/vulkan/ffv1_dec_setup.comp
+++ b/libavcodec/vulkan/ffv1_dec_setup.comp
@@ -76,7 +76,6 @@ bool decode_slice_header(inout SliceContext sc, uint64_t state)
         if (idx >= quant_table_count)
             return true;
         sc.quant_table_idx[i] = uint8_t(idx);
-        sc.context_count = context_count[idx];
     }
 
     get_usymbol(sc.c, state);
diff --git a/libavcodec/vulkan/ffv1_enc_setup.comp b/libavcodec/vulkan/ffv1_enc_setup.comp
index 23f09b2af6..44c13404d8 100644
--- a/libavcodec/vulkan/ffv1_enc_setup.comp
+++ b/libavcodec/vulkan/ffv1_enc_setup.comp
@@ -38,6 +38,7 @@ void init_slice(out SliceContext sc, const uint slice_idx)
     sc.slice_rct_coef = ivec2(1, 1);
     sc.slice_coding_mode = int(force_pcm == 1);
     sc.slice_reset_contexts = sc.slice_coding_mode == 1;
+    sc.quant_table_idx = u8vec3(context_model);
 
     rac_init(sc.c,
              OFFBUF(u8buf, out_data, slice_idx * slice_size_max),
@@ -84,7 +85,7 @@ void write_slice_header(inout SliceContext sc, uint64_t state)
     put_symbol_unsigned(sc.c, state, 0);
 
     for (int i = 0; i < codec_planes; i++)
-        put_symbol_unsigned(sc.c, state, context_model);
+        put_symbol_unsigned(sc.c, state, sc.quant_table_idx[i]);
 
     put_symbol_unsigned(sc.c, state, pic_mode);
     put_symbol_unsigned(sc.c, state, sar.x);
diff --git a/libavcodec/vulkan/ffv1_reset.comp b/libavcodec/vulkan/ffv1_reset.comp
index 1b87ca754e..cfb7dcc444 100644
--- a/libavcodec/vulkan/ffv1_reset.comp
+++ b/libavcodec/vulkan/ffv1_reset.comp
@@ -28,14 +28,15 @@ void main(void)
         slice_ctx[slice_idx].slice_reset_contexts == false)
         return;
 
+    const uint8_t qidx = slice_ctx[slice_idx].quant_table_idx[gl_WorkGroupID.z];
+    uint contexts = context_count[qidx];
     uint64_t slice_state_off = uint64_t(slice_state) +
                                slice_idx*plane_state_size*codec_planes;
 
 #ifdef GOLOMB
     uint64_t start = slice_state_off +
-                     (gl_WorkGroupID.z*context_count +
-                      gl_LocalInvocationID.x)*VLC_STATE_SIZE;
-    for (uint x = gl_LocalInvocationID.x; x < context_count; x += gl_WorkGroupSize.x) {
+                     (gl_WorkGroupID.z*(plane_state_size/VLC_STATE_SIZE) + gl_LocalInvocationID.x)*VLC_STATE_SIZE;
+    for (uint x = gl_LocalInvocationID.x; x < contexts; x += gl_WorkGroupSize.x) {
         VlcState sb = VlcState(start);
         sb.drift     =  int16_t(0);
         sb.error_sum = uint16_t(4);
@@ -45,9 +46,9 @@ void main(void)
     }
 #else
     uint64_t start = slice_state_off +
-                     (gl_WorkGroupID.z*context_count)*CONTEXT_SIZE +
+                     gl_WorkGroupID.z*plane_state_size +
                      (gl_LocalInvocationID.x << 2 /* dwords */); /* Bytes */
-    uint count_total = context_count*(CONTEXT_SIZE /* bytes */ >> 2 /* dwords */);
+    uint count_total = contexts*(CONTEXT_SIZE /* bytes */ >> 2 /* dwords */);
     for (uint x = gl_LocalInvocationID.x; x < count_total; x += gl_WorkGroupSize.x) {
         u32buf(start).v = 0x80808080;
         start += gl_WorkGroupSize.x*(CONTEXT_SIZE >> 3 /* 1/8th of context */);
diff --git a/libavcodec/vulkan_ffv1.c b/libavcodec/vulkan_ffv1.c
index 5584b72385..aaebcd53b5 100644
--- a/libavcodec/vulkan_ffv1.c
+++ b/libavcodec/vulkan_ffv1.c
@@ -49,7 +49,6 @@ typedef struct FFv1VulkanDecodePicture {
     uint32_t plane_state_size;
     uint32_t slice_state_size;
     uint32_t slice_data_size;
-    uint32_t max_context_count;
 
     AVBufferRef *slice_offset_buf;
     uint32_t    *slice_offset;
@@ -77,8 +76,6 @@ typedef struct FFv1VulkanDecodeContext {
 } FFv1VulkanDecodeContext;
 
 typedef struct FFv1VkParameters {
-    uint32_t context_count[MAX_QUANT_TABLES];
-
     VkDeviceAddress slice_data;
     VkDeviceAddress slice_state;
     VkDeviceAddress scratch_data;
@@ -111,8 +108,6 @@ typedef struct FFv1VkParameters {
 static void add_push_data(FFVulkanShader *shd)
 {
     GLSLC(0, layout(push_constant, scalar) uniform pushConstants {  );
-    GLSLF(1,    uint context_count[%i];                             ,MAX_QUANT_TABLES);
-    GLSLC(0,                                                        );
     GLSLC(1,    u8buf slice_data;                                   );
     GLSLC(1,    u8buf slice_state;                                  );
     GLSLC(1,    u8buf scratch_data;                                 );
@@ -162,13 +157,15 @@ static int vk_ffv1_start_frame(AVCodecContext          *avctx,
     AVHWFramesContext *hwfc = (AVHWFramesContext *)avctx->hw_frames_ctx->data;
     enum AVPixelFormat sw_format = hwfc->sw_format;
 
+    int max_contexts;
     int is_rgb = !(f->colorspace == 0 && sw_format != AV_PIX_FMT_YA8) &&
                  !(sw_format == AV_PIX_FMT_YA8);
 
     fp->slice_num = 0;
 
+    max_contexts = 0;
     for (int i = 0; i < f->quant_table_count; i++)
-        fp->max_context_count = FFMAX(f->context_count[i], fp->max_context_count);
+        max_contexts = FFMAX(f->context_count[i], max_contexts);
 
     /* Allocate slice buffer data */
     if (f->ac == AC_GOLOMB_RICE)
@@ -176,7 +173,7 @@ static int vk_ffv1_start_frame(AVCodecContext          *avctx,
     else
         fp->plane_state_size = CONTEXT_SIZE;
 
-    fp->plane_state_size *= fp->max_context_count;
+    fp->plane_state_size *= max_contexts;
     fp->slice_state_size = fp->plane_state_size*f->plane_count;
 
     fp->slice_data_size = 256; /* Overestimation for the SliceContext struct */
@@ -430,8 +427,6 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
 
     ff_vk_exec_bind_shader(&ctx->s, exec, &fv->setup);
     pd = (FFv1VkParameters) {
-        /* context_count */
-
         .slice_data = slices_buf->address,
         .slice_state = slice_state->address + f->slice_count*fp->slice_data_size,
         .scratch_data = tmp_data->address,
@@ -471,9 +466,6 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
     else
         ff_vk_set_perm(sw_format, pd.fmt_lut, 0);
 
-    for (int i = 0; i < MAX_QUANT_TABLES; i++)
-        pd.context_count[i] = f->context_count[i];
-
     ff_vk_shader_update_push_const(&ctx->s, exec, &fv->setup,
                                    VK_SHADER_STAGE_COMPUTE_BIT,
                                    0, sizeof(pd), &pd);
@@ -505,12 +497,14 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
     pd_reset = (FFv1VkResetParameters) {
         .slice_state = slice_state->address + f->slice_count*fp->slice_data_size,
         .plane_state_size = fp->plane_state_size,
-        .context_count = fp->max_context_count,
         .codec_planes = f->plane_count,
         .key_frame = f->picture.f->flags & AV_FRAME_FLAG_KEY,
         .version = f->version,
         .micro_version = f->micro_version,
     };
+    for (int i = 0; i < f->quant_table_count; i++)
+        pd_reset.context_count[i] = f->context_count[i];
+
     ff_vk_shader_update_push_const(&ctx->s, exec, reset_shader,
                                    VK_SHADER_STAGE_COMPUTE_BIT,
                                    0, sizeof(pd_reset), &pd_reset);
@@ -763,9 +757,9 @@ static int init_reset_shader(FFV1Context *f, FFVulkanContext *s,
     GLSLD(ff_source_common_comp);
 
     GLSLC(0, layout(push_constant, scalar) uniform pushConstants {             );
+    GLSLF(1,    uint context_count[%i];                                        ,MAX_QUANT_TABLES);
     GLSLC(1,    u8buf slice_state;                                             );
     GLSLC(1,    uint plane_state_size;                                         );
-    GLSLC(1,    uint context_count;                                            );
     GLSLC(1,    uint8_t codec_planes;                                          );
     GLSLC(1,    uint8_t key_frame;                                             );
     GLSLC(1,    uint8_t version;                                               );
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 14/18] vulkan_ffv1: externalize extended lookup check
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
                   ` (11 preceding siblings ...)
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 13/18] ffv1/vulkan: redo context count tracking and quant_table_idx management Lynne
@ 2025-04-12  7:22 ` Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 15/18] vulkan_ffv1: remove need for scratch data during setup Lynne
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

8% speedup on nvidia on 4k.
---
 libavcodec/vulkan/ffv1_dec.comp | 3 +--
 libavcodec/vulkan_ffv1.c        | 6 ++++++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/libavcodec/vulkan/ffv1_dec.comp b/libavcodec/vulkan/ffv1_dec.comp
index a6272d4832..4cc3b9987f 100644
--- a/libavcodec/vulkan/ffv1_dec.comp
+++ b/libavcodec/vulkan/ffv1_dec.comp
@@ -47,8 +47,7 @@ ivec2 get_pred(ivec2 sp, ivec2 off, int p, int sw, uint8_t quant_table_idx)
                quant_table[quant_table_idx][1][(top[0] - top[1]) & MAX_QUANT_TABLE_MASK] +
                quant_table[quant_table_idx][2][(top[1] - top[2]) & MAX_QUANT_TABLE_MASK];
 
-    if ((quant_table[quant_table_idx][3][127] != 0) ||
-        (quant_table[quant_table_idx][4][127] != 0)) {
+    if (extend_lookup[quant_table_idx] > 0) {
         TYPE cur2 = TYPE(0);
         if (off.x > 0) {
             const ivec2 yoff_border2 = off.x == 1 ? ivec2(-1, -1) : ivec2(-2, 0);
diff --git a/libavcodec/vulkan_ffv1.c b/libavcodec/vulkan_ffv1.c
index aaebcd53b5..72cacb1678 100644
--- a/libavcodec/vulkan_ffv1.c
+++ b/libavcodec/vulkan_ffv1.c
@@ -88,6 +88,7 @@ typedef struct FFv1VkParameters {
     uint32_t crcref;
     int rct_offset;
 
+    uint8_t extend_lookup[8];
     uint8_t bits_per_raw_sample;
     uint8_t quant_table_count;
     uint8_t version;
@@ -120,6 +121,7 @@ static void add_push_data(FFVulkanShader *shd)
     GLSLC(1,    uint32_t crcref;                                    );
     GLSLC(1,    int rct_offset;                                     );
     GLSLC(0,                                                        );
+    GLSLC(1,    uint8_t extend_lookup[8];                           );
     GLSLC(1,    uint8_t bits_per_raw_sample;                        );
     GLSLC(1,    uint8_t quant_table_count;                          );
     GLSLC(1,    uint8_t version;                                    );
@@ -456,6 +458,10 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
         .golomb = f->ac == AC_GOLOMB_RICE,
         .check_crc = !!(avctx->err_recognition & AV_EF_CRCCHECK),
     };
+    for (int i = 0; i < f->quant_table_count; i++)
+        pd.extend_lookup[i] = (f->quant_tables[i][3][127] != 0) ||
+                              (f->quant_tables[i][4][127] != 0);
+
 
     /* For some reason the C FFv1 encoder/decoder treats these differently */
     if (sw_format == AV_PIX_FMT_GBRP10 || sw_format == AV_PIX_FMT_GBRP12 ||
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 15/18] vulkan_ffv1: remove need for scratch data during setup
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
                   ` (12 preceding siblings ...)
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 14/18] vulkan_ffv1: externalize extended lookup check Lynne
@ 2025-04-12  7:22 ` Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 16/18] vulkan_ffv1: shortcut +-1 coeffs in symbol reading Lynne
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

This saves on some VRAM, but mainly allows for a more unified path.
---
 libavcodec/vulkan/ffv1_dec_setup.comp | 55 ++++++++++++++-------------
 libavcodec/vulkan/rangecoder.comp     | 17 +++++++++
 libavcodec/vulkan_ffv1.c              | 23 +----------
 3 files changed, 46 insertions(+), 49 deletions(-)

diff --git a/libavcodec/vulkan/ffv1_dec_setup.comp b/libavcodec/vulkan/ffv1_dec_setup.comp
index 5da63be56d..a27a878927 100644
--- a/libavcodec/vulkan/ffv1_dec_setup.comp
+++ b/libavcodec/vulkan/ffv1_dec_setup.comp
@@ -20,13 +20,15 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
-uint get_usymbol(inout RangeCoder c, uint64_t state)
+uint8_t setup_state[CONTEXT_SIZE];
+
+uint get_usymbol(inout RangeCoder c)
 {
-    if (get_rac(c, state + 0))
+    if (get_rac_direct(c, setup_state[0]))
         return 0;
 
     int e = 0;
-    while (get_rac(c, state + 1 + min(e, 9))) { // 1..10
+    while (get_rac_direct(c, setup_state[1 + min(e, 9)])) { // 1..10
         e++;
         if (e > 31) {
             corrupt = true;
@@ -35,24 +37,24 @@ uint get_usymbol(inout RangeCoder c, uint64_t state)
     }
 
     uint a = 1;
-    for (int i = e - 1; i >= 0; i--)
-        a += a + uint(get_rac(c, state + 22 + min(i, 9)));  // 22..31
+    for (int i = e - 1; i >= 0; i--) {
+        a <<= 1;
+        a |= uint(get_rac_direct(c, setup_state[22 + min(i, 9)]));  // 22..31
+    }
 
     return a;
 }
 
-bool decode_slice_header(inout SliceContext sc, uint64_t state)
+bool decode_slice_header(inout SliceContext sc)
 {
-    u8buf sb = u8buf(state);
-
     [[unroll]]
     for (int i = 0; i < CONTEXT_SIZE; i++)
-        sb[i].v = uint8_t(128);
+        setup_state[i] = uint8_t(128);
 
-    uint sx = get_usymbol(sc.c, state);
-    uint sy = get_usymbol(sc.c, state);
-    uint sw = get_usymbol(sc.c, state) + 1;
-    uint sh = get_usymbol(sc.c, state) + 1;
+    uint sx = get_usymbol(sc.c);
+    uint sy = get_usymbol(sc.c);
+    uint sw = get_usymbol(sc.c) + 1;
+    uint sh = get_usymbol(sc.c) + 1;
 
     if (sx < 0 || sy < 0 || sw <= 0 || sh <= 0 ||
         sx > (gl_NumWorkGroups.x - sw) || sy > (gl_NumWorkGroups.y - sh) ||
@@ -72,22 +74,22 @@ bool decode_slice_header(inout SliceContext sc, uint64_t state)
     sc.slice_coding_mode = int(0);
 
     for (uint i = 0; i < codec_planes; i++) {
-        uint idx = get_usymbol(sc.c, state);
+        uint idx = get_usymbol(sc.c);
         if (idx >= quant_table_count)
             return true;
         sc.quant_table_idx[i] = uint8_t(idx);
     }
 
-    get_usymbol(sc.c, state);
-    get_usymbol(sc.c, state);
-    get_usymbol(sc.c, state);
+    get_usymbol(sc.c);
+    get_usymbol(sc.c);
+    get_usymbol(sc.c);
 
     if (version >= 4) {
-        sc.slice_reset_contexts = get_rac(sc.c, state);
-        sc.slice_coding_mode = get_usymbol(sc.c, state);
+        sc.slice_reset_contexts = get_rac_direct(sc.c, setup_state[0]);
+        sc.slice_coding_mode = get_usymbol(sc.c);
         if (sc.slice_coding_mode != 1 && colorspace == 1) {
-            sc.slice_rct_coef.x = int(get_usymbol(sc.c, state));
-            sc.slice_rct_coef.y = int(get_usymbol(sc.c, state));
+            sc.slice_rct_coef.x = int(get_usymbol(sc.c));
+            sc.slice_rct_coef.y = int(get_usymbol(sc.c));
             if (sc.slice_rct_coef.x + sc.slice_rct_coef.y > 4)
                 return true;
         }
@@ -96,11 +98,11 @@ bool decode_slice_header(inout SliceContext sc, uint64_t state)
     return false;
 }
 
-void golomb_init(inout SliceContext sc, uint64_t state)
+void golomb_init(inout SliceContext sc)
 {
     if (version == 3 && micro_version > 1 || version > 3) {
-        u8buf(state).v = uint8_t(129);
-        get_rac(sc.c, state);
+        setup_state[0] = uint8_t(129);
+        get_rac_direct(sc.c, setup_state[0]);
     }
 
     uint64_t ac_byte_count = sc.c.bytestream - sc.c.bytestream_start - 1;
@@ -111,7 +113,6 @@ void golomb_init(inout SliceContext sc, uint64_t state)
 void main(void)
 {
     const uint slice_idx = gl_WorkGroupID.y*gl_NumWorkGroups.x + gl_WorkGroupID.x;
-    uint64_t scratch_state = uint64_t(scratch_data) + slice_idx*CONTEXT_SIZE;
 
     u8buf bs = u8buf(slice_data + slice_offsets[2*slice_idx + 0]);
     uint32_t slice_size = slice_offsets[2*slice_idx + 1];
@@ -122,10 +123,10 @@ void main(void)
     if (slice_idx == (gl_NumWorkGroups.x*gl_NumWorkGroups.y - 1))
         get_rac_equi(slice_ctx[slice_idx].c);
 
-    decode_slice_header(slice_ctx[slice_idx], scratch_state);
+    decode_slice_header(slice_ctx[slice_idx]);
 
     if (golomb == 1)
-        golomb_init(slice_ctx[slice_idx], scratch_state);
+        golomb_init(slice_ctx[slice_idx]);
 
     if (ec != 0 && check_crc != 0) {
         uint32_t crc = crcref;
diff --git a/libavcodec/vulkan/rangecoder.comp b/libavcodec/vulkan/rangecoder.comp
index e332bce8a5..ff0432511d 100644
--- a/libavcodec/vulkan/rangecoder.comp
+++ b/libavcodec/vulkan/rangecoder.comp
@@ -245,6 +245,23 @@ bool get_rac(inout RangeCoder c, uint64_t state)
     return bit;
 }
 
+bool get_rac_direct(inout RangeCoder c, inout uint8_t state)
+{
+    int range1 = -int(c.range * state >> 8);
+    int ranged = c.range + range1;
+
+    bool bit = c.low >= ranged;
+    state = zero_one_state[state + (bit ? 256 : 0)];
+
+    c.low = c.low - (bit ? ranged : 0);
+    c.range = (bit ? 0 : ranged) - (bit ? range1 : 0);
+
+    if (c.range < 0x100)
+        refill(c);
+
+    return bit;
+}
+
 bool get_rac_equi(inout RangeCoder c)
 {
     int range1 = c.range >> 1;
diff --git a/libavcodec/vulkan_ffv1.c b/libavcodec/vulkan_ffv1.c
index 72cacb1678..c1875711bc 100644
--- a/libavcodec/vulkan_ffv1.c
+++ b/libavcodec/vulkan_ffv1.c
@@ -43,8 +43,6 @@ const FFVulkanDecodeDescriptor ff_vk_dec_ffv1_desc = {
 typedef struct FFv1VulkanDecodePicture {
     FFVulkanDecodePicture vp;
 
-    AVBufferRef *tmp_data;
-
     AVBufferRef *slice_state;
     uint32_t plane_state_size;
     uint32_t slice_state_size;
@@ -70,7 +68,6 @@ typedef struct FFv1VulkanDecodeContext {
     FFVkBuffer crc_tab_buf;
 
     AVBufferPool *slice_state_pool;
-    AVBufferPool *tmp_data_pool;
     AVBufferPool *slice_offset_pool;
     AVBufferPool *slice_status_pool;
 } FFv1VulkanDecodeContext;
@@ -78,7 +75,6 @@ typedef struct FFv1VulkanDecodeContext {
 typedef struct FFv1VkParameters {
     VkDeviceAddress slice_data;
     VkDeviceAddress slice_state;
-    VkDeviceAddress scratch_data;
 
     int fmt_lut[4];
     uint32_t img_size[2];
@@ -111,7 +107,6 @@ static void add_push_data(FFVulkanShader *shd)
     GLSLC(0, layout(push_constant, scalar) uniform pushConstants {  );
     GLSLC(1,    u8buf slice_data;                                   );
     GLSLC(1,    u8buf slice_state;                                  );
-    GLSLC(1,    u8buf scratch_data;                                 );
     GLSLC(0,                                                        );
     GLSLC(1,    ivec4 fmt_lut;                                      );
     GLSLC(1,    uvec2 img_size;                                     );
@@ -208,16 +203,6 @@ static int vk_ffv1_start_frame(AVCodecContext          *avctx,
             return AVERROR(ENOMEM);
     }
 
-    /* Allocate temporary data buffer */
-    err = ff_vk_get_pooled_buffer(&ctx->s, &fv->tmp_data_pool,
-                                  &fp->tmp_data,
-                                  VK_BUFFER_USAGE_STORAGE_BUFFER_BIT |
-                                  VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT,
-                                  NULL, f->slice_count*CONTEXT_SIZE,
-                                  VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT);
-    if (err < 0)
-        return err;
-
     /* Allocate slice offsets buffer */
     err = ff_vk_get_pooled_buffer(&ctx->s, &fv->slice_offset_pool,
                                   &fp->slice_offset_buf,
@@ -327,7 +312,6 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
     FFVkBuffer *slice_offset = (FFVkBuffer *)fp->slice_offset_buf->data;
     FFVkBuffer *slice_status = (FFVkBuffer *)fp->slice_status_buf->data;
 
-    FFVkBuffer *tmp_data = (FFVkBuffer *)fp->tmp_data->data;
     VkImageView rct_image_views[AV_NUM_DATA_POINTERS];
 
     AVFrame *decode_dst = is_rgb ? vp->dpb_frame : f->picture.f;
@@ -380,8 +364,6 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
     vp->slices_buf = NULL;
     RET(ff_vk_exec_add_dep_buf(&ctx->s, exec, &fp->slice_offset_buf, 1, 0));
     fp->slice_offset_buf = NULL;
-    RET(ff_vk_exec_add_dep_buf(&ctx->s, exec, &fp->tmp_data, 1, 0));
-    fp->tmp_data = NULL;
 
     /* Entry barrier for the slice state */
     buf_bar[nb_buf_bar++] = (VkBufferMemoryBarrier2) {
@@ -430,8 +412,7 @@ static int vk_ffv1_end_frame(AVCodecContext *avctx)
     ff_vk_exec_bind_shader(&ctx->s, exec, &fv->setup);
     pd = (FFv1VkParameters) {
         .slice_data = slices_buf->address,
-        .slice_state = slice_state->address + f->slice_count*fp->slice_data_size,
-        .scratch_data = tmp_data->address,
+        .slice_state  = slice_state->address + f->slice_count*fp->slice_data_size,
 
         .img_size[0] = f->picture.f->width,
         .img_size[1] = f->picture.f->height,
@@ -990,7 +971,6 @@ static void vk_decode_ffv1_uninit(FFVulkanDecodeShared *ctx)
     ff_vk_free_buf(&ctx->s, &fv->rangecoder_static_buf);
     ff_vk_free_buf(&ctx->s, &fv->crc_tab_buf);
 
-    av_buffer_pool_uninit(&fv->tmp_data_pool);
     av_buffer_pool_uninit(&fv->slice_state_pool);
     av_buffer_pool_uninit(&fv->slice_offset_pool);
     av_buffer_pool_uninit(&fv->slice_status_pool);
@@ -1148,7 +1128,6 @@ static void vk_ffv1_free_frame_priv(AVRefStructOpaque _hwctx, void *data)
     av_buffer_unref(&fp->slice_state);
     av_buffer_unref(&fp->slice_offset_buf);
     av_buffer_unref(&fp->slice_status_buf);
-    av_buffer_unref(&fp->tmp_data);
 }
 
 const FFHWAccel ff_ffv1_vulkan_hwaccel = {
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 16/18] vulkan_ffv1: shortcut +-1 coeffs in symbol reading
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
                   ` (13 preceding siblings ...)
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 15/18] vulkan_ffv1: remove need for scratch data during setup Lynne
@ 2025-04-12  7:22 ` Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 17/18] vulkan: add support for expect/assume Lynne
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

Slightly faster, and allows for further optimizations.
---
 libavcodec/vulkan/ffv1_dec.comp | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/libavcodec/vulkan/ffv1_dec.comp b/libavcodec/vulkan/ffv1_dec.comp
index 4cc3b9987f..fd9b98023c 100644
--- a/libavcodec/vulkan/ffv1_dec.comp
+++ b/libavcodec/vulkan/ffv1_dec.comp
@@ -119,7 +119,10 @@ int get_isymbol(inout RangeCoder c, uint64_t state)
     for (e = 0; e < 32; e++)
         if (!get_rac(c, state + min(e, 9)))
             break;
-    if (e > 31) {
+
+    if (e == 0) {
+        return get_rac(c, state + 10) ? -1 : 1;
+    } else if (e > 31) {
         corrupt = true;
         return 0;
     }
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 17/18] vulkan: add support for expect/assume
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
                   ` (14 preceding siblings ...)
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 16/18] vulkan_ffv1: shortcut +-1 coeffs in symbol reading Lynne
@ 2025-04-12  7:22 ` Lynne
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 18/18] vulkan_ffv1: add cached symbol reader for AMD Lynne
  2025-04-13 13:38 ` [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Jerome Martinez
  17 siblings, 0 replies; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

This commit adds support for compiler hints.
While on AMD these are not used/needed, Nvidia benefits from them, and gives
a sizeable 10% speedup on 4k.
---
 libavcodec/vulkan/ffv1_dec.comp   | 16 ++++++++--------
 libavcodec/vulkan/rangecoder.comp | 12 ++++++------
 libavutil/hwcontext_vulkan.c      |  7 +++++++
 libavutil/vulkan.c                |  6 ++++++
 libavutil/vulkan_functions.h      |  1 +
 libavutil/vulkan_loader.h         |  1 +
 6 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/libavcodec/vulkan/ffv1_dec.comp b/libavcodec/vulkan/ffv1_dec.comp
index fd9b98023c..9eba322b27 100644
--- a/libavcodec/vulkan/ffv1_dec.comp
+++ b/libavcodec/vulkan/ffv1_dec.comp
@@ -31,7 +31,7 @@
 #ifdef RGB
 ivec2 get_pred(ivec2 sp, ivec2 off, int p, int sw, uint8_t quant_table_idx)
 {
-    const ivec2 yoff_border1 = off.x == 0 ? ivec2(1, -1) : ivec2(0, 0);
+    const ivec2 yoff_border1 = expectEXT(off.x == 0, false) ? ivec2(1, -1) : ivec2(0, 0);
 
     /* Thanks to the same coincidence as below, we can skip checking if off == 0, 1 */
     VTYPE3 top  = VTYPE3(TYPE(imageLoad(dec[p], sp + LADDR(off + ivec2(-1, -1) + yoff_border1))[0]),
@@ -47,10 +47,10 @@ ivec2 get_pred(ivec2 sp, ivec2 off, int p, int sw, uint8_t quant_table_idx)
                quant_table[quant_table_idx][1][(top[0] - top[1]) & MAX_QUANT_TABLE_MASK] +
                quant_table[quant_table_idx][2][(top[1] - top[2]) & MAX_QUANT_TABLE_MASK];
 
-    if (extend_lookup[quant_table_idx] > 0) {
+    if (expectEXT(extend_lookup[quant_table_idx] > 0, false)) {
         TYPE cur2 = TYPE(0);
-        if (off.x > 0) {
-            const ivec2 yoff_border2 = off.x == 1 ? ivec2(-1, -1) : ivec2(-2, 0);
+        if (expectEXT(off.x > 0, true)) {
+            const ivec2 yoff_border2 = expectEXT(off.x == 1, false) ? ivec2(-1, -1) : ivec2(-2, 0);
             cur2 = TYPE(imageLoad(dec[p], sp + LADDR(off + yoff_border2))[0]);
         }
         base += quant_table[quant_table_idx][3][(cur2 - cur) & MAX_QUANT_TABLE_MASK];
@@ -110,7 +110,7 @@ ivec2 get_pred(ivec2 sp, ivec2 off, int p, int sw, uint8_t quant_table_idx)
 #ifndef GOLOMB
 int get_isymbol(inout RangeCoder c, uint64_t state)
 {
-    if (get_rac(c, state))
+    if (expectEXT(get_rac(c, state), false))
         return 0;
 
     state += 1;
@@ -120,9 +120,9 @@ int get_isymbol(inout RangeCoder c, uint64_t state)
         if (!get_rac(c, state + min(e, 9)))
             break;
 
-    if (e == 0) {
+    if (expectEXT(e == 0, false)) {
         return get_rac(c, state + 10) ? -1 : 1;
-    } else if (e > 31) {
+    } else if (expectEXT(e > 31, false)) {
         corrupt = true;
         return 0;
     }
@@ -274,7 +274,7 @@ void writeout_rgb(in SliceContext sc, ivec2 sp, int w, int y, bool apply_rct)
         if (transparency != 0)
             pix.a = int(imageLoad(dec[3], lpos)[0]);
 
-        if (apply_rct)
+        if (expectEXT(apply_rct, true))
             pix = transform_sample(pix, sc.slice_rct_coef);
 
         imageStore(dst[0], pos, pix);
diff --git a/libavcodec/vulkan/rangecoder.comp b/libavcodec/vulkan/rangecoder.comp
index ff0432511d..b95c722a5c 100644
--- a/libavcodec/vulkan/rangecoder.comp
+++ b/libavcodec/vulkan/rangecoder.comp
@@ -141,7 +141,7 @@ void put_rac_equi(inout RangeCoder c, bool bit)
         c.range -= range1;
     }
 
-    if (c.range < 0x100)
+    if (expectEXT(c.range < 0x100, false))
         renorm_encoder(c);
 }
 
@@ -157,7 +157,7 @@ void put_rac_terminate(inout RangeCoder c)
 #endif
 
     c.range -= range1;
-    if (c.range < 0x100)
+    if (expectEXT(c.range < 0x100, false))
         renorm_encoder(c);
 }
 
@@ -218,7 +218,7 @@ void refill(inout RangeCoder c)
 {
     c.range <<= 8;
     c.low   <<= 8;
-    if (c.bytestream < c.bytestream_end) {
+    if (expectEXT(c.bytestream < c.bytestream_end, false)) {
         c.low |= u8buf(c.bytestream).v;
         c.bytestream++;
     } else {
@@ -239,7 +239,7 @@ bool get_rac(inout RangeCoder c, uint64_t state)
     c.low = c.low - (bit ? ranged : 0);
     c.range = (bit ? 0 : ranged) - (bit ? range1 : 0);
 
-    if (c.range < 0x100)
+    if (expectEXT(c.range < 0x100, false))
         refill(c);
 
     return bit;
@@ -256,7 +256,7 @@ bool get_rac_direct(inout RangeCoder c, inout uint8_t state)
     c.low = c.low - (bit ? ranged : 0);
     c.range = (bit ? 0 : ranged) - (bit ? range1 : 0);
 
-    if (c.range < 0x100)
+    if (expectEXT(c.range < 0x100, false))
         refill(c);
 
     return bit;
@@ -274,7 +274,7 @@ bool get_rac_equi(inout RangeCoder c)
         c.range = range1;
     }
 
-    if (c.range < 0x100)
+    if (expectEXT(c.range < 0x100, false))
         refill(c);
 
     return bit;
diff --git a/libavutil/hwcontext_vulkan.c b/libavutil/hwcontext_vulkan.c
index d11c0274d2..f7d43248e8 100644
--- a/libavutil/hwcontext_vulkan.c
+++ b/libavutil/hwcontext_vulkan.c
@@ -79,6 +79,7 @@ typedef struct VulkanDeviceFeatures {
     VkPhysicalDeviceVulkan12Features vulkan_1_2;
     VkPhysicalDeviceVulkan13Features vulkan_1_3;
     VkPhysicalDeviceTimelineSemaphoreFeatures timeline_semaphore;
+    VkPhysicalDeviceShaderExpectAssumeFeatures expect_assume;
 
     VkPhysicalDeviceVideoMaintenance1FeaturesKHR video_maintenance_1;
 #ifdef VK_KHR_video_maintenance2
@@ -209,6 +210,9 @@ static void device_features_init(AVHWDeviceContext *ctx, VulkanDeviceFeatures *f
     OPT_CHAIN(&feats->timeline_semaphore, FF_VK_EXT_PORTABILITY_SUBSET,
               VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_TIMELINE_SEMAPHORE_FEATURES);
 
+    OPT_CHAIN(&feats->expect_assume, FF_VK_EXT_EXPECT_ASSUME,
+              VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_SHADER_EXPECT_ASSUME_FEATURES_KHR);
+
     OPT_CHAIN(&feats->video_maintenance_1, FF_VK_EXT_VIDEO_MAINTENANCE_1,
               VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_VIDEO_MAINTENANCE_1_FEATURES_KHR);
 #ifdef VK_KHR_video_maintenance2
@@ -301,6 +305,8 @@ static void device_features_copy_needed(VulkanDeviceFeatures *dst, VulkanDeviceF
     COPY_VAL(relaxed_extended_instruction.shaderRelaxedExtendedInstruction);
 #endif
 
+    COPY_VAL(expect_assume.shaderExpectAssume);
+
     COPY_VAL(optical_flow.opticalFlow);
 #undef COPY_VAL
 }
@@ -615,6 +621,7 @@ static const VulkanOptExtension optional_device_exts[] = {
     { VK_KHR_COOPERATIVE_MATRIX_EXTENSION_NAME,               FF_VK_EXT_COOP_MATRIX            },
     { VK_NV_OPTICAL_FLOW_EXTENSION_NAME,                      FF_VK_EXT_OPTICAL_FLOW           },
     { VK_EXT_SHADER_OBJECT_EXTENSION_NAME,                    FF_VK_EXT_SHADER_OBJECT          },
+    { VK_KHR_SHADER_EXPECT_ASSUME_EXTENSION_NAME,             FF_VK_EXT_EXPECT_ASSUME          },
     { VK_KHR_VIDEO_MAINTENANCE_1_EXTENSION_NAME,              FF_VK_EXT_VIDEO_MAINTENANCE_1    },
 #ifdef VK_KHR_video_maintenance2
     { VK_KHR_VIDEO_MAINTENANCE_2_EXTENSION_NAME,              FF_VK_EXT_VIDEO_MAINTENANCE_2    },
diff --git a/libavutil/vulkan.c b/libavutil/vulkan.c
index 7650e83d1d..bee9d3da23 100644
--- a/libavutil/vulkan.c
+++ b/libavutil/vulkan.c
@@ -2046,6 +2046,12 @@ int ff_vk_shader_init(FFVulkanContext *s, FFVulkanShader *shd, const char *name,
     GLSLC(0, #extension GL_EXT_scalar_block_layout : require                  );
     GLSLC(0, #extension GL_EXT_shader_explicit_arithmetic_types : require     );
     GLSLC(0, #extension GL_EXT_control_flow_attributes : require              );
+    if (s->extensions & FF_VK_EXT_EXPECT_ASSUME) {
+        GLSLC(0, #extension GL_EXT_expect_assume : require                    );
+    } else {
+        GLSLC(0, #define assumeEXT(x) (x)                                     );
+        GLSLC(0, #define expectEXT(x) (x)                                     );
+    }
     if ((s->extensions & FF_VK_EXT_DEBUG_UTILS) &&
         (s->extensions & FF_VK_EXT_RELAXED_EXTENDED_INSTR)) {
         GLSLC(0, #extension GL_EXT_debug_printf : require                     );
diff --git a/libavutil/vulkan_functions.h b/libavutil/vulkan_functions.h
index 8f2bbb38c9..cd61d71577 100644
--- a/libavutil/vulkan_functions.h
+++ b/libavutil/vulkan_functions.h
@@ -47,6 +47,7 @@ typedef uint64_t FFVulkanExtensions;
 #define FF_VK_EXT_SHADER_OBJECT          (1ULL << 13) /* VK_EXT_shader_object */
 #define FF_VK_EXT_PUSH_DESCRIPTOR        (1ULL << 14) /* VK_KHR_push_descriptor */
 #define FF_VK_EXT_RELAXED_EXTENDED_INSTR (1ULL << 15) /* VK_KHR_shader_relaxed_extended_instruction */
+#define FF_VK_EXT_EXPECT_ASSUME          (1ULL << 16) /* VK_KHR_shader_expect_assume */
 
 /* Video extensions */
 #define FF_VK_EXT_VIDEO_QUEUE            (1ULL << 36) /* VK_KHR_video_queue */
diff --git a/libavutil/vulkan_loader.h b/libavutil/vulkan_loader.h
index 6d5bbf057a..3641fcb22e 100644
--- a/libavutil/vulkan_loader.h
+++ b/libavutil/vulkan_loader.h
@@ -76,6 +76,7 @@ static inline uint64_t ff_vk_extensions_to_mask(const char * const *extensions,
         { VK_KHR_VIDEO_DECODE_H265_EXTENSION_NAME,         FF_VK_EXT_VIDEO_DECODE_H265      },
         { VK_KHR_VIDEO_DECODE_AV1_EXTENSION_NAME,          FF_VK_EXT_VIDEO_DECODE_AV1       },
         { VK_KHR_PUSH_DESCRIPTOR_EXTENSION_NAME,           FF_VK_EXT_PUSH_DESCRIPTOR        },
+        { VK_KHR_SHADER_EXPECT_ASSUME_EXTENSION_NAME,      FF_VK_EXT_EXPECT_ASSUME          },
     };
 
     FFVulkanExtensions mask = 0x0;
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [FFmpeg-devel] [PATCH 18/18] vulkan_ffv1: add cached symbol reader for AMD
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
                   ` (15 preceding siblings ...)
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 17/18] vulkan: add support for expect/assume Lynne
@ 2025-04-12  7:22 ` Lynne
  2025-04-13 13:38 ` [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Jerome Martinez
  17 siblings, 0 replies; 20+ messages in thread
From: Lynne @ 2025-04-12  7:22 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lynne

Speeds up everything on AMD by 3x.
This uses 32 local invocations to load state into cache, as well
as to do the RCT faster.
---
 libavcodec/vulkan/ffv1_dec.comp | 71 ++++++++++++++++++++-------------
 libavcodec/vulkan_ffv1.c        |  7 +++-
 2 files changed, 50 insertions(+), 28 deletions(-)

diff --git a/libavcodec/vulkan/ffv1_dec.comp b/libavcodec/vulkan/ffv1_dec.comp
index 9eba322b27..3c46ee1771 100644
--- a/libavcodec/vulkan/ffv1_dec.comp
+++ b/libavcodec/vulkan/ffv1_dec.comp
@@ -108,34 +108,37 @@ ivec2 get_pred(ivec2 sp, ivec2 off, int p, int sw, uint8_t quant_table_idx)
 #endif
 
 #ifndef GOLOMB
-int get_isymbol(inout RangeCoder c, uint64_t state)
+#ifdef CACHED_SYMBOL_READER
+shared uint8_t state[CONTEXT_SIZE];
+#define READ(c, off) get_rac_direct(c, state[off])
+#else
+#define READ(c, off) get_rac(c, uint64_t(slice_state) + state_off + off)
+#endif
+
+int get_isymbol(inout RangeCoder c, uint state_off)
 {
-    if (expectEXT(get_rac(c, state), false))
+    if (expectEXT(READ(c, 0), false))
         return 0;
 
-    state += 1;
-
-    int e;
-    for (e = 0; e < 32; e++)
-        if (!get_rac(c, state + min(e, 9)))
+    int e = 1;
+    for (; e < 33; e++)
+        if (!READ(c, min(e, 10)))
             break;
 
-    if (expectEXT(e == 0, false)) {
-        return get_rac(c, state + 10) ? -1 : 1;
-    } else if (expectEXT(e > 31, false)) {
+    if (expectEXT(e == 1, false)) {
+        return READ(c, 11) ? -1 : 1;
+    } else if (expectEXT(e == 33, false)) {
         corrupt = true;
         return 0;
     }
 
-    state += 21;
-
     int a = 1;
-    for (int i = e - 1; i >= 0; i--) {
+    for (int i = e + 20; i >= 22; i--) {
         a <<= 1;
-        a |= int(get_rac(c, state + min(i, 9)));  // 22..31
+        a |= int(READ(c, min(i, 31)));
     }
 
-    return get_rac(c, state - 11 + min(e, 10)) ? -a : a;
+    return READ(c, min(e + 10, 21)) ? -a : a;
 }
 
 void decode_line_pcm(inout SliceContext sc, ivec2 sp, int w, int y, int p, int bits)
@@ -157,7 +160,7 @@ void decode_line_pcm(inout SliceContext sc, ivec2 sp, int w, int y, int p, int b
 }
 
 void decode_line(inout SliceContext sc, ivec2 sp, int w,
-                 int y, int p, int bits, uint64_t state,
+                 int y, int p, int bits, uint state_off,
                  uint8_t quant_table_idx, const int run_index)
 {
 #ifndef RGB
@@ -171,19 +174,33 @@ void decode_line(inout SliceContext sc, ivec2 sp, int w,
         ivec2 pr = get_pred(sp, ivec2(x, y), p, w,
                             quant_table_idx);
 
-        int diff = get_isymbol(sc.c, state + CONTEXT_SIZE*abs(pr[0]));
-        if (pr[0] < 0)
-            diff = -diff;
+        uint context_off = state_off + CONTEXT_SIZE*abs(pr[0]);
+#ifdef CACHED_SYMBOL_READER
+        u8buf sb = u8buf(uint64_t(slice_state) + context_off + gl_LocalInvocationID.x);
+        state[gl_LocalInvocationID.x] = sb.v;
+        barrier();
+        if (gl_LocalInvocationID.x == 0) {
 
-        uint v = zero_extend(pr[1] + diff, bits);
-        imageStore(dec[p], sp + LADDR(ivec2(x, y)), uvec4(v));
+#endif
+
+            int diff = get_isymbol(sc.c, context_off);
+            if (pr[0] < 0)
+                diff = -diff;
+
+            uint v = zero_extend(pr[1] + diff, bits);
+            imageStore(dec[p], sp + LADDR(ivec2(x, y)), uvec4(v));
+
+#ifdef CACHED_SYMBOL_READER
+        }
+        sb.v = state[gl_LocalInvocationID.x];
+#endif
     }
 }
 
 #else /* GOLOMB */
 
 void decode_line(inout SliceContext sc, ivec2 sp, int w,
-                 int y, int p, int bits, uint64_t state,
+                 int y, int p, int bits, uint state_off,
                  uint8_t quant_table_idx, inout int run_index)
 {
 #ifndef RGB
@@ -202,7 +219,7 @@ void decode_line(inout SliceContext sc, ivec2 sp, int w,
         ivec2 pr = get_pred(sp, ivec2(x, y), p, w,
                             quant_table_idx);
 
-        VlcState sb = VlcState(state + VLC_STATE_SIZE*abs(pr[0]));
+        VlcState sb = VlcState(uint64_t(slice_state) + state_off + VLC_STATE_SIZE*abs(pr[0]));
 
         if (pr[0] == 0 && run_mode == 0)
             run_mode = 1;
@@ -263,7 +280,7 @@ ivec4 transform_sample(ivec4 pix, ivec2 rct_coef)
 
 void writeout_rgb(in SliceContext sc, ivec2 sp, int w, int y, bool apply_rct)
 {
-    for (int x = 0; x < w; x++) {
+    for (uint x = gl_LocalInvocationID.x; x < w; x += gl_WorkGroupSize.x) {
         ivec2 lpos = sp + LADDR(ivec2(x, y));
         ivec2 pos = sc.slice_pos + ivec2(x, y);
 
@@ -305,6 +322,8 @@ void decode_slice(inout SliceContext sc, const uint slice_idx)
     /* PCM coding */
 #ifndef GOLOMB
     if (sc.slice_coding_mode == 1) {
+        if (gl_LocalInvocationID.x > 0)
+            return;
 #ifndef RGB
         for (int p = 0; p < planes; p++) {
             int h = sc.slice_dim.y;
@@ -328,9 +347,7 @@ void decode_slice(inout SliceContext sc, const uint slice_idx)
 #endif
     {
         u8vec4 quant_table_idx = sc.quant_table_idx.xyyz;
-        u64vec4 slice_state_off = (uint64_t(slice_state) +
-                                   slice_idx*plane_state_size*codec_planes) +
-                                  plane_state_size*uvec4(0, 1, 1, 2);
+        u32vec4 slice_state_off = (slice_idx*codec_planes + uvec4(0, 1, 1, 2))*plane_state_size;
 
 #ifndef RGB
         for (int p = 0; p < planes; p++) {
diff --git a/libavcodec/vulkan_ffv1.c b/libavcodec/vulkan_ffv1.c
index c1875711bc..33c4e9114d 100644
--- a/libavcodec/vulkan_ffv1.c
+++ b/libavcodec/vulkan_ffv1.c
@@ -823,12 +823,14 @@ static int init_decode_shader(FFV1Context *f, FFVulkanContext *s,
     uint8_t *spv_data;
     size_t spv_len;
     void *spv_opaque = NULL;
+    int use_cached_reader = ac != AC_GOLOMB_RICE &&
+                            s->driver_props.driverID == VK_DRIVER_ID_MESA_RADV;
 
     RET(ff_vk_shader_init(s, shd, "ffv1_dec",
                           VK_SHADER_STAGE_COMPUTE_BIT,
                           (const char *[]) { "GL_EXT_buffer_reference",
                                              "GL_EXT_buffer_reference2" }, 2,
-                          1, 1, 1,
+                          use_cached_reader ? 32 : 1, 1, 1,
                           0));
 
     if (ac == AC_GOLOMB_RICE)
@@ -837,6 +839,9 @@ static int init_decode_shader(FFV1Context *f, FFVulkanContext *s,
     if (rgb)
         av_bprintf(&shd->src, "#define RGB\n");
 
+    if (use_cached_reader)
+        av_bprintf(&shd->src, "#define CACHED_SYMBOL_READER 1\n");
+
     /* Common codec header */
     GLSLD(ff_source_common_comp);
 
-- 
2.47.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel
  2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
                   ` (16 preceding siblings ...)
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 18/18] vulkan_ffv1: add cached symbol reader for AMD Lynne
@ 2025-04-13 13:38 ` Jerome Martinez
  17 siblings, 0 replies; 20+ messages in thread
From: Jerome Martinez @ 2025-04-13 13:38 UTC (permalink / raw)
  To: ffmpeg-devel

I tested the patches as a whole, for the moment with a focus on the 
performance.

Tested with a RTX 4070 Ti on Linux & a RTX 3050 on Windows, globally no 
regression in speed on both test platforms, and I find a +25% (up to 
+35% in some cases, especially high resolutions) in average with 10-bit 
or 16-bit content for most cases (it is weird sometimes, no speed 
improvement, I'll check).
Easy to decode complex 4K 10-bit in real time with the RTX 4070 Ti now, 
which is good performance for lossless compressed content.

Additionally, the cards can handle higher resolutions e.g. 12k9k16bit 
succeeds now even on the low end RTX 3050 (at 0.5 fps :) ).


Le 12/04/2025 à 09:22, Lynne a écrit :
> Temporary workaround. Will be replaced with a version check once a fix is
> in the works and a known next version for Mesa with a fix is known.
> ---
>   libavutil/hwcontext_vulkan.c | 5 +++++
>   1 file changed, 5 insertions(+)
>
> diff --git a/libavutil/hwcontext_vulkan.c b/libavutil/hwcontext_vulkan.c
> index 319b71ed04..d11c0274d2 100644
> --- a/libavutil/hwcontext_vulkan.c
> +++ b/libavutil/hwcontext_vulkan.c
> @@ -773,6 +773,11 @@ static int check_extensions(AVHWDeviceContext *ctx, int dev, AVDictionary *opts,
>           tstr = optional_exts[i].name;
>           found = 0;
>   
> +        /* Intel has had a bad descriptor buffer implementation for a while */
> +        if (p->vkctx.driver_props.driverID == VK_DRIVER_ID_INTEL_OPEN_SOURCE_MESA &&
> +            !strcmp(tstr, VK_EXT_DESCRIPTOR_BUFFER_EXTENSION_NAME))
> +            continue;
> +
>           if (dev &&
>               ((debug_mode == FF_VULKAN_DEBUG_VALIDATE) ||
>                (debug_mode == FF_VULKAN_DEBUG_PRINTF) ||


_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [FFmpeg-devel] [PATCH 13/18] ffv1/vulkan: redo context count tracking and quant_table_idx management
  2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 13/18] ffv1/vulkan: redo context count tracking and quant_table_idx management Lynne
@ 2025-04-13 20:39   ` Jerome Martinez
  0 siblings, 0 replies; 20+ messages in thread
From: Jerome Martinez @ 2025-04-13 20:39 UTC (permalink / raw)
  To: ffmpeg-devel

Le 12/04/2025 à 09:22, Lynne a écrit :
> Also, this commit fully fixes decoding of context=1 encoded files.

I confirm that it passes all my tests with this patches.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-04-13 20:39 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-12  7:22 [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Lynne
2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 02/18] vulkan_ffv1: enable acceleration " Lynne
2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 03/18] vulkan_ffv1: remove unused define Lynne
2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 04/18] vulkan_ffv1: slightly optimize the range decoder Lynne
2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 05/18] vulkan_ffv1: optimize symbol reader Lynne
2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 06/18] vulkan_ffv1: allocate just as much memory for slice state as needed Lynne
2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 07/18] vulkan_ffv1: init overread/corrupt fields Lynne
2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 08/18] vulkan_ffv1: fallback to upload if mapping packet fails, fix fallback Lynne
2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 09/18] vulkan_ffv1: fix reset shader dependencies Lynne
2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 10/18] vulkan_ffv1: improve buffer barrier correctness for slice state Lynne
2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 11/18] vulkan_ffv1: fix left-2 sample addressing Lynne
2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 12/18] vulkan_ffv1: cache only 2 lines when decoding RGB Lynne
2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 13/18] ffv1/vulkan: redo context count tracking and quant_table_idx management Lynne
2025-04-13 20:39   ` Jerome Martinez
2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 14/18] vulkan_ffv1: externalize extended lookup check Lynne
2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 15/18] vulkan_ffv1: remove need for scratch data during setup Lynne
2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 16/18] vulkan_ffv1: shortcut +-1 coeffs in symbol reading Lynne
2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 17/18] vulkan: add support for expect/assume Lynne
2025-04-12  7:22 ` [FFmpeg-devel] [PATCH 18/18] vulkan_ffv1: add cached symbol reader for AMD Lynne
2025-04-13 13:38 ` [FFmpeg-devel] [PATCH 01/18] hwcontext_vulkan: disable descriptor buffer extension on Intel Jerome Martinez

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git