From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 974EF4E2DE for ; Mon, 19 Jan 2026 11:01:20 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'oxswgQiErPoTP4NRk4ryT7LapFQFvowa5Pe42Z4JsrA=', expected b'QUnOFKM3UnEfSpguh+l2h2N+9Qgv9uSUQfqh+Ef3/MU=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1768820415; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=oxswgQiErPoTP4NRk4ryT7LapFQFvowa5Pe42Z4JsrA=; b=f31flRYhgOFiR5rRDt1fiQThwvhcST/rhqnlpJr4MlAuaIvmOS8DgHmPn2h0MHMfrJkrw h7XGFrYTF8eEdVSyWlX/66d4ZRg8hXi1/dAgUf62x5FaG73FPv6NCfNKne+mrTpuvZRjdY+ xcQz4mzd3hqoMiT1Sey0IIgTa/3xzxBREnQKJtdNjc95EVXKU0sAB9M1pscsBOXx3XUbssJ WLRjBfwGUW1/dCxgLeBzfXkJJuRzPB6ciG00dXrFK8LvmXW1VIajW/2lAbYGAc9fn68Mq5n gMkJW031Kd9EmYEfEGRGBawd8O6VUzeQET4WRPC1EVoYEPS65tKf6ASRPzgQ== Received: from [172.20.0.4] (unknown [172.20.0.4]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 2E74A690E12; Mon, 19 Jan 2026 13:00:15 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1768820397; b=HS2hmsaMhhcUu5C2whJY6DZFuJi8RngJz2Uup28pbi77ASFAuW4lWI+DE/xqNTjq5pSOS 5Tu5icXOTxBfNCQGBHjMIN/M2KDy+0eCe2Zyz0dCdFKS7FsFs4Xha1kVyc2MOy+1Q4zX4AH jCeE20Sxd9+L7GMuHvOBGmINI7/Sq+6bQwAgZIAIKGxKwq9U/GziAN1nwfiyvD5XjdnRVEn cSN2vzXVFKZqKM3DwmfuSg/EMSbg9CahtW10NEwu6rRVTYmLAEF1kWNIr1bj3ssj0V7mxd/ q6STu/PKv/xZQCOGXGVZfqrY6juwirpINJjmFehMin1v6dBEOozm2nCWJYSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1768820397; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=ojsApYdzX6+HFsROMcvINbkD5EkfNXZGi2vYAmMvnKk=; b=ezWUKm+Nwt6lkxC8/wE6PKeMhWU+vvLpuU8IEjoCzfUYb7Wl3w/9jtivr4UkqEc/BdOoB yDzFKUEnSNCqcW2nMCxTyo0e2C0+puEf9pOY8cFgeF/VR/ydwSDJcuknwPhhq6rbJCz4MsP goL/RRQMdWsfWGXOqu6XSdDO38n5rJlgZcSnNxmVopBhF3YsJ06whZlexHLnAu8WSOYXYDn iGWKxXM/6jSUxChYYMx4uvqcE/tU2TRd7H52Hv34lLPHS/cDkft0kfhvGo+KDAESnpdIBP6 I7B4EyP90gZ6pbXYV+5Io86HG0Ub4uz9Ic8e9WBHHZa8dhbcolt4kL5ZU85w== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1768820389; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=QUnOFKM3UnEfSpguh+l2h2N+9Qgv9uSUQfqh+Ef3/MU=; b=fYUA3zBAzcjetC+aSZvwIMsLkIjsBeJ/UhsYRFVzXYVpPXBkg+gpdbrwbMuUpXx5gxxCC ctlPjPpVggJwyK33Iqqtp0KSr+y+d+c54JWygLp75Q882aD5m8XYEPJef8BeTVmxBLmZDh2 CKJRYP1QQNS+cvBjUZs1GxIHUTL3hpNnGcFK7Mt1/1F7p+BHbn66VmrSHeec31KBDnidelG 8Nto5AwVVqt+dkNUcGdFb2wQ90XV58GwiApaFa14U4sNkhv8Gbk9FCWimprkT9ScZtCdyMD 0AeIKD14gTsKQ5oCVLvmMD/kWPZVvxbzCuW6YYykaORO9mQYFsJ3DdgM3WzQ== Received: from 69dab402ede7 (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 5C54A690DE3 for ; Mon, 19 Jan 2026 12:59:49 +0200 (EET) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Mon, 19 Jan 2026 10:59:49 -0000 Message-ID: <176882038955.25.15147601132068838814@4457048688e7> Message-ID-Hash: NSF4UVFTRN2M367ANI7HK4QGRXLNM5RT X-Message-ID-Hash: NSF4UVFTRN2M367ANI7HK4QGRXLNM5RT X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PR] vulkan_dpx: remove host image upload path (PR #21514) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Lynne via ffmpeg-devel Cc: Lynne Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #21514 opened by Lynne URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21514 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21514.patch The main reason this was written was due to Nvidia. Nvidia always has a fickle upload path, and seemed to have a shortcut for the host image upload path. This seems to have been patched out of recent driver versions. This upload path relies on the driver keeping the same layout, down to the stride for the images. Which is an assumption that's not portable. Rather than relying on this fickle upload path, what we'd like when we want pure bandwidth is to decouple uploads to a separate queue, and let the GPU pull the data from RAM via uploads. It'll be slower with a single-threaded decoder, but currently all of our compute-based decoders and the decoders that sit underneath them support frame threading. >>From 2b309836266b969ec0f4f6d7feefdf4c54e7dcc1 Mon Sep 17 00:00:00 2001 From: Lynne Date: Fri, 16 Jan 2026 16:09:05 +0100 Subject: [PATCH 1/3] vulkan: remove IS_WITHIN macro This is the more correct GLSL solution. --- libavcodec/vulkan/common.comp | 3 --- libavcodec/vulkan/dpx_unpack.comp.glsl | 2 +- libavfilter/vulkan/avgblur.comp.glsl | 3 +-- libavfilter/vulkan/bwdif.comp.glsl | 4 ++-- 4 files changed, 4 insertions(+), 8 deletions(-) diff --git a/libavcodec/vulkan/common.comp b/libavcodec/vulkan/common.comp index 8a658f8524..1ec9ae7e7c 100644 --- a/libavcodec/vulkan/common.comp +++ b/libavcodec/vulkan/common.comp @@ -107,9 +107,6 @@ layout(buffer_reference, buffer_reference_align = 8) buffer u64buf { #define ceil_rshift(a, b) \ (-((-(a)) >> (b))) -#define IS_WITHIN(v1, v2) \ - ((v1.x < v2.x) && (v1.y < v2.y)) - /* TODO: optimize */ uint align(uint src, uint a) { diff --git a/libavcodec/vulkan/dpx_unpack.comp.glsl b/libavcodec/vulkan/dpx_unpack.comp.glsl index 93fda6142d..3850cbf3e9 100644 --- a/libavcodec/vulkan/dpx_unpack.comp.glsl +++ b/libavcodec/vulkan/dpx_unpack.comp.glsl @@ -91,7 +91,7 @@ i16vec4 parse_packed_in_32(ivec2 pos, int stride) void main(void) { ivec2 pos = ivec2(gl_GlobalInvocationID.xy); - if (!IS_WITHIN(pos, imageSize(dst[0]))) + if (any(greaterThanEqual(pos, imageSize(dst[0])))) return; i16vec4 p; diff --git a/libavfilter/vulkan/avgblur.comp.glsl b/libavfilter/vulkan/avgblur.comp.glsl index e7a476b98c..b53ec4092c 100644 --- a/libavfilter/vulkan/avgblur.comp.glsl +++ b/libavfilter/vulkan/avgblur.comp.glsl @@ -40,9 +40,8 @@ void main() { const ivec2 pos = ivec2(gl_GlobalInvocationID.xy); -#define IS_WITHIN(v1, v2) ((v1.x < v2.x) && (v1.y < v2.y)) ivec2 size = imageSize(output_img[nonuniformEXT(gl_LocalInvocationID.z)]); - if (!IS_WITHIN(pos, size)) + if (any(greaterThanEqual(pos, size))) return; if ((planes & (1 << gl_LocalInvocationID.z)) == 0) { diff --git a/libavfilter/vulkan/bwdif.comp.glsl b/libavfilter/vulkan/bwdif.comp.glsl index 043ded0d24..fb18af3915 100644 --- a/libavfilter/vulkan/bwdif.comp.glsl +++ b/libavfilter/vulkan/bwdif.comp.glsl @@ -151,8 +151,8 @@ void main() bool filter_field = ((pos.y ^ parity) & 1) == 1; bool is_intra = filter_field && (current_field == 0); -#define IS_WITHIN(v1, v2) ((v1.x < v2.x) && (v1.y < v2.y)) - if (!IS_WITHIN(pos, imageSize(dst[nonuniformEXT(gl_LocalInvocationID.z)]))) { + ivec2 size = imageSize(dst[nonuniformEXT(gl_LocalInvocationID.z)]); + if (any(greaterThanEqual(pos, size))) { return; } else if (is_intra) { process_plane_intra(pos); -- 2.52.0 >>From a6e641ab6e08bf8a794e256e5082335794fc6b68 Mon Sep 17 00:00:00 2001 From: Lynne Date: Mon, 19 Jan 2026 11:33:02 +0100 Subject: [PATCH 2/3] vulkan_decode: do not align single-plane images to subsampling Unlike multiplane images, single-plane images do not need to be aligned to chroma width. Saves a bit of memory. --- libavcodec/vulkan_decode.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/libavcodec/vulkan_decode.c b/libavcodec/vulkan_decode.c index 5ed963eacc..9ab8d45aa9 100644 --- a/libavcodec/vulkan_decode.c +++ b/libavcodec/vulkan_decode.c @@ -1150,7 +1150,10 @@ int ff_vk_frame_params(AVCodecContext *avctx, AVBufferRef *hw_frames_ctx) if (err < 0) return err; + frames_ctx->format = AV_PIX_FMT_VULKAN; frames_ctx->sw_format = avctx->sw_pix_fmt; + frames_ctx->width = avctx->coded_width; + frames_ctx->height = avctx->coded_height; if (!DECODER_IS_SDR(avctx->codec_id)) { prof = av_mallocz(sizeof(FFVulkanDecodeProfileData)); @@ -1166,6 +1169,9 @@ int ff_vk_frame_params(AVCodecContext *avctx, AVBufferRef *hw_frames_ctx) return err; } + const AVPixFmtDescriptor *pdesc = av_pix_fmt_desc_get(frames_ctx->sw_format); + frames_ctx->width = FFALIGN(frames_ctx->width, 1 << pdesc->log2_chroma_w); + frames_ctx->height = FFALIGN(frames_ctx->height, 1 << pdesc->log2_chroma_h); frames_ctx->user_opaque = prof; frames_ctx->free = free_profile_data; @@ -1211,11 +1217,6 @@ int ff_vk_frame_params(AVCodecContext *avctx, AVBufferRef *hw_frames_ctx) } } - const AVPixFmtDescriptor *pdesc = av_pix_fmt_desc_get(frames_ctx->sw_format); - frames_ctx->width = FFALIGN(avctx->coded_width, 1 << pdesc->log2_chroma_w); - frames_ctx->height = FFALIGN(avctx->coded_height, 1 << pdesc->log2_chroma_h); - frames_ctx->format = AV_PIX_FMT_VULKAN; - hwfc->tiling = VK_IMAGE_TILING_OPTIMAL; hwfc->usage = VK_IMAGE_USAGE_TRANSFER_SRC_BIT | VK_IMAGE_USAGE_STORAGE_BIT | -- 2.52.0 >>From f0d2e7de30ff8d9b7b0b1c8210337fc849eb82ef Mon Sep 17 00:00:00 2001 From: Lynne Date: Mon, 19 Jan 2026 11:51:30 +0100 Subject: [PATCH 3/3] vulkan_dpx: remove host image upload path The main reason this was written was due to Nvidia. Nvidia always has a fickle upload path, and seemed to have a shortcut for the host image upload path. This seems to have been patched out of recent driver versions. This upload path relies on the driver keeping the same layout, down to the stride for the images. Which is an assumption that's not portable. Rather than relying on this fickle upload path, what we'd like when we want pure bandwidth is to decouple uploads to a separate queue, and let the GPU pull the data from RAM via uploads. It'll be slower with a single-threaded decoder, but currently all of our compute-based decoders and the decoders that sit underneath them support frame threading. --- libavcodec/vulkan_dpx.c | 103 ---------------------------------------- 1 file changed, 103 deletions(-) diff --git a/libavcodec/vulkan_dpx.c b/libavcodec/vulkan_dpx.c index cf53a0f4df..17f91c6ce4 100644 --- a/libavcodec/vulkan_dpx.c +++ b/libavcodec/vulkan_dpx.c @@ -54,106 +54,6 @@ typedef struct DecodePushData { int shift; } DecodePushData; -static int host_upload_image(AVCodecContext *avctx, - FFVulkanDecodeContext *dec, DPXDecContext *dpx, - const uint8_t *src, uint32_t size) -{ - int err; - VkImage temp; - - FFVulkanDecodeShared *ctx = dec->shared_ctx; - DPXVulkanDecodeContext *dxv = ctx->sd_ctx; - VkPhysicalDeviceLimits *limits = &ctx->s.props.properties.limits; - FFVulkanFunctions *vk = &ctx->s.vkfn; - - DPXVulkanDecodePicture *pp = dpx->hwaccel_picture_private; - FFVulkanDecodePicture *vp = &pp->vp; - - int unpack = (avctx->bits_per_raw_sample == 12 && !dpx->packing) || - avctx->bits_per_raw_sample == 10; - if (unpack) - return 0; - - VkImageCreateInfo create_info = { - .sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO, - .imageType = VK_IMAGE_TYPE_2D, - .format = avctx->bits_per_raw_sample == 8 ? VK_FORMAT_R8_UINT : - avctx->bits_per_raw_sample == 32 ? VK_FORMAT_R32_UINT : - VK_FORMAT_R16_UINT, - .extent.width = dpx->frame->width*dpx->components, - .extent.height = dpx->frame->height, - .extent.depth = 1, - .mipLevels = 1, - .arrayLayers = 1, - .tiling = VK_IMAGE_TILING_LINEAR, - .initialLayout = VK_IMAGE_LAYOUT_UNDEFINED, - .usage = VK_IMAGE_USAGE_STORAGE_BIT | VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT, - .samples = VK_SAMPLE_COUNT_1_BIT, - .pQueueFamilyIndices = &ctx->qf[0].idx, - .queueFamilyIndexCount = 1, - .sharingMode = VK_SHARING_MODE_EXCLUSIVE, - }; - - if (create_info.extent.width >= limits->maxImageDimension2D || - create_info.extent.height >= limits->maxImageDimension2D) - return 0; - - vk->CreateImage(ctx->s.hwctx->act_dev, &create_info, ctx->s.hwctx->alloc, - &temp); - - err = ff_vk_get_pooled_buffer(&ctx->s, &dxv->frame_data_pool, - &vp->slices_buf, - VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | - VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT, - NULL, size, - VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT | - VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT); - if (err < 0) - return err; - - FFVkBuffer *vkb = (FFVkBuffer *)vp->slices_buf->data; - VkBindImageMemoryInfo bind_info = { - .sType = VK_STRUCTURE_TYPE_BIND_IMAGE_MEMORY_INFO, - .image = temp, - .memory = vkb->mem, - }; - vk->BindImageMemory2(ctx->s.hwctx->act_dev, 1, &bind_info); - - VkHostImageLayoutTransitionInfo layout_change = { - .sType = VK_STRUCTURE_TYPE_HOST_IMAGE_LAYOUT_TRANSITION_INFO, - .image = temp, - .oldLayout = VK_IMAGE_LAYOUT_UNDEFINED, - .newLayout = VK_IMAGE_LAYOUT_GENERAL, - .subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT, - .subresourceRange.layerCount = 1, - .subresourceRange.levelCount = 1, - }; - vk->TransitionImageLayoutEXT(ctx->s.hwctx->act_dev, 1, &layout_change); - - VkMemoryToImageCopy copy_region = { - .sType = VK_STRUCTURE_TYPE_MEMORY_TO_IMAGE_COPY, - .pHostPointer = src, - .imageSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT, - .imageSubresource.layerCount = 1, - .imageExtent = (VkExtent3D){ dpx->frame->width*dpx->components, - dpx->frame->height, - 1 }, - }; - VkCopyMemoryToImageInfo copy_info = { - .sType = VK_STRUCTURE_TYPE_COPY_MEMORY_TO_IMAGE_INFO, - .flags = VK_HOST_IMAGE_COPY_MEMCPY_EXT, - .dstImage = temp, - .dstImageLayout = VK_IMAGE_LAYOUT_GENERAL, - .regionCount = 1, - .pRegions = ©_region, - }; - vk->CopyMemoryToImageEXT(ctx->s.hwctx->act_dev, ©_info); - - vk->DestroyImage(ctx->s.hwctx->act_dev, temp, ctx->s.hwctx->alloc); - - return 0; -} - static int vk_dpx_start_frame(AVCodecContext *avctx, const AVBufferRef *buffer_ref, av_unused const uint8_t *buffer, @@ -167,9 +67,6 @@ static int vk_dpx_start_frame(AVCodecContext *avctx, DPXVulkanDecodePicture *pp = dpx->hwaccel_picture_private; FFVulkanDecodePicture *vp = &pp->vp; - if (ctx->s.extensions & FF_VK_EXT_HOST_IMAGE_COPY) - host_upload_image(avctx, dec, dpx, buffer, size); - /* Host map the frame data if supported */ if (!vp->slices_buf && ctx->s.extensions & FF_VK_EXT_EXTERNAL_HOST_MEMORY) -- 2.52.0 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org