From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 1B8454CB20 for ; Sat, 29 Nov 2025 17:07:41 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'xeqiVstlQWHmpsLYxgYNb5kzLE0bX5ie46dPPPTPGi0=', expected b'Kv5igKMhIA/pNlERXCsS14Xcqqe2L0qUSgXP0rouO1Q=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1764436043; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=xeqiVstlQWHmpsLYxgYNb5kzLE0bX5ie46dPPPTPGi0=; b=NI+1h0Tc4Dqjv3Xr2zFvOtK/06v3/LkmzD9w8ksypwYF0vrExQGyjwEJdBEusbkjJFfYh 6vqqcyoSDepYGMdYR6y98H0cN9ks5+JRKuoOLgSi813C8A7afCc3Ba6bQUF299COD/w1wzi BDNm7RoIVdAB1Cg8fdCR3oWr3YNUiwnNWSFnFdhL1WIsaQXOI7XMKRyit+H5OL2VwRcXLsO cH231A4MG+QoFPp0ZclzxclgarOtj6662N/Ua0IaOlr0W9ZV6XgCxs/6DWTawjARjf6ev3u l7xT/divzaWTGc8QWDd89+5C9cO+0VpRFA/v9rwsYNmL36hNBpbisrA12p3A== Received: from [172.19.0.3] (unknown [172.19.0.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 077786902AA; Sat, 29 Nov 2025 19:07:23 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1764436026; b=jLC726TCoiWKaAslMFsxc7/dmZLUeMYiViKvzNyGl4ReWEkIK60m1wKgliS4e6uGmE0KC 2JA74NAwPbP6CtDBzE/PW1kOpCtD8z54r0yb4U/1RYVHpA1CsTbasBrEtbqI650D1iKUTAR pX03o1+M4egjsF3pDWZRU59fCpkdtV8RRG8z3inva/Iog3mYL5ReTKmvl9PPFoane2SFwbP LbesfHNy6V8qQDXCtyBHGfusEN0jmupN2rphHLrOm7pPsvOVlCAlnB8nuTgqA0zua5/0WZH x83Rkc/gWPXdyvcABQcpNPvGRv/Bz6zYS+VOdPoSSCaTKF5P8TpRt/D/lqmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1764436026; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=gYUHbYy0xrm0m5BveB3fzmhmR86cAtEQhIOfSZfhxgc=; b=m3kZPPON8oTiUWdtn03NWWoz5s3pzHDouteKIX/0dpaNh+cJGhAN/niwJkDDhjOaS2eMH yXm6zZkKvGw8hmUOfgfzSHPVrgo1Q3fdqgfDqvVqHJofqYlwym4pvt4mQ+cGlm64LUb3az1 MAzXHeWsRTdpw7wtbmvjYRgDKF4tEIqmxuAf+3OfPZsJIkzVLEJhRsuFViOHiXX58HYZeXZ EpyZS4kYBLczDgU5DqWpssVNEz/GRIkuVJat/jbpoSoDg+fLYhgYCnxycP7x776IxYbTIzY VvPOdN2lpn62VUe4bn1JtyeA9gXhVLJ+n5L98ERGhXO9YhIw0eL7q4cULwzw== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1764436016; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=Kv5igKMhIA/pNlERXCsS14Xcqqe2L0qUSgXP0rouO1Q=; b=C/zccQGAn1T0isuTpTQXKG8H6eU7SJT75X3pihfZSG+Df+vWHHVt93jzFgPF0zI2Zn5d9 WJxU79BwbuFjQIKtfSLc0JNj9d1ZOBM/fE/H1B3jtgz+pzO/G2mvG7TrLIn6J9ch3u+g7wh HK8fEyActD/1dPEjIlrpOe6xj3fjRN66r08tpT6GTGsLqY1duoucfMwrqZu5myfCUEXRHTg OOb0JcFD2a+ctwo1mHwI/R2qiM845uxwYGuEtHCO+DCsuibytJZGt2c9bU+IuRAi6cqrwhU p3JUUgofHCqaXPA7X9Tjw+De/anSAj+jBOKX1rGqLtapTNC0kJrtlJhI/6Jg== Received: from 55ca25703178 (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 81C0B68004C for ; Sat, 29 Nov 2025 19:06:56 +0200 (EET) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Sat, 29 Nov 2025 17:06:56 -0000 Message-ID: <176443601670.39.7545910665151083121@2cb04c0e5124> Message-ID-Hash: MVEA3KSOWTWLZTT76A53QBFH3U2WFB2G X-Message-ID-Hash: MVEA3KSOWTWLZTT76A53QBFH3U2WFB2G X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH] vulkan/prores: normalize coefficients during IDCT (PR #21045) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: averne via ffmpeg-devel Cc: averne Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #21045 opened by averne URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21045 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21045.patch Also fix dequant for 4:2:2 subsample. >>From 1982add48595db4891b16131928b9eb25fb85e2f Mon Sep 17 00:00:00 2001 From: averne Date: Sat, 29 Nov 2025 17:26:51 +0100 Subject: [PATCH 1/2] vulkan/prores: fix dequantization for 4:2:2 subsampling Bug introduced in d00f41f due to an oversight. --- libavcodec/vulkan/prores_idct.comp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/vulkan/prores_idct.comp b/libavcodec/vulkan/prores_idct.comp index 05ba8e4967..5d0d41cfa5 100644 --- a/libavcodec/vulkan/prores_idct.comp +++ b/libavcodec/vulkan/prores_idct.comp @@ -127,7 +127,7 @@ void main(void) uint8_t[64] qmat = comp == 0 ? qmat_luma : qmat_chroma; /* Table 15 */ - uint8_t qidx = quant_idx[(gid.y >> 1) * mb_width + (gid.x >> 4)]; + uint8_t qidx = quant_idx[(gid.y >> 1) * mb_width + (gid.x >> (4 - chroma_shift))]; int qscale = qidx > 128 ? (qidx - 96) << 2 : qidx; [[unroll]] for (uint i = 0; i < 8; ++i) { -- 2.49.1 >>From 1c5bb1b12da142ae111b35565420ffd1ccc9a029 Mon Sep 17 00:00:00 2001 From: averne Date: Sat, 29 Nov 2025 17:25:17 +0100 Subject: [PATCH 2/2] vulkan/prores: normalize coefficients during IDCT This allows increased internal precision. In addition, we can introduce an offset to the DC coefficient during the second IDCT step, to remove a per-element addition in the output codepath. Finally, by processing columns first we can remove the barrier after loading coefficients. Signed-off-by: averne --- libavcodec/vulkan/prores_idct.comp | 57 +++++++++++++++++++----------- 1 file changed, 37 insertions(+), 20 deletions(-) diff --git a/libavcodec/vulkan/prores_idct.comp b/libavcodec/vulkan/prores_idct.comp index 5d0d41cfa5..5eef61e57a 100644 --- a/libavcodec/vulkan/prores_idct.comp +++ b/libavcodec/vulkan/prores_idct.comp @@ -37,19 +37,27 @@ void put_px(uint tex_idx, ivec2 pos, uint v) #endif } -const float idct_8x8_scales[] = { - 0.353553390593274f, // cos(4 * pi/16) / 2 - 0.490392640201615f, // cos(1 * pi/16) / 2 - 0.461939766255643f, // cos(2 * pi/16) / 2 - 0.415734806151273f, // cos(3 * pi/16) / 2 - 0.353553390593274f, // cos(4 * pi/16) / 2 - 0.277785116509801f, // cos(5 * pi/16) / 2 - 0.191341716182545f, // cos(6 * pi/16) / 2 - 0.097545161008064f, // cos(7 * pi/16) / 2 +const float idct_scale[64] = { + 0.1250000000000000, 0.1733799806652684, 0.1633203706095471, 0.1469844503024199, + 0.1250000000000000, 0.0982118697983878, 0.0676495125182746, 0.0344874224103679, + 0.1733799806652684, 0.2404849415639108, 0.2265318615882219, 0.2038732892122293, + 0.1733799806652684, 0.1362237766939547, 0.0938325693794663, 0.0478354290456362, + 0.1633203706095471, 0.2265318615882219, 0.2133883476483184, 0.1920444391778541, + 0.1633203706095471, 0.1283199917898342, 0.0883883476483185, 0.0450599888754343, + 0.1469844503024199, 0.2038732892122293, 0.1920444391778541, 0.1728354290456362, + 0.1469844503024199, 0.1154849415639109, 0.0795474112858021, 0.0405529186026822, + 0.1250000000000000, 0.1733799806652684, 0.1633203706095471, 0.1469844503024199, + 0.1250000000000000, 0.0982118697983878, 0.0676495125182746, 0.0344874224103679, + 0.0982118697983878, 0.1362237766939547, 0.1283199917898342, 0.1154849415639109, + 0.0982118697983878, 0.0771645709543638, 0.0531518809229535, 0.0270965939155924, + 0.0676495125182746, 0.0938325693794663, 0.0883883476483185, 0.0795474112858021, + 0.0676495125182746, 0.0531518809229535, 0.0366116523516816, 0.0186644585125857, + 0.0344874224103679, 0.0478354290456362, 0.0450599888754343, 0.0405529186026822, + 0.0344874224103679, 0.0270965939155924, 0.0186644585125857, 0.0095150584360892, }; /* 7.4 Inverse Transform */ -void idct(uint block, uint offset, uint stride) +void idct8(uint block, uint offset, uint stride) { float t0, t1, t2, t3, t4, t5, t6, t7, u8; float u0, u1, u2, u3, u4, u5, u6, u7; @@ -117,6 +125,12 @@ void main(void) uint chroma_shift = comp != 0 ? log2_chroma_w : 0; bool act = gid.x < mb_width << (4 - chroma_shift); + /** + * Normalize coefficients to [-1, 1] for increased precision during the iDCT. + * DCT coeffs have the range of a 12-bit signed integer (7.4 Inverse Transform). + */ + const float norm = 1.0f / (1 << 11); + /* Coalesced load of DCT coeffs in shared memory, inverse quantization */ if (act) { /** @@ -131,28 +145,31 @@ void main(void) int qscale = qidx > 128 ? (qidx - 96) << 2 : qidx; [[unroll]] for (uint i = 0; i < 8; ++i) { + uint cidx = (i << 3) + idx; int c = sign_extend(int(get_px(comp, ivec2(gid.x, (gid.y << 3) + i))), 16); - float v = float(c * qscale * int(qmat[(i << 3) + idx])); - blocks[block][i * 9 + idx] = v * idct_8x8_scales[idx] * idct_8x8_scales[i]; + float v = float(c * qscale * int(qmat[cidx])) * norm; + blocks[block][i * 9 + idx] = v * idct_scale[cidx]; } } - /* Row-wise iDCT */ - barrier(); - idct(block, idx * 9, 1); - /* Column-wise iDCT */ + idct8(block, idx, 9); barrier(); - idct(block, idx, 9); - float fact = 1.0f / (1 << (12 - depth)), off = 1 << (depth - 1); + /* Remap [-1, 1] to [0, 2] to remove a per-element addition in the output loop */ + blocks[block][idx * 9] += 1.0f; + + /* Row-wise iDCT */ + idct8(block, idx * 9, 1); + barrier(); + + float fact = 1 << (depth - 1); int maxv = (1 << depth) - 1; /* 7.5.1 Color Component Samples. Rescale, clamp and write back to global memory */ - barrier(); if (act) { [[unroll]] for (uint i = 0; i < 8; ++i) { - float v = round(blocks[block][i * 9 + idx] * fact + off); + float v = round(blocks[block][i * 9 + idx] * fact); put_px(comp, ivec2(gid.x, (gid.y << 3) + i), clamp(int(v), 0, maxv)); } } -- 2.49.1 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org