From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 6C0884BE09 for ; Sun, 2 Nov 2025 19:28:07 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'DPMcAlKSvxeEEx2GbS0UYTctSXDtd7yv/eWaaFvoteo=', expected b'INoYvUgGvyzwaHo6TxIOt7ffy8AKA5B3oL8bpJXj9K4=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1762111676; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=DPMcAlKSvxeEEx2GbS0UYTctSXDtd7yv/eWaaFvoteo=; b=Q84OMv7Q1IrQfheObzaxv2llWOgCqHuCDDZ/fvX/2WlMolIZHutPXWYlmPYid6iFKezVK r8W9PluggWM5FP8WKuA1g8QVfv5VzlKZjWKY+GOS9XEmcc7iNwhGtRcWUkSFEU6Uqu4sFiX OXa7sFF9+KQpfUPGg5WPM/GJYVvimrYHFj+LHX3sTDOF9U/v5Ieap7C2Ffr0mdD6RXFrLEb /8WpNOlEi7TsFqHkiljt9qIvjq165uRXgnu1OXFal9mjXA03GEh1rY/uE4yLhJII/icoJLe XAWI5NDIR+2tBjta2sQUi8j+/OrQPgeXruul5ohpY/N3ZpSQC9bWa5TjM/og== Received: from [172.19.0.2] (unknown [172.19.0.2]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 18FED68F93F; Sun, 2 Nov 2025 21:27:56 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1762111659; b=TFc33lNHY3AbtSReccHhsAE2f+F41KypEQ10PsSw8rq/CPRXN7sg8RdoGmy/Rx0+IFZfu DL10dKyfaayhdTnZHSOvtbtUoR9kGabyxojTQNxHJeJjpzKPFKN+IowAZf1tF6azMjgpWoJ dT5mDgl+XdBmohucumyl70CJFyxYb6nROe+Paa0AgAe+/7Rso6v23SP3GQgNqPicLuMzsE4 rTCSrciiDv/gyQ0hOngfibnb7GC6EdKCoAkYWTVnu23iJdzPejaBagR0fXow46KkebzfuZ4 46wO9BUjJoloji9DIfW5p6OEo7t63IxuUj2MpqB8nuRlh3g4mglJ3SAUY/uA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1762111659; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=dbhxoRIq720jvNK1mqdWQgVgGs4K/dp67glAzdDNbh0=; b=pSE05TY8wQNJqCnOXY+Yt07EivzLmyQH4swQoaCsYfySTzPqX6kUvhSzjwDHpMyLgAn2F KmdDPlqsXE6T4KkJ1p7pFt3Ug6PB1NZF6yvm8ZsuCJWQKsxVjoxxAUF/H/2N2raJ5SqB6BC DDkXbWtqYKLKohVAFlG57EjcybnRvH6JmHFJLGkNFG5fqG+HfdXTENMewIF7GKz7yaqegzU hlOxwAji8zszONgr5B+gFz1XdGWcZ1sauG9MZfSFqx9eBvdykS3pZgSehQEbPiZGZ4t/Im5 1ZXAnEgIUhcIUdiSxW34RaeB6k+neibzg7ddap9oW+PUtkwWEMu+zpMLb/8Q== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1762111653; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=INoYvUgGvyzwaHo6TxIOt7ffy8AKA5B3oL8bpJXj9K4=; b=ba01h2xbkWoFkWDBojw3dgRPyGYJe0+CN0+QguDln2F9xNf41vuu20GEeFPK8XDzd1tsp 5Vx68C1LEcVxayrxj95j9r2dxubM5t+DbdHh4VV9qEEY66JoKdXe/sozLNoC9d1BsPLQSIo 7bL0FaeHsfmmBBzpAHDUk88UQg5PYheJTbTNr3PFhefKqnzv08o9FHJAV/UWHPSyiXL7//4 CmzvBs6rhTZvyCGGVyn+aV2xsTUPIy7nxH6teIgYVQEPc2gq9mVdhyaKo3G18mKctwjOMDf LkevcMs/u4Wj5PxkflkAOetJhQfw69k89QLLBte3Sr7RLQDCqJszb1pQZ0/A== Received: from 02c22a36bd31 (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 96A3968F86A for ; Sun, 2 Nov 2025 21:27:33 +0200 (EET) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Sun, 02 Nov 2025 19:27:33 -0000 Message-ID: <176211165375.25.10975636295279296971@2cb04c0e5124> Message-ID-Hash: IOZ73RTRCCDUZ4BA4QK4FWZCUXLOWXAG X-Message-ID-Hash: IOZ73RTRCCDUZ4BA4QK4FWZCUXLOWXAG X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH] vulkan/prores: Adopt the same IDCT routine as the prores-raw hwaccel (PR #20819) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: averne via ffmpeg-devel Cc: averne Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #20819 opened by averne URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20819 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20819.patch The added rounding at the final output conforms to the SMPTE document and reduces the deviation against the software decoder. >>From 7639b6fd0cec3e7ae31f1d0c2d1fc491dbd937e5 Mon Sep 17 00:00:00 2001 From: averne Date: Sun, 2 Nov 2025 20:23:28 +0100 Subject: [PATCH] vulkan/prores: Adopt the same IDCT routine as the prores-raw hwaccel The added rounding at the final output conforms to the SMPTE document and reduces the deviation against the software decoder. --- libavcodec/vulkan/prores_idct.comp | 105 +++++++++++++++++++---------- 1 file changed, 68 insertions(+), 37 deletions(-) diff --git a/libavcodec/vulkan/prores_idct.comp b/libavcodec/vulkan/prores_idct.comp index 642fcb5bd5..8ad3b7f58b 100644 --- a/libavcodec/vulkan/prores_idct.comp +++ b/libavcodec/vulkan/prores_idct.comp @@ -37,47 +37,77 @@ void put_px(uint tex_idx, ivec2 pos, uint v) #endif } +const float idct_8x8_scales[] = { + 0.353553390593274f, // cos(4 * pi/16) / 2 + 0.490392640201615f, // cos(1 * pi/16) / 2 + 0.461939766255643f, // cos(2 * pi/16) / 2 + 0.415734806151273f, // cos(3 * pi/16) / 2 + 0.353553390593274f, // cos(4 * pi/16) / 2 + 0.277785116509801f, // cos(5 * pi/16) / 2 + 0.191341716182545f, // cos(6 * pi/16) / 2 + 0.097545161008064f, // cos(7 * pi/16) / 2 +}; + /* 7.4 Inverse Transform */ void idct(uint block, uint offset, uint stride) { - float c0 = blocks[block][0*stride + offset]; - float c1 = blocks[block][1*stride + offset]; - float c2 = blocks[block][2*stride + offset]; - float c3 = blocks[block][3*stride + offset]; - float c4 = blocks[block][4*stride + offset]; - float c5 = blocks[block][5*stride + offset]; - float c6 = blocks[block][6*stride + offset]; - float c7 = blocks[block][7*stride + offset]; + float t0, t1, t2, t3, t4, t5, t6, t7, u8; + float u0, u1, u2, u3, u4, u5, u6, u7; - float tmp1 = c6 * 1.4142134189605712891 + (c2 - c6); - float tmp2 = c6 * 1.4142134189605712891 - (c2 - c6); + /* Input */ + t0 = blocks[block][0*stride + offset]; + u4 = blocks[block][1*stride + offset]; + t2 = blocks[block][2*stride + offset]; + u6 = blocks[block][3*stride + offset]; + t1 = blocks[block][4*stride + offset]; + u5 = blocks[block][5*stride + offset]; + t3 = blocks[block][6*stride + offset]; + u7 = blocks[block][7*stride + offset]; - float a1 = (c0 + c4) * 0.35355341434478759766 + tmp1 * 0.46193981170654296875; - float a4 = (c0 + c4) * 0.35355341434478759766 - tmp1 * 0.46193981170654296875; + /* Embedded scaled inverse 4-point Type-II DCT */ + u0 = t0 + t1; + u1 = t0 - t1; + u3 = t2 + t3; + u2 = (t2 - t3)*(1.4142135623730950488016887242097f) - u3; + t0 = u0 + u3; + t3 = u0 - u3; + t1 = u1 + u2; + t2 = u1 - u2; - float a3 = (c0 - c4) * 0.35355341434478759766 + tmp2 * 0.19134169816970825195; - float a2 = (c0 - c4) * 0.35355341434478759766 - tmp2 * 0.19134169816970825195; + /* Embedded scaled inverse 4-point Type-IV DST */ + t5 = u5 + u6; + t6 = u5 - u6; + t7 = u4 + u7; + t4 = u4 - u7; + u7 = t7 + t5; + u5 = (t7 - t5)*(1.4142135623730950488016887242097f); + u8 = (t4 + t6)*(1.8477590650225735122563663787936f); + u4 = u8 - t4*(1.0823922002923939687994464107328f); + u6 = u8 - t6*(2.6131259297527530557132863468544f); + t7 = u7; + t6 = t7 - u6; + t5 = t6 + u5; + t4 = t5 - u4; - float tmp3 = (c3 - c5) * 0.70710682868957519531 + c7; - float tmp4 = (c3 - c5) * 0.70710682868957519531 - c7; + /* Butterflies */ + u0 = t0 + t7; + u7 = t0 - t7; + u6 = t1 + t6; + u1 = t1 - t6; + u2 = t2 + t5; + u5 = t2 - t5; + u4 = t3 + t4; + u3 = t3 - t4; - float tmp5 = (c5 - c7) * 1.4142134189605712891 + (c5 - c7) + (c1 - c3); - float tmp6 = (c5 - c7) * -1.4142134189605712891 + (c5 - c7) + (c1 - c3); - - float m1 = tmp3 * 2.6131260395050048828 + tmp5; - float m4 = tmp3 * -2.6131260395050048828 + tmp5; - - float m2 = tmp4 * 1.0823919773101806641 + tmp6; - float m3 = tmp4 * -1.0823919773101806641 + tmp6; - - blocks[block][0*stride + offset] = m1 * 0.49039259552955627441 + a1; - blocks[block][7*stride + offset] = m1 * -0.49039259552955627441 + a1; - blocks[block][1*stride + offset] = m2 * 0.41573479771614074707 + a2; - blocks[block][6*stride + offset] = m2 * -0.41573479771614074707 + a2; - blocks[block][2*stride + offset] = m3 * 0.27778509259223937988 + a3; - blocks[block][5*stride + offset] = m3 * -0.27778509259223937988 + a3; - blocks[block][3*stride + offset] = m4 * 0.097545139491558074951 + a4; - blocks[block][4*stride + offset] = m4 * -0.097545139491558074951 + a4; + /* Output */ + blocks[block][0*stride + offset] = u0; + blocks[block][1*stride + offset] = u1; + blocks[block][2*stride + offset] = u2; + blocks[block][3*stride + offset] = u3; + blocks[block][4*stride + offset] = u4; + blocks[block][5*stride + offset] = u5; + blocks[block][6*stride + offset] = u6; + blocks[block][7*stride + offset] = u7; } void main(void) @@ -90,14 +120,15 @@ void main(void) /* Coalesced load of DCT coeffs in shared memory, second part of inverse quantization */ if (act) { /** - * According to spec indexing an array in push constant memory with + * According to the VK spec indexing an array in push constant memory with * a non-dynamically uniform value is illegal ($15.9.1 in v1.4.326), * so copy the whole matrix locally. */ uint8_t[64] qmat = comp == 0 ? qmat_luma : qmat_chroma; [[unroll]] for (uint i = 0; i < 8; ++i) { - int v = sign_extend(int(get_px(comp, ivec2(gid.x, (gid.y << 3) | i))), 16); - blocks[block][i * 9 + idx] = float(v * int(qmat[(i << 3) + idx])); + int c = sign_extend(int(get_px(comp, ivec2(gid.x, (gid.y << 3) | i))), 16); + float v = float(c * int(qmat[(i << 3) + idx])); + blocks[block][i * 9 + idx] = v * idct_8x8_scales[idx] * idct_8x8_scales[i]; } } @@ -116,7 +147,7 @@ void main(void) barrier(); if (act) { [[unroll]] for (uint i = 0; i < 8; ++i) { - float v = blocks[block][i * 9 + idx] * fact + off; + float v = round(blocks[block][i * 9 + idx] * fact + off); put_px(comp, ivec2(gid.x, (gid.y << 3) | i), clamp(int(v), 0, maxv) << shift); } } -- 2.49.1 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org