From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 10B32444AE for ; Mon, 12 Dec 2022 20:08:45 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EC21C68BDBC; Mon, 12 Dec 2022 22:08:43 +0200 (EET) Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id BB5F968BD99 for ; Mon, 12 Dec 2022 22:08:37 +0200 (EET) Received: by mail-yb1-f201.google.com with SMTP id t9-20020a5b03c9000000b006cff5077dc9so14080780ybp.3 for ; Mon, 12 Dec 2022 12:08:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=ljXbPDNg3dxFHfocsFyEAYnHIkLkBM0V3pp1DZjy6AM=; b=f/v/fklrWghZkThoxpAPdzU4TSsp3fWJiAMey5h5q/cxMCyPofxR7Qj8/CK6sIptKg 9DpRwHYRpLzZx8VPfC2GsuvH26XwBiTvQY1BoaJmeZZ4y8L8KT+4nYmzl7ZBPh7yOWD0 7cI4/rb7QrcgsSNJPCCATwCJ6xYZAGOxJnP3YLZPnWVzkLh5m5NKfYss5bqwoRhthN7g TLh78nlXkC/FFuY285ZZjwaqPwKvFzgUX/66La7wntEWmFU7m+v1+eqoyTUghuiazsTH eCHs+8s+crREs63PTgN1t0Os3YmiReqSIhwQqgQNKYLinT5q5y4AcFJDjVkwFB1UX8xT 4bXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ljXbPDNg3dxFHfocsFyEAYnHIkLkBM0V3pp1DZjy6AM=; b=KM4mXKrxC9BLbPbtsYl9JOZJYVWqaTdvkEZN8rxe/ZIAeOzkltCR+5Bij/kNJhE+7o hkupoV6xHU5qi/ZPqZjiT9vj95i3DMiSoe6BkQ2Q8Lrdzu7is/ifuJmnFLvlJ8GAUWln Dmz/6/cvHz7efS9PYXdnTgKrOFYYmdv76EqY7pqDJYeMDAtPlBQBMPLf8WsmmIlpm/s4 wCLpdG/S0wRSx315OW+TGn4XY3p+LqsLuGKuMoQyMV0Rsxzi9dZM28R5x1HiaJRudniq CkRzG/l8jTcP/b17vorKwCoKsLJRRpulh3BTksIBido/0hZcwyg5BIRoZEhttUCh+J9E +lSA== X-Gm-Message-State: ANoB5pnqM7cXxKX3UHrzp6QfF4NINhINlFuYlQXxBf9Bf7So3j7nWDT+ kKijOR9WOsLgETVTRGtgKHmQjK0wzYfYYZSVSZ202NGWSwifE50mXLgf1VKzz7eU9KtxLaRQPPD qTdsqGCM6B5sC0nMlsiI3YsQZSdIlwiUcWLNYqyRQx+/bpS3v6ho+UC7QSiH0sWCZFQ== X-Google-Smtp-Source: AA0mqf40yL4TdHhAx5niRiKPJ6EVIZE3Ni3H9+/VwLLCgd38bMHF9a4qaI+P99a1dfq+v30jwQI7Aomc8ZZZ X-Received: from zaffrezebra.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2d40]) (user=asdunne job=sendgmr) by 2002:a05:690c:445:b0:3b7:a71f:66c9 with SMTP id bj5-20020a05690c044500b003b7a71f66c9mr1907543ywb.295.1670875716289; Mon, 12 Dec 2022 12:08:36 -0800 (PST) Date: Mon, 12 Dec 2022 20:08:28 +0000 Mime-Version: 1.0 X-Mailer: git-send-email 2.39.0.rc1.256.g54fd8350bd-goog Message-ID: <20221212200828.329839-1-asdunne@google.com> From: Drew Dunne To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH] swscale/output: Use av_sat_add32 in yuv2rgba64 templates to avoid underflow X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Drew Dunne Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Previously I sent this patch to solve an overflow in these templates. That patch wasn't used in favor of one that biased the output calculations to avoid the double clipping. Now I've found an underflow case, which I've put the command below, and I'll attach an input image in a reply. ./ffmpeg \ -f rawvideo -video_size 64x64 -pixel_format yuva420p10le \ -i yuv2rgb_underflow_w64h64.yuva \ -filter_complex \ "scale=flags=bicubic+full_chroma_int+full_chroma_inp+bitexact+accurate_rnd:in_color_matrix=bt2020:out_color_matrix=bt2020:in_range=mpeg:out_range=mpeg,format=rgba64[out]" \ -f rawvideo -codec:v:0 rawvideo -pixel_format rgba64 -map '[out]' \ -y underflow_w64h64.rgba64 The previous overflow case was in this thread: https://ffmpeg.org/pipermail/ffmpeg-devel/2022-November/303532.html --- libswscale/output.c | 96 ++++++++++++++++++++++----------------------- 1 file changed, 48 insertions(+), 48 deletions(-) diff --git a/libswscale/output.c b/libswscale/output.c index 5c85bff971..8abf043d73 100644 --- a/libswscale/output.c +++ b/libswscale/output.c @@ -1109,20 +1109,20 @@ yuv2rgba64_X_c_template(SwsContext *c, const int16_t *lumFilter, B = U * c->yuv2rgb_u2b_coeff; // 8 bits: 30 - 22 = 8 bits, 16 bits: 30 bits - 14 = 16 bits - output_pixel(&dest[0], av_clip_uintp2(((R_B + Y1) >> 14) + (1<<15), 16)); - output_pixel(&dest[1], av_clip_uintp2((( G + Y1) >> 14) + (1<<15), 16)); - output_pixel(&dest[2], av_clip_uintp2(((B_R + Y1) >> 14) + (1<<15), 16)); + output_pixel(&dest[0], av_clip_uintp2(((av_sat_add32(R_B, Y1)) >> 14) + (1<<15), 16)); + output_pixel(&dest[1], av_clip_uintp2(((av_sat_add32( G, Y1)) >> 14) + (1<<15), 16)); + output_pixel(&dest[2], av_clip_uintp2(((av_sat_add32(B_R, Y1)) >> 14) + (1<<15), 16)); if (eightbytes) { output_pixel(&dest[3], av_clip_uintp2(A1 , 30) >> 14); - output_pixel(&dest[4], av_clip_uintp2(((R_B + Y2) >> 14) + (1<<15), 16)); - output_pixel(&dest[5], av_clip_uintp2((( G + Y2) >> 14) + (1<<15), 16)); - output_pixel(&dest[6], av_clip_uintp2(((B_R + Y2) >> 14) + (1<<15), 16)); + output_pixel(&dest[4], av_clip_uintp2(((av_sat_add32(R_B, Y2)) >> 14) + (1<<15), 16)); + output_pixel(&dest[5], av_clip_uintp2(((av_sat_add32( G, Y2)) >> 14) + (1<<15), 16)); + output_pixel(&dest[6], av_clip_uintp2(((av_sat_add32(B_R, Y2)) >> 14) + (1<<15), 16)); output_pixel(&dest[7], av_clip_uintp2(A2 , 30) >> 14); dest += 8; } else { - output_pixel(&dest[3], av_clip_uintp2(((R_B + Y2) >> 14) + (1<<15), 16)); - output_pixel(&dest[4], av_clip_uintp2((( G + Y2) >> 14) + (1<<15), 16)); - output_pixel(&dest[5], av_clip_uintp2(((B_R + Y2) >> 14) + (1<<15), 16)); + output_pixel(&dest[3], av_clip_uintp2(((av_sat_add32(R_B, Y2)) >> 14) + (1<<15), 16)); + output_pixel(&dest[4], av_clip_uintp2(((av_sat_add32( G, Y2)) >> 14) + (1<<15), 16)); + output_pixel(&dest[5], av_clip_uintp2(((av_sat_add32(B_R, Y2)) >> 14) + (1<<15), 16)); dest += 6; } } @@ -1175,20 +1175,20 @@ yuv2rgba64_2_c_template(SwsContext *c, const int32_t *buf[2], A2 += 1 << 13; } - output_pixel(&dest[0], av_clip_uintp2(((R_B + Y1) >> 14) + (1<<15), 16)); - output_pixel(&dest[1], av_clip_uintp2((( G + Y1) >> 14) + (1<<15), 16)); - output_pixel(&dest[2], av_clip_uintp2(((B_R + Y1) >> 14) + (1<<15), 16)); + output_pixel(&dest[0], av_clip_uintp2(((av_sat_add32(R_B, Y1)) >> 14) + (1<<15), 16)); + output_pixel(&dest[1], av_clip_uintp2(((av_sat_add32( G, Y1)) >> 14) + (1<<15), 16)); + output_pixel(&dest[2], av_clip_uintp2(((av_sat_add32(B_R, Y1)) >> 14) + (1<<15), 16)); if (eightbytes) { output_pixel(&dest[3], av_clip_uintp2(A1 , 30) >> 14); - output_pixel(&dest[4], av_clip_uintp2(((R_B + Y2) >> 14) + (1<<15), 16)); - output_pixel(&dest[5], av_clip_uintp2((( G + Y2) >> 14) + (1<<15), 16)); - output_pixel(&dest[6], av_clip_uintp2(((B_R + Y2) >> 14) + (1<<15), 16)); + output_pixel(&dest[4], av_clip_uintp2(((av_sat_add32(R_B, Y2)) >> 14) + (1<<15), 16)); + output_pixel(&dest[5], av_clip_uintp2(((av_sat_add32( G, Y2)) >> 14) + (1<<15), 16)); + output_pixel(&dest[6], av_clip_uintp2(((av_sat_add32(B_R, Y2)) >> 14) + (1<<15), 16)); output_pixel(&dest[7], av_clip_uintp2(A2 , 30) >> 14); dest += 8; } else { - output_pixel(&dest[3], av_clip_uintp2(((R_B + Y2) >> 14) + (1<<15), 16)); - output_pixel(&dest[4], av_clip_uintp2((( G + Y2) >> 14) + (1<<15), 16)); - output_pixel(&dest[5], av_clip_uintp2(((B_R + Y2) >> 14) + (1<<15), 16)); + output_pixel(&dest[3], av_clip_uintp2(((av_sat_add32(R_B, Y2)) >> 14) + (1<<15), 16)); + output_pixel(&dest[4], av_clip_uintp2(((av_sat_add32( G, Y2)) >> 14) + (1<<15), 16)); + output_pixel(&dest[5], av_clip_uintp2(((av_sat_add32(B_R, Y2)) >> 14) + (1<<15), 16)); dest += 6; } } @@ -1232,20 +1232,20 @@ yuv2rgba64_1_c_template(SwsContext *c, const int32_t *buf0, G = V * c->yuv2rgb_v2g_coeff + U * c->yuv2rgb_u2g_coeff; B = U * c->yuv2rgb_u2b_coeff; - output_pixel(&dest[0], av_clip_uintp2(((R_B + Y1) >> 14) + (1<<15), 16)); - output_pixel(&dest[1], av_clip_uintp2((( G + Y1) >> 14) + (1<<15), 16)); - output_pixel(&dest[2], av_clip_uintp2(((B_R + Y1) >> 14) + (1<<15), 16)); + output_pixel(&dest[0], av_clip_uintp2(((av_sat_add32(R_B, Y1)) >> 14) + (1<<15), 16)); + output_pixel(&dest[1], av_clip_uintp2(((av_sat_add32( G, Y1)) >> 14) + (1<<15), 16)); + output_pixel(&dest[2], av_clip_uintp2(((av_sat_add32(B_R, Y1)) >> 14) + (1<<15), 16)); if (eightbytes) { output_pixel(&dest[3], av_clip_uintp2(A1 , 30) >> 14); - output_pixel(&dest[4], av_clip_uintp2(((R_B + Y2) >> 14) + (1<<15), 16)); - output_pixel(&dest[5], av_clip_uintp2((( G + Y2) >> 14) + (1<<15), 16)); - output_pixel(&dest[6], av_clip_uintp2(((B_R + Y2) >> 14) + (1<<15), 16)); + output_pixel(&dest[4], av_clip_uintp2(((av_sat_add32(R_B, Y2)) >> 14) + (1<<15), 16)); + output_pixel(&dest[5], av_clip_uintp2(((av_sat_add32( G, Y2)) >> 14) + (1<<15), 16)); + output_pixel(&dest[6], av_clip_uintp2(((av_sat_add32(B_R, Y2)) >> 14) + (1<<15), 16)); output_pixel(&dest[7], av_clip_uintp2(A2 , 30) >> 14); dest += 8; } else { - output_pixel(&dest[3], av_clip_uintp2(((R_B + Y2) >> 14) + (1<<15), 16)); - output_pixel(&dest[4], av_clip_uintp2((( G + Y2) >> 14) + (1<<15), 16)); - output_pixel(&dest[5], av_clip_uintp2(((B_R + Y2) >> 14) + (1<<15), 16)); + output_pixel(&dest[3], av_clip_uintp2(((av_sat_add32(R_B, Y2)) >> 14) + (1<<15), 16)); + output_pixel(&dest[4], av_clip_uintp2(((av_sat_add32( G, Y2)) >> 14) + (1<<15), 16)); + output_pixel(&dest[5], av_clip_uintp2(((av_sat_add32(B_R, Y2)) >> 14) + (1<<15), 16)); dest += 6; } } @@ -1278,20 +1278,20 @@ yuv2rgba64_1_c_template(SwsContext *c, const int32_t *buf0, G = V * c->yuv2rgb_v2g_coeff + U * c->yuv2rgb_u2g_coeff; B = U * c->yuv2rgb_u2b_coeff; - output_pixel(&dest[0], av_clip_uintp2(((R_B + Y1) >> 14) + (1<<15), 16)); - output_pixel(&dest[1], av_clip_uintp2((( G + Y1) >> 14) + (1<<15), 16)); - output_pixel(&dest[2], av_clip_uintp2(((B_R + Y1) >> 14) + (1<<15), 16)); + output_pixel(&dest[0], av_clip_uintp2(((av_sat_add32(R_B, Y1)) >> 14) + (1<<15), 16)); + output_pixel(&dest[1], av_clip_uintp2(((av_sat_add32( G, Y1)) >> 14) + (1<<15), 16)); + output_pixel(&dest[2], av_clip_uintp2(((av_sat_add32(B_R, Y1)) >> 14) + (1<<15), 16)); if (eightbytes) { output_pixel(&dest[3], av_clip_uintp2(A1 , 30) >> 14); - output_pixel(&dest[4], av_clip_uintp2(((R_B + Y2) >> 14) + (1<<15), 16)); - output_pixel(&dest[5], av_clip_uintp2((( G + Y2) >> 14) + (1<<15), 16)); - output_pixel(&dest[6], av_clip_uintp2(((B_R + Y2) >> 14) + (1<<15), 16)); + output_pixel(&dest[4], av_clip_uintp2(((av_sat_add32(R_B, Y2)) >> 14) + (1<<15), 16)); + output_pixel(&dest[5], av_clip_uintp2(((av_sat_add32( G, Y2)) >> 14) + (1<<15), 16)); + output_pixel(&dest[6], av_clip_uintp2(((av_sat_add32(B_R, Y2)) >> 14) + (1<<15), 16)); output_pixel(&dest[7], av_clip_uintp2(A2 , 30) >> 14); dest += 8; } else { - output_pixel(&dest[3], av_clip_uintp2(((R_B + Y2) >> 14) + (1<<15), 16)); - output_pixel(&dest[4], av_clip_uintp2((( G + Y2) >> 14) + (1<<15), 16)); - output_pixel(&dest[5], av_clip_uintp2(((B_R + Y2) >> 14) + (1<<15), 16)); + output_pixel(&dest[3], av_clip_uintp2(((av_sat_add32(R_B, Y2)) >> 14) + (1<<15), 16)); + output_pixel(&dest[4], av_clip_uintp2(((av_sat_add32( G, Y2)) >> 14) + (1<<15), 16)); + output_pixel(&dest[5], av_clip_uintp2(((av_sat_add32(B_R, Y2)) >> 14) + (1<<15), 16)); dest += 6; } } @@ -1351,9 +1351,9 @@ yuv2rgba64_full_X_c_template(SwsContext *c, const int16_t *lumFilter, B = U * c->yuv2rgb_u2b_coeff; // 8bit: 30 - 22 = 8bit, 16bit: 30bit - 14 = 16bit - output_pixel(&dest[0], av_clip_uintp2(((R_B + Y)>>14) + (1<<15), 16)); - output_pixel(&dest[1], av_clip_uintp2((( G + Y)>>14) + (1<<15), 16)); - output_pixel(&dest[2], av_clip_uintp2(((B_R + Y)>>14) + (1<<15), 16)); + output_pixel(&dest[0], av_clip_uintp2(((av_sat_add32(R_B, Y))>>14) + (1<<15), 16)); + output_pixel(&dest[1], av_clip_uintp2(((av_sat_add32( G, Y))>>14) + (1<<15), 16)); + output_pixel(&dest[2], av_clip_uintp2(((av_sat_add32(B_R, Y))>>14) + (1<<15), 16)); if (eightbytes) { output_pixel(&dest[3], av_clip_uintp2(A, 30) >> 14); dest += 4; @@ -1404,9 +1404,9 @@ yuv2rgba64_full_2_c_template(SwsContext *c, const int32_t *buf[2], A += 1 << 13; } - output_pixel(&dest[0], av_clip_uintp2(((R_B + Y) >> 14) + (1<<15), 16)); - output_pixel(&dest[1], av_clip_uintp2((( G + Y) >> 14) + (1<<15), 16)); - output_pixel(&dest[2], av_clip_uintp2(((B_R + Y) >> 14) + (1<<15), 16)); + output_pixel(&dest[0], av_clip_uintp2(((av_sat_add32(R_B, Y))>>14) + (1<<15), 16)); + output_pixel(&dest[1], av_clip_uintp2(((av_sat_add32( G, Y))>>14) + (1<<15), 16)); + output_pixel(&dest[2], av_clip_uintp2(((av_sat_add32(B_R, Y))>>14) + (1<<15), 16)); if (eightbytes) { output_pixel(&dest[3], av_clip_uintp2(A, 30) >> 14); dest += 4; @@ -1448,9 +1448,9 @@ yuv2rgba64_full_1_c_template(SwsContext *c, const int32_t *buf0, G = V * c->yuv2rgb_v2g_coeff + U * c->yuv2rgb_u2g_coeff; B = U * c->yuv2rgb_u2b_coeff; - output_pixel(&dest[0], av_clip_uintp2(((R_B + Y) >> 14) + (1<<15), 16)); - output_pixel(&dest[1], av_clip_uintp2((( G + Y) >> 14) + (1<<15), 16)); - output_pixel(&dest[2], av_clip_uintp2(((B_R + Y) >> 14) + (1<<15), 16)); + output_pixel(&dest[0], av_clip_uintp2(((av_sat_add32(R_B, Y))>>14) + (1<<15), 16)); + output_pixel(&dest[1], av_clip_uintp2(((av_sat_add32( G, Y))>>14) + (1<<15), 16)); + output_pixel(&dest[2], av_clip_uintp2(((av_sat_add32(B_R, Y))>>14) + (1<<15), 16)); if (eightbytes) { output_pixel(&dest[3], av_clip_uintp2(A, 30) >> 14); dest += 4; @@ -1481,9 +1481,9 @@ yuv2rgba64_full_1_c_template(SwsContext *c, const int32_t *buf0, G = V * c->yuv2rgb_v2g_coeff + U * c->yuv2rgb_u2g_coeff; B = U * c->yuv2rgb_u2b_coeff; - output_pixel(&dest[0], av_clip_uintp2(((R_B + Y) >> 14) + (1<<15), 16)); - output_pixel(&dest[1], av_clip_uintp2((( G + Y) >> 14) + (1<<15), 16)); - output_pixel(&dest[2], av_clip_uintp2(((B_R + Y) >> 14) + (1<<15), 16)); + output_pixel(&dest[0], av_clip_uintp2(((av_sat_add32(R_B, Y))>>14) + (1<<15), 16)); + output_pixel(&dest[1], av_clip_uintp2(((av_sat_add32( G, Y))>>14) + (1<<15), 16)); + output_pixel(&dest[2], av_clip_uintp2(((av_sat_add32(B_R, Y))>>14) + (1<<15), 16)); if (eightbytes) { output_pixel(&dest[3], av_clip_uintp2(A, 30) >> 14); dest += 4; -- 2.39.0.rc1.256.g54fd8350bd-goog _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".