From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 8FBC1421C6 for ; Tue, 29 Mar 2022 12:41:29 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E210368B247; Tue, 29 Mar 2022 15:41:25 +0300 (EEST) Received: from mail8.parnet.fi (mail8.parnet.fi [77.234.108.134]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 500F768B098 for ; Tue, 29 Mar 2022 15:41:20 +0300 (EEST) Received: from mail9.parnet.fi (mail9.parnet.fi [77.234.108.21]) by mail8.parnet.fi with ESMTP id 22TCfJl5022264-22TCfJl6022264; Tue, 29 Mar 2022 15:41:19 +0300 Received: from foo.martin.st (host-97-187.parnet.fi [77.234.97.187]) by mail9.parnet.fi (Postfix) with ESMTPS id 6600EA143A; Tue, 29 Mar 2022 15:41:19 +0300 (EEST) Date: Tue, 29 Mar 2022 15:41:18 +0300 (EEST) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: FFmpeg development discussions and patches In-Reply-To: <20220325185257.513933-3-bavison@riscosopen.org> Message-ID: References: <20220317185819.466470-1-bavison@riscosopen.org> <20220325185257.513933-1-bavison@riscosopen.org> <20220325185257.513933-3-bavison@riscosopen.org> MIME-Version: 1.0 X-FE-Policy-ID: 3:14:2:SYSTEM Subject: Re: [FFmpeg-devel] [PATCH 02/10] checkasm: Add vc1dsp inverse transform tests X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Ben Avison Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Fri, 25 Mar 2022, Ben Avison wrote: > This test deliberately doesn't exercise the full range of inputs described in > the committee draft VC-1 standard. It says: > > input coefficients in frequency domain, D, satisfy -2048 <= D < 2047 > intermediate coefficients, E, satisfy -4096 <= E < 4095 > fully inverse-transformed coefficients, R, satisfy -512 <= R < 511 > > For one thing, the inequalities look odd. Did they mean them to go the > other way round? That would make more sense because the equations generally > both add and subtract coefficients multiplied by constants, including powers > of 2. Requiring the most-negative values to be valid extends the number of > bits to represent the intermediate values just for the sake of that one case! > > For another thing, the extreme values don't look to occur in real streams - > both in my experience and supported by the following comment in the AArch32 > decoder: > > tNhalf is half of the value of tN (as described in vc1_inv_trans_8x8_c). > This is done because sometimes files have input that causes tN + tM to > overflow. To avoid this overflow, we compute tNhalf, then compute > tNhalf + tM (which doesn't overflow), and then we use vhadd to compute > (tNhalf + (tNhalf + tM)) >> 1 which does not overflow because it is > one instruction. > > My AArch64 decoder goes further than this. It calculates tNhalf and tM > then does an SRA (essentially a fused halve and add) to compute > (tN + tM) >> 1 without ever having to hold (tNhalf + tM) in a 16-bit element > without overflowing. It only encounters difficulties if either tNhalf or > tM overflow in isolation. > > I haven't had sight of the final standard, so it's possible that these > issues were dealt with during finalisation, which could explain the lack > of usage of extreme inputs in real streams. Or a preponderance of decoders > that only support 16-bit intermediate values in their inverse transforms > might have caused encoders to steer clear of such cases. > > I have effectively followed this approach in the test, and limited the > scale of the coefficients sufficient that both the existing AArch32 decoder > and my new AArch64 decoder both pass. > > Signed-off-by: Ben Avison > --- > tests/checkasm/vc1dsp.c | 258 ++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 258 insertions(+) The reasoning sounds sensible to me. I didn't try to follow the exact logic for how the input data is produced, but it seems reasonable. It'd be nice to unmacro the function and wrap it in a separate standalone test function like check_idct() in vp8dsp, check_itxfm in vp9dsp or check_idct in hevc_idct.c. You may want to have a deeply nested loop to check e.g. for (int w = 4; w <= 8; w += 4) { for (int h = 4; w <= 8; w += 4) { for (int dc = 0; dc <= 1; dc++) { if (w == 8 && h == 8 && dc == 0) continue; // Tested separately [... actual test ...] (or call a separate check_idct_func(w,h,dc) function to avoid unnecessarily deep indentation of a lot of code) } } } // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".