From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 6DDEE43478 for ; Mon, 13 Jun 2022 16:38:59 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id F394068B62B; Mon, 13 Jun 2022 19:38:56 +0300 (EEST) Received: from smtp-fw-9103.amazon.com (smtp-fw-9103.amazon.com [207.171.188.200]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6D09968AB35 for ; Mon, 13 Jun 2022 19:38:49 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1655138334; x=1686674334; h=from:to:cc:subject:date:message-id: content-transfer-encoding:mime-version; bh=xqZwsldyO4xOTh589dhgppxvWO9e7VWJOKnDnvZqcb0=; b=WtKzfY/QBPclZAzONmnRBv6EO9Ul7TzdjKaRv4MKUjQwcwA0FvYoIeuC iZehxU6LyI2DyH75O/erAGJZN2I+paaOVdtFO71c28jYdPEZMTW/vd13t 7JSF9Wyzcmy+nmdox5XxaG7cL+jKuugYNX5cey6NCagpIu7BsmAf4z6Jc Y=; X-IronPort-AV: E=Sophos;i="5.91,297,1647302400"; d="scan'208";a="1024041805" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO email-inbound-relay-pdx-2c-a264e6fe.us-west-2.amazon.com) ([10.25.36.210]) by smtp-border-fw-9103.sea19.amazon.com with ESMTP; 13 Jun 2022 16:36:25 +0000 Received: from EX13MTAUWB001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan2.pdx.amazon.com [10.236.137.194]) by email-inbound-relay-pdx-2c-a264e6fe.us-west-2.amazon.com (Postfix) with ESMTPS id 7915A42EEC; Mon, 13 Jun 2022 16:36:25 +0000 (UTC) Received: from EX13D01UWB002.ant.amazon.com (10.43.161.136) by EX13MTAUWB001.ant.amazon.com (10.43.161.249) with Microsoft SMTP Server (TLS) id 15.0.1497.36; Mon, 13 Jun 2022 16:36:24 +0000 Received: from EX13D07UWB004.ant.amazon.com (10.43.161.196) by EX13d01UWB002.ant.amazon.com (10.43.161.136) with Microsoft SMTP Server (TLS) id 15.0.1497.36; Mon, 13 Jun 2022 16:36:24 +0000 Received: from EX13D07UWB004.ant.amazon.com ([10.43.161.196]) by EX13D07UWB004.ant.amazon.com ([10.43.161.196]) with mapi id 15.00.1497.036; Mon, 13 Jun 2022 16:36:24 +0000 From: "Swinney, Jonathan" To: "ffmpeg-devel@ffmpeg.org" Thread-Topic: [PATCH 1/2] checkasm: updated tests for sw_scale Thread-Index: Adh/OimWmsWhA9bqTbu/vZlhya3PPw== Date: Mon, 13 Jun 2022 16:36:24 +0000 Message-ID: <005de8b06dea40c4a60fdad9a084138f@EX13D07UWB004.ant.amazon.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.43.160.132] MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] checkasm: updated tests for sw_scale X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: =?utf-8?B?TWFydGluIFN0b3JzasO2?= , "J. Dekker" , "Pop, Sebastian" Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: - added a test for yuv2plane1 (currently disabled for x86_64) - fixed test for yuv2planeX for aarch64 which was previously not working at all Signed-off-by: Jonathan Swinney --- tests/checkasm/sw_scale.c | 176 +++++++++++++++++++++++++++++++++----- 1 file changed, 156 insertions(+), 20 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index 31d9a525e9..537cbd3265 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_scale.c @@ -35,12 +35,13 @@ AV_WN32(buf + j, rnd()); \ } while (0) -// This reference function is the same approximate algorithm employed by the -// SIMD functions -static void ref_function(const int16_t *filter, int filterSize, - const int16_t **src, uint8_t *dest, int dstW, - const uint8_t *dither, int offset) +static void yuv2planeX_8_ref(const int16_t *filter, int filterSize, + const int16_t **src, uint8_t *dest, int dstW, + const uint8_t *dither, int offset) { +#if ARCH_X86_64 + // This reference function is the same approximate algorithm employed by the + // SIMD functions on x86. int i, d; d = ((filterSize - 1) * 8 + dither[0]) >> 4; for ( i = 0; i < dstW; i++) { @@ -56,6 +57,120 @@ static void ref_function(const int16_t *filter, int filterSize, } dest[i]= av_clip_uint8(val>>3); } +#else + // Other architectures use the default implementation as the reference. + int i; + for (i=0; i>19); + } +#endif +} +static void yuv2plane1_8_ref(const int16_t *src, uint8_t *dest, int dstW, + const uint8_t *dither, int offset) +{ + int i; + for (i=0; i> 7; + dest[i]= av_clip_uint8(val); + } +} + +static void print_data(uint8_t *p, size_t len, size_t offset) +{ + size_t i = 0; + for (; i < len; i++) { + if (i % 8 == 0) { + printf("0x%04lx: ", i+offset); + } + printf("0x%02x ", (uint32_t) p[i]); + if (i % 8 == 7) { + printf("\n"); + } + } + if (i % 8 != 0) { + printf("\n"); + } +} + +static size_t show_differences(uint8_t *a, uint8_t *b, size_t len) +{ + for (size_t i = 0; i < len; i++) { + if (a[i] != b[i]) { + size_t offset_of_mismatch = i; + size_t offset; + if (i >= 8) i-=8; + offset = i & (~7); + printf("test a:\n"); + print_data(&a[offset], 32, offset); + printf("\ntest b:\n"); + print_data(&b[offset], 32, offset); + printf("\n"); + return offset_of_mismatch; + } + } + return len; +} + +static void check_yuv2yuv1(void) +{ + struct SwsContext *ctx; + int osi, isi; + int dstW, offset; + size_t fail_offset; + const int input_sizes[] = {8, 24, 128, 144, 256, 512}; + const int INPUT_SIZES = sizeof(input_sizes)/sizeof(input_sizes[0]); + #define LARGEST_INPUT_SIZE 512 + + const int offsets[] = {0, 3, 8, 11, 16, 19}; + const int OFFSET_SIZES = sizeof(offsets)/sizeof(offsets[0]); + + declare_func_emms(AV_CPU_FLAG_MMX, void, + const int16_t *src, uint8_t *dest, + int dstW, const uint8_t *dither, int offset); + + LOCAL_ALIGNED_8(int16_t, src_pixels, [LARGEST_INPUT_SIZE]); + LOCAL_ALIGNED_8(uint8_t, dst0, [LARGEST_INPUT_SIZE]); + LOCAL_ALIGNED_8(uint8_t, dst1, [LARGEST_INPUT_SIZE]); + LOCAL_ALIGNED_8(uint8_t, dither, [8]); + + randomize_buffers((uint8_t*)dither, 8); + randomize_buffers((uint8_t*)src_pixels, LARGEST_INPUT_SIZE * sizeof(int16_t)); + ctx = sws_alloc_context(); + if (sws_init_context(ctx, NULL, NULL) < 0) + fail(); + + ff_sws_init_scale(ctx); + for(isi = 0; isi < INPUT_SIZES; ++isi){ + dstW = input_sizes[isi]; + for(osi = 0; osi < OFFSET_SIZES; osi++){ + offset = offsets[osi]; + if (check_func(ctx->yuv2plane1, "yuv2yuv1_%d_%d", offset, dstW)){ + memset(dst0, 0, LARGEST_INPUT_SIZE * sizeof(dst0[0])); + memset(dst1, 0, LARGEST_INPUT_SIZE * sizeof(dst1[0])); + + yuv2plane1_8_ref(src_pixels, dst0, dstW, dither, offset); + call_new(src_pixels, dst1, dstW, dither, offset); + if (memcmp(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0]))) { + fail(); + printf("failed: yuv2yuv1_%d_%d\n", offset, dstW); + fail_offset = show_differences(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0])); + printf("failing values: src: 0x%04x dither: 0x%02x dst-c: %02x dst-asm: %02x\n", + (int) src_pixels[fail_offset], + (int) dither[(fail_offset + fail_offset) & 7], + (int) dst0[fail_offset], + (int) dst1[fail_offset]); + } + if(dstW == LARGEST_INPUT_SIZE) + bench_new(src_pixels, dst1, dstW, dither, offset); + } + } + } + sws_freeContext(ctx); } static void check_yuv2yuvX(void) @@ -64,11 +179,11 @@ static void check_yuv2yuvX(void) int fsi, osi, isi, i, j; int dstW; #define LARGEST_FILTER 16 -#define FILTER_SIZES 4 - static const int filter_sizes[FILTER_SIZES] = {1, 4, 8, 16}; + const int filter_sizes[] = {1, 2, 3, 4, 8, 16}; + const int FILTER_SIZES = sizeof(filter_sizes)/sizeof(filter_sizes[0]); #define LARGEST_INPUT_SIZE 512 -#define INPUT_SIZES 6 - static const int input_sizes[INPUT_SIZES] = {8, 24, 128, 144, 256, 512}; + static const int input_sizes[] = {8, 24, 128, 144, 256, 512}; + const int INPUT_SIZES = sizeof(input_sizes)/sizeof(input_sizes[0]); declare_func_emms(AV_CPU_FLAG_MMX, void, const int16_t *filter, int filterSize, const int16_t **src, uint8_t *dest, @@ -95,7 +210,7 @@ static void check_yuv2yuvX(void) ff_sws_init_scale(ctx); for(isi = 0; isi < INPUT_SIZES; ++isi){ dstW = input_sizes[isi]; - for(osi = 0; osi < 64; osi += 16){ + for(osi = 0; osi < 1; osi += 16){ for(fsi = 0; fsi < FILTER_SIZES; ++fsi){ src = av_malloc(sizeof(int16_t*) * filter_sizes[fsi]); vFilterData = av_malloc((filter_sizes[fsi] + 2) * sizeof(union VFilterData)); @@ -110,18 +225,35 @@ static void check_yuv2yuvX(void) memset(dst0, 0, LARGEST_INPUT_SIZE * sizeof(dst0[0])); memset(dst1, 0, LARGEST_INPUT_SIZE * sizeof(dst1[0])); - // The reference function is not the scalar function selected when mmx - // is deactivated as the SIMD functions do not give the same result as - // the scalar ones due to rounding. The SIMD functions are activated by - // the flag SWS_ACCURATE_RND - ref_function(&filter_coeff[0], filter_sizes[fsi], src, dst0, dstW - osi, dither, osi); - // There's no point in calling new for the reference function - if(ctx->use_mmx_vfilter){ - call_new((const int16_t*)vFilterData, filter_sizes[fsi], src, dst1, dstW - osi, dither, osi); - if (memcmp(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0]))) + if (ARCH_X86_64) { + // The reference function is not the scalar function selected when mmx + // is deactivated as the SIMD functions do not give the same result as + // the scalar ones due to rounding. The SIMD functions are activated by + // the flag SWS_ACCURATE_RND + yuv2planeX_8_ref(&filter_coeff[0], filter_sizes[fsi], src, dst0, dstW - osi, dither, osi); + // There's no point in calling new for the reference function + if(ctx->use_mmx_vfilter) { + call_new((const int16_t*)vFilterData, filter_sizes[fsi], src, dst1, dstW - osi, dither, osi); + if (memcmp(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0]))) { + fail(); + printf("failed: yuv2yuvX_%d_%d_%d\n", filter_sizes[fsi], osi, dstW); + show_differences(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0])); + } + if(dstW == LARGEST_INPUT_SIZE) + bench_new((const int16_t*)vFilterData, filter_sizes[fsi], src, dst1, dstW - osi, dither, osi); + } + } + + if (ARCH_AARCH64) { + yuv2planeX_8_ref(&filter_coeff[0], filter_sizes[fsi], src, dst0, dstW - osi, dither, osi); + call_new(&filter_coeff[0], filter_sizes[fsi], src, dst1, dstW - osi, dither, osi); + if (memcmp(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0]))) { fail(); + printf("failed: yuv2yuvX_%d_%d_%d\n", filter_sizes[fsi], osi, dstW); + show_differences(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0])); + } if(dstW == LARGEST_INPUT_SIZE) - bench_new((const int16_t*)vFilterData, filter_sizes[fsi], src, dst1, dstW - osi, dither, osi); + bench_new(&filter_coeff[0], filter_sizes[fsi], src, dst1, dstW - osi, dither, osi); } } av_freep(&src); @@ -245,6 +377,10 @@ void checkasm_check_sw_scale(void) { check_hscale(); report("hscale"); + if (!ARCH_X86_64) { + check_yuv2yuv1(); + report("yuv2yuv1"); + } check_yuv2yuvX(); report("yuv2yuvX"); } -- 2.32.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".