From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 52DAA49141 for ; Mon, 3 Jun 2024 18:47:13 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DBB3268D695; Mon, 3 Jun 2024 21:47:10 +0300 (EEST) Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1105068D30C for ; Mon, 3 Jun 2024 21:47:04 +0300 (EEST) Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-1f44b42d1caso35739315ad.0 for ; Mon, 03 Jun 2024 11:47:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717440421; x=1718045221; darn=ffmpeg.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id:from :to:cc:subject:date:message-id:reply-to; bh=HLegj1fbhPm1wfB32Zyz3GsmrfxlV1q4znGdmrqcKfg=; b=VQMLucoktcjkQ82iQ1aAf8kc703t+tY5CATG4fraxBqFRZbTAJ7WUsB3A2A2wcwYqm zmQZWsNciEplKHSnNJmmleHHsiBnnd1EvvdleCP4cjbVAi2bUNjMiDtdaQEjSR0IfWXq MTKRppxhMPnaI+T9gYErzZY+sjLPh19mIjXaYzv5LHUAR9mZVwFIrW0p+nNVbFRchZdp MG9VV4URTD8oEGT/PM3ojytZlAAegh1O8ovtdwBjlNUFKATDp4/xd7ZZvxpwi0UTNf9W Rv8QFbDFaCxtxsH0jExcSJhKy3fewZ4rULo7Z6IFLoQIMaOooC4NnGjwzgTOuqWSKJdh AetQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717440421; x=1718045221; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HLegj1fbhPm1wfB32Zyz3GsmrfxlV1q4znGdmrqcKfg=; b=FCpjTw4nCC3sgVgoAihf0CRwBGdk+Z6kISSLi0g9zN9VjSYmOUysRmlXfLuab6fB1I HGvzYQOlISywbavkx2DHgzysF7NoWnX2JCz5xEMEGDcyNMnYN2med8hW/RL3eR2nU1PV MusK7fVeN80wjLwl+/F8BRxABpjdsLVmUeISO3eeyRb0cAs9GPBVxznHpwLybK865Fbc uyraG8K3uzBz39IT1nRfuPdU4ZEpNcu4RpNiYctZbT55EPxjUDR2H9ogD3oOX1vVnxke ti6QZRJe1rGBo/87Z9dUUU4y/KYjzwMdkTjqXtLmJ49SSJaYAmxkFmjHrIp6UPwvqgJY peUw== X-Gm-Message-State: AOJu0YxkE6KBSlb3eWzHpP20lhdlZdwR0MJNbRySxN7XvUxZxJ3cWPmV WYWATuXexGf0cq4Os1dGeG/m9X0sKYvYjgmRcFpdsLW1+6b6n47hqvFcDA== X-Google-Smtp-Source: AGHT+IEF1B/HbLp9Mx6sXMSbrkLP0xHpON1lFbht6jM3wvuLQM1l6jXUe4CSzkiivGHGlOlI5Q1Xpg== X-Received: by 2002:a17:903:32d2:b0:1e5:3c5:55a5 with SMTP id d9443c01a7336-1f636fd9da4mr111666845ad.8.1717440421167; Mon, 03 Jun 2024 11:47:01 -0700 (PDT) Received: from [192.168.0.16] ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f632362c35sm67902385ad.91.2024.06.03.11.46.59 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 03 Jun 2024 11:47:00 -0700 (PDT) Message-ID: <76aa6de2-aef3-413e-b356-d5640d59910d@gmail.com> Date: Mon, 3 Jun 2024 15:47:06 -0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: ffmpeg-devel@ffmpeg.org References: Content-Language: en-US From: James Almer In-Reply-To: Subject: Re: [FFmpeg-devel] [WIP PATCH 1/2] checkasm/sw_rgb: test rgb24 to yuv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On 6/3/2024 10:02 AM, Zhao Zhili wrote: > From: Zhao Zhili > > --- > The test still failed on x86, but success on arm64 and longarch. > > I have tried to call rgb24ToY_c and ff_rgb24ToY_avx directly and > compare the results, they don't match. You're using an incomplete table. See below. > > https://github.com/quink-black/FFmpeg/actions/runs/9347753270 > https://patchwork.ffmpeg.org/project/ffmpeg/patch/tencent_90E6136AF5D6E919AEA9254393048855B305@qq.com/ > > tests/checkasm/sw_rgb.c | 123 ++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 123 insertions(+) > > diff --git a/tests/checkasm/sw_rgb.c b/tests/checkasm/sw_rgb.c > index 7cd815e5be..18fd4255a6 100644 > --- a/tests/checkasm/sw_rgb.c > +++ b/tests/checkasm/sw_rgb.c > @@ -24,6 +24,8 @@ > #include "libavutil/mem_internal.h" > > #include "libswscale/rgb2rgb.h" > +#include "libswscale/swscale.h" > +#include "libswscale/swscale_internal.h" > > #include "checkasm.h" > > @@ -41,6 +43,7 @@ static const struct {uint8_t w, h, s;} planes[] = { > > #define MAX_STRIDE 128 > #define MAX_HEIGHT 128 > +#define LARGEST_INPUT_SIZE 4096 > > static void check_shuffle_bytes(void * func, const char * report) > { > @@ -111,6 +114,120 @@ static void check_uyvy_to_422p(void) > } > } > > +static void check_rgb_to_y(void) > +{ > + struct SwsContext *ctx; > + static const int input_sizes[] = {8, 128, 1280, 1080, LARGEST_INPUT_SIZE}; > + int32_t rgb2yuv[9] = {0}; > + > + declare_func(void, uint8_t *dst, const uint8_t *src, > + const uint8_t *unused1, const uint8_t *unused2, int width, > + uint32_t *rgb2yuv, void *opq); > + > + LOCAL_ALIGNED_32(uint8_t, src, [LARGEST_INPUT_SIZE * 3]); > + LOCAL_ALIGNED_32(uint8_t, dst0_y, [LARGEST_INPUT_SIZE * 2]); > + LOCAL_ALIGNED_32(uint8_t, dst1_y, [LARGEST_INPUT_SIZE * 2]); > + > + randomize_buffers(src, LARGEST_INPUT_SIZE * 3); > + rgb2yuv[BY_IDX] = ((int)(0.114 * 219 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + rgb2yuv[BV_IDX] = (-(int)(0.081 * 224 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + rgb2yuv[BU_IDX] = ((int)(0.500 * 224 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + rgb2yuv[GY_IDX] = ((int)(0.587 * 219 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + rgb2yuv[GV_IDX] = (-(int)(0.419 * 224 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + rgb2yuv[GU_IDX] = (-(int)(0.331 * 224 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + rgb2yuv[RY_IDX] = ((int)(0.299 * 219 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + rgb2yuv[RV_IDX] = ((int)(0.500 * 224 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + rgb2yuv[RU_IDX] = (-(int)(0.169 * 224 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + > + ctx = sws_alloc_context(); > + if (sws_init_context(ctx, NULL, NULL) < 0) > + fail(); Allocate and initiate this once in checkasm_check_sw_rgb() and reuse it. > + > + for (int i = 0; i < FF_ARRAY_ELEMS(input_sizes); i++) { > + int w = input_sizes[i]; > + > + ctx->srcFormat = AV_PIX_FMT_RGB24; > + ctx->dstFormat = AV_PIX_FMT_YUV420P; > + > + ff_sws_init_scale(ctx); > + if (check_func(ctx->lumToYV12, "rgb24_to_y_%d", w)) { > + memset(dst0_y, 0xFF, LARGEST_INPUT_SIZE * 2); > + memset(dst1_y, 0xFF, LARGEST_INPUT_SIZE * 2); > + > + call_ref(dst0_y, src, NULL, NULL, w, rgb2yuv, NULL); Don't use a custom filled table, more so when it's smaller than needed. Use ctx->input_rgb2yuv_table directly here and everywhere else. It's filled with the values the C and any simd version may need. With that, the tests pass on x86. > + call_new(dst1_y, src, NULL, NULL, w, rgb2yuv, NULL); > + > + if (memcmp(dst0_y, dst1_y, w * 2)) > + fail(); > + > + bench_new(dst1_y, src, NULL, NULL, w, rgb2yuv, NULL); > + } > + } > + > + sws_freeContext(ctx); > +} > + > +static void check_rgb_to_uv(void) > +{ > + struct SwsContext *ctx; > + static const int input_sizes[] = {8, 128, 1280, 1080, LARGEST_INPUT_SIZE}; > + int32_t rgb2yuv[9] = {0}; > + > + declare_func(void, uint8_t *dstU, uint8_t *dstV, > + const uint8_t *src1, const uint8_t *src2, const uint8_t *src3, > + int width, uint32_t *pal, void *opq); > + > + LOCAL_ALIGNED_32(uint8_t, src, [LARGEST_INPUT_SIZE * 3]); > + LOCAL_ALIGNED_32(uint8_t, dst0_u, [LARGEST_INPUT_SIZE * 2]); > + LOCAL_ALIGNED_32(uint8_t, dst0_v, [LARGEST_INPUT_SIZE * 2]); > + LOCAL_ALIGNED_32(uint8_t, dst1_u, [LARGEST_INPUT_SIZE * 2]); > + LOCAL_ALIGNED_32(uint8_t, dst1_v, [LARGEST_INPUT_SIZE * 2]); > + > + randomize_buffers(src, LARGEST_INPUT_SIZE * 3); > + rgb2yuv[BY_IDX] = ((int)(0.114 * 219 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + rgb2yuv[BV_IDX] = (-(int)(0.081 * 224 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + rgb2yuv[BU_IDX] = ((int)(0.500 * 224 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + rgb2yuv[GY_IDX] = ((int)(0.587 * 219 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + rgb2yuv[GV_IDX] = (-(int)(0.419 * 224 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + rgb2yuv[GU_IDX] = (-(int)(0.331 * 224 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + rgb2yuv[RY_IDX] = ((int)(0.299 * 219 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + rgb2yuv[RV_IDX] = ((int)(0.500 * 224 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + rgb2yuv[RU_IDX] = (-(int)(0.169 * 224 / 255 * (1 << RGB2YUV_SHIFT) + 0.5)); > + > + ctx = sws_alloc_context(); > + if (sws_init_context(ctx, NULL, NULL) < 0) > + fail(); > + > + for (int i = 0; i < 2; i++) { > + for (int j = 0; j < FF_ARRAY_ELEMS(input_sizes); j++) { > + int w = input_sizes[j] >> i; > + > + ctx->chrSrcHSubSample = i ? 1 : 0; > + ctx->srcFormat = AV_PIX_FMT_RGB24; > + ctx->dstFormat = i ? AV_PIX_FMT_YUV420P : AV_PIX_FMT_YUV444P; > + > + ff_sws_init_scale(ctx); > + > + if (check_func(ctx->chrToYV12, "rgb24_to_uv%s_%d", i ? "_half" : "", w)) { > + memset(dst0_u, 0xFF, LARGEST_INPUT_SIZE * 2); > + memset(dst0_v, 0xFF, LARGEST_INPUT_SIZE * 2); > + memset(dst1_u, 0xFF, LARGEST_INPUT_SIZE * 2); > + memset(dst1_v, 0xFF, LARGEST_INPUT_SIZE * 2); > + > + call_ref(dst0_u, dst0_v, NULL, src, src, w, rgb2yuv, NULL); > + call_new(dst1_u, dst1_v, NULL, src, src, w, rgb2yuv, NULL); > + > + if (memcmp(dst0_u, dst1_u, w * 2) || memcmp(dst0_v, dst1_v, w * 2)) > + fail(); > + > + bench_new(dst1_u, dst1_v, NULL, src, src, w, rgb2yuv, NULL); > + } > + } > + } > + > + sws_freeContext(ctx); > +} > + > static void check_interleave_bytes(void) > { > LOCAL_ALIGNED_16(uint8_t, src0_buf, [MAX_STRIDE*MAX_HEIGHT+1]); > @@ -201,6 +318,12 @@ void checkasm_check_sw_rgb(void) > check_uyvy_to_422p(); > report("uyvytoyuv422"); > > + check_rgb_to_y(); > + report("rgb_to_y"); > + > + check_rgb_to_uv(); > + report("rgb_to_uv"); > + > check_interleave_bytes(); > report("interleave_bytes"); > } _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".