From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 8E7734CC3D for ; Sat, 25 Jan 2025 15:11:23 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 51EB468B7B4; Sat, 25 Jan 2025 17:11:20 +0200 (EET) Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id ACD6C687F49 for ; Sat, 25 Jan 2025 17:11:13 +0200 (EET) Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-5d3ecae02beso4145849a12.0 for ; Sat, 25 Jan 2025 07:11:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737817872; x=1738422672; darn=ffmpeg.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=c4G/5eJI6VTB1hJ7jdGBD/FKhWBVwXHa0Gk2VdH0wJ8=; b=CdZQ2d/IHXxKU+v3fL/LCZlXJlf+NjxZ2cpuwLtZAe+lC6l/7ZBU9eLm0sx68W/sDE WOOorvNkqcYX/BqOE0XaEKyB/XPAuvAC7+vGgx/s/40Fc4a41FncGVfxVXBUjtEi2bzs TFIAUV+8tqm6Y+J0JETcWYo/qgKBCj9mh0kOT2BFuSj1B2vEoQdDNnwx2I31BReK+Pp9 LKS6fCxgC6Q8TNjugfzrZaJIe6S1JMAH02fkguwOHP2Cie+rZ9DwseB4ckFhQmjxJ/jc pRLO1gQyyYhZdg3xUidUEeuLkws7GrhgM2/4mUmoOxtJHTqV7SLzM0BvCfOI3fdTiPnu LTFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737817872; x=1738422672; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=c4G/5eJI6VTB1hJ7jdGBD/FKhWBVwXHa0Gk2VdH0wJ8=; b=XlFHYY5VSyz2T/o2vzR8gaTECCdY/EcdkzLF6UhQmpTPCi1JK7rm6GomWXhFBOqk5E KUslhdDFBEvHgmDWP+BhAgu0RrS+IuqESAVjBXWLjsG0watlFuLcNFGyaLaGjOkgwFlk PvUZV/buMpwcqiH576LXsT6ouZq8ENaDYcXbPr3ipTVbkYFJApFvlZGyMdkixjMV0+vG HjkX/7NTvyNHE3f52tDzArtis+y1qTV6YjK+BlUDzlFrQmb6jzroNqE6uZ8IER/W1YZc f5E+jbT8IfZzz/QFlnI+z2GLFLfs94hS5g9g/BrqMOyrlivJ3nGygNblWH67NwyxLw0E GQIg== X-Gm-Message-State: AOJu0Yy8nG0goveANbMn7z6CrnnBKMO9IwYiKTFj6ffO9vC6QtyBcuLa Vqb1rtQPE7zCBjuwW05mfdKr1Ge4UGMaR/1deDYFLUa1/vqFg1WcbDb8fZgNRZNbAdNicLYXfmg 0Jl+B77V79KR6SrDz4j+0iUWDE907r1E3 X-Gm-Gg: ASbGnctaG+3R5dReNvsIjveDHjFuM7R18WF7H4tpsLbm0Cby0/oU2bP38HiNgdIH2Ev UzHLcyTE8VHm1F2Uw/k7gyXFmQD3s23J/hYhoeKGVlgQTZjgakuE3iFgLf7MWmg== X-Google-Smtp-Source: AGHT+IHBXSoYPKKCwU2u5Ty67mr1dU+veL5IymhJqoVIYYJQ+Syz9wmwg0/CNeKcj4DGPurzLZjOuPu8XoLnwsuex4w= X-Received: by 2002:a05:6402:84c:b0:5da:a97:ad73 with SMTP id 4fb4d7f45d1cf-5db7d2f8825mr27146470a12.13.1737817872126; Sat, 25 Jan 2025 07:11:12 -0800 (PST) MIME-Version: 1.0 References: <20250125142546.1244665-1-16567adigashreesh@gmail.com> <563cde8a-8fce-45fc-b126-7498e7d3862f@gmail.com> In-Reply-To: <563cde8a-8fce-45fc-b126-7498e7d3862f@gmail.com> From: Shreesh Adiga <16567adigashreesh@gmail.com> Date: Sat, 25 Jan 2025 20:41:01 +0530 X-Gm-Features: AWEUYZm5F1CduSh9RazfjyriZ11catRxqdFvHgXg7XyACrN4T1l7x9fIoqwyW88 Message-ID: To: FFmpeg development discussions and patches X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [FFmpeg-devel] [PATCH] swscale/x86/rgb2rgb: add AVX512ICL versions of shuffle_bytes X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: > Thanks for the patch. Could you please compile and run > tests/checkasm/checkasm with "--test=sw_rgb --bench" and paste the > results for the shuffle_bytes functions, to see if there's a speed up > compared to the AVX2 implementation? I ran the command "tests/checkasm/checkasm --test=sw_rgb --bench" and I see the below output: benchmarking with native FFmpeg timers nop: 45.0 checkasm: using random seed 17575157 checkasm: bench runs 1024 (1 << 10) SSE2: - sw_rgb.uyvytoyuv422 [OK] - sw_rgb.interleave_bytes [OK] - sw_rgb.deinterleave_bytes [OK] - sw_rgb.rgb_to_y [OK] - sw_rgb.rgb_to_uv [OK] SSSE3: - sw_rgb.shuffle_bytes_2103 [OK] - sw_rgb.shuffle_bytes_0321 [OK] - sw_rgb.shuffle_bytes_1230 [OK] - sw_rgb.shuffle_bytes_3012 [OK] - sw_rgb.shuffle_bytes_3210 [OK] - sw_rgb.rgb_to_y [OK] - sw_rgb.rgb_to_uv [OK] AVX: - sw_rgb.uyvytoyuv422 [OK] - sw_rgb.deinterleave_bytes [OK] - sw_rgb.rgb_to_y [OK] - sw_rgb.rgb_to_uv [OK] AVX2: - sw_rgb.shuffle_bytes_2103 [OK] - sw_rgb.shuffle_bytes_0321 [OK] - sw_rgb.shuffle_bytes_1230 [OK] - sw_rgb.shuffle_bytes_3012 [OK] - sw_rgb.shuffle_bytes_3210 [OK] - sw_rgb.uyvytoyuv422 [OK] - sw_rgb.rgb_to_y [OK] - sw_rgb.rgb_to_uv [OK] AVX-512ICL: - sw_rgb.shuffle_bytes_2103 [OK] - sw_rgb.shuffle_bytes_0321 [OK] - sw_rgb.shuffle_bytes_1230 [OK] - sw_rgb.shuffle_bytes_3012 [OK] - sw_rgb.shuffle_bytes_3210 [OK] checkasm: all 184 tests passed shuffle_bytes_0321_c: 45.0 ( 1.00x) shuffle_bytes_0321_ssse3: 11.2 ( 4.00x) shuffle_bytes_0321_avx2: 11.2 ( 4.00x) shuffle_bytes_0321_avx512icl: 11.2 ( 4.00x) shuffle_bytes_1230_c: 67.5 ( 1.00x) shuffle_bytes_1230_ssse3: 11.2 ( 6.00x) shuffle_bytes_1230_avx2: 11.2 ( 6.00x) shuffle_bytes_1230_avx512icl: 0.0 ( 0.00x) shuffle_bytes_2103_c: 45.0 ( 1.00x) shuffle_bytes_2103_ssse3: 11.2 ( 4.00x) shuffle_bytes_2103_avx2: 0.0 ( 0.00x) shuffle_bytes_2103_avx512icl: 0.0 ( 0.00x) shuffle_bytes_3012_c: 67.5 ( 1.00x) shuffle_bytes_3012_ssse3: 11.2 ( 6.00x) shuffle_bytes_3012_avx2: 11.2 ( 6.00x) shuffle_bytes_3012_avx512icl: 0.0 ( 0.00x) shuffle_bytes_3210_c: 67.5 ( 1.00x) shuffle_bytes_3210_ssse3: 11.2 ( 6.00x) shuffle_bytes_3210_avx2: 11.2 ( 6.00x) shuffle_bytes_3210_avx512icl: 0.0 ( 0.00x) I've not included the other function printed by the bench command. I'm not sure if I'm missing something, the output doesn't look consistent to me. There are many 0.0 and I don't see any difference between ssse3 and avx2 either. I'm running this on AMD Ryzen 7950x Zen4 machine. I've inspected the assembly output for one of the ssse3/avx2/avx512 and it seems to be as per my expectation. Therefore I'm not sure if the checkasm is accurately measuring here. Please let me know if I'm missing something here, I'm new to FFmpeg development and this is my first patch submission. Thanks, Shreesh _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".