From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 304AA4C511 for ; Fri, 7 Feb 2025 10:48:18 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9533F68BD26; Fri, 7 Feb 2025 12:48:14 +0200 (EET) Received: from mail-lj1-f171.google.com (mail-lj1-f171.google.com [209.85.208.171]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1535C68B6FC for ; Fri, 7 Feb 2025 12:48:08 +0200 (EET) Received: by mail-lj1-f171.google.com with SMTP id 38308e7fff4ca-308d625295cso2751341fa.0 for ; Fri, 07 Feb 2025 02:48:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1738925287; x=1739530087; darn=ffmpeg.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=okoxMpRM5hUhPn1U/7Zd/4/qTR+xzB5wrvXTFqGJnhU=; b=HO625WlOoYsfWPFJFF/ek8uWkaAIxVMIhIvvoomeClYTGsTVQPfEy8A09gaMt2o2Q8 T+xqwnu6wTwjXoV/W9+UCGq+1Nn7DIIH/d8Pr+s1Fyu7QKoBfCAiXnyPIVVLZ7mSmUdD KaQrU4j9yGlCgQxSZxUXIeM6kq6WzQKAkPWOXbDJtPKURSlKlo64n7eHFsZW+giw1V4V 3rSZU7OhhBGPvfyxEAom0TjZn7BW+dN+taMplyTUnnOUG86J6whRg5VmW1/KNI7YNq2a LfIBFnJb1dgLrIHBzuOXhhymUhXF4xyo/fd+ZUaAv41p4ifuWeMp0W5CUztwVJit0jMm zobQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738925287; x=1739530087; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=okoxMpRM5hUhPn1U/7Zd/4/qTR+xzB5wrvXTFqGJnhU=; b=vfIh3aNNy0FE9EB1dSJ021L6p1Se1IJRKN+ZUVCB/B+YpR+jERrtCbNnQ2AukqV6nR uesHq06jKV+tdAUiscOfs10Kcl8RQXU5Zk1yV3ARkJIvaBS1g1B2riENzWp7RLoai2Np g9K2IRs+IoFH/z3qXpwx3l8mpBy9mfsm/bcFJli5H+q4ElL9FUGHbrYzsrJ71S7VWlR4 oPrH3PJxcLwiMoRhk0T/UD4o5O4wFFKMvNeWEC6Tm6aCvxf3xoTGPMN+sGl5FQtrsIoX 2HDGk7Qm/iQyzqraqiCyYvc8b69wAaPnTAFlWXlstf5GvRs5dWu7q3ITD58d60YzrXFS OXLA== X-Gm-Message-State: AOJu0Yzqdp/JMHh9PgwiOnGq4t7r16gYjNT0AJ0kic093jh95W2XpkGI j+f8+r1VtHiP3NZimbWi/63ngX7rgeEazzNNC1CTbSNeaLZ03lDnO2cGvwHGHkmWPJ1Kt/irQYs Otw== X-Gm-Gg: ASbGncuP0Tzor0xgUF3lgEsM/1HSi0fF8ck2yNZKLissP0cwsUxunx761kETMRRzAkZ ix2/t0kHqMcoK5mAR9cV3VxipRA4l3NPXF0U5iuVSN18ZAbrZd4fovKZyFNek2xPmSXfyv8nDq2 GmO6XkHZMB4astZXzudmuqQrFLawUpoSIc4abDCBlV99LOcyVwxPCi6ohq14n+ojIkd3I5yPZ9q wRyGfGtVDMP3FOKFGsFJIPbWi8rLvS0NebSpvN7DFJ5SKYMP4B+HUZ2PJRpuR5glYBDMScHp62V IXBiszKoN5Gr0vsle+zm9nvmzOA0AStxzBDbS7ePs35qb7+9zC355RPFQ1ad0vXdad5vLruid55 nw6K+2BlSOuo= X-Google-Smtp-Source: AGHT+IFcuXDFoVj0Cp+O+2kNkGHfIbkDDjoLWTklCuACdCE8TvA7vrXSEyWdsOQv/ThAUIo1NCsUXQ== X-Received: by 2002:a2e:a545:0:b0:300:15d9:c625 with SMTP id 38308e7fff4ca-307e5801ab9mr8105691fa.14.1738925286956; Fri, 07 Feb 2025 02:48:06 -0800 (PST) Received: from tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net (tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net. [2001:470:27:11::2]) by smtp.gmail.com with ESMTPSA id 38308e7fff4ca-307de177d22sm4011441fa.2.2025.02.07.02.48.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Feb 2025 02:48:06 -0800 (PST) Date: Fri, 7 Feb 2025 12:48:01 +0200 (EET) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: Krzysztof Pyrkosz via ffmpeg-devel In-Reply-To: <20250128180132.12969-2-ffmpeg@szaka.eu> Message-ID: <7717cf1-9391-b3bd-6fb5-bbd7aa6c1b8@martin.st> References: <20250128180132.12969-2-ffmpeg@szaka.eu> MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH] swscale/aarch64/rgb2rgb: Implemented NEON shuf routines X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Krzysztof Pyrkosz Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Tue, 28 Jan 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote: > The key idea is to pass the pre-generated tables to the TBL instruction > and churn through the data 16 bytes at a time. The remaining 4 elements > are handled with a specialized block located at the end of the routine. > > The 3210 variant can be implemented using rev32, but surprisingly it is > slower than the generic TBL on A78, but much faster on A72. I wrapped it > in #if 0 block. So the tradeoff is essentially this: A78: shuffle_bytes_3210_c: 138.0 ( 1.00x) shuffle_bytes_3210_tbl_neon: 22.0 ( 6.27x) shuffle_bytes_3210_rev32_neon: 28.5 ( 4.88x) A72: shuffle_bytes_3210_c: 195.8 ( 1.00x) shuffle_bytes_3210_tbl_neon: 37.8 ( 5.19x) shuffle_bytes_3210_rev32_neon: 30.8 ( 6.33x) Yeah it doesn't really make much of a difference which on we pick here; we're much faster than the C code in any case. I guess favouring tbl for the newer cores is the right choice to make. > diff --git a/libswscale/aarch64/rgb2rgb_neon.S b/libswscale/aarch64/rgb2rgb_neon.S > index 1382e00261..a69a211ad4 100644 > --- a/libswscale/aarch64/rgb2rgb_neon.S > +++ b/libswscale/aarch64/rgb2rgb_neon.S > @@ -296,3 +359,99 @@ function ff_deinterleave_bytes_neon, export=1 > 0: > ret > endfunc > + > +.macro neon_shuf shuf > +function ff_shuffle_bytes_\shuf\()_neon, export=1 > + movrel x9, shuf_\shuf\()_tbl > + ld1 {v1.16b}, [x9] > + and w5, w2, #~15 > + and w3, w2, #8 > + and w4, w2, #4 > + cbz w5, 2f > +1: > + subs w5, w5, #16 > + ld1 {v0.16b}, [x0], #16 > + tbl v0.16b, {v0.16b}, v1.16b > + st1 {v0.16b}, [x1], #16 > + b.gt 1b By moving the subs to after the ld1, on the Cortex A53, I get the runtime lowered from this: shuffle_bytes_0321_c: 283.0 ( 1.00x) shuffle_bytes_0321_neon: 68.0 ( 4.16x) to this: shuffle_bytes_0321_neon: 60.8 ( 4.66x) So I'm squashing such a change into it. > +# > +#if 0 > +function ff_shuffle_bytes_3210_neon, export=1 > + and w5, w2, #~(15) While it is nice to keep this as reference, it's kinda dead code here, so I would suggest we just drop it for now. Good that you investigated it though! Other than that, this looks really good, so I'll push it with those changes. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".