From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 1743D427B2 for ; Wed, 30 Mar 2022 14:14:49 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C336868B248; Wed, 30 Mar 2022 17:14:46 +0300 (EEST) Received: from mail8.parnet.fi (mail8.parnet.fi [77.234.108.134]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D9A9568B00F for ; Wed, 30 Mar 2022 17:14:39 +0300 (EEST) Received: from mail9.parnet.fi (mail9.parnet.fi [77.234.108.21]) by mail8.parnet.fi with ESMTP id 22UEEdjQ030289-22UEEdjR030289; Wed, 30 Mar 2022 17:14:39 +0300 Received: from foo.martin.st (host-97-187.parnet.fi [77.234.97.187]) by mail9.parnet.fi (Postfix) with ESMTPS id 18A03A1430; Wed, 30 Mar 2022 17:14:38 +0300 (EEST) Date: Wed, 30 Mar 2022 17:14:37 +0300 (EEST) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: FFmpeg development discussions and patches In-Reply-To: <20220325185257.513933-9-bavison@riscosopen.org> Message-ID: <83954566-3ee5-532e-2ff3-f726d9ab52cc@martin.st> References: <20220317185819.466470-1-bavison@riscosopen.org> <20220325185257.513933-1-bavison@riscosopen.org> <20220325185257.513933-9-bavison@riscosopen.org> MIME-Version: 1.0 X-FE-Policy-ID: 3:14:2:SYSTEM Subject: Re: [FFmpeg-devel] [PATCH 08/10] avcodec/idctdsp: Arm 64-bit NEON block add and clamp fast paths X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Ben Avison Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Fri, 25 Mar 2022, Ben Avison wrote: > checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. > > idctdsp.add_pixels_clamped_c: 323.0 > idctdsp.add_pixels_clamped_neon: 41.5 > idctdsp.put_pixels_clamped_c: 243.0 > idctdsp.put_pixels_clamped_neon: 30.0 > idctdsp.put_signed_pixels_clamped_c: 225.7 > idctdsp.put_signed_pixels_clamped_neon: 37.7 > > Signed-off-by: Ben Avison > --- > libavcodec/aarch64/Makefile | 3 +- > libavcodec/aarch64/idctdsp_init_aarch64.c | 26 +++-- > libavcodec/aarch64/idctdsp_neon.S | 130 ++++++++++++++++++++++ > 3 files changed, 150 insertions(+), 9 deletions(-) > create mode 100644 libavcodec/aarch64/idctdsp_neon.S Generally LGTM > +// Clamp 16-bit signed block coefficients to signed 8-bit (biased by 128) > +// On entry: > +// x0 -> array of 64x 16-bit coefficients > +// x1 -> 8-bit results > +// x2 = row stride for results, bytes > +function ff_put_signed_pixels_clamped_neon, export=1 > + ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x0], #64 > + movi v4.8b, #128 > + ld1 {v16.16b, v17.16b, v18.16b, v19.16b}, [x0] > + sqxtn v0.8b, v0.8h > + sqxtn v1.8b, v1.8h > + sqxtn v2.8b, v2.8h > + sqxtn v3.8b, v3.8h > + sqxtn v5.8b, v16.8h > + add v0.8b, v0.8b, v4.8b Here you could save 4 add instructions with sqxtn2 and adding .16b vectors, but I'm not sure if it's wortwhile. (It reduces the checkasm numbers by 0.7 for Cortex A72, by 0.3 for A73, but increases the runtime by 1.0 on A53.) Stranegely enough, I get much smaller numbers on my A72 than you got. I get these: idctdsp.add_pixels_clamped_c: 306.7 idctdsp.add_pixels_clamped_neon: 25.7 idctdsp.put_pixels_clamped_c: 217.2 idctdsp.put_pixels_clamped_neon: 15.2 idctdsp.put_signed_pixels_clamped_c: 216.7 idctdsp.put_signed_pixels_clamped_neon: 19.2 (The _c numbers are of course highly compiler dependent, but the assembly numbers should generally match quite closely. And AFAIK they should be measured in clock cycles, so CPU frequency shouldn't really play a role either.) // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".