From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 1030D44055 for ; Mon, 24 Oct 2022 13:19:33 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 99B1968BB51; Mon, 24 Oct 2022 16:19:30 +0300 (EEST) Received: from mail8.parnet.fi (mail8.parnet.fi [77.234.108.134]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 9E91168B958 for ; Mon, 24 Oct 2022 16:19:24 +0300 (EEST) Received: from mail9.parnet.fi (mail9.parnet.fi [77.234.108.21]) by mail8.parnet.fi with ESMTP id 29ODJI7D018885-29ODJI7E018885; Mon, 24 Oct 2022 16:19:18 +0300 Received: from foo.martin.st (host-97-187.parnet.fi [77.234.97.187]) by mail9.parnet.fi (Postfix) with ESMTPS id 4E4DFA1428; Mon, 24 Oct 2022 16:19:18 +0300 (EEST) Date: Mon, 24 Oct 2022 16:19:18 +0300 (EEST) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: Hubert Mazur In-Reply-To: <20221017130715.30896-5-hum@semihalf.com> Message-ID: <757bb051-c5c3-91cb-cb7a-f5854bc6d26e@martin.st> References: <20221017130715.30896-1-hum@semihalf.com> <20221017130715.30896-5-hum@semihalf.com> MIME-Version: 1.0 X-FE-Policy-ID: 3:14:2:SYSTEM Subject: Re: [FFmpeg-devel] [PATCH 4/4] sw_scale: Add specializations for hscale 16 to 19 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: gjb@semihalf.com, upstream@semihalf.com, jswinney@amazon.com, ffmpeg-devel@ffmpeg.org, mw@semihalf.com, spop@amazon.com Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Mon, 17 Oct 2022, Hubert Mazur wrote: > Provide arm64 neon optimized implementations for hscale16To19 with > filter sizes 4, 8 and X4. > > The tests and benchmarks run on AWS Graviton 2 instances. > The results from a checkasm tool are shown below. > > hscale_16_to_19__fs_4_dstW_512_c: 6216.0 > hscale_16_to_19__fs_4_dstW_512_neon: 2257.0 > hscale_16_to_19__fs_8_dstW_512_c: 10417.7 > hscale_16_to_19__fs_8_dstW_512_neon: 3112.5 > hscale_16_to_19__fs_12_dstW_512_c: 14890.5 > hscale_16_to_19__fs_12_dstW_512_neon: 3899.0 > hscale_16_to_19__fs_16_dstW_512_c: 19006.5 > hscale_16_to_19__fs_16_dstW_512_neon: 5341.2 > hscale_16_to_19__fs_32_dstW_512_c: 36629.5 > hscale_16_to_19__fs_32_dstW_512_neon: 9502.7 > hscale_16_to_19__fs_40_dstW_512_c: 45477.5 > hscale_16_to_19__fs_40_dstW_512_neon: 11552.0 > > Signed-off-by: Hubert Mazur > --- > libswscale/aarch64/hscale.S | 402 +++++++++++++++++++++++++++++++++++ > libswscale/aarch64/swscale.c | 70 +++++- > 2 files changed, 471 insertions(+), 1 deletion(-) > +void ff_hscale16to19_4_neon_asm(int shift, int16_t *_dst, int dstW, > + const uint8_t *_src, const int16_t *filter, > + const int32_t *filterPos, int filterSize); > +void ff_hscale16to19_X8_neon_asm(int shift, int16_t *_dst, int dstW, > + const uint8_t *_src, const int16_t *filter, > + const int32_t *filterPos, int filterSize); > +void ff_hscale16to19_X4_neon_asm(int shift, int16_t *_dst, int dstW, > + const uint8_t *_src, const int16_t *filter, > + const int32_t *filterPos, int filterSize); > + > #define SCALE_FUNC(filter_n, from_bpc, to_bpc, opt) \ > void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt( \ > SwsContext *c, int16_t *data, \ > @@ -43,7 +53,8 @@ void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt( \ > #define SCALE_FUNCS(filter_n, opt) \ > SCALE_FUNC(filter_n, 8, 15, opt); \ > SCALE_FUNC(filter_n, 8, 19, opt); \ > - SCALE_FUNC(filter_n, 16, 15, opt); > + SCALE_FUNC(filter_n, 16, 15, opt); \ > + SCALE_FUNC(filter_n, 16, 19, opt); So this declares the functions we're implementing as C wrappers below, and the manual declarations further up declare the actual asm functions? I guess that works, although it makes unnecessary extern functions. In such cases, we usually have the C functions be static functions, placed above the code that uses them. But it's not a big deal. Other than that, this patchset mostly seems fine. However, I tested the patches on x86, and the new checkasm tests do fail on x86 (both i386 and x86_64) - so that needs to be fixed anyway. So since we'll need to do a new round anyway, please do try to fix up the minor cosmetics I mentioned. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".