From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 91CC64322B for ; Wed, 25 May 2022 08:41:00 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8B86D68B4E7; Wed, 25 May 2022 11:40:57 +0300 (EEST) Received: from mail8.parnet.fi (mail8.parnet.fi [77.234.108.134]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1C4E168B428 for ; Wed, 25 May 2022 11:40:51 +0300 (EEST) Received: from mail9.parnet.fi (mail9.parnet.fi [77.234.108.21]) by mail8.parnet.fi with ESMTP id 24P8eiEL021523-24P8eiEM021523; Wed, 25 May 2022 11:40:44 +0300 Received: from foo.martin.st (host-97-187.parnet.fi [77.234.97.187]) by mail9.parnet.fi (Postfix) with ESMTPS id 837F1A142D; Wed, 25 May 2022 11:40:44 +0300 (EEST) Date: Wed, 25 May 2022 11:40:44 +0300 (EEST) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: "Swinney, Jonathan" In-Reply-To: Message-ID: References: MIME-Version: 1.0 X-FE-Policy-ID: 3:14:2:SYSTEM Subject: Re: [FFmpeg-devel] [PATCH v2 2/2] swscale/aarch64: add hscale specializations X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: "Pop, Sebastian" , "ffmpeg-devel@ffmpeg.org" Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Wed, 25 May 2022, Swinney, Jonathan wrote: > This patch adds code to support specializations of the hscale function and adds > a specialization for filterSize == 4. > > ff_hscale8to15_4_neon is a complete rewrite. Since the main bottleneck here is > loading the data from src, this data is loaded a whole block ahead and stored > back to the stack to be loaded again with ld4. This arranges the data for most > efficient use of the vector instructions and removes the need for completion > adds at the end. The number of iterations of the C per iteration of the assembly > is increased from 4 to 8, but because of the prefetching, there must be a > special section without prefetching when dstW < 16. > > This improves speed on Graviton 2 (Neoverse N1) dramatically in the case where > previously fs=8 would have been required. > > before: hscale_8_to_15__fs_8_dstW_512_neon: 1962.8 > after : hscale_8_to_15__fs_4_dstW_512_neon: 1220.9 > > Signed-off-by: Jonathan Swinney > --- > libswscale/aarch64/hscale.S | 172 ++++++++++++++++++++++++++++++++++- > libswscale/aarch64/swscale.c | 40 ++++++-- > libswscale/utils.c | 2 +- > 3 files changed, 203 insertions(+), 11 deletions(-) > > -void ff_hscale_8_to_15_neon(SwsContext *c, int16_t *dst, int dstW, > - const uint8_t *src, const int16_t *filter, > - const int32_t *filterPos, int filterSize); > +#define SCALE_FUNC(filter_n, from_bpc, to_bpc, opt) \ > +void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt( \ > + SwsContext *c, int16_t *data, \ > + int dstW, const uint8_t *src, \ > + const int16_t *filter, \ > + const int32_t *filterPos, int filterSize) > +#define SCALE_FUNCS(filter_n, opt) \ > + SCALE_FUNC(filter_n, 8, 15, opt); > +#define ALL_SCALE_FUNCS(opt) \ > + SCALE_FUNCS(4, opt); \ > + SCALE_FUNCS(8, opt); \ > + SCALE_FUNCS(X8, opt) Here, you still declare the -8 function which no longer is implemented. Other than that, this patch looks fine I think. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".