From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id A118B46A09 for ; Sat, 1 Jul 2023 21:40:19 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id ACD3868C17A; Sun, 2 Jul 2023 00:40:16 +0300 (EEST) Received: from mail8.parnet.fi (mail8.parnet.fi [77.234.108.134]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 923D368BE69 for ; Sun, 2 Jul 2023 00:40:10 +0300 (EEST) Received: from mail9.parnet.fi (mail9.parnet.fi [77.234.108.21]) by mail8.parnet.fi with ESMTP id 361Le9wL012163-361Le9wM012163; Sun, 2 Jul 2023 00:40:09 +0300 Received: from foo.martin.st (host-97-187.parnet.fi [77.234.97.187]) by mail9.parnet.fi (Postfix) with ESMTPS id AB6E9A146B; Sun, 2 Jul 2023 00:40:09 +0300 (EEST) Date: Sun, 2 Jul 2023 00:40:09 +0300 (EEST) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: FFmpeg development discussions and patches In-Reply-To: <20230629175729.224383-9-jc@kynesim.co.uk> Message-ID: <8f4abf25-df6c-37b-f0b4-5ab85b5cc0d8@martin.st> References: <20230629175729.224383-1-jc@kynesim.co.uk> <20230629175729.224383-9-jc@kynesim.co.uk> MIME-Version: 1.0 X-FE-Policy-ID: 3:14:2:SYSTEM Subject: Re: [FFmpeg-devel] [PATCH 08/15] avfilter/vf_bwdif: Add neon for filter_edge X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, John Cox Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Thu, 29 Jun 2023, John Cox wrote: > Signed-off-by: John Cox > --- > libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20 ++++ > libavfilter/aarch64/vf_bwdif_neon.S | 104 ++++++++++++++++++++ > 2 files changed, 124 insertions(+) > > diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c > index 3ffaa07ab3..e75cf2f204 100644 > --- a/libavfilter/aarch64/vf_bwdif_init_aarch64.c > +++ b/libavfilter/aarch64/vf_bwdif_init_aarch64.c > @@ -24,10 +24,29 @@ > #include "libavfilter/bwdif.h" > #include "libavutil/aarch64/cpu.h" > > +void ff_bwdif_filter_edge_neon(void *dst1, void *prev1, void *cur1, void *next1, > + int w, int prefs, int mrefs, int prefs2, int mrefs2, > + int parity, int clip_max, int spat); > + > void ff_bwdif_filter_intra_neon(void *dst1, void *cur1, int w, int prefs, int mrefs, > int prefs3, int mrefs3, int parity, int clip_max); > > > +static void filter_edge_helper(void *dst1, void *prev1, void *cur1, void *next1, > + int w, int prefs, int mrefs, int prefs2, int mrefs2, > + int parity, int clip_max, int spat) > +{ > + const int w0 = clip_max != 255 ? 0 : w & ~15; > + > + ff_bwdif_filter_edge_neon(dst1, prev1, cur1, next1, w0, prefs, mrefs, prefs2, mrefs2, > + parity, clip_max, spat); > + > + if (w0 < w) > + ff_bwdif_filter_edge_c((char *)dst1 + w0, (char *)prev1 + w0, (char *)cur1 + w0, (char *)next1 + w0, > + w - w0, prefs, mrefs, prefs2, mrefs2, > + parity, clip_max, spat); > +} > + > static void filter_intra_helper(void *dst1, void *cur1, int w, int prefs, int mrefs, > int prefs3, int mrefs3, int parity, int clip_max) > { > @@ -52,5 +71,6 @@ ff_bwdif_init_aarch64(BWDIFContext *s, int bit_depth) > return; > > s->filter_intra = filter_intra_helper; > + s->filter_edge = filter_edge_helper; > } > > diff --git a/libavfilter/aarch64/vf_bwdif_neon.S b/libavfilter/aarch64/vf_bwdif_neon.S > index 6c5d1598f4..a33b235882 100644 > --- a/libavfilter/aarch64/vf_bwdif_neon.S > +++ b/libavfilter/aarch64/vf_bwdif_neon.S > @@ -128,6 +128,110 @@ coeffs: > .hword 5570, 3801, 1016, -3801 // hf[0] = v0.h[2], -hf[1] = v0.h[5] > .hword 5077, 981 // sp[0] = v0.h[6] > > +// ============================================================================ > +// > +// void ff_bwdif_filter_edge_neon( > +// void *dst1, // x0 > +// void *prev1, // x1 > +// void *cur1, // x2 > +// void *next1, // x3 > +// int w, // w4 > +// int prefs, // w5 > +// int mrefs, // w6 > +// int prefs2, // w7 > +// int mrefs2, // [sp, #0] > +// int parity, // [sp, #8] > +// int clip_max, // [sp, #16] unused > +// int spat); // [sp, #24] This doesn't hold for macOS targets (and the checkasm tests fail on that platform). On macOS, arguments that aren't passed in registers but on the stack, are tightly packed. So since parity is 32 bit and mrefs2 also was 32 bit, parity is available at [sp, #4]. Therefore, it's usually simplest for portability reasons, to pass any arguments after the first 8, as intptr_t or ptrdiff_t, as that makes them consistent across platforms. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".