From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 9B829466EA for ; Sun, 2 Jul 2023 20:40:14 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4B35268C4B6; Sun, 2 Jul 2023 23:40:12 +0300 (EEST) Received: from mail8.parnet.fi (mail8.parnet.fi [77.234.108.134]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 701D368BEB2 for ; Sun, 2 Jul 2023 23:40:05 +0300 (EEST) Received: from mail9.parnet.fi (mail9.parnet.fi [77.234.108.21]) by mail8.parnet.fi with ESMTP id 362Ke4EC013959-362Ke4ED013959; Sun, 2 Jul 2023 23:40:04 +0300 Received: from foo.martin.st (host-97-187.parnet.fi [77.234.97.187]) by mail9.parnet.fi (Postfix) with ESMTPS id C590BA146B; Sun, 2 Jul 2023 23:40:04 +0300 (EEST) Date: Sun, 2 Jul 2023 23:40:04 +0300 (EEST) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: John Cox In-Reply-To: Message-ID: <28219054-4555-f7e1-801d-1eb624f7c6c7@martin.st> References: <20230629175729.224383-1-jc@kynesim.co.uk> <20230629175729.224383-12-jc@kynesim.co.uk> MIME-Version: 1.0 X-FE-Policy-ID: 3:14:2:SYSTEM Subject: Re: [FFmpeg-devel] [PATCH 11/15] avfilter/vf_bwdif: Add neon for filter_line X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Sun, 2 Jul 2023, John Cox wrote: > On Sun, 2 Jul 2023 00:44:10 +0300 (EEST), you wrote: > >> On Thu, 29 Jun 2023, John Cox wrote: >> >>> Signed-off-by: John Cox >>> --- >>> libavfilter/aarch64/vf_bwdif_init_aarch64.c | 21 ++ >>> libavfilter/aarch64/vf_bwdif_neon.S | 215 ++++++++++++++++++++ >>> 2 files changed, 236 insertions(+) >>> >>> diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c >>> index e75cf2f204..21e67884ab 100644 >>> --- a/libavfilter/aarch64/vf_bwdif_init_aarch64.c >>> +++ b/libavfilter/aarch64/vf_bwdif_init_aarch64.c >>> @@ -31,6 +31,26 @@ void ff_bwdif_filter_edge_neon(void *dst1, void *prev1, void *cur1, void *next1, >>> void ff_bwdif_filter_intra_neon(void *dst1, void *cur1, int w, int prefs, int mrefs, >>> int prefs3, int mrefs3, int parity, int clip_max); >>> >>> +void ff_bwdif_filter_line_neon(void *dst1, void *prev1, void *cur1, void *next1, >>> + int w, int prefs, int mrefs, int prefs2, int mrefs2, >>> + int prefs3, int mrefs3, int prefs4, int mrefs4, >>> + int parity, int clip_max); >>> + >>> + >>> +static void filter_line_helper(void *dst1, void *prev1, void *cur1, void *next1, >>> + int w, int prefs, int mrefs, int prefs2, int mrefs2, >>> + int prefs3, int mrefs3, int prefs4, int mrefs4, >>> + int parity, int clip_max) >>> +{ >>> + const int w0 = clip_max != 255 ? 0 : w & ~15; >>> + >>> + ff_bwdif_filter_line_neon(dst1, prev1, cur1, next1, >>> + w0, prefs, mrefs, prefs2, mrefs2, prefs3, mrefs3, prefs4, mrefs4, parity, clip_max); >>> + >>> + if (w0 < w) >>> + ff_bwdif_filter_line_c((char *)dst1 + w0, (char *)prev1 + w0, (char *)cur1 + w0, (char *)next1 + w0, >>> + w - w0, prefs, mrefs, prefs2, mrefs2, prefs3, mrefs3, prefs4, mrefs4, parity, clip_max); >>> +} >>> >>> static void filter_edge_helper(void *dst1, void *prev1, void *cur1, void *next1, >>> int w, int prefs, int mrefs, int prefs2, int mrefs2, >>> @@ -71,6 +91,7 @@ ff_bwdif_init_aarch64(BWDIFContext *s, int bit_depth) >>> return; >>> >>> s->filter_intra = filter_intra_helper; >>> + s->filter_line = filter_line_helper; >>> s->filter_edge = filter_edge_helper; >>> } >>> >>> diff --git a/libavfilter/aarch64/vf_bwdif_neon.S b/libavfilter/aarch64/vf_bwdif_neon.S >>> index a33b235882..675e97d966 100644 >>> --- a/libavfilter/aarch64/vf_bwdif_neon.S >>> +++ b/libavfilter/aarch64/vf_bwdif_neon.S >>> @@ -128,6 +128,221 @@ coeffs: >>> .hword 5570, 3801, 1016, -3801 // hf[0] = v0.h[2], -hf[1] = v0.h[5] >>> .hword 5077, 981 // sp[0] = v0.h[6] >>> >>> +// =========================================================================== >>> +// >>> +// void filter_line( >>> +// void *dst1, // x0 >>> +// void *prev1, // x1 >>> +// void *cur1, // x2 >>> +// void *next1, // x3 >>> +// int w, // w4 >>> +// int prefs, // w5 >>> +// int mrefs, // w6 >>> +// int prefs2, // w7 >>> +// int mrefs2, // [sp, #0] >>> +// int prefs3, // [sp, #8] >>> +// int mrefs3, // [sp, #16] >>> +// int prefs4, // [sp, #24] >>> +// int mrefs4, // [sp, #32] >>> +// int parity, // [sp, #40] >>> +// int clip_max) // [sp, #48] >>> + >>> +function ff_bwdif_filter_line_neon, export=1 >>> + // Sanity check w >>> + cmp w4, #0 >>> + ble 99f >>> + >>> + // Rearrange regs to be the same as line3 for ease of debug! >>> + mov w10, w4 // w10 = loop count >>> + mov w9, w6 // w9 = mref >>> + mov w12, w7 // w12 = pref2 >>> + mov w11, w5 // w11 = pref >>> + ldr w8, [sp, #0] // w8 = mref2 >>> + ldr w7, [sp, #16] // w7 = mref3 >>> + ldr w6, [sp, #32] // w6 = mref4 >>> + ldr w13, [sp, #8] // w13 = pref3 >>> + ldr w14, [sp, #24] // w14 = pref4 >> >> Btw, remember that you can load two arguments from the stack at once with >> ldp, e.g. "ldp x8, x13, [sp, #0]". If they're made intptr_t/ptrdiff_t, you >> won't have an issue with garbage in the upper 32 bits either. > > Fair point - I was indeed worrying about garbage in the upper half (and > this is not performance or size critical code). Well as long as you actually do refer to the register in the form of w8 instead of x8, it shouldn't matter. Checkasm does try to make sure that you actually should get garbage in such areas, so if it passes checkasm, it should be fine. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".