From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id EA26446A3E for ; Sun, 2 Jul 2023 10:51:03 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 202E368C217; Sun, 2 Jul 2023 13:51:01 +0300 (EEST) Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id F0374688158 for ; Sun, 2 Jul 2023 13:50:53 +0300 (EEST) Received: by mail-wm1-f45.google.com with SMTP id 5b1f17b1804b1-3fbc6ab5ff5so27948495e9.1 for ; Sun, 02 Jul 2023 03:50:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1688295053; x=1690887053; h=content-transfer-encoding:mime-version:user-agent:in-reply-to :references:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=cM9rpOLJ8av20p24LJn8uMJHc69CHUUDHNpRpl4FQ5A=; b=Fd8Mvzyd0ezaZ0p9chiPsuiH7U61IwWEKDcH91GjlRRikIcq5pRiKuCyx/g7xQYa4Y xygFuOHS1XbjEJrrAQDHtqmebHLIgxpMebJKmg4AK3pw1qgYWYb8UsEnp1z2ovSeoaIL V4xgXa92U7oJKXcJmfwRXj5WyjYq4RlRDwUPM4qZgJuMZVsWlfm6vf6AtKyFipAIl2QY MyeO+VJjw/frlwLQ7NQ3Lrsb7NAdU1+zWn/gYKuzb+ps7NQaM4L/TpdL3oAYy8ARhLZG LMEjPjQQnDccnh2idPsKmeR07qQXP0IpxGLM95LF2oHt3hYAiy2mqPekmfDsMX6H8y7P ghvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688295053; x=1690887053; h=content-transfer-encoding:mime-version:user-agent:in-reply-to :references:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=cM9rpOLJ8av20p24LJn8uMJHc69CHUUDHNpRpl4FQ5A=; b=L1WY+CXEhDqvBUPVL6ptq9wRNcce07Ce0WFFa6ETfGmc7sIg2wcxT0G1Kvb23CByCM Ui8M8EiqvwRlswjAnA9Mi/qA8j8moDrW100nZe3ZY9AGLDLHJBNqj95Zc4VL9mf0VNZP u+p078qeEeEnuztz5xA2kGv/hlvqtw+lainlInUiOl125W277IofM6prH/MOH+Zvbdin vw9MUsPWBw0iUAQi63l8PszolOdhnERTX5VFqOWL0jzJTWRIyldwNTezBGugTwVHJilb BC3muIs/GXP+mMdglaFF32siv30rrW8qZ9dsXA5lMqBUxVVMY8ADs4ThqB3fo2mYqiyh E0sQ== X-Gm-Message-State: AC+VfDw38riP+m+AcpslBsOX2b57me7tienn/qJYNMXlIwsUamXrOATl vNE77DlKtR8ECKJf8jiZcivJ+QsnBrH9fN2fgAc= X-Google-Smtp-Source: ACHHUZ7oSBpro4OFYpob5uM0tEPX0Q4LV+m71ZtOTA88SSSZvnrluZAD+5Rv2I2p5aR7x8BUHEKilw== X-Received: by 2002:a7b:c8d8:0:b0:3fa:9767:c816 with SMTP id f24-20020a7bc8d8000000b003fa9767c816mr5851242wml.39.1688295053205; Sun, 02 Jul 2023 03:50:53 -0700 (PDT) Received: from CTHALPA.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id z24-20020a05600c221800b003fa96fe2bd9sm18935502wml.22.2023.07.02.03.50.52 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Sun, 02 Jul 2023 03:50:53 -0700 (PDT) From: John Cox To: =?utf-8?Q?Martin_Storsj=C3=B6?= Date: Sun, 02 Jul 2023 11:50:52 +0100 Message-ID: References: <20230629175729.224383-1-jc@kynesim.co.uk> <20230629175729.224383-9-jc@kynesim.co.uk> <8f4abf25-df6c-37b-f0b4-5ab85b5cc0d8@martin.st> In-Reply-To: <8f4abf25-df6c-37b-f0b4-5ab85b5cc0d8@martin.st> User-Agent: ForteAgent/8.00.32.1272 MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH 08/15] avfilter/vf_bwdif: Add neon for filter_edge X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Sun, 2 Jul 2023 00:40:09 +0300 (EEST), you wrote: >On Thu, 29 Jun 2023, John Cox wrote: > >> Signed-off-by: John Cox >> --- >> libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20 ++++ >> libavfilter/aarch64/vf_bwdif_neon.S | 104 ++++++++++++++++++++ >> 2 files changed, 124 insertions(+) >> >> diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c >> index 3ffaa07ab3..e75cf2f204 100644 >> --- a/libavfilter/aarch64/vf_bwdif_init_aarch64.c >> +++ b/libavfilter/aarch64/vf_bwdif_init_aarch64.c >> @@ -24,10 +24,29 @@ >> #include "libavfilter/bwdif.h" >> #include "libavutil/aarch64/cpu.h" >> >> +void ff_bwdif_filter_edge_neon(void *dst1, void *prev1, void *cur1, void *next1, >> + int w, int prefs, int mrefs, int prefs2, int mrefs2, >> + int parity, int clip_max, int spat); >> + >> void ff_bwdif_filter_intra_neon(void *dst1, void *cur1, int w, int prefs, int mrefs, >> int prefs3, int mrefs3, int parity, int clip_max); >> >> >> +static void filter_edge_helper(void *dst1, void *prev1, void *cur1, void *next1, >> + int w, int prefs, int mrefs, int prefs2, int mrefs2, >> + int parity, int clip_max, int spat) >> +{ >> + const int w0 = clip_max != 255 ? 0 : w & ~15; >> + >> + ff_bwdif_filter_edge_neon(dst1, prev1, cur1, next1, w0, prefs, mrefs, prefs2, mrefs2, >> + parity, clip_max, spat); >> + >> + if (w0 < w) >> + ff_bwdif_filter_edge_c((char *)dst1 + w0, (char *)prev1 + w0, (char *)cur1 + w0, (char *)next1 + w0, >> + w - w0, prefs, mrefs, prefs2, mrefs2, >> + parity, clip_max, spat); >> +} >> + >> static void filter_intra_helper(void *dst1, void *cur1, int w, int prefs, int mrefs, >> int prefs3, int mrefs3, int parity, int clip_max) >> { >> @@ -52,5 +71,6 @@ ff_bwdif_init_aarch64(BWDIFContext *s, int bit_depth) >> return; >> >> s->filter_intra = filter_intra_helper; >> + s->filter_edge = filter_edge_helper; >> } >> >> diff --git a/libavfilter/aarch64/vf_bwdif_neon.S b/libavfilter/aarch64/vf_bwdif_neon.S >> index 6c5d1598f4..a33b235882 100644 >> --- a/libavfilter/aarch64/vf_bwdif_neon.S >> +++ b/libavfilter/aarch64/vf_bwdif_neon.S >> @@ -128,6 +128,110 @@ coeffs: >> .hword 5570, 3801, 1016, -3801 // hf[0] = v0.h[2], -hf[1] = v0.h[5] >> .hword 5077, 981 // sp[0] = v0.h[6] >> >> +// ============================================================================ >> +// >> +// void ff_bwdif_filter_edge_neon( >> +// void *dst1, // x0 >> +// void *prev1, // x1 >> +// void *cur1, // x2 >> +// void *next1, // x3 >> +// int w, // w4 >> +// int prefs, // w5 >> +// int mrefs, // w6 >> +// int prefs2, // w7 >> +// int mrefs2, // [sp, #0] >> +// int parity, // [sp, #8] >> +// int clip_max, // [sp, #16] unused >> +// int spat); // [sp, #24] > >This doesn't hold for macOS targets (and the checkasm tests fail on that >platform). > >On macOS, arguments that aren't passed in registers but on the stack, are >tightly packed. So since parity is 32 bit and mrefs2 also was 32 bit, >parity is available at [sp, #4]. > >Therefore, it's usually simplest for portability reasons, to pass any >arguments after the first 8, as intptr_t or ptrdiff_t, as that makes them >consistent across platforms. Not my interface - this is already existing code. What do you suggest I do? I'm happy either to change the interface or fix my stack offsets if there is any clue that lets me detect this ABI. As personal preference I'd choose the latter. I don't have easy access to a mac. Is there any easy way of getting this tested before resubmission? Thanks JC >// Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".