From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 2606F46A6B for ; Sun, 2 Jul 2023 14:02:33 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7C90C68C311; Sun, 2 Jul 2023 17:02:30 +0300 (EEST) Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 12DC5688371 for ; Sun, 2 Jul 2023 17:02:23 +0300 (EEST) Received: from tutadb.w10.tutanota.de (unknown [192.168.1.10]) by w4.tutanota.de (Postfix) with ESMTP id 0E9611060235 for ; Sun, 2 Jul 2023 14:02:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1688306542; s=s1; d=lynne.ee; h=From:From:To:To:Subject:Subject:Content-Description:Content-ID:Content-Type:Content-Type:Content-Transfer-Encoding:Content-Transfer-Encoding:Cc:Date:Date:In-Reply-To:In-Reply-To:MIME-Version:MIME-Version:Message-ID:Message-ID:Reply-To:References:References:Sender; bh=E7myW0TMESpzwIq0bSqlf0RdKYhoT2TxfsajxP+NxaQ=; b=JCy2IbRGy8VrZ2m5DQj/PlwXN1TNop0zud3w3zwQvnJjZnCPCFVzaXkcafmeIA4F NG0x1GhhQ3wa96vV+y+L2ulA7djvL676Y3eJh9ORjt5FVir0lJNScCe1BtDNNZLOQU2 M/0QG7stvSYPRT0q7Ta7zxbtjHWJKueNbCrjJUWByTQ61IXBwouWbP6yia9b7aRHjsd cyi0ac+6SbGKj6EttIUKy89rAy8vqG08kP9JL3MUXJv+lDcX+m53zDCy6RsX9ExcrIM 6uMR/SdyL2ZpVFPIyt+uR91Uv+Rz9L0Xu1A3ZPv2y2AWUPPF2jKEbcF2dtEnoMNjzVg S1cMVfuGvg== Date: Sun, 2 Jul 2023 16:02:22 +0200 (CEST) From: Lynne To: FFmpeg development discussions and patches Message-ID: In-Reply-To: <20230702123242.232484-7-jc@kynesim.co.uk> References: <20230702123242.232484-1-jc@kynesim.co.uk> <20230702123242.232484-7-jc@kynesim.co.uk> MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH v2 06/15] avfilter/vf_bwdif: Add clip and spatial macros for aarch64 neon X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Jul 2, 2023, 14:34 by jc@kynesim.co.uk: > Signed-off-by: John Cox > --- > libavfilter/aarch64/vf_bwdif_neon.S | 73 +++++++++++++++++++++++++++++ > 1 file changed, 73 insertions(+) > > diff --git a/libavfilter/aarch64/vf_bwdif_neon.S b/libavfilter/aarch64/vf_bwdif_neon.S > index 6a614f8d6e..48dc7bcd9d 100644 > --- a/libavfilter/aarch64/vf_bwdif_neon.S > +++ b/libavfilter/aarch64/vf_bwdif_neon.S > @@ -66,6 +66,79 @@ > umlsl2 \a3\().4s, \s1\().8h, \k > .endm > > +// int b = m2s1 - m1; > +// int f = p2s1 - p1; > +// int dc = c0s1 - m1; > +// int de = c0s1 - p1; > +// int sp_max = FFMIN(p1 - c0s1, m1 - c0s1); > +// sp_max = FFMIN(sp_max, FFMAX(-b,-f)); > +// int sp_min = FFMIN(c0s1 - p1, c0s1 - m1); > +// sp_min = FFMIN(sp_min, FFMAX(b,f)); > +// diff = diff == 0 ? 0 : FFMAX3(diff, sp_min, sp_max); > +.macro SPAT_CHECK diff, m2s1, m1, c0s1, p1, p2s1, t0, t1, t2, t3 > + uqsub \t0\().16b, \p1\().16b, \c0s1\().16b > + uqsub \t2\().16b, \m1\().16b, \c0s1\().16b > + umin \t2\().16b, \t0\().16b, \t2\().16b > + > + uqsub \t1\().16b, \m1\().16b, \m2s1\().16b > + uqsub \t3\().16b, \p1\().16b, \p2s1\().16b > + umax \t3\().16b, \t3\().16b, \t1\().16b > + umin \t3\().16b, \t3\().16b, \t2\().16b > + > + uqsub \t0\().16b, \c0s1\().16b, \p1\().16b > + uqsub \t2\().16b, \c0s1\().16b, \m1\().16b > + umin \t2\().16b, \t0\().16b, \t2\().16b > + > + uqsub \t1\().16b, \m2s1\().16b, \m1\().16b > + uqsub \t0\().16b, \p2s1\().16b, \p1\().16b > + umax \t0\().16b, \t0\().16b, \t1\().16b > + umin \t2\().16b, \t2\().16b, \t0\().16b > + > + cmeq \t1\().16b, \diff\().16b, #0 > + umax \diff\().16b, \diff\().16b, \t3\().16b > + umax \diff\().16b, \diff\().16b, \t2\().16b > + bic \diff\().16b, \diff\().16b, \t1\().16b > +.endm > + > +// i0 = s0; > +// if (i0 > d0 + diff0) > +// i0 = d0 + diff0; > +// else if (i0 < d0 - diff0) > +// i0 = d0 - diff0; > +// > +// i0 = s0 is safe > +.macro DIFF_CLIP i0, s0, d0, diff, t0, t1 > + uqadd \t0\().16b, \d0\().16b, \diff\().16b > + uqsub \t1\().16b, \d0\().16b, \diff\().16b > + umin \i0\().16b, \s0\().16b, \t0\().16b > + umax \i0\().16b, \i0\().16b, \t1\().16b > +.endm > + > +// i0 = FFABS(m1 - p1) > td0 ? i1 : i2; > +// DIFF_CLIP > +// > +// i0 = i1 is safe > +.macro INTERPOL i0, i1, i2, m1, d0, p1, td0, diff, t0, t1, t2 > + uabd \t0\().16b, \m1\().16b, \p1\().16b > + cmhi \t0\().16b, \t0\().16b, \td0\().16b > + bsl \t0\().16b, \i1\().16b, \i2\().16b > + DIFF_CLIP \i0, \t0, \d0, \diff, \t1, \t2 > +.endm > + > +.macro PUSH_VREGS > + stp d8, d9, [sp, #-64]! > + stp d10, d11, [sp, #16] > + stp d12, d13, [sp, #32] > + stp d14, d15, [sp, #48] > +.endm > + > +.macro POP_VREGS > + ldp d14, d15, [sp, #48] > + ldp d12, d13, [sp, #32] > + ldp d10, d11, [sp, #16] > + ldp d8, d9, [sp], #64 > +.endm > Could you squash? Adding empty files and then commit by commit filling them up is pointless and makes it harder to review. Just export what you need in one commit, and add everything else in another. Also, keep in mind the final spatial clip at the end should be removable. I discovered it makes the filter look quite a lot better. Currently, only the Vulkan version does it, but we're looking into changing the C/asm versions too, and you're the second one to rush into implementing asm for it before we've had a chance to discuss it properly. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".