From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 83B5046A4D for ; Sun, 2 Jul 2023 12:34:23 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C055368C335; Sun, 2 Jul 2023 15:33:20 +0300 (EEST) Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com [209.85.128.52]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id EE65068C311 for ; Sun, 2 Jul 2023 15:33:12 +0300 (EEST) Received: by mail-wm1-f52.google.com with SMTP id 5b1f17b1804b1-3fb4146e8deso50874885e9.0 for ; Sun, 02 Jul 2023 05:33:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1688301192; x=1690893192; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eCO/ylYdkhcZNGVwiQ4Jymw9L2xrls+PUjEuhxqI3PU=; b=dO49YcPWsQoeZV80AayvytEalH+kTgMqHhDcgcMrFkQkYf1y4TFb/fjDrSLwig8vUQ hjinEoibyF2skz8194v6rOl7FLm7K3ADkQzVlKCtv5Nqg3wsc9FccUJFJAQVYaHO1N45 +FwEAXAxisUJHfGMnfHnZ8otvQ0RL4mQv+gD5Jb+XnB0jPEuymTDL6LqjzEYB4fVhD+E 40YmUQEeOKpvcQbv6kWh4wfoq2T8QbQOYY9gVJtM6srYF+IvUVlJgx2txBeb/JR4WN/9 g4Y1qhh8Mg0viQ4xYdJJQETgLgccJ3XrLJ0VWoFDM/i6qPkP7WKIylXhlKmM6ia8dBDt TsDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688301192; x=1690893192; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eCO/ylYdkhcZNGVwiQ4Jymw9L2xrls+PUjEuhxqI3PU=; b=InoFmbT61VsJ+LiVaSykK8KSJTeTWUwzJMa6iL36GArK5FH0e6uS9YHkGZPmqbmDhx /W45K7fz7RsoAO+0FvS7GusrX5HSEQ1b4O5rBD/KrA/oRgNSaPqm6XYpVEPtB4+zhmWl TGqjGeLje3591vdAbzk/72uuWy6yoFdHRWV39UpdpOPQDXDV0HjjusWVbwNurIGk6gTl ddN+DnAuZA+b/EDkBsMGmVZYgSfMJ49SJ3NudiGS6EtuYIbtxJ8Pp/XOgceO2jvGP7mw yhk3Iy8dkC9mKrF/pkRelkP5GqXHPkOy8LyzLDMF0LhdCbUdqAoAdSDpEBFgtIewj/Hj 0QQg== X-Gm-Message-State: AC+VfDyOibMaQMNf3Bq/Xr2V/ySVi7EEwslE+4nh+FY0in394Pm7X/+U U8Xp8jl/KLWYs9D00zKpyqvANfOh3dLoIyQp0ms= X-Google-Smtp-Source: ACHHUZ6aLtLXWHCtDKAdKNg6rPcxKFSWm2mHeueJRrilANJZN1TY1evuGg8qdB+syPWdRih7ovhDhw== X-Received: by 2002:a1c:4b17:0:b0:3f9:255e:ee3b with SMTP id y23-20020a1c4b17000000b003f9255eee3bmr9345906wma.30.1688301192275; Sun, 02 Jul 2023 05:33:12 -0700 (PDT) Received: from sucnaath.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id f12-20020a7bc8cc000000b003fbbe41fd78sm8816167wml.10.2023.07.02.05.33.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Jul 2023 05:33:11 -0700 (PDT) From: John Cox To: ffmpeg-devel@ffmpeg.org Date: Sun, 2 Jul 2023 12:32:33 +0000 Message-Id: <20230702123242.232484-7-jc@kynesim.co.uk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230702123242.232484-1-jc@kynesim.co.uk> References: <20230702123242.232484-1-jc@kynesim.co.uk> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 06/15] avfilter/vf_bwdif: Add clip and spatial macros for aarch64 neon X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, John Cox , martin@martin.st Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_neon.S | 73 +++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) diff --git a/libavfilter/aarch64/vf_bwdif_neon.S b/libavfilter/aarch64/vf_bwdif_neon.S index 6a614f8d6e..48dc7bcd9d 100644 --- a/libavfilter/aarch64/vf_bwdif_neon.S +++ b/libavfilter/aarch64/vf_bwdif_neon.S @@ -66,6 +66,79 @@ umlsl2 \a3\().4s, \s1\().8h, \k .endm +// int b = m2s1 - m1; +// int f = p2s1 - p1; +// int dc = c0s1 - m1; +// int de = c0s1 - p1; +// int sp_max = FFMIN(p1 - c0s1, m1 - c0s1); +// sp_max = FFMIN(sp_max, FFMAX(-b,-f)); +// int sp_min = FFMIN(c0s1 - p1, c0s1 - m1); +// sp_min = FFMIN(sp_min, FFMAX(b,f)); +// diff = diff == 0 ? 0 : FFMAX3(diff, sp_min, sp_max); +.macro SPAT_CHECK diff, m2s1, m1, c0s1, p1, p2s1, t0, t1, t2, t3 + uqsub \t0\().16b, \p1\().16b, \c0s1\().16b + uqsub \t2\().16b, \m1\().16b, \c0s1\().16b + umin \t2\().16b, \t0\().16b, \t2\().16b + + uqsub \t1\().16b, \m1\().16b, \m2s1\().16b + uqsub \t3\().16b, \p1\().16b, \p2s1\().16b + umax \t3\().16b, \t3\().16b, \t1\().16b + umin \t3\().16b, \t3\().16b, \t2\().16b + + uqsub \t0\().16b, \c0s1\().16b, \p1\().16b + uqsub \t2\().16b, \c0s1\().16b, \m1\().16b + umin \t2\().16b, \t0\().16b, \t2\().16b + + uqsub \t1\().16b, \m2s1\().16b, \m1\().16b + uqsub \t0\().16b, \p2s1\().16b, \p1\().16b + umax \t0\().16b, \t0\().16b, \t1\().16b + umin \t2\().16b, \t2\().16b, \t0\().16b + + cmeq \t1\().16b, \diff\().16b, #0 + umax \diff\().16b, \diff\().16b, \t3\().16b + umax \diff\().16b, \diff\().16b, \t2\().16b + bic \diff\().16b, \diff\().16b, \t1\().16b +.endm + +// i0 = s0; +// if (i0 > d0 + diff0) +// i0 = d0 + diff0; +// else if (i0 < d0 - diff0) +// i0 = d0 - diff0; +// +// i0 = s0 is safe +.macro DIFF_CLIP i0, s0, d0, diff, t0, t1 + uqadd \t0\().16b, \d0\().16b, \diff\().16b + uqsub \t1\().16b, \d0\().16b, \diff\().16b + umin \i0\().16b, \s0\().16b, \t0\().16b + umax \i0\().16b, \i0\().16b, \t1\().16b +.endm + +// i0 = FFABS(m1 - p1) > td0 ? i1 : i2; +// DIFF_CLIP +// +// i0 = i1 is safe +.macro INTERPOL i0, i1, i2, m1, d0, p1, td0, diff, t0, t1, t2 + uabd \t0\().16b, \m1\().16b, \p1\().16b + cmhi \t0\().16b, \t0\().16b, \td0\().16b + bsl \t0\().16b, \i1\().16b, \i2\().16b + DIFF_CLIP \i0, \t0, \d0, \diff, \t1, \t2 +.endm + +.macro PUSH_VREGS + stp d8, d9, [sp, #-64]! + stp d10, d11, [sp, #16] + stp d12, d13, [sp, #32] + stp d14, d15, [sp, #48] +.endm + +.macro POP_VREGS + ldp d14, d15, [sp, #48] + ldp d12, d13, [sp, #32] + ldp d10, d11, [sp, #16] + ldp d8, d9, [sp], #64 +.endm + // static const uint16_t coef_lf[2] = { 4309, 213 }; // static const uint16_t coef_hf[3] = { 5570, 3801, 1016 }; // static const uint16_t coef_sp[2] = { 5077, 981 }; -- 2.39.2 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".