From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTP id D29CA40244
	for <ffmpegdev@gitmailbox.com>; Wed, 20 Jul 2022 04:41:56 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C07A868B830;
	Wed, 20 Jul 2022 07:41:34 +0300 (EEST)
Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com
 [209.85.215.171])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E722168B82B
 for <ffmpeg-devel@ffmpeg.org>; Wed, 20 Jul 2022 07:41:25 +0300 (EEST)
Received: by mail-pg1-f171.google.com with SMTP id bh13so15366988pgb.4
 for <ffmpeg-devel@ffmpeg.org>; Tue, 19 Jul 2022 21:41:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding;
 bh=c4xeuuzjgkCyJeehhi155/UaxzBBbcPC2HabThBId/I=;
 b=mPTInKTpJeKEWZCUxtsOvlEIMBkFH6T0Kn+OITXaAe7hwlW21QyK0oiM8gxutllaQA
 KPqXymJmf4okA5s3LIirZv9VLKg0zfD/vl+Ck/BjO4lVKvVJ9FFfMa/BzMuta1MsR2Rd
 mBDv5KX+T8uK0R67mNjsj4TNO04i5FriVSKpO94e8+L4iSjaOXIAu6ZrYjdUNdV8rqkw
 yUdDWRFtn2t/C42VTrCkOJVCSKAuf92lT3bmzi6+ffENSAzdHPJnI0G3+SI95gh/6IjV
 5Nr0R8KOWfRwKWvFynRVGm4ZsW7qmuY95lRm8HbzuNp5XDmW9+sL+s5RIXy1jDMPry3z
 ylrA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=c4xeuuzjgkCyJeehhi155/UaxzBBbcPC2HabThBId/I=;
 b=NFXOW7tj8HGxFwVV8hHRJEzUCAQS1n8KFsefw/r/BbDEIJdlSwD/onkzyyKnDxnwSb
 XqIXFtycPLLXeC36KSC0HZa4As45CeaNR765z92Q/Qg6qWonvXbvLdDlFTWBv1ONsOdR
 Dz2jJt6QhA03+nH2ZBNYJTOa1pzFU7KQpZzgWwWYAk4LckNFxfqewABCW0t6lpU/nJ7m
 e/YVg4G5sPGgeeDed+3jOwETwTz6AbA4er0HEAxGqOL1tDP7tqG9P450OKZU6dHG8SKZ
 lg7KsOOUbRabIhZbiQW1jKpCJOt5jPe1KDHItHsMWDQ+to31a9PuvdyTyHzlH+Qwe1bF
 2hMw==
X-Gm-Message-State: AJIora9NjcuL/Va04IRy9ulGlNHTOFH6fWrFvhG9LTtiZBnaOxcFvuIZ
 X5O+McW2gp53WmXfkY/ioRLNbn0FMetKdw==
X-Google-Smtp-Source: AGRyM1tbtNjvZ/ML9UUziW1R23/PKHcYMXCHoTGY24ALFzRwjIXNHFqAOM/luRFnkTXCiGyIYCP6ww==
X-Received: by 2002:a63:9049:0:b0:412:b11b:c630 with SMTP id
 a70-20020a639049000000b00412b11bc630mr31785614pge.175.1658292083977; 
 Tue, 19 Jul 2022 21:41:23 -0700 (PDT)
Received: from localhost.localdomain
 (23-121-159-29.lightspeed.sntcca.sbcglobal.net. [23.121.159.29])
 by smtp.googlemail.com with ESMTPSA id
 f16-20020a635110000000b003fba1a97c49sm10855907pgb.61.2022.07.19.21.41.23
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 19 Jul 2022 21:41:23 -0700 (PDT)
From: Chris Phlipot <cphlipot0@gmail.com>
To: ffmpeg-devel@ffmpeg.org
Date: Tue, 19 Jul 2022 21:41:16 -0700
Message-Id: <20220720044117.1282961-4-cphlipot0@gmail.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20220720044117.1282961-1-cphlipot0@gmail.com>
References: <20220720044117.1282961-1-cphlipot0@gmail.com>
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH 4/5] avfilter/vf_yadif: Process more pixels
 using filter_line
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Chris Phlipot <cphlipot0@gmail.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/20220720044117.1282961-4-cphlipot0@gmail.com/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

filter_line is generally vectorized, wheras filter_edge is implemented
in C. Currently we rely on filter_edge to process non-edges in cases
where the width doesn't match the alignment. This causes us to process
non-edge pixels with the slow C implementation vs the faster SSE
implementation.

It is generally faster to process 8 pixels with the slowest SSE2
vectorized implementation than it is to process 2 pixels with the
C implementation. Therefore, if filter_edge needs to process 2 or
more non-edge pixels, it would be faster to process these non-edge
pixels with filter_line instead even if it processes more pixels
than necessary.

To address this, we use filter_line so long as we know that at least
2 pixels will be used in the final output even if the rest of the
computed pixels are invalid. Any incorrect output pixels generated by
filter_line will be overwritten by the following call to filter_edge.
In addtion we avoid running filter_line if it would read or write
pixels outside the current slice.

Signed-off-by: Chris Phlipot <cphlipot0@gmail.com>
---
 libavfilter/vf_yadif.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/libavfilter/vf_yadif.c b/libavfilter/vf_yadif.c
index 54109566be..394c04a985 100644
--- a/libavfilter/vf_yadif.c
+++ b/libavfilter/vf_yadif.c
@@ -201,6 +201,8 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
     int slice_end   = (td->h * (jobnr+1)) / nb_jobs;
     int y;
     int edge = 3 + s->req_align / df - 1;
+    int filter_width_target = td->w - 3;
+    int filter_width_rounded_up = (filter_width_target & ~(s->req_align-1)) + s->req_align;
 
     /* filtering reads 3 pixels to the left/right; to avoid invalid reads,
      * we need to call the c variant which avoids this for border pixels
@@ -215,11 +217,28 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
             int     mrefs = y ? -refs : refs;
             int    parity = td->parity ^ td->tff;
             int     mode  = y == 1 || y + 2 == td->h ? 2 : s->mode;
+
+            /* Adjust width and alignment to process extra pixels in filter_line
+             * using potentially vectorized code so long as it doesn't cause
+             * reads or writes outside of the current slice. filter_edge will
+             * correct any incorrect pixels written by filter_line in this
+             * scenario.
+             */
+            int filter_width;
+            int edge_alignment;
+            if (filter_width_rounded_up - filter_width_target >= 2
+                && y*refs + filter_width_rounded_up < slice_end * refs + refs - 3) {
+                filter_width = filter_width_rounded_up;
+                edge_alignment = 1;
+            } else {
+                filter_width = td->w - edge;
+                edge_alignment = s->req_align;
+            }
             s->filter_line(dst + pix_3, prev + pix_3, cur + pix_3,
-                           next + pix_3, td->w - edge,
+                           next + pix_3, filter_width,
                            prefs, mrefs, parity, mode);
             s->filter_edges(dst, prev, cur, next, td->w,
-                            prefs, mrefs, parity, mode, s->req_align);
+                            prefs, mrefs, parity, mode, edge_alignment);
         } else {
             memcpy(&td->frame->data[td->plane][y * td->frame->linesize[td->plane]],
                    &s->cur->data[td->plane][y * refs], td->w * df);
-- 
2.25.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".