From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTP id B9A3C46C41
	for <ffmpegdev@gitmailbox.com>; Thu,  6 Jul 2023 09:30:10 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C442E68C6C2;
	Thu,  6 Jul 2023 12:30:07 +0300 (EEST)
Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com
 [209.85.128.53])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id AE76A68C62D
 for <ffmpeg-devel@ffmpeg.org>; Thu,  6 Jul 2023 12:30:01 +0300 (EEST)
Received: by mail-wm1-f53.google.com with SMTP id
 5b1f17b1804b1-3fbac8b01b3so14292335e9.1
 for <ffmpeg-devel@ffmpeg.org>; Thu, 06 Jul 2023 02:30:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1688635801; x=1691227801;
 h=content-transfer-encoding:mime-version:user-agent:in-reply-to
 :references:message-id:date:subject:cc:to:from:from:to:cc:subject
 :date:message-id:reply-to;
 bh=nhZ0inbx6e4JXyRE7fPHLTccCnBTRH+OGMmWNW+REJE=;
 b=FDiU/yzbCLQHv0jDJ+dwnXG/oPAyUtMzzuq04dfxIVqbAECQEysznqExe/NlLI8WpN
 S3Osy/rQyMBuQzpFa88lhvbbX2baaa+zZj4icngcE4/Ob0rnvcl3Bl/JZMWM014d3zvb
 XHLj55H2RSc419PDLPGPcSiFZv2lRmyw4t1ymvPcKV9Vc8rhlJOScOAuq5pp3YOgeIt/
 v70oQyyxk2BQQ4WLq/znaewp++7AIHRfj1kpwbXbHPpzd2pbHcvLSixNBATXmQu3hRxJ
 hRJW1mI70Fxv/crCkwyLXv+f/e3fQFqDUxKVzGr1nybfoRZ9/m470ssUNFtVKI/O1qPr
 va9g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20221208; t=1688635801; x=1691227801;
 h=content-transfer-encoding:mime-version:user-agent:in-reply-to
 :references:message-id:date:subject:cc:to:from:x-gm-message-state
 :from:to:cc:subject:date:message-id:reply-to;
 bh=nhZ0inbx6e4JXyRE7fPHLTccCnBTRH+OGMmWNW+REJE=;
 b=ZTkAJczeChJbuynVnTv6jeLd9R/AwmahCSvPG+B1hXn5MVHqF2ozFj8m+sVmBMpo0b
 kMF5xdtaYgtlYMtoxSDE0ocnk98jFTUwB5e8eu5YUIGbgSQy/5InfgkxcAKJSdTFFFjf
 iWofMWB9kqWquKFhVBI6V2NB82UxgL7DVH4cN2WU13gID4FfPLXxHetHeBhZrbyw7uZp
 NgbNUpOqg4w+VhVZbfMX60mh37KrHDJlmEKFbT7Zs+Zi4JNIzQEvb+9qIyzJBW9B0XvA
 8mAel0BSwdJwTcWQqtui9fDskx9w/J0yyPET2yk6NuQewWmBaLzu82MGyvjttTedZUY7
 8x6Q==
X-Gm-Message-State: ABy/qLbCKhNoEDYijfEOMYb/Irb3X+Iai6ibIGa2uxXemIrUh41LncYl
 jDNJ5wTfR5WCPLskC1uXNfyE9GQFXO7mygoWS3E=
X-Google-Smtp-Source: APBJJlHRVuaCXYi6s+/eTLy6Rnv7wdsowll8SlYBnNRq4bzN4brV+Kxvh8FuGo2jb+eQAJppywIKOg==
X-Received: by 2002:a05:600c:ad9:b0:3f9:fd12:a8b0 with SMTP id
 c25-20020a05600c0ad900b003f9fd12a8b0mr1276131wmr.20.1688635800633; 
 Thu, 06 Jul 2023 02:30:00 -0700 (PDT)
Received: from CTHALPA.outer.uphall.net
 (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248])
 by smtp.gmail.com with ESMTPSA id
 p16-20020adfe610000000b0031434c08bb7sm1349393wrm.105.2023.07.06.02.30.00
 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128);
 Thu, 06 Jul 2023 02:30:00 -0700 (PDT)
From: John Cox <jc@kynesim.co.uk>
To: =?utf-8?Q?Martin_Storsj=C3=B6?= <martin@martin.st>
Date: Thu, 06 Jul 2023 10:30:00 +0100
Message-ID: <fv0dail1p4icss6l4p1cllft2ft39shdr8@4ax.com>
References: <20230704140445.240426-1-jc@kynesim.co.uk>
 <2fc449f-836b-43b-32c6-7b5c575d5ad9@martin.st>
In-Reply-To: <2fc449f-836b-43b-32c6-7b5c575d5ad9@martin.st>
User-Agent: ForteAgent/8.00.32.1272
MIME-Version: 1.0
Subject: Re: [FFmpeg-devel] [PATCH v4 0/7] avfilter/vf_bwdif: Add aarch64
 neon functions
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: thomas.mundt@hr.de, ffmpeg-devel@ffmpeg.org
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/fv0dail1p4icss6l4p1cllft2ft39shdr8@4ax.com/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

On Thu, 6 Jul 2023 00:19:50 +0300 (EEST), you wrote:

>On Tue, 4 Jul 2023, John Cox wrote:
>
>> Also adds a filter_line3 method which on aarch64 neon yields approx 30%
>> speedup over 2xfilter_line and a memcpy
>>
>> Differences from v3:
>> Remove a few lines of neon in filter_line that should have been removed
>> when copying from line3
>>
>> Sorry about the two patch sets in quick succession, but I think I've
>> applied all the requested changes and I didn't want this mistake in the
>> final patchset. (The mistake was benign - it just wasted a few cycles.)
>>
>> John Cox (7):
>>  tests/checkasm: Add test for vf_bwdif filter_intra
>>  avfilter/vf_bwdif: Add neon for filter_intra
>>  tests/checkasm: Add test for vf_bwdif filter_edge
>>  avfilter/vf_bwdif: Add neon for filter_edge
>>  avfilter/vf_bwdif: Add neon for filter_line
>>  avfilter/vf_bwdif: Add a filter_line3 method for optimisation
>>  avfilter/vf_bwdif: Add neon for filter_line3
>
>I think this looks ok to me, so I'll go ahead and push it. The tests pass 
>on x86 too, msvc/aarch64, llvm-mingw/aarch64, macOS and linux.
>
>Just a couple notes I didn't remember to mention before:
>
>- Regarding the int parameters on the stack; as long as you do have the C 
>wrapper functions, you don't strictly need to have the same function 
>signature for the NEON function as for the actual DSP function. So if 
>you'd have wanted to have a different signature for the NEON function 
>(changing it to intptr_t), that'd worked too. But I do see the benefit of 
>keeping it identical to the DSP function interface.

If I was going to do that what I'd actually do is only pass a single
prefs arg and no mrefs which would cut the number of args down to
something that is register only. (In the only current use case all the
other args are multiples of prefs.)

>- The way of making the the C function exported and calling that for the 
>tail is neat, but kinda unusual within ffmpeg. In most cases (except for 
>parts of swscale), we can just assume and rely on buffers being aligned 
>enough for the SIMD vector length of the current platform, and freely 
>overwrite a little into the padding at the end of the lines. Not sure if 
>this is the case here though.
>
>(If it is, it's easy enough to remove those bits and make the C functions 
>static again as a follow-up.)

Indeed - if you call the asm functions with a width that is not a
multiple of 16 then you will get the rounded up number processed so you
can just replace the helper function with the asm.

>Also, checkasm coverage for >8bpp would be nice as mentioned, but if 
>someone wants to write asm for that, it should be doable to factorize the 
>new tests to run them for both 8 and 16 bpp.

There seemed no point writing the test unless I had some asm to back it
up!

The asm for 9-14bit should be fairly straightforward (15-16-bit requires
a little more work due to overflow) if processing 8 pels at a time
(16-bytes), 16 pels at a time might be doable but there's a non-trivial
chance of running out of registers.

However my use case is Kodi on Pi4 where 10-bit interlace has never been
seen so this isn't going to happen immediately.

>That said, it looks ok enough to me so I'll push it.

Thanks

JC

>// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".