From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id B9A3C46C41 for ; Thu, 6 Jul 2023 09:30:10 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C442E68C6C2; Thu, 6 Jul 2023 12:30:07 +0300 (EEST) Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id AE76A68C62D for ; Thu, 6 Jul 2023 12:30:01 +0300 (EEST) Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-3fbac8b01b3so14292335e9.1 for ; Thu, 06 Jul 2023 02:30:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1688635801; x=1691227801; h=content-transfer-encoding:mime-version:user-agent:in-reply-to :references:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=nhZ0inbx6e4JXyRE7fPHLTccCnBTRH+OGMmWNW+REJE=; b=FDiU/yzbCLQHv0jDJ+dwnXG/oPAyUtMzzuq04dfxIVqbAECQEysznqExe/NlLI8WpN S3Osy/rQyMBuQzpFa88lhvbbX2baaa+zZj4icngcE4/Ob0rnvcl3Bl/JZMWM014d3zvb XHLj55H2RSc419PDLPGPcSiFZv2lRmyw4t1ymvPcKV9Vc8rhlJOScOAuq5pp3YOgeIt/ v70oQyyxk2BQQ4WLq/znaewp++7AIHRfj1kpwbXbHPpzd2pbHcvLSixNBATXmQu3hRxJ hRJW1mI70Fxv/crCkwyLXv+f/e3fQFqDUxKVzGr1nybfoRZ9/m470ssUNFtVKI/O1qPr va9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688635801; x=1691227801; h=content-transfer-encoding:mime-version:user-agent:in-reply-to :references:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=nhZ0inbx6e4JXyRE7fPHLTccCnBTRH+OGMmWNW+REJE=; b=ZTkAJczeChJbuynVnTv6jeLd9R/AwmahCSvPG+B1hXn5MVHqF2ozFj8m+sVmBMpo0b kMF5xdtaYgtlYMtoxSDE0ocnk98jFTUwB5e8eu5YUIGbgSQy/5InfgkxcAKJSdTFFFjf iWofMWB9kqWquKFhVBI6V2NB82UxgL7DVH4cN2WU13gID4FfPLXxHetHeBhZrbyw7uZp NgbNUpOqg4w+VhVZbfMX60mh37KrHDJlmEKFbT7Zs+Zi4JNIzQEvb+9qIyzJBW9B0XvA 8mAel0BSwdJwTcWQqtui9fDskx9w/J0yyPET2yk6NuQewWmBaLzu82MGyvjttTedZUY7 8x6Q== X-Gm-Message-State: ABy/qLbCKhNoEDYijfEOMYb/Irb3X+Iai6ibIGa2uxXemIrUh41LncYl jDNJ5wTfR5WCPLskC1uXNfyE9GQFXO7mygoWS3E= X-Google-Smtp-Source: APBJJlHRVuaCXYi6s+/eTLy6Rnv7wdsowll8SlYBnNRq4bzN4brV+Kxvh8FuGo2jb+eQAJppywIKOg== X-Received: by 2002:a05:600c:ad9:b0:3f9:fd12:a8b0 with SMTP id c25-20020a05600c0ad900b003f9fd12a8b0mr1276131wmr.20.1688635800633; Thu, 06 Jul 2023 02:30:00 -0700 (PDT) Received: from CTHALPA.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id p16-20020adfe610000000b0031434c08bb7sm1349393wrm.105.2023.07.06.02.30.00 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Thu, 06 Jul 2023 02:30:00 -0700 (PDT) From: John Cox To: =?utf-8?Q?Martin_Storsj=C3=B6?= Date: Thu, 06 Jul 2023 10:30:00 +0100 Message-ID: References: <20230704140445.240426-1-jc@kynesim.co.uk> <2fc449f-836b-43b-32c6-7b5c575d5ad9@martin.st> In-Reply-To: <2fc449f-836b-43b-32c6-7b5c575d5ad9@martin.st> User-Agent: ForteAgent/8.00.32.1272 MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH v4 0/7] avfilter/vf_bwdif: Add aarch64 neon functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, ffmpeg-devel@ffmpeg.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Thu, 6 Jul 2023 00:19:50 +0300 (EEST), you wrote: >On Tue, 4 Jul 2023, John Cox wrote: > >> Also adds a filter_line3 method which on aarch64 neon yields approx 30% >> speedup over 2xfilter_line and a memcpy >> >> Differences from v3: >> Remove a few lines of neon in filter_line that should have been removed >> when copying from line3 >> >> Sorry about the two patch sets in quick succession, but I think I've >> applied all the requested changes and I didn't want this mistake in the >> final patchset. (The mistake was benign - it just wasted a few cycles.) >> >> John Cox (7): >> tests/checkasm: Add test for vf_bwdif filter_intra >> avfilter/vf_bwdif: Add neon for filter_intra >> tests/checkasm: Add test for vf_bwdif filter_edge >> avfilter/vf_bwdif: Add neon for filter_edge >> avfilter/vf_bwdif: Add neon for filter_line >> avfilter/vf_bwdif: Add a filter_line3 method for optimisation >> avfilter/vf_bwdif: Add neon for filter_line3 > >I think this looks ok to me, so I'll go ahead and push it. The tests pass >on x86 too, msvc/aarch64, llvm-mingw/aarch64, macOS and linux. > >Just a couple notes I didn't remember to mention before: > >- Regarding the int parameters on the stack; as long as you do have the C >wrapper functions, you don't strictly need to have the same function >signature for the NEON function as for the actual DSP function. So if >you'd have wanted to have a different signature for the NEON function >(changing it to intptr_t), that'd worked too. But I do see the benefit of >keeping it identical to the DSP function interface. If I was going to do that what I'd actually do is only pass a single prefs arg and no mrefs which would cut the number of args down to something that is register only. (In the only current use case all the other args are multiples of prefs.) >- The way of making the the C function exported and calling that for the >tail is neat, but kinda unusual within ffmpeg. In most cases (except for >parts of swscale), we can just assume and rely on buffers being aligned >enough for the SIMD vector length of the current platform, and freely >overwrite a little into the padding at the end of the lines. Not sure if >this is the case here though. > >(If it is, it's easy enough to remove those bits and make the C functions >static again as a follow-up.) Indeed - if you call the asm functions with a width that is not a multiple of 16 then you will get the rounded up number processed so you can just replace the helper function with the asm. >Also, checkasm coverage for >8bpp would be nice as mentioned, but if >someone wants to write asm for that, it should be doable to factorize the >new tests to run them for both 8 and 16 bpp. There seemed no point writing the test unless I had some asm to back it up! The asm for 9-14bit should be fairly straightforward (15-16-bit requires a little more work due to overflow) if processing 8 pels at a time (16-bytes), 16 pels at a time might be doable but there's a non-trivial chance of running out of registers. However my use case is Kodi on Pi4 where 10-bit interlace has never been seen so this isn't going to happen immediately. >That said, it looks ok enough to me so I'll push it. Thanks JC >// Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".