From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id E86A541A73 for ; Sat, 19 Mar 2022 23:06:14 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D8CC468B0A2; Sun, 20 Mar 2022 01:06:11 +0200 (EET) Received: from mail8.parnet.fi (mail8.parnet.fi [77.234.108.134]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 348D968AA3E for ; Sun, 20 Mar 2022 01:06:05 +0200 (EET) Received: from mail9.parnet.fi (mail9.parnet.fi [77.234.108.21]) by mail8.parnet.fi with ESMTP id 22JN63Bt019648-22JN63Bu019648; Sun, 20 Mar 2022 01:06:03 +0200 Received: from foo.martin.st (host-97-187.parnet.fi [77.234.97.187]) by mail9.parnet.fi (Postfix) with ESMTPS id 8286CA1439; Sun, 20 Mar 2022 01:06:03 +0200 (EET) Date: Sun, 20 Mar 2022 01:06:03 +0200 (EET) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: FFmpeg development discussions and patches In-Reply-To: <20220317185819.466470-1-bavison@riscosopen.org> Message-ID: References: <20220317185819.466470-1-bavison@riscosopen.org> MIME-Version: 1.0 X-FE-Policy-ID: 3:14:2:SYSTEM Subject: Re: [FFmpeg-devel] [PATCH 0/6] avcodec/vc1: Arm optimisations X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Ben Avison Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Hi Ben, On Thu, 17 Mar 2022, Ben Avison wrote: > The VC1 decoder was missing lots of important fast paths for Arm, especially > for 64-bit Arm. This submission fills in implementations for all functions > where a fast path already existed and the fallback C implementation was > taking 1% or more of the runtime, and adds a new fast path to permit > vc1_unescape_buffer() to be overridden. > > I've measured the playback speed on a 1.5 GHz Cortex-A72 (Raspberry Pi 4) > using `ffmpeg -i -f null -` for a couple of example streams: > > Architecture: AArch32 AArch32 AArch64 AArch64 > Stream: 1 2 1 2 > Before speed: 1.22x 0.82x 1.00x 0.67x > After speed: 1.31x 0.98x 1.39x 1.06x > Improvement: 7.4% 20% 39% 58% > > `make fate` passes on both AArch32 and AArch64. Thanks for the patches! I have looked briefly at the patches (I haven't started reading the implementation in detail yet though). As you are writing assembly for these functions, I would very much appreciate if you could add checkasm tests for all the functions you're implementing. I see that there exists a test for the blockdsp functions, but all the other ones are missing a test. I try to request such tests for all new assembly. Such a test allows testing all interesting cornercases of the DSP functions with one concise test, instead of having to run the full fate testsuite. It also allows catching a number of other possible lingering issues, like using the full 64 bit register when the argument only set 32 bits where the upper bits are undefined, or missing to restore callee saved registers, etc. It also allows for easy benchmarking of the functions on their own, which is very useful for tuning of the implementation. And it finally allows easily checking that the assembly works correctly when built with a different toolchain for a different platform - without needing to run the full decoding tests. Especially as you've been implementing the functions, you're probably more familiar with the expectations and behaviours (and potential cornercases that are worth testing) of the functions than most other developers in the community at the moment, which is good for writing useful testcases. There's plenty of existing examples of such tests - the h264dsp, vp8dsp and vp9dsp cases might be relevant. The other main issue I'd like to request is to indent the assembly similarly to the rest of the existing assembly. For the 32 bit assembly, your patches do match the surrounding code, but for the 64 bit assembly, your patches align the operands column differently than the rest. (I think your code aligns the operands with 16 chars to the left of the operands, while our code aligns it with 24 chars to the left, both in 32 and 64 bit arm assembly.) Finally, the 32 bit assembly fails to build for me both with (recent) clang and old binutils, with errors like these: src/libavcodec/arm/vc1dsp_neon.S: Assembler messages: src/libavcodec/arm/vc1dsp_neon.S:1579: Error: bad type for scalar -- `vmov r0,d4[1]' src/libavcodec/arm/vc1dsp_neon.S:1582: Error: bad type for scalar -- `vmov r2,d5[1]' src/libavcodec/arm/vc1dsp_neon.S:1592: Error: bad type for scalar -- `vmov r2,d8[1]' src/libavcodec/arm/vc1dsp_neon.S:1595: Error: bad type for scalar -- `vmov r12,d9[1]' Qualifying the "vmov" into "vmov.32" seems to fix it. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".