From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTP id 3424F427AE
	for <ffmpegdev@gitmailbox.com>; Wed, 30 Mar 2022 14:01:37 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9EB7868B1FB;
	Wed, 30 Mar 2022 17:01:35 +0300 (EEST)
Received: from mail8.parnet.fi (mail8.parnet.fi [77.234.108.134])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6D16C68A927
 for <ffmpeg-devel@ffmpeg.org>; Wed, 30 Mar 2022 17:01:29 +0300 (EEST)
Received: from mail9.parnet.fi (mail9.parnet.fi [77.234.108.21])
 by mail8.parnet.fi  with ESMTP id 22UE1S2o029528-22UE1S2p029528;
 Wed, 30 Mar 2022 17:01:28 +0300
Received: from foo.martin.st (host-97-187.parnet.fi [77.234.97.187])
 by mail9.parnet.fi (Postfix) with ESMTPS id 8D90CA1430;
 Wed, 30 Mar 2022 17:01:28 +0300 (EEST)
Date: Wed, 30 Mar 2022 17:01:27 +0300 (EEST)
From: =?ISO-8859-15?Q?Martin_Storsj=F6?= <martin@martin.st>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
In-Reply-To: <11a47aa-b2d8-ad46-eb34-d3a7c7cd971@martin.st>
Message-ID: <818fe3a9-a0ed-f04a-9fc8-f7e6b42bf3ee@martin.st>
References: <20220317185819.466470-1-bavison@riscosopen.org>
 <20220325185257.513933-1-bavison@riscosopen.org>
 <20220325185257.513933-8-bavison@riscosopen.org>
 <11a47aa-b2d8-ad46-eb34-d3a7c7cd971@martin.st>
MIME-Version: 1.0
X-FE-Policy-ID: 3:14:2:SYSTEM
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
Subject: Re: [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON
 inverse transform fast paths
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Ben Avison <bavison@riscosopen.org>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="iso-8859-15"; Format="flowed"
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/818fe3a9-a0ed-f04a-9fc8-f7e6b42bf3ee@martin.st/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

On Wed, 30 Mar 2022, Martin Storsj=F6 wrote:

> On Fri, 25 Mar 2022, Ben Avison wrote:
>
>> checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows.
>> =

>> vc1dsp.vc1_inv_trans_4x4_c: 158.2
>> vc1dsp.vc1_inv_trans_4x4_neon: 65.7
>> vc1dsp.vc1_inv_trans_4x4_dc_c: 86.5
>> vc1dsp.vc1_inv_trans_4x4_dc_neon: 26.5
>> vc1dsp.vc1_inv_trans_4x8_c: 335.2
>> vc1dsp.vc1_inv_trans_4x8_neon: 106.2
>> vc1dsp.vc1_inv_trans_4x8_dc_c: 151.2
>> vc1dsp.vc1_inv_trans_4x8_dc_neon: 25.5
>> vc1dsp.vc1_inv_trans_8x4_c: 365.7
>> vc1dsp.vc1_inv_trans_8x4_neon: 97.2
>> vc1dsp.vc1_inv_trans_8x4_dc_c: 139.7
>> vc1dsp.vc1_inv_trans_8x4_dc_neon: 16.5
>> vc1dsp.vc1_inv_trans_8x8_c: 547.7
>> vc1dsp.vc1_inv_trans_8x8_neon: 137.0
>> vc1dsp.vc1_inv_trans_8x8_dc_c: 268.2
>> vc1dsp.vc1_inv_trans_8x8_dc_neon: 30.5
>> =

>> Signed-off-by: Ben Avison <bavison@riscosopen.org>
>> ---
>> libavcodec/aarch64/vc1dsp_init_aarch64.c |  19 +
>> libavcodec/aarch64/vc1dsp_neon.S         | 678 +++++++++++++++++++++++
>> 2 files changed, 697 insertions(+)
>
> Looks generally reasonable. Is it possible to factorize out the individua=
l =

> transforms (so that you'd e.g. invoke the same macro twice in the 8x8 and=
 4x4 =

> functions) without too much loss? The downshift which differs between thw=
 two =

> could either be left outside of the macro, or the downshift amount could =
be =

> made a macro parameter.

Another aspect: I forgot the aspect that we have existing arm assembly for =

the idct. In some cases, there's value in keeping the implementations =

similar if possible and relevant. But your implementation seems quite =

straightforward, and seems to get better benchmark numbers on the same =

cores, so I guess it's fine to diverge and add a new from-scratch =

implementation here.

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".