From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTP id CE31342630
	for <ffmpegdev@gitmailbox.com>; Mon, 21 Mar 2022 15:51:14 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BA73168AF1E;
	Mon, 21 Mar 2022 17:51:11 +0200 (EET)
Received: from outmail148111.authsmtp.net (outmail148111.authsmtp.net
 [62.13.148.111])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 44DE468A351
 for <ffmpeg-devel@ffmpeg.org>; Mon, 21 Mar 2022 17:51:05 +0200 (EET)
Received: from punt21.authsmtp.com (punt21.authsmtp.com [62.13.128.151])
 by punt15.authsmtp.com. (8.15.2/8.15.2) with ESMTP id 22LFp3ME053027
 for <ffmpeg-devel@ffmpeg.org>; Mon, 21 Mar 2022 15:51:03 GMT
 (envelope-from bavison@riscosopen.org)
Received: from mail-c237.authsmtp.com (mail-c237.authsmtp.com [62.13.128.237])
 by punt21.authsmtp.com. (8.15.2/8.15.2) with ESMTP id 22LFp3XO026448; 
 Mon, 21 Mar 2022 15:51:03 GMT (envelope-from bavison@riscosopen.org)
Received: from [192.168.0.212] (237.63.9.51.dyn.plus.net [51.9.63.237])
 (authenticated bits=0)
 by mail.authsmtp.com (8.15.2/8.15.2) with ESMTPSA id 22LFp1Kq066074
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO);
 Mon, 21 Mar 2022 15:51:02 GMT (envelope-from bavison@riscosopen.org)
Message-ID: <0d906656-af31-412d-96aa-6c0ab1857714@riscosopen.org>
Date: Mon, 21 Mar 2022 15:51:01 +0000
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.5.0
Content-Language: en-GB
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>,
 Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
References: <20220317185819.466470-1-bavison@riscosopen.org>
 <20220317185819.466470-7-bavison@riscosopen.org>
 <AS1PR01MB9564305644E9C9578DA3F0E08F139@AS1PR01MB9564.eurprd01.prod.exchangelabs.com>
From: Ben Avison <bavison@riscosopen.org>
Organization: RISC OS Open Ltd
In-Reply-To: <AS1PR01MB9564305644E9C9578DA3F0E08F139@AS1PR01MB9564.eurprd01.prod.exchangelabs.com>
X-Server-Quench: b5c8cbc0-a92e-11ec-ba2e-8434971169dc
X-AuthReport-Spam: If SPAM / abuse - report it at:
 http://www.authsmtp.com/abuse
X-AuthRoute: OCd1YggXA1ZfRRob ESQCJDVBUg4iPRpU DBlFKhFVNl8UURhQ
 KkJXbgASJgdFAnRQ QXkJW1ZWQFx0U2Z0 YQpTIwBcfENQWQZ0 UktOXVBXFgB3AFID
 BGJoEhgJNgVAfn53 YQhlWnZbEhZ/Jk4u EB1UCDwPZzJ9aWFK A10KJgEBbQtOfRtM
 bVF+UnpZMitlM3Bw LCEdOig8MnBTJTpY RjRFHmo7fW0rVmB7 GSVKFCskGkEITGIt
 JEV0Yn8aAFwWPlg5 PBMvUEkEewcbFg0W EUZXYmdEIVQaTCMz DAVVFUQfDCZBWypV
 B1UiPxFGDyY6
X-Authentic-SMTP: 61633632303230.1024:7600
X-AuthFastPath: 0 (Was 255)
X-AuthVirus-Status: No virus detected - but ensure you scan with your own
 anti-virus system.
Subject: Re: [FFmpeg-devel] [PATCH 6/6] avcodec/vc1: Introduce fast path for
 unescaping bitstream buffer
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/0d906656-af31-412d-96aa-6c0ab1857714@riscosopen.org/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

On 18/03/2022 19:10, Andreas Rheinhardt wrote:
> Ben Avison:
>> +static int vc1_unescape_buffer_neon(const uint8_t *src, int size, uint8_t *dst)
>> +{
>> +    /* Dealing with starting and stopping, and removing escape bytes, are
>> +     * comparatively less time-sensitive, so are more clearly expressed using
>> +     * a C wrapper around the assembly inner loop. Note that we assume a
>> +     * little-endian machine that supports unaligned loads. */
> 
> You should nevertheless use AV_RL32 for your unaligned LE loads

Thanks - I wasn't aware of that. I'll add it in.

> 1. You should add some benchmarks to the commit message.

Do you mean for each commit, or this one in particular? Are there any 
particular standard files you'd expect to see benchmarked, or will the 
ones I used in the cover-letter do? (Those were just snippets from 
problematic BluRay rips, but that does mean I don't have the rights to 
redistribute them.) I believe there should be conformance bitstreams for 
VC-1 somewhere, but I wasn't able to locate them.

During development, I wrote a simple benchmarker for this particular 
patch, which measures the throughput of processing random data (which 
doesn't contain the escape sequence at any point). I've just pushed it 
here if anyone's interested:

https://github.com/bavison/test-unescape

The compile-time define VERSION there takes a few different values:
1: the original C implementation of vc1_unescape_buffer()
2: an early prototype version I wrote that uses unaligned 32-bit loads, 
again in pure C
3: the NEON assembly versions

The sort of speeds this measures are:
             AArch32    AArch64
version 1   210 MB/s   292 MB/s
version 2   461 MB/s   435 MB/s
version 3  1294 MB/s  1554 MB/s

> 2. The unescaping process for VC1 is basically the same as for H.264 and
> HEVC* and for those we already have better optimized code in
> libavcodec/h2645_parse.c. Can you check the performance of this code
> here against (re)using the code from h2645_parse.c?

I've hacked that around a bit to match the calling conditions of 
vc1_unescape_buffer(), though not adapted it for the slightly different 
rules you noted for VC-1 as opposed to H.264/265. Hopefully it should 
still give some indication of the approximate performance that could be 
expected, but I didn't take time to fully understand everything it was 
doing, so do please say if I've messed something up.

This can be selected by #defining VERSION 4:

             AArch32    AArch64
version 4   737 MB/s  1286 MB/s

This suggests it's much better than the original C, but my NEON versions 
still have the edge, especially on AArch32. The NEON code is very much a 
brute force check, but it's effectively able to do the testing in 
parallel with the memcpy - each byte only gets loaded once.

Ben
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".