From: Ben Avison <bavison@riscosopen.org> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>, Andreas Rheinhardt <andreas.rheinhardt@outlook.com> Subject: Re: [FFmpeg-devel] [PATCH 6/6] avcodec/vc1: Introduce fast path for unescaping bitstream buffer Date: Mon, 21 Mar 2022 15:51:01 +0000 Message-ID: <0d906656-af31-412d-96aa-6c0ab1857714@riscosopen.org> (raw) In-Reply-To: <AS1PR01MB9564305644E9C9578DA3F0E08F139@AS1PR01MB9564.eurprd01.prod.exchangelabs.com> On 18/03/2022 19:10, Andreas Rheinhardt wrote: > Ben Avison: >> +static int vc1_unescape_buffer_neon(const uint8_t *src, int size, uint8_t *dst) >> +{ >> + /* Dealing with starting and stopping, and removing escape bytes, are >> + * comparatively less time-sensitive, so are more clearly expressed using >> + * a C wrapper around the assembly inner loop. Note that we assume a >> + * little-endian machine that supports unaligned loads. */ > > You should nevertheless use AV_RL32 for your unaligned LE loads Thanks - I wasn't aware of that. I'll add it in. > 1. You should add some benchmarks to the commit message. Do you mean for each commit, or this one in particular? Are there any particular standard files you'd expect to see benchmarked, or will the ones I used in the cover-letter do? (Those were just snippets from problematic BluRay rips, but that does mean I don't have the rights to redistribute them.) I believe there should be conformance bitstreams for VC-1 somewhere, but I wasn't able to locate them. During development, I wrote a simple benchmarker for this particular patch, which measures the throughput of processing random data (which doesn't contain the escape sequence at any point). I've just pushed it here if anyone's interested: https://github.com/bavison/test-unescape The compile-time define VERSION there takes a few different values: 1: the original C implementation of vc1_unescape_buffer() 2: an early prototype version I wrote that uses unaligned 32-bit loads, again in pure C 3: the NEON assembly versions The sort of speeds this measures are: AArch32 AArch64 version 1 210 MB/s 292 MB/s version 2 461 MB/s 435 MB/s version 3 1294 MB/s 1554 MB/s > 2. The unescaping process for VC1 is basically the same as for H.264 and > HEVC* and for those we already have better optimized code in > libavcodec/h2645_parse.c. Can you check the performance of this code > here against (re)using the code from h2645_parse.c? I've hacked that around a bit to match the calling conditions of vc1_unescape_buffer(), though not adapted it for the slightly different rules you noted for VC-1 as opposed to H.264/265. Hopefully it should still give some indication of the approximate performance that could be expected, but I didn't take time to fully understand everything it was doing, so do please say if I've messed something up. This can be selected by #defining VERSION 4: AArch32 AArch64 version 4 737 MB/s 1286 MB/s This suggests it's much better than the original C, but my NEON versions still have the edge, especially on AArch32. The NEON code is very much a brute force check, but it's effectively able to do the testing in parallel with the memcpy - each byte only gets loaded once. Ben _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2022-03-21 15:51 UTC|newest] Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-03-17 18:58 [FFmpeg-devel] [PATCH 0/6] avcodec/vc1: Arm optimisations Ben Avison 2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 1/6] avcodec/vc1: Arm 64-bit NEON deblocking filter fast paths Ben Avison 2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 2/6] avcodec/vc1: Arm 32-bit " Ben Avison 2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 3/6] avcodec/vc1: Arm 64-bit NEON inverse transform " Ben Avison 2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 4/6] avcodec/idctdsp: Arm 64-bit NEON block add and clamp " Ben Avison 2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 5/6] avcodec/blockdsp: Arm 64-bit NEON block clear " Ben Avison 2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 6/6] avcodec/vc1: Introduce fast path for unescaping bitstream buffer Ben Avison 2022-03-18 19:10 ` Andreas Rheinhardt 2022-03-21 15:51 ` Ben Avison [this message] 2022-03-21 20:44 ` Martin Storsjö 2022-03-19 23:06 ` [FFmpeg-devel] [PATCH 0/6] avcodec/vc1: Arm optimisations Martin Storsjö 2022-03-19 23:07 ` Martin Storsjö 2022-03-21 17:37 ` Ben Avison 2022-03-21 22:29 ` Martin Storsjö 2022-03-25 18:52 ` [FFmpeg-devel] [PATCH v2 00/10] " Ben Avison 2022-03-25 18:52 ` [FFmpeg-devel] [PATCH 01/10] checkasm: Add vc1dsp in-loop deblocking filter tests Ben Avison 2022-03-25 22:53 ` Martin Storsjö 2022-03-28 18:28 ` Ben Avison 2022-03-29 11:47 ` Martin Storsjö 2022-03-29 12:24 ` Martin Storsjö 2022-03-29 12:43 ` Martin Storsjö 2022-03-25 18:52 ` [FFmpeg-devel] [PATCH 02/10] checkasm: Add vc1dsp inverse transform tests Ben Avison 2022-03-29 12:41 ` Martin Storsjö 2022-03-25 18:52 ` [FFmpeg-devel] [PATCH 03/10] checkasm: Add idctdsp add/put-pixels-clamped tests Ben Avison 2022-03-29 13:13 ` Martin Storsjö 2022-03-29 19:56 ` Martin Storsjö 2022-03-29 20:22 ` Ben Avison 2022-03-29 20:30 ` Martin Storsjö 2022-03-25 18:52 ` [FFmpeg-devel] [PATCH 04/10] avcodec/vc1: Introduce fast path for unescaping bitstream buffer Ben Avison 2022-03-29 20:37 ` Martin Storsjö 2022-03-31 13:58 ` Ben Avison 2022-03-31 14:07 ` Martin Storsjö 2022-03-25 18:52 ` [FFmpeg-devel] [PATCH 05/10] avcodec/vc1: Arm 64-bit NEON deblocking filter fast paths Ben Avison 2022-03-30 12:35 ` Martin Storsjö 2022-03-31 15:15 ` Ben Avison 2022-03-31 21:21 ` Martin Storsjö 2022-03-25 18:52 ` [FFmpeg-devel] [PATCH 06/10] avcodec/vc1: Arm 32-bit " Ben Avison 2022-03-25 19:27 ` Lynne 2022-03-25 19:49 ` Martin Storsjö 2022-03-25 19:55 ` Lynne 2022-03-30 12:37 ` Martin Storsjö 2022-03-30 13:03 ` Martin Storsjö 2022-03-25 18:52 ` [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform " Ben Avison 2022-03-30 13:49 ` Martin Storsjö 2022-03-30 14:01 ` Martin Storsjö 2022-03-31 15:37 ` Ben Avison 2022-03-31 21:32 ` Martin Storsjö 2022-03-25 18:52 ` [FFmpeg-devel] [PATCH 08/10] avcodec/idctdsp: Arm 64-bit NEON block add and clamp " Ben Avison 2022-03-30 14:14 ` Martin Storsjö 2022-03-31 16:47 ` Ben Avison 2022-03-31 21:42 ` Martin Storsjö 2022-03-25 18:52 ` [FFmpeg-devel] [PATCH 09/10] avcodec/vc1: Arm 64-bit NEON unescape fast path Ben Avison 2022-03-30 14:35 ` Martin Storsjö 2022-03-25 18:52 ` [FFmpeg-devel] [PATCH 10/10] avcodec/vc1: Arm 32-bit " Ben Avison 2022-03-30 14:35 ` Martin Storsjö
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=0d906656-af31-412d-96aa-6c0ab1857714@riscosopen.org \ --to=bavison@riscosopen.org \ --cc=andreas.rheinhardt@outlook.com \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git