Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
From: "Martin Storsjö" <martin@martin.st>
To: Ben Avison <bavison@riscosopen.org>
Cc: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform fast paths
Date: Fri, 1 Apr 2022 00:32:22 +0300 (EEST)
Message-ID: <a98fdc73-b3c2-e087-1f33-a7ed12a0a779@martin.st> (raw)
In-Reply-To: <6b959f99-b3bd-ca2a-08fb-d43dddeb3a2b@riscosopen.org>

On Thu, 31 Mar 2022, Ben Avison wrote:

> On 30/03/2022 14:49, Martin Storsjö wrote:
>> Looks generally reasonable. Is it possible to factorize out the individual 
>> transforms (so that you'd e.g. invoke the same macro twice in the 8x8 and 
>> 4x4 functions) without too much loss?
>
> There is a close analogy here with the vertical/horizontal deblocking 
> filters, because while there are similarities between the two matrix 
> multiplications within a transform, one of them follows a series of loads and 
> the other follows a matrix transposition.
>
> If you look for example at ff_vc1_inv_trans_8x8_neon, you'll see I was able 
> to do a fair amount of overlap between sections of the function - 
> particularly between the transpose and the second matrix multiplication, but 
> to a lesser extent between the loads and the first matrix multiplication and 
> between the second multiplication and the stores. This sort of overlapping is 
> tricky to maintain when using macros. Also, it means the the order of 
> operations within each matrix multiply ended up quite different.
>
> At first sight, you might think that the multiplies from the 8x8 function 
> (which you might also view as kind of 8-tap filter) would be re-usable for 
> the size-8 multiplies in the 8x4 or 4x8 function. Yes, the instructions are 
> similar, save for using .4h elements rather than .8h elements, but that has 
> significant impacts on scheduling. For example, the Cortex-A72, which is my 
> primary target, can only do NEON bit-shifts in one pipeline at once, 
> irrespective of whether the vectors are 64-bit or 128-bit long, while other 
> instructions don't have such restrictions.
>
> So while in theory you could factor some of this code out more, I suspect any 
> attempt to do so would have a detrimental effect on performance.

Ok, fair enough. Yes, it's always a trade off between code simplicity and 
getting the optimal interleaving. As you've spent the effort on making it 
efficient with respect to that, let's go with that then!

(FWIW, for future endeavours, having the checkasm tests in place while 
developing/tuning the implementation does allow getting good empirical 
data on how much you gain from different alternative scheduling choices. I 
usually don't follow the optimization guides for any specific core, but 
track the benchmark numbers for a couple different cores and try to pick a 
scheduling that is a decent compromise for all of them.)

Also, for future work - if you have checkasm tests in place while working 
on the assembly, I usually amend the test with debug printouts that 
visualize the output of the reference and the tested function, and a map 
showing which elements differ - which makes tracking down issues a whole 
lot easier. I don't think any of the checkasm tests in ffmpeg have such 
printouts though, but within e.g. the dav1d project, the checkasm tool is 
extended with helpers for comparing and printing such debug aids.

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

  reply	other threads:[~2022-03-31 21:32 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-17 18:58 [FFmpeg-devel] [PATCH 0/6] avcodec/vc1: Arm optimisations Ben Avison
2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 1/6] avcodec/vc1: Arm 64-bit NEON deblocking filter fast paths Ben Avison
2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 2/6] avcodec/vc1: Arm 32-bit " Ben Avison
2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 3/6] avcodec/vc1: Arm 64-bit NEON inverse transform " Ben Avison
2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 4/6] avcodec/idctdsp: Arm 64-bit NEON block add and clamp " Ben Avison
2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 5/6] avcodec/blockdsp: Arm 64-bit NEON block clear " Ben Avison
2022-03-17 18:58 ` [FFmpeg-devel] [PATCH 6/6] avcodec/vc1: Introduce fast path for unescaping bitstream buffer Ben Avison
2022-03-18 19:10   ` Andreas Rheinhardt
2022-03-21 15:51     ` Ben Avison
2022-03-21 20:44       ` Martin Storsjö
2022-03-19 23:06 ` [FFmpeg-devel] [PATCH 0/6] avcodec/vc1: Arm optimisations Martin Storsjö
2022-03-19 23:07   ` Martin Storsjö
2022-03-21 17:37   ` Ben Avison
2022-03-21 22:29     ` Martin Storsjö
2022-03-25 18:52 ` [FFmpeg-devel] [PATCH v2 00/10] " Ben Avison
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 01/10] checkasm: Add vc1dsp in-loop deblocking filter tests Ben Avison
2022-03-25 22:53     ` Martin Storsjö
2022-03-28 18:28       ` Ben Avison
2022-03-29 11:47         ` Martin Storsjö
2022-03-29 12:24     ` Martin Storsjö
2022-03-29 12:43     ` Martin Storsjö
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 02/10] checkasm: Add vc1dsp inverse transform tests Ben Avison
2022-03-29 12:41     ` Martin Storsjö
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 03/10] checkasm: Add idctdsp add/put-pixels-clamped tests Ben Avison
2022-03-29 13:13     ` Martin Storsjö
2022-03-29 19:56       ` Martin Storsjö
2022-03-29 20:22       ` Ben Avison
2022-03-29 20:30         ` Martin Storsjö
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 04/10] avcodec/vc1: Introduce fast path for unescaping bitstream buffer Ben Avison
2022-03-29 20:37     ` Martin Storsjö
2022-03-31 13:58       ` Ben Avison
2022-03-31 14:07         ` Martin Storsjö
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 05/10] avcodec/vc1: Arm 64-bit NEON deblocking filter fast paths Ben Avison
2022-03-30 12:35     ` Martin Storsjö
2022-03-31 15:15       ` Ben Avison
2022-03-31 21:21         ` Martin Storsjö
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 06/10] avcodec/vc1: Arm 32-bit " Ben Avison
2022-03-25 19:27     ` Lynne
2022-03-25 19:49       ` Martin Storsjö
2022-03-25 19:55         ` Lynne
2022-03-30 12:37     ` Martin Storsjö
2022-03-30 13:03     ` Martin Storsjö
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform " Ben Avison
2022-03-30 13:49     ` Martin Storsjö
2022-03-30 14:01       ` Martin Storsjö
2022-03-31 15:37       ` Ben Avison
2022-03-31 21:32         ` Martin Storsjö [this message]
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 08/10] avcodec/idctdsp: Arm 64-bit NEON block add and clamp " Ben Avison
2022-03-30 14:14     ` Martin Storsjö
2022-03-31 16:47       ` Ben Avison
2022-03-31 21:42         ` Martin Storsjö
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 09/10] avcodec/vc1: Arm 64-bit NEON unescape fast path Ben Avison
2022-03-30 14:35     ` Martin Storsjö
2022-03-25 18:52   ` [FFmpeg-devel] [PATCH 10/10] avcodec/vc1: Arm 32-bit " Ben Avison
2022-03-30 14:35     ` Martin Storsjö

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a98fdc73-b3c2-e087-1f33-a7ed12a0a779@martin.st \
    --to=martin@martin.st \
    --cc=bavison@riscosopen.org \
    --cc=ffmpeg-devel@ffmpeg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git