Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
From: "Martin Storsjö" <martin@martin.st>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH] lavu/tx: implement aarch64 NEON SIMD
Date: Thu, 25 Aug 2022 13:51:57 +0300 (EEST)
Message-ID: <8854c096-5855-d110-29a0-dace5d887bf0@martin.st> (raw)
In-Reply-To: <N9PbCTK--3-2@lynne.ee>

On Sun, 14 Aug 2022, Lynne wrote:

> The fastest fast Fourier transform in not just the west, but the world,
> now for the most popular toy ISA.
>
> On a high level, it follows the design of the AVX2 version closely,
> with the exception that the input is slightly less permuted as we don't have
> to do lane switching with the input on double 4pt and 8pt.
>
> On a low level, the lack of subadd/addsub instructions REALLY penalizes
> any attempt at writing an FFT. That single register matters a _lot_,
> and reloading it simply takes unacceptably long.
> In x86 land, vendors would've noticed developers need this.
> In ARM land, you get a badly designed complex multiplication instruction
> we cannot use, that's not present on 95% of devices. Because only
> compilers matter, right?
>
> There's still room for improvement. I think using stp
> instead of st1 may help in a few places, some reordering
> may help performance in the recombination macro,
> and there are other TODOs I've left marked in the code.
> There are also a few places where the limited range on
> immediates in adds may be worked around.
>
> All timings below are in cycles:
> A53:
> Length | C           | New (lavu)  | Old (lavc)  | FFTW
> ------ |-------------|-------------|-------------|-----
> 4      |         842 | 420         | 1210        | 1460
> 8      |        1538 | 1020        | 1850        | 2520
> 16     |        3717 | 1900        | 3700        | 3990
> 32     |        9156 | 4070        | 8289        | 8860
> 64     |       21160 | 9931        | 18600       | 19625
> 128    |       49180 | 23278       | 41922       | 41922
> 256    |      112073 | 53876       | 93202       | 101092
> 512    |      252864 | 122884      | 205897      | 207868
> 1024   |      560512 | 278322      | 458071      | 453053
> 2048   |     1295402 | 775835      | 1038205     | 1020265
> 4096   |     3281263 | 2021221     | 2409718     | 2577554
> 8192   |     8577845 | 4780526     | 5673041     | 6802722
>
> Apple M1
> New  - Total for len 512 reps 2097152 = 1.459141 s
> Old  - Total for len 512 reps 2097152 = 2.251344 s
> FFTW - Total for len 512 reps 2097152 = 1.868429 s
>
> New  - Total for len 1024 reps 4194304 = 6.490080 s
> Old  - Total for len 1024 reps 4194304 = 9.604949 s
> FFTW - Total for len 1024 reps 4194304 = 7.889281 s
>
> New  - Total for len 16384 reps 262144 = 10.374001 s
> Old  - Total for len 16384 reps 262144 = 15.266713 s
> FFTW - Total for len 16384 reps 262144 = 12.341745 s
>
> New  - Total for len 65536 reps 8192 = 1.769812 s
> Old  - Total for len 65536 reps 8192 = 4.209413 s
> FFTW - Total for len 65536 reps 8192 = 3.012365 s
>
> New  - Total for len 131072 reps 4096 = 1.942836 s
> Old  - Segfaults
> FFTW - Total for len 131072 reps 4096 = 3.713713 s
>
> Patch attached.

I've had a look at this now.

I don't have much to add/comment about the core implementation itself and 
the performance of it (I didn't try to read it and follow it from that 
perspective).

Wrt non-functional aspects, the patch needs a couple fixes to build with 
other assemblers (binutils, and MS armasm64.exe). I've also done a couple 
minor fixes - instead of using a series of mov+add+add for loading a large 
constant, use the ldr= pseudo instruction which is made exactly for 
loading odd constants, and avoid unnecessary \() operators after macro 
arguments.

See https://github.com/mstorsjo/ffmpeg/commits/aarch64-fft for my 
incremental fixes on top; at least the first three are needed for fixing 
assembling with the other tools, but all up to the WIP (for removing 
prefetching) probably are worthwhile to include; feel free to squash these 
into your patch.

Coding style wise, it looks mostly reasonable; some things use a bit 
nonstandard style (spaces within {} for loads/stores, and some operand 
columns are right-adjusted instead of left-adjusted), but it's probably 
acceptable as such.

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

      parent reply	other threads:[~2022-08-25 10:52 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-14  4:31 Lynne
2022-08-16 11:07 ` Anton Khirnov
2022-08-16 16:33   ` Paul B Mahol
2022-08-17  8:18     ` Anton Khirnov
2022-08-25 10:51 ` Martin Storsjö [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8854c096-5855-d110-29a0-dace5d887bf0@martin.st \
    --to=martin@martin.st \
    --cc=ffmpeg-devel@ffmpeg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git