* [FFmpeg-devel] [PATCH v3 0/5] avcodec/ac3: Add aarch64 NEON DSP
@ 2024-04-03 6:43 Geoff Hill
2024-04-04 12:57 ` Martin Storsjö
0 siblings, 1 reply; 2+ messages in thread
From: Geoff Hill @ 2024-04-03 6:43 UTC (permalink / raw)
To: ffmpeg-devel
Here's v3 to push the AC-3 ARMv8 NEON experiment a step further.
This version implements 5 of the AC-3 encoder DSP functions,
and adds checkasm tests where missing.
I've tested that the checkasm tests pass on aarch64 and x86.
On AWS Graviton2 (t4g.medium), GCC 12.3:
$ tests/checkasm/checkasm --bench --verbose --test=ac3dsp
...
NEON:
- ac3dsp.ac3_exponent_min [OK]
- ac3dsp.ac3_extract_exponents [OK]
- ac3dsp.float_to_fixed24 [OK]
- ac3dsp.ac3_sum_square_butterfly_int32 [OK]
- ac3dsp.ac3_sum_square_butterfly_float [OK]
checkasm: all 20 tests passed
ac3_exponent_min_reuse0_c: 9.0
ac3_exponent_min_reuse0_neon: 9.7
ac3_exponent_min_reuse1_c: 1037.5
ac3_exponent_min_reuse1_neon: 54.0
ac3_exponent_min_reuse2_c: 1820.7
ac3_exponent_min_reuse2_neon: 135.2
ac3_exponent_min_reuse3_c: 2080.5
ac3_exponent_min_reuse3_neon: 167.7
ac3_exponent_min_reuse4_c: 2493.2
ac3_exponent_min_reuse4_neon: 200.0
ac3_exponent_min_reuse5_c: 2970.0
ac3_exponent_min_reuse5_neon: 231.7
ac3_extract_exponents_n512_c: 1717.5
ac3_extract_exponents_n512_neon: 506.7
ac3_extract_exponents_n768_c: 2562.7
ac3_extract_exponents_n768_neon: 769.7
ac3_extract_exponents_n1024_c: 3389.2
ac3_extract_exponents_n1024_neon: 1019.0
ac3_extract_exponents_n1280_c: 4210.7
ac3_extract_exponents_n1280_neon: 1267.5
ac3_extract_exponents_n1536_c: 5071.5
ac3_extract_exponents_n1536_neon: 1522.0
ac3_extract_exponents_n1792_c: 5896.5
ac3_extract_exponents_n1792_neon: 1784.0
ac3_extract_exponents_n2048_c: 6779.2
ac3_extract_exponents_n2048_neon: 2051.0
ac3_extract_exponents_n2304_c: 7559.5
ac3_extract_exponents_n2304_neon: 2290.0
ac3_extract_exponents_n2560_c: 8397.2
ac3_extract_exponents_n2560_neon: 2552.5
ac3_extract_exponents_n2816_c: 9224.2
ac3_extract_exponents_n2816_neon: 2797.7
ac3_extract_exponents_n3072_c: 10026.2
ac3_extract_exponents_n3072_neon: 3047.7
ac3_sum_square_bufferfly_float_c: 1605.7
ac3_sum_square_bufferfly_float_neon: 365.7
ac3_sum_square_bufferfly_int32_c: 965.5
ac3_sum_square_bufferfly_int32_neon: 486.2
float_to_fixed24_c: 2453.7
float_to_fixed24_neon: 516.2
Geoff Hill (5):
avcodec/ac3: Implement float_to_fixed24 for aarch64 NEON
avcodec/ac3: Implement ac3_exponent_min for aarch64 NEON
avcodec/ac3: Implement ac3_extract_exponents for aarch64 NEON
avcodec/ac3: Implement sum_square_butterfly_int32 for aarch64 NEON
avcodec/ac3: Implement sum_square_butterfly_float for aarch64 NEON
libavcodec/aarch64/Makefile | 2 +
libavcodec/aarch64/ac3dsp_init_aarch64.c | 50 +++++++++
libavcodec/aarch64/ac3dsp_neon.S | 125 ++++++++++++++++++++++
libavcodec/ac3dsp.c | 4 +-
libavcodec/ac3dsp.h | 3 +-
tests/checkasm/ac3dsp.c | 130 +++++++++++++++++++++++
6 files changed, 312 insertions(+), 2 deletions(-)
create mode 100644 libavcodec/aarch64/ac3dsp_init_aarch64.c
create mode 100644 libavcodec/aarch64/ac3dsp_neon.S
--
2.44.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [FFmpeg-devel] [PATCH v3 0/5] avcodec/ac3: Add aarch64 NEON DSP
2024-04-03 6:43 [FFmpeg-devel] [PATCH v3 0/5] avcodec/ac3: Add aarch64 NEON DSP Geoff Hill
@ 2024-04-04 12:57 ` Martin Storsjö
0 siblings, 0 replies; 2+ messages in thread
From: Martin Storsjö @ 2024-04-04 12:57 UTC (permalink / raw)
To: FFmpeg development discussions and patches
On Tue, 2 Apr 2024, Geoff Hill wrote:
> Here's v3 to push the AC-3 ARMv8 NEON experiment a step further.
>
> This version implements 5 of the AC-3 encoder DSP functions,
> and adds checkasm tests where missing.
>
> I've tested that the checkasm tests pass on aarch64 and x86.
Thanks, I've tested that checkasm also passes on 32 bit arm (where we also
do have an ac3dsp implementation).
Overall the patches look mostly fine.
Are these implementations based on the existing 32 bit arm ones? The code
is quite similar (although there's not very many different ways to
implement things, so this could be a coincidence)? If based on the
existing code, it would be good to retain the copyright statement from
that file.
These functions have a different indentation than the rest of
essentially all our aarch64 assembly (the code you're adding is aligned in
two different ways) - please check other files (e.g. vp8dsp_neon.S) for
example. The instructions should be aligned to 8 leading spaces, and
operands to 24 leading characters.
Other than those generic points, I have two comments on the patches
themselves.
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-04-04 12:57 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-03 6:43 [FFmpeg-devel] [PATCH v3 0/5] avcodec/ac3: Add aarch64 NEON DSP Geoff Hill
2024-04-04 12:57 ` Martin Storsjö
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git