From: "Martin Storsjö" <martin@martin.st>
To: Krzysztof Pyrkosz via ffmpeg-devel <ffmpeg-devel@ffmpeg.org>
Cc: Krzysztof Pyrkosz <ffmpeg@szaka.eu>
Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/aarch64/ac3dsp_neon.S: Optimize ac3_extract_exponents
Date: Sun, 2 Mar 2025 00:59:20 +0200 (EET)
Message-ID: <39fc9b98-6e97-a659-32f3-5060a32f60d6@martin.st> (raw)
In-Reply-To: <20250228212148.11560-2-ffmpeg@szaka.eu>
On Fri, 28 Feb 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:
> Before and after:
>
> A78
> ac3_extract_exponents_n512_neon: 503.2 ( 3.36x)
> ac3_extract_exponents_n3072_neon: 2986.2 ( 3.35x)
>
> ac3_extract_exponents_n512_neon: 211.2 ( 8.02x)
> ac3_extract_exponents_n3072_neon: 1251.5 ( 8.00x)
>
> A72
> ac3_extract_exponents_n512_neon: 964.7 ( 2.39x)
> ac3_extract_exponents_n3072_neon: 5434.5 ( 2.47x)
>
> ac3_extract_exponents_n512_neon: 465.6 ( 4.87x)
> ac3_extract_exponents_n3072_neon: 2696.3 ( 4.97x)
> ---
> This version handles 16 ints in one go and consolidates separate
> extractions and writes into one. I assume the length of the input is a
> multiple of 16 (there are no constraints defined in the template file),
> but the tests are passing.
I have no clue about whehter this is ok or not (it may be good to check
other assembly implementations if we do this on e.g. x86). Codewise, the
patch looks good, thanks!
This description of the patch, what it does and the assumptions it makes,
is probably nice to keep in the final commit as well, so it could be
included above "---" too.
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
prev parent reply other threads:[~2025-03-01 22:59 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-28 21:21 Krzysztof Pyrkosz via ffmpeg-devel
2025-02-28 21:21 ` [FFmpeg-devel] [PATCH 2/2] avcodec/aarch64/ac3dsp_neon.S: Optimize ac3_sum_square_butterfly_int32_neon Krzysztof Pyrkosz via ffmpeg-devel
2025-03-01 23:07 ` Martin Storsjö
2025-03-01 22:59 ` Martin Storsjö [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=39fc9b98-6e97-a659-32f3-5060a32f60d6@martin.st \
--to=martin@martin.st \
--cc=ffmpeg-devel@ffmpeg.org \
--cc=ffmpeg@szaka.eu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git