From: "Martin Storsjö" <martin@martin.st> To: Krzysztof Pyrkosz via ffmpeg-devel <ffmpeg-devel@ffmpeg.org> Cc: Krzysztof Pyrkosz <ffmpeg@szaka.eu> Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/aarch64/vvc: Optimize vvc_avg{8, 10, 12} Date: Sun, 2 Mar 2025 00:21:23 +0200 (EET) Message-ID: <ab5b78f9-20ae-a791-91c6-bd2647bbd2c@martin.st> (raw) In-Reply-To: <20250219174010.3911-2-ffmpeg@szaka.eu> On Wed, 19 Feb 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote: > --- As you've noticed in later patches; most of this commentery _is_ valuable to keep in the commit message, so I'd keep most of this, including the performance diff, in the commit message (i.e. above the ---). > This patch replaces integer widening with halving addition, and > multi-step "emulated" rounding shift with a single asm instruction doing > exactly that. This pattern repeats in other functions in this file, I > fixed some in the succeeding patch. There's a lot of performance to be > gained there. > > I didn't modify the existing function because it adds a few extra steps > solely for the shared w_avg implementation (every cycle matters), but also > because I find this linear version easier to digest and understand. That's probably reasonable - but if the avg codepath in vvc_avg is unused now, we should remove it; that makes the patch clearer to see the change, when we see the removed old codepath together with the new added one in the same patch. > Besides, I noticed that removing smin and smax instructions used for > clamping the values for 10 and 12 bit_depth instantiations does not > affect the checkasm result, but it breaks FATE. It would probably be good if we could improve the checkasm to hit those cases too, but that's of course a separate question. > > Benchmarks before and after: > A78 > avg_8_2x2_neon: 21.0 ( 1.55x) > avg_8_4x4_neon: 25.8 ( 3.05x) > avg_8_8x8_neon: 45.0 ( 5.86x) > avg_8_16x16_neon: 178.5 ( 5.49x) > avg_8_32x32_neon: 709.2 ( 6.20x) > avg_8_64x64_neon: 2686.2 ( 6.12x) > avg_8_128x128_neon: 10734.2 ( 5.88x) > avg_10_2x2_neon: 19.0 ( 1.75x) > avg_10_4x4_neon: 28.2 ( 2.76x) > avg_10_8x8_neon: 44.0 ( 5.82x) > avg_10_16x16_neon: 179.5 ( 4.81x) > avg_10_32x32_neon: 680.8 ( 5.58x) > avg_10_64x64_neon: 2536.8 ( 5.40x) > avg_10_128x128_neon: 10079.0 ( 5.22x) > avg_12_2x2_neon: 20.8 ( 1.59x) > avg_12_4x4_neon: 25.2 ( 3.09x) > avg_12_8x8_neon: 44.0 ( 5.79x) > avg_12_16x16_neon: 182.2 ( 4.80x) > avg_12_32x32_neon: 696.2 ( 5.46x) > avg_12_64x64_neon: 2548.2 ( 5.38x) > avg_12_128x128_neon: 10133.8 ( 5.19x) > > avg_8_2x2_neon: 16.5 ( 1.98x) > avg_8_4x4_neon: 26.2 ( 2.93x) > avg_8_8x8_neon: 31.8 ( 8.55x) > avg_8_16x16_neon: 82.0 (12.02x) > avg_8_32x32_neon: 310.2 (14.12x) > avg_8_64x64_neon: 897.8 (18.26x) > avg_8_128x128_neon: 3608.5 (17.37x) > avg_10_2x2_neon: 19.5 ( 1.69x) > avg_10_4x4_neon: 28.0 ( 2.79x) > avg_10_8x8_neon: 34.8 ( 7.32x) > avg_10_16x16_neon: 119.8 ( 7.35x) > avg_10_32x32_neon: 444.2 ( 8.51x) > avg_10_64x64_neon: 1711.8 ( 8.00x) > avg_10_128x128_neon: 7065.2 ( 7.43x) > avg_12_2x2_neon: 19.5 ( 1.71x) > avg_12_4x4_neon: 24.2 ( 3.22x) > avg_12_8x8_neon: 33.8 ( 7.57x) > avg_12_16x16_neon: 120.2 ( 7.33x) > avg_12_32x32_neon: 442.5 ( 8.53x) > avg_12_64x64_neon: 1706.2 ( 8.02x) > avg_12_128x128_neon: 7010.0 ( 7.46x) > > A72 > avg_8_2x2_neon: 30.2 ( 1.48x) > avg_8_4x4_neon: 40.0 ( 3.10x) > avg_8_8x8_neon: 91.0 ( 4.14x) > avg_8_16x16_neon: 340.4 ( 3.92x) > avg_8_32x32_neon: 1220.7 ( 4.67x) > avg_8_64x64_neon: 5823.4 ( 3.88x) > avg_8_128x128_neon: 17430.5 ( 4.73x) > avg_10_2x2_neon: 34.0 ( 1.66x) > avg_10_4x4_neon: 45.2 ( 2.73x) > avg_10_8x8_neon: 97.5 ( 3.87x) > avg_10_16x16_neon: 317.7 ( 3.90x) > avg_10_32x32_neon: 1376.2 ( 4.21x) > avg_10_64x64_neon: 5228.1 ( 3.71x) > avg_10_128x128_neon: 16722.2 ( 4.17x) > avg_12_2x2_neon: 31.7 ( 1.76x) > avg_12_4x4_neon: 36.0 ( 3.44x) > avg_12_8x8_neon: 91.7 ( 4.10x) > avg_12_16x16_neon: 297.2 ( 4.13x) > avg_12_32x32_neon: 1400.5 ( 4.14x) > avg_12_64x64_neon: 5379.1 ( 3.51x) > avg_12_128x128_neon: 16715.7 ( 4.17x) > > avg_8_2x2_neon: 33.7 ( 1.72x) > avg_8_4x4_neon: 45.5 ( 2.84x) > avg_8_8x8_neon: 65.0 ( 5.98x) > avg_8_16x16_neon: 171.0 ( 7.81x) > avg_8_32x32_neon: 558.2 (10.05x) > avg_8_64x64_neon: 2006.5 (10.61x) > avg_8_128x128_neon: 9158.7 ( 8.96x) > avg_10_2x2_neon: 38.0 ( 1.92x) > avg_10_4x4_neon: 53.2 ( 2.69x) > avg_10_8x8_neon: 95.2 ( 4.08x) > avg_10_16x16_neon: 243.0 ( 5.02x) > avg_10_32x32_neon: 891.7 ( 5.64x) > avg_10_64x64_neon: 3357.7 ( 5.60x) > avg_10_128x128_neon: 12411.7 ( 5.56x) > avg_12_2x2_neon: 34.7 ( 1.97x) > avg_12_4x4_neon: 53.2 ( 2.68x) > avg_12_8x8_neon: 91.7 ( 4.22x) > avg_12_16x16_neon: 239.0 ( 5.08x) > avg_12_32x32_neon: 895.7 ( 5.62x) > avg_12_64x64_neon: 3317.5 ( 5.67x) > avg_12_128x128_neon: 12358.5 ( 5.58x) > > > A53 > avg_8_2x2_neon: 58.3 ( 1.41x) > avg_8_4x4_neon: 101.8 ( 2.21x) > avg_8_8x8_neon: 178.6 ( 4.53x) > avg_8_16x16_neon: 569.5 ( 5.01x) > avg_8_32x32_neon: 1962.5 ( 5.50x) > avg_8_64x64_neon: 8327.8 ( 5.18x) > avg_8_128x128_neon: 31631.3 ( 5.34x) > avg_10_2x2_neon: 54.5 ( 1.56x) > avg_10_4x4_neon: 88.8 ( 2.53x) > avg_10_8x8_neon: 163.6 ( 4.97x) > avg_10_16x16_neon: 550.5 ( 5.16x) > avg_10_32x32_neon: 1942.5 ( 5.64x) > avg_10_64x64_neon: 8783.5 ( 4.98x) > avg_10_128x128_neon: 32617.0 ( 5.25x) > avg_12_2x2_neon: 53.3 ( 1.66x) > avg_12_4x4_neon: 86.8 ( 2.61x) > avg_12_8x8_neon: 156.6 ( 5.12x) > avg_12_16x16_neon: 541.3 ( 5.25x) > avg_12_32x32_neon: 1955.3 ( 5.59x) > avg_12_64x64_neon: 8686.0 ( 5.06x) > avg_12_128x128_neon: 32487.5 ( 5.25x) > > avg_8_2x2_neon: 39.5 ( 1.96x) > avg_8_4x4_neon: 65.3 ( 3.41x) > avg_8_8x8_neon: 168.8 ( 4.79x) > avg_8_16x16_neon: 348.0 ( 8.20x) > avg_8_32x32_neon: 1207.5 ( 8.98x) > avg_8_64x64_neon: 6032.3 ( 7.17x) > avg_8_128x128_neon: 22008.5 ( 7.69x) > avg_10_2x2_neon: 55.5 ( 1.52x) > avg_10_4x4_neon: 73.8 ( 3.08x) > avg_10_8x8_neon: 157.8 ( 5.12x) > avg_10_16x16_neon: 445.0 ( 6.43x) > avg_10_32x32_neon: 1587.3 ( 6.87x) > avg_10_64x64_neon: 7738.0 ( 5.68x) > avg_10_128x128_neon: 27813.8 ( 6.14x) > avg_12_2x2_neon: 48.3 ( 1.80x) > avg_12_4x4_neon: 77.0 ( 2.95x) > avg_12_8x8_neon: 161.5 ( 4.98x) > avg_12_16x16_neon: 433.5 ( 6.59x) > avg_12_32x32_neon: 1622.0 ( 6.75x) > avg_12_64x64_neon: 7844.5 ( 5.60x) > avg_12_128x128_neon: 26999.5 ( 6.34x) > > Krzysztof > > libavcodec/aarch64/vvc/inter.S | 124 ++++++++++++++++++++++++++++++++- > 1 file changed, 121 insertions(+), 3 deletions(-) Overall the change looks reasonable to me, thanks, but remove the now unused parts and update the patch to include the valuable comments and benchmarks above the "---" bit. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
prev parent reply other threads:[~2025-03-01 22:21 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2025-02-19 17:40 Krzysztof Pyrkosz via ffmpeg-devel 2025-02-19 17:40 ` [FFmpeg-devel] [PATCH 2/2] avcodec/aarch64/vvc: Use rounding shift NEON instruction Krzysztof Pyrkosz via ffmpeg-devel 2025-02-20 8:08 ` Zhao Zhili 2025-03-01 22:34 ` Martin Storsjö 2025-02-20 7:20 ` [FFmpeg-devel] [PATCH 1/2] avcodec/aarch64/vvc: Optimize vvc_avg{8, 10, 12} Zhao Zhili 2025-02-20 18:49 ` Krzysztof Pyrkosz via ffmpeg-devel 2025-02-20 18:49 ` [FFmpeg-devel] [PATCH 2/2] avcodec/aarch64/vvc: Use rounding shift NEON instruction Krzysztof Pyrkosz via ffmpeg-devel 2025-02-26 8:54 ` [FFmpeg-devel] [PATCH 1/2] avcodec/aarch64/vvc: Optimize vvc_avg{8, 10, 12} Zhao Zhili 2025-03-01 22:21 ` Martin Storsjö [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=ab5b78f9-20ae-a791-91c6-bd2647bbd2c@martin.st \ --to=martin@martin.st \ --cc=ffmpeg-devel@ffmpeg.org \ --cc=ffmpeg@szaka.eu \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git