From: "Martin Storsjö" <martin@martin.st> To: ffmpeg-devel@ffmpeg.org Cc: Logan Lyu <Logan.Lyu@myais.com.cn>, "J . Dekker" <jdek@itanimul.li> Subject: [FFmpeg-devel] [PATCH 00/21] aarch64: hevc: Add missing hevc_pel NEON functions Date: Mon, 25 Mar 2024 17:02:22 +0200 Message-ID: <20240325150243.59058-1-martin@martin.st> (raw) Hi, Since some time, we have pretty complete AArch64 NEON coverage for the hevc decoder. However, some of these functions require the I8MM instruction set extension, and many of them (but not all) lack a plain NEON version. This patchset fills in a regular NEON version of all functions where we have an I8MM function. For context; the I8MM instruction set extension is a mandatory part of armv8.6-a. E.g. Apple M2, AWS Graviton 3 have it, but Apple M1 and Ampere Altra don't. This patchset takes decoding of a 1080p HEVC clip from 402 fps to 649 fps on an Apple M1. Patch #2 also fixes a subtle bug in the existing implementation; two functions relied on the contents on the stack, below the stack pointer, being untouched within a function. If a signal gets delivered, those parts of the stack could be clobbered. // Martin Martin Storsjö (21): aarch64: hevc: Reorder a misplaced function init line aarch64: hevc: Don't iterate with sp in ff_hevc_put_hevc_qpel_uni_w_hv32/64_8_neon_i8mm aarch64: hevc: Merge consecutive stores in put_hevc_\type\()_h16_8_neon aarch64: hevc: Specialize put_hevc_\type\()_h*_8_neon for horizontal looping aarch64: hevc: Use ld1r instead of ldr+dup in hevc_qpel_uni_w_h aarch64: hevc: Implement a neon version of put_hevc_epel_h*_8 aarch64: hevc: Implement a neon version of hevc_epel_uni_w_h*_8 aarch64: hevc: Split the epel_*_hv functions into two parts aarch64: hevc: Reorder epel_hv functions to prepare for templating aarch64: hevc: Produce epel_hv functions for both plain neon and i8mm aarch64: hevc: Produce epel_uni_hv functions for both neon and i8mm aarch64: hevc: Produce epel_uni_w_hv functions for both neon and i8mm aarch64: hevc: Produce epel_bi_hv functions for both neon and i8mm aarch64: hevc: Implement a neon version of hevc_qpel_uni_w_h*_8 aarch64: hevc: Split the qpel_*_hv functions into two parts aarch64: hevc: Deduplicate the hevc_put_hevc_qpel_uni_w_hv*_8_end_neon functions aarch64: hevc: Reorder qpel_hv functions to prepare for templating aarch64: hevc: Produce plain neon versions of qpel_hv aarch64: hevc: Produce plain neon versions of qpel_uni_hv aarch64: hevc: Produce plain neon versions of qpel_uni_w_hv aarch64: hevc: Produce plain neon versions of qpel_bi_hv libavcodec/aarch64/hevcdsp_epel_neon.S | 1529 +++++++++++------ libavcodec/aarch64/hevcdsp_init_aarch64.c | 96 +- libavcodec/aarch64/hevcdsp_qpel_neon.S | 1804 +++++++++++++-------- 3 files changed, 2291 insertions(+), 1138 deletions(-) -- 2.39.3 (Apple Git-146) _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next reply other threads:[~2024-03-25 15:02 UTC|newest] Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top 2024-03-25 15:02 Martin Storsjö [this message] 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 01/21] aarch64: hevc: Reorder a misplaced function init line Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 02/21] aarch64: hevc: Don't iterate with sp in ff_hevc_put_hevc_qpel_uni_w_hv32/64_8_neon_i8mm Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 03/21] aarch64: hevc: Merge consecutive stores in put_hevc_\type\()_h16_8_neon Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 04/21] aarch64: hevc: Specialize put_hevc_\type\()_h*_8_neon for horizontal looping Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 05/21] aarch64: hevc: Use ld1r instead of ldr+dup in hevc_qpel_uni_w_h Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 06/21] aarch64: hevc: Implement a neon version of put_hevc_epel_h*_8 Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 07/21] aarch64: hevc: Implement a neon version of hevc_epel_uni_w_h*_8 Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 08/21] aarch64: hevc: Split the epel_*_hv functions into two parts Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 09/21] aarch64: hevc: Reorder epel_hv functions to prepare for templating Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 10/21] aarch64: hevc: Produce epel_hv functions for both plain neon and i8mm Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 11/21] aarch64: hevc: Produce epel_uni_hv functions for both " Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 12/21] aarch64: hevc: Produce epel_uni_w_hv " Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 13/21] aarch64: hevc: Produce epel_bi_hv " Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 14/21] aarch64: hevc: Implement a neon version of hevc_qpel_uni_w_h*_8 Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 15/21] aarch64: hevc: Split the qpel_*_hv functions into two parts Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 16/21] aarch64: hevc: Deduplicate the hevc_put_hevc_qpel_uni_w_hv*_8_end_neon functions Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 17/21] aarch64: hevc: Reorder qpel_hv functions to prepare for templating Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 18/21] aarch64: hevc: Produce plain neon versions of qpel_hv Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 19/21] aarch64: hevc: Produce plain neon versions of qpel_uni_hv Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 20/21] aarch64: hevc: Produce plain neon versions of qpel_uni_w_hv Martin Storsjö 2024-03-25 15:02 ` [FFmpeg-devel] [PATCH 21/21] aarch64: hevc: Produce plain neon versions of qpel_bi_hv Martin Storsjö 2024-03-25 21:15 ` [FFmpeg-devel] [PATCH 00/21] aarch64: hevc: Add missing hevc_pel NEON functions Martin Storsjö 2024-03-25 21:56 ` J. Dekker 2024-03-26 6:01 ` Jean-Baptiste Kempf 2024-03-26 7:09 ` Martin Storsjö
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20240325150243.59058-1-martin@martin.st \ --to=martin@martin.st \ --cc=Logan.Lyu@myais.com.cn \ --cc=ffmpeg-devel@ffmpeg.org \ --cc=jdek@itanimul.li \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git