From: "Martin Storsjö" <martin@martin.st>
To: "Logan.Lyu" <Logan.Lyu@myais.com.cn>
Cc: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH 5/5] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_hv
Date: Sun, 2 Jul 2023 00:28:17 +0300 (EEST)
Message-ID: <41f13b-1e5b-78c7-8c3a-6b21e0fee096@martin.st> (raw)
In-Reply-To: <08a60626-b5f4-935b-5a56-2e001b59b8d7@myais.com.cn>
On Sun, 18 Jun 2023, Logan.Lyu wrote:
> Hi, Martin,
>
> I modified it according to your comments. Please review again.
> From 47b7f7af634add7680b56a216fff7dbe1f08cd11 Mon Sep 17 00:00:00 2001
> From: Logan Lyu <Logan.Lyu@myais.com.cn>
> Date: Sun, 28 May 2023 10:35:43 +0800
> Subject: [PATCH 5/5] lavc/aarch64: new optimization for 8-bit
> hevc_epel_uni_w_hv
>
> Signed-off-by: Logan Lyu <Logan.Lyu@myais.com.cn>
> ---
> libavcodec/aarch64/hevcdsp_epel_neon.S | 694 ++++++++++++++++++++++
> libavcodec/aarch64/hevcdsp_init_aarch64.c | 6 +
> 2 files changed, 700 insertions(+)
>
> diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S b/libavcodec/aarch64/hevcdsp_epel_neon.S
> index 8b6f396a0b..355679af29 100644
> --- a/libavcodec/aarch64/hevcdsp_epel_neon.S
> +++ b/libavcodec/aarch64/hevcdsp_epel_neon.S
> @@ -717,6 +717,700 @@ function ff_hevc_put_hevc_epel_uni_w_h64_8_neon_i8mm, export=1
> ret
> endfunc
>
> +.macro epel_uni_w_hv_start
> + mov x15, x5 //denom
> + mov x16, x6 //wx
> + mov x17, x7 //ox
> + add w15, w15, #6 //shift = denom+6
> +
> +
> + ldp x5, x6, [sp]
> + ldr x7, [sp, #16]
> +
> + stp q12, q13, [sp, #-128]!
> + stp q14, q15, [sp, #32]
> + stp q8, q9, [sp, #64]
> + stp q10, q11, [sp, #96]
Only need to back up 64 bytes, by backing up d8-d15. Also, the order
is quite weird here, why not keep them in e.g. linear order?
> +function ff_hevc_put_hevc_epel_uni_w_hv4_8_neon_i8mm, export=1
> + epel_uni_w_hv_start
> + sxtw x4, w4
> +
> + add x10, x4, #3
> + lsl x10, x10, #7
> + sub sp, sp, x10 // tmp_array
> + stp xzr, x30, [sp, #-48]!
As mentioned already in the previous review - why do you back up and
restore xzr here? That's not necessary. Yes, you should keep the stack
16 byte aligned, but you can just leave an empty slot, and just do
"str x30, [sp, #-48]!" here, and vice versa with "ldr" instead of ldp
when restoring.
The same goes in all functions here.
> +2:
> + ldp q14, q15, [sp, #32]
> + ldp q8, q9, [sp, #64]
> + ldp q10, q11, [sp, #96]
> + ldp q12, q13, [sp], #128
Only need d8-d15, and weird register order here, and elsewhere.
> +function ff_hevc_put_hevc_epel_uni_w_hv24_8_neon_i8mm, export=1
> + epel_uni_w_hv_start
> + sxtw x4, w4
FWIW, it's unusual to need an explicit sxtw instruction, but I guess
if you use it in the form "add x10, x4, #3" it might be needed.
> +function ff_hevc_put_hevc_epel_uni_w_hv32_8_neon_i8mm, export=1
> + ldp x15, x16, [sp]
> + stp x0, x30, [sp, #-16]!
> + stp x1, x2, [sp, #-16]!
> + stp x3, x4, [sp, #-16]!
> + stp x5, x6, [sp, #-16]!
Don't do consecutive stack pointer updates like this, but merge it
into one large stack decrement followed by positive offsets, like in
all the other cases of stp/ldp.
> + mov x17, #16
> + stp x17, x7, [sp, #-16]!
> + stp x15, x16, [sp, #-16]!
> + bl X(ff_hevc_put_hevc_epel_uni_w_hv16_8_neon_i8mm)
> + ldp x15, x16, [sp], #16
> + ldp x17, x7, [sp], #16
> + ldp x5, x6, [sp], #16
> + ldp x3, x4, [sp], #16
> + ldp x1, x2, [sp], #16
> + ldr x0, [sp]
> + add x0, x0, #16
> + add x2, x2, #16
> + mov x17, #16
> + stp x17, xzr, [sp, #-16]!
> + stp x15, x16, [sp, #-16]!
Don't do multiple stack decrements, don't needlessly store xzr here.
The same goes for all the other functions in this patch.
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2023-07-01 21:28 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-04 4:17 [FFmpeg-devel] [PATCH 1/5] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_pixels Logan.Lyu
2023-06-04 4:17 ` [FFmpeg-devel] [PATCH 2/5] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_h Logan.Lyu
2023-06-12 7:59 ` Martin Storsjö
2023-06-18 8:21 ` Logan.Lyu
2023-06-04 4:17 ` [FFmpeg-devel] [PATCH 3/5] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_v Logan.Lyu
2023-06-12 8:09 ` Martin Storsjö
2023-06-12 9:08 ` Martin Storsjö
2023-06-18 8:22 ` Logan.Lyu
2023-07-01 21:21 ` Martin Storsjö
2023-06-04 4:17 ` [FFmpeg-devel] [PATCH 4/5] lavc/aarch64: new optimization for 8-bit hevc_epel_h Logan.Lyu
2023-06-12 8:12 ` Martin Storsjö
2023-06-18 8:23 ` Logan.Lyu
2023-06-18 8:26 ` Logan.Lyu
2023-06-04 4:17 ` [FFmpeg-devel] [PATCH 5/5] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_hv Logan.Lyu
2023-06-12 8:19 ` Martin Storsjö
2023-06-18 8:25 ` Logan.Lyu
2023-07-01 21:28 ` Martin Storsjö [this message]
2023-07-13 14:54 ` Logan.Lyu
2023-07-14 9:28 ` Martin Storsjö
2023-06-12 7:47 ` [FFmpeg-devel] [PATCH 1/5] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_pixels Martin Storsjö
2023-06-18 8:29 ` Logan.Lyu
2023-07-01 21:16 ` Martin Storsjö
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41f13b-1e5b-78c7-8c3a-6b21e0fee096@martin.st \
--to=martin@martin.st \
--cc=Logan.Lyu@myais.com.cn \
--cc=ffmpeg-devel@ffmpeg.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git