From: "Martin Storsjö" <martin@martin.st> To: "Logan.Lyu" <Logan.Lyu@myais.com.cn> Cc: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Subject: Re: [FFmpeg-devel] [PATCH 5/5] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_hv Date: Sun, 2 Jul 2023 00:28:17 +0300 (EEST) Message-ID: <41f13b-1e5b-78c7-8c3a-6b21e0fee096@martin.st> (raw) In-Reply-To: <08a60626-b5f4-935b-5a56-2e001b59b8d7@myais.com.cn> On Sun, 18 Jun 2023, Logan.Lyu wrote: > Hi, Martin, > > I modified it according to your comments. Please review again. > From 47b7f7af634add7680b56a216fff7dbe1f08cd11 Mon Sep 17 00:00:00 2001 > From: Logan Lyu <Logan.Lyu@myais.com.cn> > Date: Sun, 28 May 2023 10:35:43 +0800 > Subject: [PATCH 5/5] lavc/aarch64: new optimization for 8-bit > hevc_epel_uni_w_hv > > Signed-off-by: Logan Lyu <Logan.Lyu@myais.com.cn> > --- > libavcodec/aarch64/hevcdsp_epel_neon.S | 694 ++++++++++++++++++++++ > libavcodec/aarch64/hevcdsp_init_aarch64.c | 6 + > 2 files changed, 700 insertions(+) > > diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S b/libavcodec/aarch64/hevcdsp_epel_neon.S > index 8b6f396a0b..355679af29 100644 > --- a/libavcodec/aarch64/hevcdsp_epel_neon.S > +++ b/libavcodec/aarch64/hevcdsp_epel_neon.S > @@ -717,6 +717,700 @@ function ff_hevc_put_hevc_epel_uni_w_h64_8_neon_i8mm, export=1 > ret > endfunc > > +.macro epel_uni_w_hv_start > + mov x15, x5 //denom > + mov x16, x6 //wx > + mov x17, x7 //ox > + add w15, w15, #6 //shift = denom+6 > + > + > + ldp x5, x6, [sp] > + ldr x7, [sp, #16] > + > + stp q12, q13, [sp, #-128]! > + stp q14, q15, [sp, #32] > + stp q8, q9, [sp, #64] > + stp q10, q11, [sp, #96] Only need to back up 64 bytes, by backing up d8-d15. Also, the order is quite weird here, why not keep them in e.g. linear order? > +function ff_hevc_put_hevc_epel_uni_w_hv4_8_neon_i8mm, export=1 > + epel_uni_w_hv_start > + sxtw x4, w4 > + > + add x10, x4, #3 > + lsl x10, x10, #7 > + sub sp, sp, x10 // tmp_array > + stp xzr, x30, [sp, #-48]! As mentioned already in the previous review - why do you back up and restore xzr here? That's not necessary. Yes, you should keep the stack 16 byte aligned, but you can just leave an empty slot, and just do "str x30, [sp, #-48]!" here, and vice versa with "ldr" instead of ldp when restoring. The same goes in all functions here. > +2: > + ldp q14, q15, [sp, #32] > + ldp q8, q9, [sp, #64] > + ldp q10, q11, [sp, #96] > + ldp q12, q13, [sp], #128 Only need d8-d15, and weird register order here, and elsewhere. > +function ff_hevc_put_hevc_epel_uni_w_hv24_8_neon_i8mm, export=1 > + epel_uni_w_hv_start > + sxtw x4, w4 FWIW, it's unusual to need an explicit sxtw instruction, but I guess if you use it in the form "add x10, x4, #3" it might be needed. > +function ff_hevc_put_hevc_epel_uni_w_hv32_8_neon_i8mm, export=1 > + ldp x15, x16, [sp] > + stp x0, x30, [sp, #-16]! > + stp x1, x2, [sp, #-16]! > + stp x3, x4, [sp, #-16]! > + stp x5, x6, [sp, #-16]! Don't do consecutive stack pointer updates like this, but merge it into one large stack decrement followed by positive offsets, like in all the other cases of stp/ldp. > + mov x17, #16 > + stp x17, x7, [sp, #-16]! > + stp x15, x16, [sp, #-16]! > + bl X(ff_hevc_put_hevc_epel_uni_w_hv16_8_neon_i8mm) > + ldp x15, x16, [sp], #16 > + ldp x17, x7, [sp], #16 > + ldp x5, x6, [sp], #16 > + ldp x3, x4, [sp], #16 > + ldp x1, x2, [sp], #16 > + ldr x0, [sp] > + add x0, x0, #16 > + add x2, x2, #16 > + mov x17, #16 > + stp x17, xzr, [sp, #-16]! > + stp x15, x16, [sp, #-16]! Don't do multiple stack decrements, don't needlessly store xzr here. The same goes for all the other functions in this patch. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2023-07-01 21:28 UTC|newest] Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-06-04 4:17 [FFmpeg-devel] [PATCH 1/5] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_pixels Logan.Lyu 2023-06-04 4:17 ` [FFmpeg-devel] [PATCH 2/5] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_h Logan.Lyu 2023-06-12 7:59 ` Martin Storsjö 2023-06-18 8:21 ` Logan.Lyu 2023-06-04 4:17 ` [FFmpeg-devel] [PATCH 3/5] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_v Logan.Lyu 2023-06-12 8:09 ` Martin Storsjö 2023-06-12 9:08 ` Martin Storsjö 2023-06-18 8:22 ` Logan.Lyu 2023-07-01 21:21 ` Martin Storsjö 2023-06-04 4:17 ` [FFmpeg-devel] [PATCH 4/5] lavc/aarch64: new optimization for 8-bit hevc_epel_h Logan.Lyu 2023-06-12 8:12 ` Martin Storsjö 2023-06-18 8:23 ` Logan.Lyu 2023-06-18 8:26 ` Logan.Lyu 2023-06-04 4:17 ` [FFmpeg-devel] [PATCH 5/5] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_hv Logan.Lyu 2023-06-12 8:19 ` Martin Storsjö 2023-06-18 8:25 ` Logan.Lyu 2023-07-01 21:28 ` Martin Storsjö [this message] 2023-07-13 14:54 ` Logan.Lyu 2023-07-14 9:28 ` Martin Storsjö 2023-06-12 7:47 ` [FFmpeg-devel] [PATCH 1/5] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_pixels Martin Storsjö 2023-06-18 8:29 ` Logan.Lyu 2023-07-01 21:16 ` Martin Storsjö
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=41f13b-1e5b-78c7-8c3a-6b21e0fee096@martin.st \ --to=martin@martin.st \ --cc=Logan.Lyu@myais.com.cn \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git