From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 9135444F6A for ; Mon, 12 Jun 2023 08:00:03 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E38CE68C33D; Mon, 12 Jun 2023 11:00:00 +0300 (EEST) Received: from mail8.parnet.fi (mail8.parnet.fi [77.234.108.134]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C7D5C68C165 for ; Mon, 12 Jun 2023 10:59:53 +0300 (EEST) Received: from mail9.parnet.fi (mail9.parnet.fi [77.234.108.21]) by mail8.parnet.fi with ESMTP id 35C7xrGk019894-35C7xrGl019894; Mon, 12 Jun 2023 10:59:53 +0300 Received: from foo.martin.st (host-97-187.parnet.fi [77.234.97.187]) by mail9.parnet.fi (Postfix) with ESMTPS id 271E1A145F; Mon, 12 Jun 2023 10:59:50 +0300 (EEST) Date: Mon, 12 Jun 2023 10:59:50 +0300 (EEST) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: FFmpeg development discussions and patches In-Reply-To: <20230604041756.5196-2-Logan.Lyu@myais.com.cn> Message-ID: <4dd1f09f-188b-d9ff-c8ad-4950bab5b661@martin.st> References: <20230604041756.5196-1-Logan.Lyu@myais.com.cn> <20230604041756.5196-2-Logan.Lyu@myais.com.cn> MIME-Version: 1.0 X-FE-Policy-ID: 3:14:2:SYSTEM Subject: Re: [FFmpeg-devel] [PATCH 2/5] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_h X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Logan Lyu Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Sun, 4 Jun 2023, Logan.Lyu@myais.com.cn wrote: > From: Logan Lyu > > Signed-off-by: Logan Lyu > --- > libavcodec/aarch64/Makefile | 1 + > libavcodec/aarch64/hevcdsp_epel_neon.S | 378 ++++++++++++++++++++++ > libavcodec/aarch64/hevcdsp_init_aarch64.c | 7 +- > 3 files changed, 385 insertions(+), 1 deletion(-) > create mode 100644 libavcodec/aarch64/hevcdsp_epel_neon.S > > diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile > index 216191640c..cb428b49e0 100644 > --- a/libavcodec/aarch64/Makefile > +++ b/libavcodec/aarch64/Makefile > @@ -69,4 +69,5 @@ NEON-OBJS-$(CONFIG_HEVC_DECODER) += aarch64/hevcdsp_deblock_neon.o \ > aarch64/hevcdsp_idct_neon.o \ > aarch64/hevcdsp_init_aarch64.o \ > aarch64/hevcdsp_qpel_neon.o \ > + aarch64/hevcdsp_epel_neon.o \ > aarch64/hevcdsp_sao_neon.o > diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S b/libavcodec/aarch64/hevcdsp_epel_neon.S > new file mode 100644 > index 0000000000..fe494dd843 > --- /dev/null > +++ b/libavcodec/aarch64/hevcdsp_epel_neon.S > @@ -0,0 +1,378 @@ > +/* -*-arm64-*- > + * vim: syntax=arm64asm > + * > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + */ > + > +#include "libavutil/aarch64/asm.S" > +#define MAX_PB_SIZE 64 > + > +const epel_filters, align=4 > + .byte 0, 0, 0, 0 > + .byte -2, 58, 10, -2 > + .byte -4, 54, 16, -2 > + .byte -6, 46, 28, -4 > + .byte -4, 36, 36, -4 > + .byte -4, 28, 46, -6 > + .byte -2, 16, 54, -4 > + .byte -2, 10, 58, -2 > +endconst > + > +#if HAVE_I8MM > +.macro EPEL_UNI_W_H_HEADER > + ldr x12, [sp] > + sub x2, x2, #1 > + movrel x9, epel_filters > + add x9, x9, x12, lsl #2 > + ldr w11, [x9] > + dup v28.4s, w11 Why not just do "ld1r {v28.4s}, [x9]" here instead, avoiding the indirection via GPRs? Other than that, I think this mostly looks reasonable. Btw, for any assembly patches like these, it would be appreciated if you can provide benchmarks from checkasm, e.g. "checkasm --test=hevc_pel --bench=put_hevc" (or maybe just "--bench") and extract the relevant lines for the functions that you've added/modified, and mention what system you've benchmarked it on. You get the most useful benchmarks for micro-tuning if you can enable userspace access to the timing registers and configure with --disable-linux-perf. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".