From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTP id E6265444AF
	for <ffmpegdev@gitmailbox.com>; Mon, 12 Jun 2023 07:48:12 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E06A868BF75;
	Mon, 12 Jun 2023 10:48:08 +0300 (EEST)
Received: from mail8.parnet.fi (mail8.parnet.fi [77.234.108.134])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id BB6B868C1EC
 for <ffmpeg-devel@ffmpeg.org>; Mon, 12 Jun 2023 10:48:01 +0300 (EEST)
Received: from mail9.parnet.fi (mail9.parnet.fi [77.234.108.21])
 by mail8.parnet.fi  with ESMTP id 35C7lxMH019121-35C7lxMI019121;
 Mon, 12 Jun 2023 10:48:00 +0300
Received: from foo.martin.st (host-97-187.parnet.fi [77.234.97.187])
 by mail9.parnet.fi (Postfix) with ESMTPS id 8DA7AA145F;
 Mon, 12 Jun 2023 10:47:57 +0300 (EEST)
Date: Mon, 12 Jun 2023 10:47:54 +0300 (EEST)
From: =?ISO-8859-15?Q?Martin_Storsj=F6?= <martin@martin.st>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
In-Reply-To: <20230604041756.5196-1-Logan.Lyu@myais.com.cn>
Message-ID: <e73dd44a-88d-6c66-a642-f964d3f45729@martin.st>
References: <20230604041756.5196-1-Logan.Lyu@myais.com.cn>
MIME-Version: 1.0
X-FE-Policy-ID: 3:14:2:SYSTEM
Subject: Re: [FFmpeg-devel] [PATCH 1/5] lavc/aarch64: new optimization for
 8-bit hevc_pel_uni_pixels
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Logan Lyu <Logan.Lyu@myais.com.cn>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/e73dd44a-88d-6c66-a642-f964d3f45729@martin.st/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

On Sun, 4 Jun 2023, Logan.Lyu@myais.com.cn wrote:

> From: Logan Lyu <Logan.Lyu@myais.com.cn>
>
> Signed-off-by: Logan Lyu <Logan.Lyu@myais.com.cn>
> ---
> libavcodec/aarch64/hevcdsp_init_aarch64.c |   5 ++
> libavcodec/aarch64/hevcdsp_qpel_neon.S    | 104 ++++++++++++++++++++++
> 2 files changed, 109 insertions(+)
>
> diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c
> index 483a9d5253..5a1d520eec 100644
> --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c
> +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c
> @@ -152,6 +152,9 @@ void ff_hevc_put_hevc_qpel_bi_h16_8_neon(uint8_t *_dst, ptrdiff_t _dststride, co
>     void ff_hevc_put_hevc_##fn##32_8_neon##ext args; \
>     void ff_hevc_put_hevc_##fn##64_8_neon##ext args; \
>
> +NEON8_FNPROTO(pel_uni_pixels, (uint8_t *_dst, ptrdiff_t _dststride,
> +        const uint8_t *_src, ptrdiff_t _srcstride,
> +        int height, intptr_t mx, intptr_t my, int width),);
>
> NEON8_FNPROTO(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride,
>         const uint8_t *_src, ptrdiff_t _srcstride,
> @@ -263,6 +266,8 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth)
>         c->put_hevc_qpel_bi[8][0][1]   =
>         c->put_hevc_qpel_bi[9][0][1]   = ff_hevc_put_hevc_qpel_bi_h16_8_neon;
>
> +        NEON8_FNASSIGN(c->put_hevc_epel_uni, 0, 0, pel_uni_pixels,);
> +        NEON8_FNASSIGN(c->put_hevc_qpel_uni, 0, 0, pel_uni_pixels,);
>         NEON8_FNASSIGN(c->put_hevc_epel_uni_w, 0, 0, pel_uni_w_pixels,);
>         NEON8_FNASSIGN(c->put_hevc_qpel_uni_w, 0, 0, pel_uni_w_pixels,);
>         NEON8_FNASSIGN_PARTIAL_4(c->put_hevc_qpel_uni_w, 1, 0, qpel_uni_w_v,);
> diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S
> index ed659cfe9b..6ca05b7201 100644
> --- a/libavcodec/aarch64/hevcdsp_qpel_neon.S
> +++ b/libavcodec/aarch64/hevcdsp_qpel_neon.S
> @@ -490,6 +490,110 @@ put_hevc qpel
> put_hevc qpel_uni
> put_hevc qpel_bi
>
> +function ff_hevc_put_hevc_pel_uni_pixels4_8_neon, export=1
> +1:
> +        ldr             s0, [x2]
> +        ldr             s1, [x2, x3]
> +        add             x2, x2, x3, lsl #1
> +        str             s0, [x0]
> +        str             s1, [x0, x1]
> +        add             x0, x0, x1, lsl #1
> +        subs            w4, w4, #2
> +        b.hi            1b
> +        ret
> +endfunc

In a loop like this, I would recommend moving the "subs" instruction 
further away from the branch that depends on it. For cores with in-order 
execution, it does matter a fair bit, while it probably doesn't for cores 
with out-of-order execution. Here, the ideal location probably is after 
the two loads at the start. The same thing goes for all the other 
functions in this patch.

Other than that, this looks ok.

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".