From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id CC6874585E for ; Fri, 24 Feb 2023 07:43:36 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 757B068C0FB; Fri, 24 Feb 2023 09:43:33 +0200 (EET) Received: from out203-205-221-240.mail.qq.com (out203-205-221-240.mail.qq.com [203.205.221.240]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1B28668C0D7 for ; Fri, 24 Feb 2023 09:43:25 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1677224596; bh=vYeFsEjc3tmtEurwQuipEbX0heEcqK5C5x8criGjPL0=; h=From:To:Cc:Subject:Date; b=eBIxFyNjItCReacuzOVIDfrKuLl4xa746ErQnfdMmNJbFEkVQr70BCNFtHhGkybvf xqoqWmFREDm+itpr6Gjwduc+557PmO9Rfg0GlA41avXoEOj+9WCDVyXMfyDfu/Nrzm TEeJNBH8Vv61GCP0+46ZlP6YFnna7HV1kCKcNE6s= Received: from localhost.localdomain ([113.65.131.200]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id ACF3AC3C; Fri, 24 Feb 2023 15:43:15 +0800 X-QQ-mid: xmsmtpt1677224595tzldyj7k3 Message-ID: X-QQ-XMAILINFO: Mm0jDPjQYDdl3PcT67Y+Inzm0dXP+AUjRNbyDw5K6owwWNsFj79srcjCuAOjFG 0dwEppKLQJZrNny0BLMPMfdYC6F+ux99Ty8GU7YWqC8StRMJ4FG2rRFx8ZzY63Lyr8MOtg6ESjvQ EwYWhO2Zj65PxKJT3Iaz+5iG/yHdZ4bJldKEHuXMKX6etBVdcW2QR7TYAAzbEcFIQtUIUopy/XIn oTZJuuk/IqytjGOeW/RZj/4PnEEunXlQenl769LA1GCoep8OcCsKGMay9nr90hzUm/HYb1c6ET5g eMiLxdup4i+yoWtdhApMf/Sxfghd43jHxPZdiZTmcp/kpnyi3xJEyDuYxDhj9IUpbGqGpD3A+l77 rQwetZut5NyR4UIoVPbOx/OZI9J1G/cWpPOq1O0eGJliYiRBZDwpPKY+R6eby07Ssnk8c1o2gfk5 qEcaddhVSmrWG9lCbZE6mhy/2c8c9rYbSywFUnSCzuXIDmYvPnj2IJknKyFSWR/DPRBfRjPVYxzb aVwqkO15qL24J+BW95Xegr9s5RuajNGL8gSrZ/6M01frhGIt6PAFtSJUq0xZGgDD4pcN3GlQTizn OytKeE0j6/cflTIWwm8N4cScLhZbdi8BpOboazIV8S7rMv6N0XB8FwtnR7xWVwZQiuEma5732Oqx jphq0hB6OigMP3sebXhyocTD29fX/FqyLxKGXVH9e2erxWngwEELWs+7xRrXQfmdXgTnkWtG0QKm SddtRML7/GsT7/QgSTOWGNC1CQ2TsPNN9CRqYYxBiTjFJnoSWQqX8YTfK/0nhyI5xtO8zl9Xj9ui 6kNBsXVG3YMJJj8b3CoQbKiL13HUhDlZtuXS0iQNKXyCZGAfyehFKvoYhuttvXod4NMfLJwMpF5A /ZXITo0GwnfxhhUaziqZvH/hER3WSjfKILwaJakcpeAab7LNylnswKAyVYA0P4F7IeqH3o/fRRyh LMOxzOUPhs07HoYLGobA== From: xufuji456 <839789740@qq.com> To: ffmpeg-devel@ffmpeg.org Date: Fri, 24 Feb 2023 15:43:13 +0800 X-OQ-MSGID: <20230224074313.11631-1-839789740@qq.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] libavcodec/hevc: add hevc idct4x4 neon of aarch64 after fixed X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: xufuji456 <839789740@qq.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --- libavcodec/aarch64/hevcdsp_idct_neon.S | 40 +++++++++++++++++++++++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 4 +++ 2 files changed, 44 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S index 124c50998a..f5135160b6 100644 --- a/libavcodec/aarch64/hevcdsp_idct_neon.S +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -245,6 +245,43 @@ function hevc_add_residual_32x32_16_neon, export=0 ret endfunc +.macro tr_4x4 in0, in1, in2, in3, out0, out1, out2, out3, shift + sshll v20.4s, \in0, #6 + sshll v21.4s, \in0, #6 + smull v22.4s, \in1, v4.h[1] + smull v23.4s, \in1, v4.h[3] + smlal v20.4s, \in2, v4.h[0] //e0 + smlsl v21.4s, \in2, v4.h[0] //e1 + smlal v22.4s, \in3, v4.h[3] //o0 + smlsl v23.4s, \in3, v4.h[1] //o1 + + add v24.4s, v20.4s, v22.4s + sub v20.4s, v20.4s, v22.4s + add v22.4s, v21.4s, v23.4s + sub v21.4s, v21.4s, v23.4s + sqrshrn \out0, v24.4s, #\shift + sqrshrn \out3, v20.4s, #\shift + sqrshrn \out1, v22.4s, #\shift + sqrshrn \out2, v21.4s, #\shift +.endm + +.macro idct_4x4 bitdepth +function ff_hevc_idct_4x4_\bitdepth\()_neon, export=1 + ld1 {v0.4h-v3.4h}, [x0] + + movrel x1, trans + ld1 {v4.4h}, [x1] + + tr_4x4 v0.4h, v1.4h, v2.4h, v3.4h, v16.4h, v17.4h, v18.4h, v19.4h, 7 + transpose_4x8H v16, v17, v18, v19, v26, v27, v28, v29 + + tr_4x4 v16.4h, v17.4h, v18.4h, v19.4h, v0.4h, v1.4h, v2.4h, v3.4h, 20 - \bitdepth + transpose_4x8H v0, v1, v2, v3, v26, v27, v28, v29 + st1 {v0.4h-v3.4h}, [x0] + ret +endfunc +.endm + .macro sum_sub out, in, c, op, p .ifc \op, + smlal\p \out, \in, \c @@ -578,6 +615,9 @@ function ff_hevc_idct_16x16_\bitdepth\()_neon, export=1 endfunc .endm +idct_4x4 8 +idct_4x4 10 + idct_8x8 8 idct_8x8 10 diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 88a797f393..1deefca0a2 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -49,6 +49,8 @@ void ff_hevc_add_residual_32x32_10_neon(uint8_t *_dst, const int16_t *coeffs, ptrdiff_t stride); void ff_hevc_add_residual_32x32_12_neon(uint8_t *_dst, const int16_t *coeffs, ptrdiff_t stride); +void ff_hevc_idct_4x4_8_neon(int16_t *coeffs, int col_limit); +void ff_hevc_idct_4x4_10_neon(int16_t *coeffs, int col_limit); void ff_hevc_idct_8x8_8_neon(int16_t *coeffs, int col_limit); void ff_hevc_idct_8x8_10_neon(int16_t *coeffs, int col_limit); void ff_hevc_idct_16x16_8_neon(int16_t *coeffs, int col_limit); @@ -119,6 +121,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->add_residual[1] = ff_hevc_add_residual_8x8_8_neon; c->add_residual[2] = ff_hevc_add_residual_16x16_8_neon; c->add_residual[3] = ff_hevc_add_residual_32x32_8_neon; + c->idct[0] = ff_hevc_idct_4x4_8_neon; c->idct[1] = ff_hevc_idct_8x8_8_neon; c->idct[2] = ff_hevc_idct_16x16_8_neon; c->idct_dc[0] = ff_hevc_idct_4x4_dc_8_neon; @@ -168,6 +171,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->add_residual[1] = ff_hevc_add_residual_8x8_10_neon; c->add_residual[2] = ff_hevc_add_residual_16x16_10_neon; c->add_residual[3] = ff_hevc_add_residual_32x32_10_neon; + c->idct[0] = ff_hevc_idct_4x4_10_neon; c->idct[1] = ff_hevc_idct_8x8_10_neon; c->idct[2] = ff_hevc_idct_16x16_10_neon; c->idct_dc[0] = ff_hevc_idct_4x4_dc_10_neon; -- 2.32.0 (Apple Git-132) _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".