From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 8646647E00 for ; Thu, 28 Dec 2023 08:22:21 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E232068CC86; Thu, 28 Dec 2023 10:22:01 +0200 (EET) Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BDE0668CBD2 for ; Thu, 28 Dec 2023 10:21:59 +0200 (EET) Received: from loongson.cn (unknown [36.33.26.33]) by gateway (Coremail) with SMTP id _____8CxePAkMI1lIwAAAA--.11S3; Thu, 28 Dec 2023 16:21:56 +0800 (CST) Received: from localhost (unknown [36.33.26.33]) by localhost.localdomain (Coremail) with SMTP id AQAAf8AxzuT+L41lRucNAA--.48205S3; Thu, 28 Dec 2023 16:21:18 +0800 (CST) From: jinbo To: ffmpeg-devel@ffmpeg.org Date: Thu, 28 Dec 2023 16:21:00 +0800 Message-Id: <20231228082105.31311-2-jinbo@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231228082105.31311-1-jinbo@loongson.cn> References: <20231228082105.31311-1-jinbo@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8AxzuT+L41lRucNAA--.48205S3 X-CM-SenderInfo: xmlqu0o6or00hjvr0hdfq/1tbiAQATEmWM2y8HQQABsa X-Coremail-Antispam: 1Uk129KBj93XoW3AFyDtr4rtrWkCr17Aw15WrX_yoWxAF4DpF 9FvwnxGw1kWr9I9wnrKry5XF1j9rZaga4agFW3try29rWUXryjvw1DJF97XFyDXwn5ArWr X3Zaq343C3W7K3gCm3ZEXasCq-sJn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUyEb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r106r15M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Jr0_JF4l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Jr0_Gr1l84ACjcxK6I8E87Iv67AKxVWUJVW8JwA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_ Jr0_Gr1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqjxCEc2xF0cIa020Ex4CE44I27wAqx4 xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_JF0_Jw1lYx0Ex4A2jsIE14v2 6r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwCF04k20xvY0x0EwI xGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480 Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jrv_JF1lIxkGc2Ij64vIr41lIxAIcVC0I7 IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Jr0_Gr1lIxAIcVCF04k2 6cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxV AFwI0_Jr0_GrUvcSsGvfC2KfnxnUUI43ZEXa7IU8wNVDUUUUU== Subject: [FFmpeg-devel] [PATCH v3 2/7] avcodec/hevc: Add add_residual_4/8/16/32 asm opt X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: jinbo Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: After this patch, the peformance of decoding H265 4K 30FPS 30Mbps on 3A6000 with 8 threads improves 2fps (45fps-->47fsp). --- libavcodec/loongarch/Makefile | 3 +- libavcodec/loongarch/hevc_add_res.S | 162 ++++++++++++++++++ libavcodec/loongarch/hevcdsp_init_loongarch.c | 5 + libavcodec/loongarch/hevcdsp_lsx.h | 5 + 4 files changed, 174 insertions(+), 1 deletion(-) create mode 100644 libavcodec/loongarch/hevc_add_res.S diff --git a/libavcodec/loongarch/Makefile b/libavcodec/loongarch/Makefile index 06cfab5c20..07ea97f803 100644 --- a/libavcodec/loongarch/Makefile +++ b/libavcodec/loongarch/Makefile @@ -27,7 +27,8 @@ LSX-OBJS-$(CONFIG_HEVC_DECODER) += loongarch/hevcdsp_lsx.o \ loongarch/hevc_lpf_sao_lsx.o \ loongarch/hevc_mc_bi_lsx.o \ loongarch/hevc_mc_uni_lsx.o \ - loongarch/hevc_mc_uniw_lsx.o + loongarch/hevc_mc_uniw_lsx.o \ + loongarch/hevc_add_res.o LSX-OBJS-$(CONFIG_H264DSP) += loongarch/h264idct.o \ loongarch/h264idct_loongarch.o \ loongarch/h264dsp.o diff --git a/libavcodec/loongarch/hevc_add_res.S b/libavcodec/loongarch/hevc_add_res.S new file mode 100644 index 0000000000..dd2d820af8 --- /dev/null +++ b/libavcodec/loongarch/hevc_add_res.S @@ -0,0 +1,162 @@ +/* + * Loongson LSX optimized add_residual functions for HEVC decoding + * + * Copyright (c) 2023 Loongson Technology Corporation Limited + * Contributed by jinbo + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "loongson_asm.S" + +/* + * void ff_hevc_add_residual4x4_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride) + */ +.macro ADD_RES_LSX_4x4_8 + vldrepl.w vr0, a0, 0 + add.d t0, a0, a2 + vldrepl.w vr1, t0, 0 + vld vr2, a1, 0 + + vilvl.w vr1, vr1, vr0 + vsllwil.hu.bu vr1, vr1, 0 + vadd.h vr1, vr1, vr2 + vssrani.bu.h vr1, vr1, 0 + + vstelm.w vr1, a0, 0, 0 + vstelm.w vr1, t0, 0, 1 +.endm + +function ff_hevc_add_residual4x4_8_lsx + ADD_RES_LSX_4x4_8 + alsl.d a0, a2, a0, 1 + addi.d a1, a1, 16 + ADD_RES_LSX_4x4_8 +endfunc + +/* + * void ff_hevc_add_residual8x8_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride) + */ +.macro ADD_RES_LSX_8x8_8 + vldrepl.d vr0, a0, 0 + add.d t0, a0, a2 + vldrepl.d vr1, t0, 0 + add.d t1, t0, a2 + vldrepl.d vr2, t1, 0 + add.d t2, t1, a2 + vldrepl.d vr3, t2, 0 + + vld vr4, a1, 0 + addi.d t3, zero, 16 + vldx vr5, a1, t3 + addi.d t4, a1, 32 + vld vr6, t4, 0 + vldx vr7, t4, t3 + + vsllwil.hu.bu vr0, vr0, 0 + vsllwil.hu.bu vr1, vr1, 0 + vsllwil.hu.bu vr2, vr2, 0 + vsllwil.hu.bu vr3, vr3, 0 + vadd.h vr0, vr0, vr4 + vadd.h vr1, vr1, vr5 + vadd.h vr2, vr2, vr6 + vadd.h vr3, vr3, vr7 + vssrani.bu.h vr1, vr0, 0 + vssrani.bu.h vr3, vr2, 0 + + vstelm.d vr1, a0, 0, 0 + vstelm.d vr1, t0, 0, 1 + vstelm.d vr3, t1, 0, 0 + vstelm.d vr3, t2, 0, 1 +.endm + +function ff_hevc_add_residual8x8_8_lsx + ADD_RES_LSX_8x8_8 + alsl.d a0, a2, a0, 2 + addi.d a1, a1, 64 + ADD_RES_LSX_8x8_8 +endfunc + +/* + * void ff_hevc_add_residual16x16_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride) + */ +function ff_hevc_add_residual16x16_8_lsx +.rept 8 + vld vr0, a0, 0 + vldx vr2, a0, a2 + + vld vr4, a1, 0 + addi.d t0, zero, 16 + vldx vr5, a1, t0 + addi.d t1, a1, 32 + vld vr6, t1, 0 + vldx vr7, t1, t0 + + vexth.hu.bu vr1, vr0 + vsllwil.hu.bu vr0, vr0, 0 + vexth.hu.bu vr3, vr2 + vsllwil.hu.bu vr2, vr2, 0 + vadd.h vr0, vr0, vr4 + vadd.h vr1, vr1, vr5 + vadd.h vr2, vr2, vr6 + vadd.h vr3, vr3, vr7 + + vssrani.bu.h vr1, vr0, 0 + vssrani.bu.h vr3, vr2, 0 + + vst vr1, a0, 0 + vstx vr3, a0, a2 + + alsl.d a0, a2, a0, 1 + addi.d a1, a1, 64 +.endr +endfunc + +/* + * void ff_hevc_add_residual32x32_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride) + */ +function ff_hevc_add_residual32x32_8_lsx +.rept 32 + vld vr0, a0, 0 + addi.w t0, zero, 16 + vldx vr2, a0, t0 + + vld vr4, a1, 0 + vldx vr5, a1, t0 + addi.d t1, a1, 32 + vld vr6, t1, 0 + vldx vr7, t1, t0 + + vexth.hu.bu vr1, vr0 + vsllwil.hu.bu vr0, vr0, 0 + vexth.hu.bu vr3, vr2 + vsllwil.hu.bu vr2, vr2, 0 + vadd.h vr0, vr0, vr4 + vadd.h vr1, vr1, vr5 + vadd.h vr2, vr2, vr6 + vadd.h vr3, vr3, vr7 + + vssrani.bu.h vr1, vr0, 0 + vssrani.bu.h vr3, vr2, 0 + + vst vr1, a0, 0 + vstx vr3, a0, t0 + + add.d a0, a0, a2 + addi.d a1, a1, 64 +.endr +endfunc diff --git a/libavcodec/loongarch/hevcdsp_init_loongarch.c b/libavcodec/loongarch/hevcdsp_init_loongarch.c index 5a96f3a4c9..a8f753dc86 100644 --- a/libavcodec/loongarch/hevcdsp_init_loongarch.c +++ b/libavcodec/loongarch/hevcdsp_init_loongarch.c @@ -189,6 +189,11 @@ void ff_hevc_dsp_init_loongarch(HEVCDSPContext *c, const int bit_depth) c->idct[1] = ff_hevc_idct_8x8_lsx; c->idct[2] = ff_hevc_idct_16x16_lsx; c->idct[3] = ff_hevc_idct_32x32_lsx; + + c->add_residual[0] = ff_hevc_add_residual4x4_8_lsx; + c->add_residual[1] = ff_hevc_add_residual8x8_8_lsx; + c->add_residual[2] = ff_hevc_add_residual16x16_8_lsx; + c->add_residual[3] = ff_hevc_add_residual32x32_8_lsx; } } } diff --git a/libavcodec/loongarch/hevcdsp_lsx.h b/libavcodec/loongarch/hevcdsp_lsx.h index 0d54196caf..ac509984fd 100644 --- a/libavcodec/loongarch/hevcdsp_lsx.h +++ b/libavcodec/loongarch/hevcdsp_lsx.h @@ -227,4 +227,9 @@ void ff_hevc_idct_8x8_lsx(int16_t *coeffs, int col_limit); void ff_hevc_idct_16x16_lsx(int16_t *coeffs, int col_limit); void ff_hevc_idct_32x32_lsx(int16_t *coeffs, int col_limit); +void ff_hevc_add_residual4x4_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride); +void ff_hevc_add_residual8x8_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride); +void ff_hevc_add_residual16x16_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride); +void ff_hevc_add_residual32x32_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride); + #endif // #ifndef AVCODEC_LOONGARCH_HEVCDSP_LSX_H -- 2.20.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".