From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 400BE4B6C0 for ; Sat, 24 Jan 2026 15:56:56 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'tXgt20kSRnLCvg6v9PZ3c37D50pWmzykAbjzSn1Ms1Y=', expected b'M4n9Ds0pFoOXLglZUeKvj1cZfPc1Mabz6PmrMW1+WXU=')) header.d=linux.alibaba.com header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1769270071; h=to : date : message-id : mime-version : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=/BUOdriusCdWcF9wNRoVxjIoHqSgmysfKR5QzhEIbaM=; b=Y8nPHrxqESaPAFDcD1kiOJW/gx2gqK8+YvgJfoDYqdS+pujW4BJ5v8c+/08nXSnPJCc2i geno0NOcVFueBTdsLm6e1LjgGaBTFH0HynoxlHJbG4amdzq134JlS2Nn07JE+d6Yc/XDpnk JVkzjTyM+SfC1/91G/taVqTFJCCLrUZYbs6oqaNJznirW6cvyWcH7oAsmcvtshWGzYIi5wG CqVe3lgovyDtvtXX6fpY4FlesWPp0QSLaxHNcNDc5OzEs68bIkmG+ExG5+ygqPh1fnQrVRN ZHNyVAk91KJChvgLYs63VwFZU2pbxIp8oFpOoRxgruohg85ZlSaeTe4h9iLQ== Received: from [172.20.0.4] (unknown [172.20.0.4]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 9B086690E88; Sat, 24 Jan 2026 17:54:31 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1769270039; b=BkJ7kMUhkwjn73tKRId3gu0vplZtMf4wAHICbbUWwmwC9VNetLzOeFusSU35a4/vAzu3E cTwt6fUTRuQnCpIzxuwe0+k5gRX1cztxeEzvIwBMW779lvzlEBodebx+OHZLr5+MlNby9cs frJKZtOEmlNa0YTZ3zD8B6xU3Y/5OzRYjzD9Eii1OvifO09llHIouCTWLIjLC6/zmqdFKha nN9wlO81Dc8NLGjauQ6loMb0Uvck3aXhtuV7sOTG1RhMZEQPYgrQmI4VpyzCaDAHveB+jk6 FdNKZccUAK/8dRB6XcvJDc7a13aZF8ZqZVmJDGIIf2/uiJQffFMCnlvz1fBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1769270039; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=tXgt20kSRnLCvg6v9PZ3c37D50pWmzykAbjzSn1Ms1Y=; b=lmWrLmSCmxLiHaqmAJKe+zm7HGNpOYIjzvsFvyR+12rb0UP7R2S0l8bGV9fUQrMGd/TPR MzUjfMrqNN0bvh4CEf4KVSzmJSRx8no+Pd0I4pq+0FFqpVmTb0K7Egvvgei28AeIStECy2n JTmh/GSwYGejsJHCzNeVbC8PtN24WElEz37rr5QuAV91KCuFNuJxaJv2ovCk1vZotrdVhoQ ZXMdj8NnLfviEbmt0oAu6NGcGcRxs7K+dttspdm/+Er8guSOzeuzoCZ58G0oparGkmaCOp8 EvSxXjhiY1eU1rpvDC+XepOCJX4hBuZeXri1p0iRUltE84yWb3lkBoxNFeqQ== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=linux.alibaba.com; arc=none; dmarc=pass header.from=linux.alibaba.com policy.dmarc=none Authentication-Results: ffmpeg.org; dkim=pass header.d=linux.alibaba.com; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=linux.alibaba.com policy.dmarc=none Received: from out30-110.freemail.mail.aliyun.com (out30-110.freemail.mail.aliyun.com [115.124.30.110]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id CECF4690E94 for ; Thu, 22 Jan 2026 06:24:17 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1769055852; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=M4n9Ds0pFoOXLglZUeKvj1cZfPc1Mabz6PmrMW1+WXU=; b=ZDAcMKu9v6kjU0QTRf+5HrYa0dSsqpaAcLNmffiM2N5c+HiI7X/Ej9DFx7UWZ8PUg/5G32DVpP8f8h5XlFf9bpNG4XxhcHmBrheYGJNCmn2dhxuxwhAKLWSNM7YQN3dd2RlyeIGSTi1cEjq3u2IqVCp07QEbKVDSTy2YlAgkhT4= Received: from localhost.localdomain(mailfrom:zhanheng.yang@linux.alibaba.com fp:SMTPD_---0WxazbQD_1769055849 cluster:ay36) by smtp.aliyun-inc.com; Thu, 22 Jan 2026 12:24:10 +0800 To: ffmpeg-devel@ffmpeg.org Date: Thu, 22 Jan 2026 12:23:52 +0800 Message-ID: <20260122042357.1438-1-zhanheng.yang@linux.alibaba.com> X-Mailer: git-send-email 2.41.0.windows.3 MIME-Version: 1.0 X-MailFrom: SRS0=9xR2=73=linux.alibaba.com=zhanheng.yang@ffmpeg.org X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation Message-ID-Hash: PFRAYLJK4CIOTXSAIJGRV4AFOPPRTYTH X-Message-ID-Hash: PFRAYLJK4CIOTXSAIJGRV4AFOPPRTYTH X-Mailman-Approved-At: Sat, 24 Jan 2026 15:53:51 +0000 X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH 1/6] libavcodec/riscv: add RVV optimized for qpel_h in HEVC. List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: "zhanheng.yang--- via ffmpeg-devel" Cc: Zhanheng Yang Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: From: Zhanheng Yang Bench on A210 C908 core(VLEN 128). put_hevc_qpel_h4_8_c: 275.4 ( 1.00x) put_hevc_qpel_h4_8_rvv_i32: 142.9 ( 1.93x) put_hevc_qpel_h6_8_c: 595.2 ( 1.00x) put_hevc_qpel_h6_8_rvv_i32: 209.7 ( 2.84x) put_hevc_qpel_h8_8_c: 1044.0 ( 1.00x) put_hevc_qpel_h8_8_rvv_i32: 287.2 ( 3.63x) put_hevc_qpel_h12_8_c: 2371.0 ( 1.00x) put_hevc_qpel_h12_8_rvv_i32: 419.5 ( 5.65x) put_hevc_qpel_h16_8_c: 4187.2 ( 1.00x) put_hevc_qpel_h16_8_rvv_i32: 530.8 ( 7.89x) put_hevc_qpel_h24_8_c: 9276.4 ( 1.00x) put_hevc_qpel_h24_8_rvv_i32: 1509.6 ( 6.15x) put_hevc_qpel_h32_8_c: 16417.8 ( 1.00x) put_hevc_qpel_h32_8_rvv_i32: 1984.3 ( 8.27x) put_hevc_qpel_h48_8_c: 36812.8 ( 1.00x) put_hevc_qpel_h48_8_rvv_i32: 4390.6 ( 8.38x) put_hevc_qpel_h64_8_c: 65296.8 ( 1.00x) put_hevc_qpel_h64_8_rvv_i32: 7745.0 ( 8.43x) put_hevc_qpel_uni_h4_8_c: 374.8 ( 1.00x) put_hevc_qpel_uni_h4_8_rvv_i32: 162.9 ( 2.30x) put_hevc_qpel_uni_h6_8_c: 818.6 ( 1.00x) put_hevc_qpel_uni_h6_8_rvv_i32: 236.3 ( 3.46x) put_hevc_qpel_uni_h8_8_c: 1504.3 ( 1.00x) put_hevc_qpel_uni_h8_8_rvv_i32: 309.3 ( 4.86x) put_hevc_qpel_uni_h12_8_c: 3239.2 ( 1.00x) put_hevc_qpel_uni_h12_8_rvv_i32: 448.0 ( 7.23x) put_hevc_qpel_uni_h16_8_c: 5702.9 ( 1.00x) put_hevc_qpel_uni_h16_8_rvv_i32: 589.3 ( 9.68x) put_hevc_qpel_uni_h24_8_c: 12741.4 ( 1.00x) put_hevc_qpel_uni_h24_8_rvv_i32: 1650.3 ( 7.72x) put_hevc_qpel_uni_h32_8_c: 22531.3 ( 1.00x) put_hevc_qpel_uni_h32_8_rvv_i32: 2189.1 (10.29x) put_hevc_qpel_uni_h48_8_c: 50647.0 ( 1.00x) put_hevc_qpel_uni_h48_8_rvv_i32: 4817.0 (10.51x) put_hevc_qpel_uni_h64_8_c: 89742.9 ( 1.00x) put_hevc_qpel_uni_h64_8_rvv_i32: 8497.9 (10.56x) put_hevc_qpel_uni_hv4_8_c: 920.4 ( 1.00x) put_hevc_qpel_uni_hv4_8_rvv_i32: 532.1 ( 1.73x) put_hevc_qpel_uni_hv6_8_c: 1753.0 ( 1.00x) put_hevc_qpel_uni_hv6_8_rvv_i32: 691.0 ( 2.54x) put_hevc_qpel_uni_hv8_8_c: 2872.7 ( 1.00x) put_hevc_qpel_uni_hv8_8_rvv_i32: 836.9 ( 3.43x) put_hevc_qpel_uni_hv12_8_c: 5828.4 ( 1.00x) put_hevc_qpel_uni_hv12_8_rvv_i32: 1141.2 ( 5.11x) put_hevc_qpel_uni_hv16_8_c: 9906.7 ( 1.00x) put_hevc_qpel_uni_hv16_8_rvv_i32: 1452.5 ( 6.82x) put_hevc_qpel_uni_hv24_8_c: 20871.3 ( 1.00x) put_hevc_qpel_uni_hv24_8_rvv_i32: 4094.0 ( 5.10x) put_hevc_qpel_uni_hv32_8_c: 36123.3 ( 1.00x) put_hevc_qpel_uni_hv32_8_rvv_i32: 5310.5 ( 6.80x) put_hevc_qpel_uni_hv48_8_c: 79016.0 ( 1.00x) put_hevc_qpel_uni_hv48_8_rvv_i32: 11591.2 ( 6.82x) put_hevc_qpel_uni_hv64_8_c: 138779.8 ( 1.00x) put_hevc_qpel_uni_hv64_8_rvv_i32: 20321.1 ( 6.83x) put_hevc_qpel_uni_w_h4_8_c: 412.1 ( 1.00x) put_hevc_qpel_uni_w_h4_8_rvv_i32: 237.3 ( 1.74x) put_hevc_qpel_uni_w_h6_8_c: 895.9 ( 1.00x) put_hevc_qpel_uni_w_h6_8_rvv_i32: 345.6 ( 2.59x) put_hevc_qpel_uni_w_h8_8_c: 1625.4 ( 1.00x) put_hevc_qpel_uni_w_h8_8_rvv_i32: 452.4 ( 3.59x) put_hevc_qpel_uni_w_h12_8_c: 3541.2 ( 1.00x) put_hevc_qpel_uni_w_h12_8_rvv_i32: 663.6 ( 5.34x) put_hevc_qpel_uni_w_h16_8_c: 6290.3 ( 1.00x) put_hevc_qpel_uni_w_h16_8_rvv_i32: 875.7 ( 7.18x) put_hevc_qpel_uni_w_h24_8_c: 13994.9 ( 1.00x) put_hevc_qpel_uni_w_h24_8_rvv_i32: 2475.0 ( 5.65x) put_hevc_qpel_uni_w_h32_8_c: 24852.3 ( 1.00x) put_hevc_qpel_uni_w_h32_8_rvv_i32: 3291.2 ( 7.55x) put_hevc_qpel_uni_w_h48_8_c: 55595.5 ( 1.00x) put_hevc_qpel_uni_w_h48_8_rvv_i32: 7297.4 ( 7.62x) put_hevc_qpel_uni_w_h64_8_c: 98628.2 ( 1.00x) put_hevc_qpel_uni_w_h64_8_rvv_i32: 12883.2 ( 7.66x) put_hevc_qpel_bi_h4_8_c: 392.6 ( 1.00x) put_hevc_qpel_bi_h4_8_rvv_i32: 186.1 ( 2.11x) put_hevc_qpel_bi_h6_8_c: 842.3 ( 1.00x) put_hevc_qpel_bi_h6_8_rvv_i32: 267.8 ( 3.15x) put_hevc_qpel_bi_h8_8_c: 1546.4 ( 1.00x) put_hevc_qpel_bi_h8_8_rvv_i32: 353.7 ( 4.37x) put_hevc_qpel_bi_h12_8_c: 3317.2 ( 1.00x) put_hevc_qpel_bi_h12_8_rvv_i32: 515.1 ( 6.44x) put_hevc_qpel_bi_h16_8_c: 5848.3 ( 1.00x) put_hevc_qpel_bi_h16_8_rvv_i32: 680.9 ( 8.59x) put_hevc_qpel_bi_h24_8_c: 13032.6 ( 1.00x) put_hevc_qpel_bi_h24_8_rvv_i32: 1880.8 ( 6.93x) put_hevc_qpel_bi_h32_8_c: 23021.1 ( 1.00x) put_hevc_qpel_bi_h32_8_rvv_i32: 2498.5 ( 9.21x) put_hevc_qpel_bi_h48_8_c: 51655.9 ( 1.00x) put_hevc_qpel_bi_h48_8_rvv_i32: 5486.3 ( 9.42x) put_hevc_qpel_bi_h64_8_c: 91738.7 ( 1.00x) put_hevc_qpel_bi_h64_8_rvv_i32: 9735.0 ( 9.42x) Signed-off-by: Zhanheng Yang --- libavcodec/riscv/Makefile | 3 +- libavcodec/riscv/h26x/h2656dsp.h | 12 ++ libavcodec/riscv/h26x/hevcqpel_rvv.S | 309 +++++++++++++++++++++++++++ libavcodec/riscv/hevcdsp_init.c | 55 +++-- 4 files changed, 364 insertions(+), 15 deletions(-) create mode 100644 libavcodec/riscv/h26x/hevcqpel_rvv.S diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index 2c53334923..414790ae0c 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -36,7 +36,8 @@ RVV-OBJS-$(CONFIG_H264DSP) += riscv/h264addpx_rvv.o riscv/h264dsp_rvv.o \ OBJS-$(CONFIG_H264QPEL) += riscv/h264qpel_init.o RVV-OBJS-$(CONFIG_H264QPEL) += riscv/h264qpel_rvv.o OBJS-$(CONFIG_HEVC_DECODER) += riscv/hevcdsp_init.o -RVV-OBJS-$(CONFIG_HEVC_DECODER) += riscv/h26x/h2656_inter_rvv.o +RVV-OBJS-$(CONFIG_HEVC_DECODER) += riscv/h26x/h2656_inter_rvv.o \ + riscv/h26x/hevcqpel_rvv.o OBJS-$(CONFIG_HUFFYUV_DECODER) += riscv/huffyuvdsp_init.o RVV-OBJS-$(CONFIG_HUFFYUV_DECODER) += riscv/huffyuvdsp_rvv.o OBJS-$(CONFIG_IDCTDSP) += riscv/idctdsp_init.o diff --git a/libavcodec/riscv/h26x/h2656dsp.h b/libavcodec/riscv/h26x/h2656dsp.h index 6d2ac55556..028b9ffbfd 100644 --- a/libavcodec/riscv/h26x/h2656dsp.h +++ b/libavcodec/riscv/h26x/h2656dsp.h @@ -1,5 +1,6 @@ /* * Copyright (c) 2024 Institute of Software Chinese Academy of Sciences (ISCAS). + * Copyright (C) 2026 Alibaba Group Holding Limited. * * This file is part of FFmpeg. * @@ -24,4 +25,15 @@ void ff_h2656_put_pixels_8_rvv_256(int16_t *dst, const uint8_t *src, ptrdiff_t srcstride, int height, intptr_t mx, intptr_t my, int width); void ff_h2656_put_pixels_8_rvv_128(int16_t *dst, const uint8_t *src, ptrdiff_t srcstride, int height, intptr_t mx, intptr_t my, int width); +void ff_hevc_put_qpel_h_8_m1_rvv(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, + intptr_t mx, intptr_t my, int width); +void ff_hevc_put_qpel_uni_h_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, + ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); +void ff_hevc_put_qpel_uni_w_h_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width); +void ff_hevc_put_qpel_bi_h_8_m1_rvv(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, + ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t + mx, intptr_t my, int width); #endif diff --git a/libavcodec/riscv/h26x/hevcqpel_rvv.S b/libavcodec/riscv/h26x/hevcqpel_rvv.S new file mode 100644 index 0000000000..52d7acac33 --- /dev/null +++ b/libavcodec/riscv/h26x/hevcqpel_rvv.S @@ -0,0 +1,309 @@ + /* + * Copyright (C) 2026 Alibaba Group Holding Limited. + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ +.data +.align 2 +qpel_filters: + .byte 0, 0, 0, 0, 0, 0, 0, 0 + .byte -1, 4, -10, 58, 17, -5, 1, 0 + .byte -1, 4, -11, 40, 40, -11, 4, -1 + .byte 0, 1, -5, 17, 58, -10, 4, -1 + +.text +#include "libavutil/riscv/asm.S" +#define HEVC_MAX_PB_SIZE 64 + +.macro lx rd, addr +#if (__riscv_xlen == 32) + lw \rd, \addr +#elif (__riscv_xlen == 64) + ld \rd, \addr +#else + lq \rd, \addr +#endif +.endm + +.macro sx rd, addr +#if (__riscv_xlen == 32) + sw \rd, \addr +#elif (__riscv_xlen == 64) + sd \rd, \addr +#else + sq \rd, \addr +#endif +.endm + +/* clobbers t0, t1 */ +.macro load_filter m + la t0, qpel_filters + slli t1, \m, 3 + add t0, t0, t1 + lb s1, 0(t0) + lb s2, 1(t0) + lb s3, 2(t0) + lb s4, 3(t0) + lb s5, 4(t0) + lb s6, 5(t0) + lb s7, 6(t0) + lb s8, 7(t0) +.endm + +/* output is unclipped; clobbers t4 */ +.macro filter_h vdst, vsrc0, vsrc1, vsrc2, vsrc3, vsrc4, vsrc5, vsrc6, vsrc7, src + addi t4, \src, -3 + vle8.v \vsrc0, (t4) + addi t4, \src, -2 + vmv.v.x \vsrc3, s1 + vwmulsu.vv \vdst, \vsrc3, \vsrc0 + vle8.v \vsrc1, (t4) + addi t4, \src, -1 + vle8.v \vsrc2, (t4) + vle8.v \vsrc3, (\src) + addi t4, \src, 1 + vle8.v \vsrc4, (t4) + addi t4, \src, 2 + vle8.v \vsrc5, (t4) + addi t4, \src, 3 + vle8.v \vsrc6, (t4) + addi t4, \src, 4 + vle8.v \vsrc7, (t4) + + vwmaccsu.vx \vdst, s2, \vsrc1 + vwmaccsu.vx \vdst, s3, \vsrc2 + vwmaccsu.vx \vdst, s4, \vsrc3 + vwmaccsu.vx \vdst, s5, \vsrc4 + vwmaccsu.vx \vdst, s6, \vsrc5 + vwmaccsu.vx \vdst, s7, \vsrc6 + vwmaccsu.vx \vdst, s8, \vsrc7 +.endm + +.macro vreg + +.endm + +.macro hevc_qpel_h lmul, lmul2, lmul4 +func ff_hevc_put_qpel_h_8_\lmul\()_rvv, zve32x + addi sp, sp, -64 + sx s1, 0(sp) + sx s2, 8(sp) + sx s3, 16(sp) + sx s4, 24(sp) + sx s5, 32(sp) + sx s6, 40(sp) + sx s7, 48(sp) + sx s8, 56(sp) + load_filter a4 + mv t3, a6 + li t1, 0 # offset + +1: + vsetvli t6, t3, e8, \lmul, ta, ma + add t2, a1, t1 + filter_h v0, v16, v18, v20, v22, v24, v26, v28, v30, t2 + vsetvli zero, zero, e16, \lmul2, ta, ma + slli t2, t1, 1 + add t2, a0, t2 + vse16.v v0, (t2) + sub t3, t3, t6 + add t1, t1, t6 + bgt t3, zero, 1b + addi a3, a3, -1 + mv t3, a6 + add a1, a1, a2 + addi a0, a0, 2*HEVC_MAX_PB_SIZE + li t1, 0 + bgt a3, zero, 1b + + lx s1, 0(sp) + lx s2, 8(sp) + lx s3, 16(sp) + lx s4, 24(sp) + lx s5, 32(sp) + lx s6, 40(sp) + lx s7, 48(sp) + lx s8, 56(sp) + addi sp, sp, 64 + ret +endfunc + +func ff_hevc_put_qpel_uni_h_8_\lmul\()_rvv, zve32x + csrwi vxrm, 0 + addi sp, sp, -64 + sx s1, 0(sp) + sx s2, 8(sp) + sx s3, 16(sp) + sx s4, 24(sp) + sx s5, 32(sp) + sx s6, 40(sp) + sx s7, 48(sp) + sx s8, 56(sp) + load_filter a5 + mv t3, a7 + li t1, 0 # offset + +1: + vsetvli t6, t3, e8, \lmul, ta, ma + add t2, a2, t1 + filter_h v0, v16, v18, v20, v22, v24, v26, v28, v30, t2 + vsetvli zero, zero, e16, \lmul2, ta, ma + vmax.vx v0, v0, zero + vsetvli zero, zero, e8, \lmul, ta, ma + vnclipu.wi v0, v0, 6 + add t2, a0, t1 + vse8.v v0, (t2) + sub t3, t3, t6 + add t1, t1, t6 + bgt t3, zero, 1b + addi a4, a4, -1 + mv t3, a7 + add a2, a2, a3 + add a0, a0, a1 + li t1, 0 + bgt a4, zero, 1b + + lx s1, 0(sp) + lx s2, 8(sp) + lx s3, 16(sp) + lx s4, 24(sp) + lx s5, 32(sp) + lx s6, 40(sp) + lx s7, 48(sp) + lx s8, 56(sp) + addi sp, sp, 64 + ret +endfunc + +func ff_hevc_put_qpel_uni_w_h_8_\lmul\()_rvv, zve32x + csrwi vxrm, 0 + lx t2, 0(sp) # mx + addi a5, a5, 6 # shift +#if (__riscv_xlen == 32) + lw t3, 8(sp) # width +#elif (__riscv_xlen == 64) + lw t3, 16(sp) +#endif + addi sp, sp, -64 + sx s1, 0(sp) + sx s2, 8(sp) + sx s3, 16(sp) + sx s4, 24(sp) + sx s5, 32(sp) + sx s6, 40(sp) + sx s7, 48(sp) + sx s8, 56(sp) + load_filter t2 + li t2, 0 # offset + +1: + vsetvli t6, t3, e8, \lmul, ta, ma + add t1, a2, t2 + filter_h v8, v16, v18, v20, v22, v24, v26, v28, v30, t1 + vsetvli zero, zero, e16, \lmul2, ta, ma + vwmul.vx v0, v8, a6 + vsetvli zero, zero, e32, \lmul4, ta, ma + vssra.vx v0, v0, a5 + vsadd.vx v0, v0, a7 + vmax.vx v0, v0, zero + vsetvli zero, zero, e16, \lmul2, ta, ma + vnclip.wi v0, v0, 0 + vsetvli zero, zero, e8, \lmul, ta, ma + vnclipu.wi v0, v0, 0 + add t1, a0, t2 + vse8.v v0, (t1) + sub t3, t3, t6 + add t2, t2, t6 + bgt t3, zero, 1b + addi a4, a4, -1 +#if (__riscv_xlen == 32) + lw t3, 72(sp) +#elif (__riscv_xlen == 64) + ld t3, 80(sp) +#endif + add a2, a2, a3 + add a0, a0, a1 + li t2, 0 + bgt a4, zero, 1b + + lx s1, 0(sp) + lx s2, 8(sp) + lx s3, 16(sp) + lx s4, 24(sp) + lx s5, 32(sp) + lx s6, 40(sp) + lx s7, 48(sp) + lx s8, 56(sp) + addi sp, sp, 64 + ret +endfunc + +func ff_hevc_put_qpel_bi_h_8_\lmul\()_rvv, zve32x + csrwi vxrm, 0 + lw t3, 0(sp) # width + addi sp, sp, -64 + sx s1, 0(sp) + sx s2, 8(sp) + sx s3, 16(sp) + sx s4, 24(sp) + sx s5, 32(sp) + sx s6, 40(sp) + sx s7, 48(sp) + sx s8, 56(sp) + load_filter a6 + li t1, 0 # offset + +1: + vsetvli t6, t3, e16, \lmul2, ta, ma + slli t2, t1, 1 + add t2, a4, t2 + vle16.v v12, (t2) + vsetvli zero, zero, e8, \lmul, ta, ma + add t2, a2, t1 + filter_h v0, v16, v18, v20, v22, v24, v26, v28, v30, t2 + vsetvli zero, zero, e16, \lmul2, ta, ma + vsadd.vv v0, v0, v12 + vmax.vx v0, v0, zero + vsetvli zero, zero, e8, \lmul, ta, ma + vnclipu.wi v0, v0, 7 + add t2, a0, t1 + vse8.v v0, (t2) + sub t3, t3, t6 + add t1, t1, t6 + bgt t3, zero, 1b + addi a5, a5, -1 + lw t3, 64(sp) + add a2, a2, a3 + add a0, a0, a1 + addi a4, a4, 2*HEVC_MAX_PB_SIZE + li t1, 0 + bgt a5, zero, 1b + + lx s1, 0(sp) + lx s2, 8(sp) + lx s3, 16(sp) + lx s4, 24(sp) + lx s5, 32(sp) + lx s6, 40(sp) + lx s7, 48(sp) + lx s8, 56(sp) + addi sp, sp, 64 + ret +endfunc +.endm + +hevc_qpel_h m1, m2, m4 \ No newline at end of file diff --git a/libavcodec/riscv/hevcdsp_init.c b/libavcodec/riscv/hevcdsp_init.c index 70bc8ebea7..59333740de 100644 --- a/libavcodec/riscv/hevcdsp_init.c +++ b/libavcodec/riscv/hevcdsp_init.c @@ -1,5 +1,6 @@ /* * Copyright (c) 2024 Institute of Software Chinese Academy of Sciences (ISCAS). + * Copyright (C) 2026 Alibaba Group Holding Limited. * * This file is part of FFmpeg. * @@ -34,30 +35,56 @@ member[7][v][h] = ff_h2656_put_pixels_##8_##ext; \ member[9][v][h] = ff_h2656_put_pixels_##8_##ext; +#define RVV_FNASSIGN_PEL(member, v, h, fn) \ + member[1][v][h] = fn; \ + member[2][v][h] = fn; \ + member[3][v][h] = fn; \ + member[4][v][h] = fn; \ + member[5][v][h] = fn; \ + member[6][v][h] = fn; \ + member[7][v][h] = fn; \ + member[8][v][h] = fn; \ + member[9][v][h] = fn; + void ff_hevc_dsp_init_riscv(HEVCDSPContext *c, const int bit_depth) { #if HAVE_RVV const int flags = av_get_cpu_flags(); int vlenb; - if (!(flags & AV_CPU_FLAG_RVV_I32) || !(flags & AV_CPU_FLAG_RVB)) - return; + if ((flags & AV_CPU_FLAG_RVV_I32) && (flags & AV_CPU_FLAG_RVB)) { + vlenb = ff_get_rv_vlenb(); + if (vlenb >= 32) { + switch (bit_depth) { + case 8: + RVV_FNASSIGN(c->put_hevc_qpel, 0, 0, pel_pixels, rvv_256); + RVV_FNASSIGN(c->put_hevc_epel, 0, 0, pel_pixels, rvv_256); - vlenb = ff_get_rv_vlenb(); - if (vlenb >= 32) { - switch (bit_depth) { - case 8: - RVV_FNASSIGN(c->put_hevc_qpel, 0, 0, pel_pixels, rvv_256); - RVV_FNASSIGN(c->put_hevc_epel, 0, 0, pel_pixels, rvv_256); - break; - default: - break; + break; + default: + break; + } + } else if (vlenb >= 16) { + switch (bit_depth) { + case 8: + RVV_FNASSIGN(c->put_hevc_qpel, 0, 0, pel_pixels, rvv_128); + RVV_FNASSIGN(c->put_hevc_epel, 0, 0, pel_pixels, rvv_128); + + break; + default: + break; + } } - } else if (vlenb >= 16) { + } + + if ((flags & AV_CPU_FLAG_RVV_I32)) { switch (bit_depth) { case 8: - RVV_FNASSIGN(c->put_hevc_qpel, 0, 0, pel_pixels, rvv_128); - RVV_FNASSIGN(c->put_hevc_epel, 0, 0, pel_pixels, rvv_128); + RVV_FNASSIGN_PEL(c->put_hevc_qpel, 0, 1, ff_hevc_put_qpel_h_8_m1_rvv); + RVV_FNASSIGN_PEL(c->put_hevc_qpel_uni, 0, 1, ff_hevc_put_qpel_uni_h_8_m1_rvv); + RVV_FNASSIGN_PEL(c->put_hevc_qpel_uni_w, 0, 1, ff_hevc_put_qpel_uni_w_h_8_m1_rvv); + RVV_FNASSIGN_PEL(c->put_hevc_qpel_bi, 0, 1, ff_hevc_put_qpel_bi_h_8_m1_rvv); + break; default: break; -- 2.25.1 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org