From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id E7EE04D1C5 for ; Sun, 14 Dec 2025 16:00:54 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'3AtIiGCg+yW+fPISvI5HACyCwpZSlxTp79SfKGDSlbU=', expected b'vtvn1mtW705Nt6mmgci2p2puY1g1pQWP/RyZUSPEV9I=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1765728035; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=3AtIiGCg+yW+fPISvI5HACyCwpZSlxTp79SfKGDSlbU=; b=iJTU89vVqN9+kQTQ5AT2MB+Lh8DgX9fKX1aSFKJ3TpUdKQmHagMTaPn/F/86vRTi82u2n kMPoqvLPX3j2L/RGBDcOIGiMaRKStKPB3kEPcHjvikRYuxe2+lFcXBJIaSa7XC80m2kYQZ+ f0QIR1oKcg0S2+AEekIwojQzL+fopNEt0lrNJvOujkujiYb/O9irkswGLpmRtNmbcYdGUIv 9He2dzm/7bFjzLFPBNS2hwIe730pyWQiduIuN6zJ0FuD18DY6ZDVbhAHtQYkPAs402loa1o DJ7QlGSMu+rcyteu3aywFGWMYxqL0b1By1jKx6L4I+Jsd+JAdiHJASrqNc5Q== Received: from [172.20.0.2] (unknown [172.19.0.4]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id C655B690807; Sun, 14 Dec 2025 18:00:35 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1765728022; b=Gn0YQsFqR2e7UNFDsPDJv7tDWdpyqq1x/ZSL8waiFNknyhJReMxzYKEw08iPp6dNRA0Gp zUxm5Aohw3gX0ilzNLR54qS4k/DzI1hrp7yfyCjZ7AT3xZLQ+wFKlWcaYuGITmiqQ2bUjfq CuCAEv/qBF0tX0MlU8pNc3Sz4Qva3U+f95vU5N3Sq/7Ti9X3BGoOxkKK4oLB0Wd4Wi2sqTd tGOmEFCUOxk0kPEd/Cf3P8qW0pg7Z6mEFu10m846m6Yv+sosoIeHpylaM7PWAbpdPglTCB+ 7ScMpxi3yXYJ896YTYi19r8K0vbI74n9dRfE++Y8/jKSCYNCBYPtCUWLOiZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1765728022; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=9oPJFbaeGlgcmH+m7CdMQAUUXoxBlw85ZVqMhdGVM4c=; b=rUDODl0oAe2BCqhavXgVjS0HVJUZcTWx6jgKdaQ5KU0Ikr8C1A369MqU7eUI4bN+VmcOJ CY6PXjRcbSOmYrwDNME0/sy+QD6syJ6pV36i2oFggui77KXppMHV32USv0zm/PGnSrwYghu 4q6IMOrWWwpEngXQladnISPbFRb1r0hrsKxWD9ffQqd2vXebXEztiJyGq4vsBIP0pGryDBl c/Ok+Dx2Wm3GuwB9ni5s2m/GmevExIKgj5xKL9LtdkI9Ta5ZwIfLr8WU3D89VtOnXaSkZOS 2a5+1PhKacSZ5tbZnrjtovci4Fii0cwx08669JIbJbjOJqrSeO1XV34jpSvg== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1765728015; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=vtvn1mtW705Nt6mmgci2p2puY1g1pQWP/RyZUSPEV9I=; b=vGHtFCQ3hmIkv1uBddRolU7V8ADpaChki0KCEhrlg9qTS+PXGFh1xhzSUpwZuar+XxjGI mCfuDsTnhrA7rr9J8RIm8uKMEqNjlWyE90JqQEzKV722X6n6ywkb4W52xnTAulN2kB3tIrh MaxnGaNIsh7phUQi+grJOj/trkvszB/4EO8uE4saPDBz9Ab+rL1W7E918HYBY2K3qR+LlhM SMceS9ikW201uxMd8WwHMYF66Zjd3P5i6qJLFK5grNW5IpijImo4N1+6Hwdgg9bAOFBaDef Ivs2Bb+Xk5ZjdiPa2ZFYBBz724P4VyIUiszNn+tr7vFx06aPttQZLx+7hfHg== Received: from 55ca25703178 (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 7232B690270 for ; Sun, 14 Dec 2025 18:00:15 +0200 (EET) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Sun, 14 Dec 2025 16:00:15 -0000 Message-ID: <176572801559.60.8407290883988227996@2cb04c0e5124> Message-ID-Hash: FDOPLZDTGGNS4QY2RCX7KNB62HWHJU7O X-Message-ID-Hash: FDOPLZDTGGNS4QY2RCX7KNB62HWHJU7O X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH] aarch64/vvc: SME optimisations of put_luma_h(64x64,128x128) functions for 8-bit (PR #21194) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: "george.zaguri via ffmpeg-devel" Cc: "george.zaguri" Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #21194 opened by george.zaguri URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21194 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21194.patch Apple M4: put_luma_h_8_64x64_c: 644.5 ( 1.00x) put_luma_h_8_64x64_neon: 520.3 ( 1.24x) put_luma_h_8_64x64_i8mm: 440.9 ( 1.46x) put_luma_h_8_64x64_sme: 405.7 ( 1.59x) put_luma_h_8_128x128_c: 2340.3 ( 1.00x) put_luma_h_8_128x128_neon: 2078.7 ( 1.13x) put_luma_h_8_128x128_i8mm: 1711.9 ( 1.37x) put_luma_h_8_128x128_sme: 1604.5 ( 1.46x) >>From 151199038279cbe8b7100ce2c41a73791f71bd45 Mon Sep 17 00:00:00 2001 From: Georgii Zagoruiko Date: Sun, 14 Dec 2025 15:58:39 +0000 Subject: [PATCH] aarch64/vvc: SME optimisations of put_luma_h(64x64,128x128) functions for 8-bit Apple M4: put_luma_h_8_64x64_c: 644.5 ( 1.00x) put_luma_h_8_64x64_neon: 520.3 ( 1.24x) put_luma_h_8_64x64_i8mm: 440.9 ( 1.46x) put_luma_h_8_64x64_sme: 405.7 ( 1.59x) put_luma_h_8_128x128_c: 2340.3 ( 1.00x) put_luma_h_8_128x128_neon: 2078.7 ( 1.13x) put_luma_h_8_128x128_i8mm: 1711.9 ( 1.37x) put_luma_h_8_128x128_sme: 1604.5 ( 1.46x) --- libavcodec/aarch64/vvc/Makefile | 1 + libavcodec/aarch64/vvc/dsp_init.c | 6 ++ libavcodec/aarch64/vvc/inter_sme.S | 132 +++++++++++++++++++++++++++++ 3 files changed, 139 insertions(+) create mode 100644 libavcodec/aarch64/vvc/inter_sme.S diff --git a/libavcodec/aarch64/vvc/Makefile b/libavcodec/aarch64/vvc/Makefile index ed80338969..56282478a7 100644 --- a/libavcodec/aarch64/vvc/Makefile +++ b/libavcodec/aarch64/vvc/Makefile @@ -8,3 +8,4 @@ NEON-OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/alf.o \ aarch64/h26x/epel_neon.o \ aarch64/h26x/qpel_neon.o \ aarch64/h26x/sao_neon.o +SME-OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/inter_sme.o diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index aa75d22b78..d86e431215 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -42,6 +42,8 @@ void ff_vvc_put_luma_h16_12_neon(int16_t *dst, const uint8_t *_src, const ptrdif const int height, const int8_t *hf, const int8_t *vf, const int width); void ff_vvc_put_luma_h_x16_12_neon(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_stride, const int height, const int8_t *hf, const int8_t *vf, const int width); +void ff_vvc_put_luma_h_8_sme(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_stride, + const int height, const int8_t *hf, const int8_t *vf, const int width); void ff_alf_classify_sum_neon(int *sum0, int *sum1, int16_t *grad, uint32_t gshift, uint32_t steps); @@ -251,6 +253,10 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[1][5][1][1] = ff_vvc_put_epel_hv64_8_neon_i8mm; c->inter.put[1][6][1][1] = ff_vvc_put_epel_hv128_8_neon_i8mm; } + if (have_sme(cpu_flags)) { + c->inter.put[0][5][0][1] = + c->inter.put[0][6][0][1] = ff_vvc_put_luma_h_8_sme; + } } else if (bd == 10) { c->inter.avg = ff_vvc_avg_10_neon; c->inter.w_avg = vvc_w_avg_10; diff --git a/libavcodec/aarch64/vvc/inter_sme.S b/libavcodec/aarch64/vvc/inter_sme.S new file mode 100644 index 0000000000..d3592518cb --- /dev/null +++ b/libavcodec/aarch64/vvc/inter_sme.S @@ -0,0 +1,132 @@ +/* + * Copyright (c) 2025 Georgii Zagoruiko + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/aarch64/asm.S" + +#define VVC_MAX_PB_SIZE 128 + +#if HAVE_SME +ENABLE_SME + +function ff_vvc_put_luma_h_8_sme, export=1 + // dst .req x0 + // _src .req x1 + // _src_stride .req x2 + // height .req w3 + // hf .req x4 + // vf .req x5 + // width .req w6 + smstart + cntb x8 + mov x9, #(VVC_MAX_PB_SIZE * 2) + mov w13, #0 + mov w14, #1 + mov w15, #2 + mov w16, #3 + ptrue p0.b, VL8 + ptrue p1.s + ld1b z30.b, p0/Z, [x4] + eor z0.b, z0.b, z0.b + mov z31.d, z30.d + sub x1, x1, #3 + ext z31.b, z31.b, z0.b, #4 +.Loop_H: + cmp w6, w8 + csel w11, w6, w8, ls + whilelo p0.b, xzr, x6 + mov w10, w3 + asr w12, w11, #1 + whilelo p2.h, xzr, x12 +.Loop_W: + ld1b z0.b, p0/z, [x1] + ld1b z1.b, p0/z, [x1, x14] + ld1b z2.b, p0/z, [x1, x15] + ld1b z3.b, p0/z, [x1, x16] + add x1, x1, #4 + ld1b z4.b, p0/z, [x1] + ld1b z5.b, p0/z, [x1, x14] + ld1b z6.b, p0/z, [x1, x15] + ld1b z7.b, p0/z, [x1, x16] + sub x1, x1, #4 + usmopa za0.s, p0/m, p0/m, z0.b, z30.b + usmopa za1.s, p0/m, p0/m, z1.b, z30.b + usmopa za2.s, p0/m, p0/m, z2.b, z30.b + usmopa za3.s, p0/m, p0/m, z3.b, z30.b + usmopa za0.s, p0/m, p0/m, z4.b, z31.b + usmopa za1.s, p0/m, p0/m, z5.b, z31.b + usmopa za2.s, p0/m, p0/m, z6.b, z31.b + usmopa za3.s, p0/m, p0/m, z7.b, z31.b + mova z22.s, p1/m, za0v.s[w13, 0] + mova z24.s, p1/m, za1v.s[w13, 0] + mova z26.s, p1/m, za2v.s[w13, 0] + mova z28.s, p1/m, za3v.s[w13, 0] + add x1, x1, x2 + zero {za} + ld1b z0.b, p0/z, [x1] + ld1b z1.b, p0/z, [x1, x14] + ld1b z2.b, p0/z, [x1, x15] + ld1b z3.b, p0/z, [x1, x16] + add x1, x1, #4 + ld1b z4.b, p0/z, [x1] + ld1b z5.b, p0/z, [x1, x14] + ld1b z6.b, p0/z, [x1, x15] + ld1b z7.b, p0/z, [x1, x16] + sub x1, x1, #4 + sqxtnb z21.h, z22.s + sqxtnb z22.h, z24.s + sqxtnt z21.h, z26.s + sqxtnt z22.h, z28.s + st2h {z21.h-z22.h}, p2, [x0] + add x1, x1, x2 + add x0, x0, x9 + + usmopa za0.s, p0/m, p0/m, z0.b, z30.b + usmopa za1.s, p0/m, p0/m, z1.b, z30.b + usmopa za2.s, p0/m, p0/m, z2.b, z30.b + usmopa za3.s, p0/m, p0/m, z3.b, z30.b + usmopa za0.s, p0/m, p0/m, z4.b, z31.b + usmopa za1.s, p0/m, p0/m, z5.b, z31.b + usmopa za2.s, p0/m, p0/m, z6.b, z31.b + usmopa za3.s, p0/m, p0/m, z7.b, z31.b + mova z22.s, p1/m, za0v.s[w13, 0] + mova z24.s, p1/m, za1v.s[w13, 0] + mova z26.s, p1/m, za2v.s[w13, 0] + mova z28.s, p1/m, za3v.s[w13, 0] + sqxtnb z21.h, z22.s + sqxtnb z22.h, z24.s + sqxtnt z21.h, z26.s + sqxtnt z22.h, z28.s + zero {za} + st2h {z21.h-z22.h}, p2, [x0] + subs w10, w10, #2 + add x0, x0, x9 + b.gt .Loop_W + msub x0, x3, x9, x0 + msub x1, x3, x2, x1 + add x0, x0, x11, lsl #1 + subs w6, w6, w11 + add x1, x1, x11 + b.gt .Loop_H + smstop + ret +endfunc + +DISABLE_SME +#endif -- 2.49.1 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org