From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 1BEA04CCD1 for ; Wed, 12 Feb 2025 12:02:18 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C223768BFFD; Wed, 12 Feb 2025 14:02:13 +0200 (EET) Received: from mail-lj1-f175.google.com (mail-lj1-f175.google.com [209.85.208.175]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CEC0D68BD7D for ; Wed, 12 Feb 2025 14:02:06 +0200 (EET) Received: by mail-lj1-f175.google.com with SMTP id 38308e7fff4ca-308edbc368cso33243521fa.1 for ; Wed, 12 Feb 2025 04:02:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1739361726; x=1739966526; darn=ffmpeg.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=6mOaXgyU1EqslUr3/u8Ml77dn9BDudK6lFhCAWYmG7M=; b=e1miDQJDzY+frU/XrFtZQjgdUJ5zCuiJqFH0zjXk9u4Tin9xeyZ4m1MU80CHUJ1lLV 5oak67XYoSV6yY88w011R8ojjFDjlJhi2aPFzt9PR3rKGpOixZpExiJq4hZOfgmONmzG 32gYFc57Vp3Cd2NFj0v2MeecTCyAKYtrknlWI0F+eLpwh1RafmPNwWFCxEM3YfhJOukH 4G6PPee9bpYhhV6M172U4I/MYQBahzqcT4XgtQN4ikDfe5hjxhCFT/8LAG6ZO3Qu4yBr Ch2O3vLOtCWzbkTc3IJ7ZIkpvQ2OwfjEw9GqG8LkmwLwTV8X+nGrhn4JPMPdrP9blrpv 9EQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739361726; x=1739966526; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6mOaXgyU1EqslUr3/u8Ml77dn9BDudK6lFhCAWYmG7M=; b=vPXKbE192Dj1N7/DuRvCbQItW9ZrAykMeTXRkKlTtoka9BeY0wUOnv3Pf/5l8Gb0Oq BZZwD5SpHe5UDINYPF5Lg+yr8f0zp9oh7si1OXHHWuunyjELPTkKsnqIUuxrM6Df2p44 t9bruT3HBvrsi1WfFzuQtz+P93EVll01Exms2Ii3V5s67WGlhZ5r7jWTThvbxVSRPVTy fFslNd3TJyPEyJFmYoGiOMzxw02Qs8dMevLx8mLRLcPJHjOUCm1ss2LFb3MLK5ecaFdG iySZDUfuxrm5EKxu1DNASVOqo0t9w1vHozxi7uPpftAiINS5smuRYzwR5fzvYoT/xHaG NxvA== X-Gm-Message-State: AOJu0YxWV0APT3iFa2aIjXcZgPWISYj4fh0jDk9sS+rAdsYxdE+3zcFl fIg0JzYwfO0TQMcQj52jk+Xk1mnTycfSEHRuebJfp32xnPPabpRTCzszg4zWkHQkeByZqLDoPQ/ 5yw== X-Gm-Gg: ASbGnctufnSuk1NzZfx3T68RTPcDH/POkSS0bbNQVI35VIGkHwSUuMFrfcTzfLX6YSk 4guc7oNjyqoxKX5yD5nCkZ8lDvGA8ZDA2wTcOFNwuDTRc0vdhGCPAjDOeWlD/LiJoN0AOCnaX2j QV6wwW/gcXw7nES9z8TvodoxiNWb2btzCKNWWQ2oVhZbuRntksuFx9QLp9keyQrq3D2hsZ4wQcK 1TYL/rB730OUTMavhn2ERbNKRpdNoQc5qyeLCareigTM/K7/u2YncdfOg6Xe1dXh8AFCFPbBHzV 3KVBtZyMKGNRYR2dvMY1SvULYTw7Arp5XRe1fnCa/79NDztmiqVfD8P9AfzvzX6CyTkSMIe1ZAb A1hJk5pbPibo= X-Google-Smtp-Source: AGHT+IFfOiJv5a1A4tacV1EgdbBWLDnvV4yvOHGMH/nDA7BbOMuZN3kOa9U6PM0ZohOkNnAK1ZmvHQ== X-Received: by 2002:a05:6512:3b87:b0:544:ee5:87b0 with SMTP id 2adb3069b0e04-5451826e9bcmr1207681e87.3.1739361725459; Wed, 12 Feb 2025 04:02:05 -0800 (PST) Received: from tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net (tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net. [2001:470:27:11::2]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-545049343adsm1330722e87.166.2025.02.12.04.02.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Feb 2025 04:02:05 -0800 (PST) Date: Wed, 12 Feb 2025 14:02:00 +0200 (EET) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: Krzysztof Pyrkosz via ffmpeg-devel In-Reply-To: <20250211220642.116850-4-ffmpeg@szaka.eu> Message-ID: <60217252-6f0-5563-bab4-4410e6cdab9@martin.st> References: <20250211220642.116850-4-ffmpeg@szaka.eu> MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH 2/2] swscale/aarch64/rgb2rgb_neon: Implemented {yuyv, uyvy}toyuv{420, 422} X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Krzysztof Pyrkosz Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Tue, 11 Feb 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote: > --- > libswscale/aarch64/rgb2rgb.c | 16 ++ > libswscale/aarch64/rgb2rgb_neon.S | 262 ++++++++++++++++++++++++++++++ > 2 files changed, 278 insertions(+) > > diff --git a/libswscale/aarch64/rgb2rgb.c b/libswscale/aarch64/rgb2rgb.c > index 7e1dba572d..f474228298 100644 > --- a/libswscale/aarch64/rgb2rgb.c > +++ b/libswscale/aarch64/rgb2rgb.c > @@ -67,6 +67,18 @@ void ff_shuffle_bytes_2013_neon(const uint8_t *src, uint8_t *dst, int src_size); > void ff_shuffle_bytes_2130_neon(const uint8_t *src, uint8_t *dst, int src_size); > void ff_shuffle_bytes_1203_neon(const uint8_t *src, uint8_t *dst, int src_size); > > +void ff_uyvytoyuv422_neon(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, > + const uint8_t *src, int width, int height, > + int lumStride, int chromStride, int srcStride); > +void ff_uyvytoyuv420_neon(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, > + const uint8_t *src, int width, int height, > + int lumStride, int chromStride, int srcStride); > +void ff_yuyvtoyuv420_neon(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, > + const uint8_t *src, int width, int height, > + int lumStride, int chromStride, int srcStride); > +void ff_yuyvtoyuv422_neon(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, > + const uint8_t *src, int width, int height, > + int lumStride, int chromStride, int srcStride); > av_cold void rgb2rgb_init_aarch64(void) > { > int cpu_flags = av_get_cpu_flags(); > @@ -84,5 +96,9 @@ av_cold void rgb2rgb_init_aarch64(void) > shuffle_bytes_2013 = ff_shuffle_bytes_2013_neon; > shuffle_bytes_2130 = ff_shuffle_bytes_2130_neon; > shuffle_bytes_1203 = ff_shuffle_bytes_1203_neon; > + uyvytoyuv422 = ff_uyvytoyuv422_neon; > + uyvytoyuv420 = ff_uyvytoyuv420_neon; > + yuyvtoyuv422 = ff_yuyvtoyuv422_neon; > + yuyvtoyuv420 = ff_yuyvtoyuv420_neon; > } > } > diff --git a/libswscale/aarch64/rgb2rgb_neon.S b/libswscale/aarch64/rgb2rgb_neon.S > index 22ecdf7ac8..9002aa028f 100644 > --- a/libswscale/aarch64/rgb2rgb_neon.S > +++ b/libswscale/aarch64/rgb2rgb_neon.S > @@ -427,3 +427,265 @@ neon_shuf 2013 > neon_shuf 1203 > neon_shuf 2130 > neon_shuf 3210 > + > +/* > +v0-v7 - two consecutive lines > +x0 - upper Y destination > +x1 - U destination > +x2 - V destination > +x3 - upper src line > +w5 - width/iteration counter - count of line pairs for yuv420, of single lines for 422 > +x6 - lum padding > +x7 - chrom padding > +x8 - src padding > +w9 - number of bytes remaining in the tail > +x10 - lower Y destination > +w12 - tmp > +x13 - lower src line > +w14 - tmp > +w17 - set to 1 if last line has to be handled separately (odd height) > +*/ > + > +// one fast path iteration processes 16 uyvy tuples > +// is_line_tail is set to 1 when final 16 tuples are being processed > +// skip_storing_chroma is set to 1 when final line is processed and the height is odd > +.macro fastpath_iteration src_fmt, dst_fmt, is_line_tail, skip_storing_chroma > + ld4 {v0.16b - v3.16b}, [x3], #64 > +.if ! \is_line_tail > + subs w14, w14, #32 > +.endif > + > +.if ! \skip_storing_chroma > +.if \dst_fmt == yuv420 This doesn't work as you want it to across all supported tools; .if conditionals are meant for pure numerical comparisons, and yuv420 isn't a numerical constant. In practice it does seem to work with binutils though, but not with Clang (or with gas-preprocessor). You can use .ifc for string comparisons, see https://sourceware.org/binutils/docs/as/If.html for more references. Also see https://github.com/mstorsjo/FFmpeg/actions/runs/13282469154/job/37083639139 for the fallout from trying to build this patch with various tool setups. Please do consider trying the aarch64 assembly testset from https://github.com/mstorsjo/FFmpeg/commits/gha-aarch64 on your commits. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".