From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 8E9844AA84 for ; Tue, 20 May 2025 19:46:47 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 123EE68DBC7; Tue, 20 May 2025 22:46:45 +0300 (EEST) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 515AB68DBB4 for ; Tue, 20 May 2025 22:46:41 +0300 (EEST) Received: by mail.gandi.net (Postfix) with ESMTPSA id 28BF51FD4B for ; Tue, 20 May 2025 19:46:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=niedermayer.cc; s=gm1; t=1747770401; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=7pTgMjN6gpBTP6AaV6V4CgDnhSqWRjIRRpQd1BWKgtQ=; b=StqRyIMsxeTc64VJy67Lq5fN0jKNgl0ZVLoMV1MPLBndDpuKUUw0p/R2SquIBtbC5oI1Ux ed5vvdvzGqqjxRjGdDA55xZznWa1Nylvy3fSQQI1PakJX9rUqPX3IF/oK5SekVTCO63x2R Ug3UQB5FCbUgpa0DM2VUrXMCFJKfXkE7mW0yA6W2dwiCTKKFCv8DCvIwNXpAObNyoNDQJz OF0+dpJPrDfcNtbd77gl6rQkztMSRLe5+xuc5XfU9UVG+t2N7fd8gTSY9skkw6kYJ+kHFp q7/qYuZXj/Y9yPdfj41A9cyWlXtcRYYHHl+TCre8pNQv5PCf2sbEdZpdTZqt6g== Date: Tue, 20 May 2025 21:46:40 +0200 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20250520194640.GI29660@pb2> References: MIME-Version: 1.0 In-Reply-To: X-GND-State: clean X-GND-Score: -85 X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtddtgddutdekucdltddurdegfedvrddttddmucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuifetpfffkfdpucggtfgfnhhsuhgsshgtrhhisggvnecuuegrihhlohhuthemuceftddunecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenfghrlhcuvffnffculdduhedmnecujfgurhepfffhvffukfhfgggtuggjsehgtderredttddvnecuhfhrohhmpefoihgthhgrvghlucfpihgvuggvrhhmrgihvghruceomhhitghhrggvlhesnhhivgguvghrmhgrhigvrhdrtggtqeenucggtffrrghtthgvrhhnpeekudehtdegjeejfeelheeivdeufeefieekheetueegueekgfffhfevkeetudetteenucffohhmrghinhepihhnphhuthdrshgsnecukfhppeeguddrieeirdeijedruddufeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepihhnvghtpeeguddrieeirdeijedruddufedphhgvlhhopehlohgtrghlhhhoshhtpdhmrghilhhfrhhomhepmhhitghhrggvlhesnhhivgguvghrmhgrhigvrhdrtggtpdhnsggprhgtphhtthhopedupdhrtghpthhtohepfhhfmhhpvghgqdguvghvvghlsehffhhmphgvghdrohhrgh X-GND-Sasl: michael@niedermayer.cc Subject: Re: [FFmpeg-devel] [PATCH] swscale: rgb_to_yuv neon optimizations X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: multipart/mixed; boundary="===============7490750536632282787==" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --===============7490750536632282787== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="OClFPYwZVEsRLy/w" Content-Disposition: inline --OClFPYwZVEsRLy/w Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, May 19, 2025 at 09:50:18PM +0200, Dmitriy Kovalenko wrote: > I've found quite a few ways to optimize existing ffmpeg's rgb to yuv > subsampled conversion. In this patch stack I'll try to > improve the performance. >=20 > This particular set of changes is a small improvement to all the > existing functions and macro. The biggest performance gain is > coming from post loading increment of the pointer and immediate > pref etching of the memory blocks and interleaving the multiplication > shifting operations of > different registers for better scheduling. >=20 > Also changed a bunch of places where cmp + b.le was used instead > of one instruction cbnz/tbnz and some other small cleanups. >=20 > Here are checkasm results on the macbook pro with the latest M4 max >=20 > >=20 > bgra_to_uv_1080_c: 257.5 ( 1.00x) > bgra_to_uv_1080_neon: 211.9 ( 1.22x) > bgra_to_uv_1920_c: 467.1 ( 1.00x) > bgra_to_uv_1920_neon: 379.3 ( 1.23x) > bgra_to_uv_half_1080_c: 198.9 ( 1.00x) > bgra_to_uv_half_1080_neon: 125.7 ( 1.58x) > bgra_to_uv_half_1920_c: 346.3 ( 1.00x) > bgra_to_uv_half_1920_neon: 223.7 ( 1.55x) >=20 > >=20 > bgra_to_uv_1080_c: 268.3 ( 1.00x) > bgra_to_uv_1080_neon: 176.0 ( 1.53x) > bgra_to_uv_1920_c: 456.6 ( 1.00x) > bgra_to_uv_1920_neon: 307.7 ( 1.48x) > bgra_to_uv_half_1080_c: 193.2 ( 1.00x) > bgra_to_uv_half_1080_neon: 96.8 ( 2.00x) > bgra_to_uv_half_1920_c: 347.2 ( 1.00x) > bgra_to_uv_half_1920_neon: 182.6 ( 1.92x) >=20 > With my proprietary test on IOS it gives around 70% of performance > improvement converting bgra 1920x1920 image to yuv420p >=20 > On my linux arm cortex-r processing the performance improvement not that > visible but still consistently faster by 5-10% than the current > implementation. >=20 > Signed-off-by: Dmitriy Kovalenko > --- > libswscale/aarch64/input.S | 166 +++++++++++++++++++++++++------------ > 1 file changed, 112 insertions(+), 54 deletions(-) >=20 > diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S > index c1c0adffc8..ee8eb24c14 100644 > --- a/libswscale/aarch64/input.S > +++ b/libswscale/aarch64/input.S > @@ -1,5 +1,4 @@ > -/* > - * Copyright (c) 2024 Zhao Zhili > +/* Copyright (c) 2024 Zhao Zhili > * > * This file is part of FFmpeg. > * > @@ -57,20 +56,41 @@ > sqshrn2 \dst\().8h, \dst2\().4s, \right_shift // > dst_higher_half =3D dst2 >> right_shift > .endm > +// interleaved product version of the rgb to yuv gives slightly better > performance on non-performant mobile +.macro rgb_to_uv_interleaved_product > r, g, b, u_coef0, u_coef1, u_coef2, v_coef0, v_coef1, v_coef2, u_dst1, > u_dst2, v_dst1, v_dst2, u_dst, v_dst, right_shift error: corrupt patch at line 58 please make sure your line/word wrap settings dont damage patches [...] --=20 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB During times of universal deceit, telling the truth becomes a revolutionary act. -- George Orwell --OClFPYwZVEsRLy/w Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EABEKAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCaCzcGQAKCRBhHseHBAsP q9K2AJ4k7Xu4rpcGg2Xui4gNH0JiBYZapACeLMNWp1upSRnSBp7c8TtBVnBFfcE= =gZuz -----END PGP SIGNATURE----- --OClFPYwZVEsRLy/w-- --===============7490750536632282787== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --===============7490750536632282787==--