From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 6E1954CFB6 for ; Fri, 30 May 2025 07:33:38 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 5FE0868D745; Fri, 30 May 2025 10:33:36 +0300 (EEST) Received: from mail-lj1-f169.google.com (mail-lj1-f169.google.com [209.85.208.169]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 9346168D4E7 for ; Fri, 30 May 2025 10:33:29 +0300 (EEST) Received: by mail-lj1-f169.google.com with SMTP id 38308e7fff4ca-3105ef2a071so20135111fa.1 for ; Fri, 30 May 2025 00:33:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1748590409; x=1749195209; darn=ffmpeg.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=xilLpymoQwWvtHcgK5pHZJg2umf08+QqJph0IvfqO1A=; b=Oq8uogx6KZovH9I6nBVJo30y21pZXGpV2sqCgwWsupnHb1u0kX9uVHhEkU7O0u3CAd V+a4xnzwpc2m88vlnuLyCyihMMbtuS4Pu8PbPqutEWbKGYE2NtnRPUVJmuRzvla53SSp NEwlLaUFJbvkM8iu9vzUQkvRpME1WyPVeMkRujRf73qFdZ/uVm7Jys6aB3fulYeqaDNl RHmr7ke5PlatbbOQdZSppnn+Qka3rLcdwS6QWl8jPntFBl9mKvBJn1t9Jda+fHfKHwQw WpUh3nFvp9pWJ2GcfWKjodp5DJBwpiQdzzXeu8/1zxgpvgBglMJo9b6Y3tkecy6jyVMv myLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748590409; x=1749195209; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xilLpymoQwWvtHcgK5pHZJg2umf08+QqJph0IvfqO1A=; b=G8EEmNU7keqhdkU7eNfydAY0FKTNss0HtkcCLkNx9AL3QeFEuqt0nXy40EXYt0uPKf GqaKARTOsr3/f1ZL+POr1rO2uwhOcltfenimB8SSK1zrd3g3J9sgTJtK1FuaJVHyODZz M8nIYf0FGfhS+KgxAkQBBUBr9jW5XLirXQb+BV8FOykP2wXIpHZBJf/4iOINq2/0JZs0 2jzmcwj3YqZ01o/lQmErs8gyV72q2IQTHNnx+oKGcEBoeUOB72fFSEpiJkNoidg+/P2m DGeLE1vMoUr1o25mNvHIHSQJRS8gXzkp26mdxkriNYUqPdBFiCVTugH44VjXBna8w+8B diTQ== X-Gm-Message-State: AOJu0Yzjz7o8snpIsSaxu2U8D1d67EtjWDSdEDJAvLp3vYecCxHhv2Am DaHwjRF1zM0FsA+igmDPg8E3rqVaN1O9FipdgWXnemjKwBJjWRPc3mqBh342fGYMfVB44re4kOr p2XRyLQ== X-Gm-Gg: ASbGnctfX9bnByvNnwa//84PijEZLyhuySLMgNBmnZ9aVupViylOBnlmoQZW90aFj6Y 7VL814qRcHDD+dcRu7KhjiQnmRlJERlUBnrwfk3ZxJNMNWnfg0hCn/xSAJnGITxkJoqtDKvHqVe JjS/C1KewK/OBdHa2+RjsPJq92gEecKycdgO8164wgxkzBxrkYlxKEGXYvQeyDyLdTf6LuN6oyz 6wF+BUs2xYMZkml6XMzLujh0CnqOowflImI1xyjzmmeNAxIhb3bsGu47PlGgkMhIPa3OLmXve1L gNAXSZxaSyfezCEQivaikdo2GMjtvVU5LBnxhBy/Py8IfajEFjqcrSVBF9zxI1XWB0aBgpbGIMw 5IB400TSJUQhgXeBPCo3FbBJmbuQ/qolOQCNV1dmuy++Psr0= X-Google-Smtp-Source: AGHT+IEA/FlIGbXuIWHTI0oKeYvd4S0/COjEZBsyfOfYsDiSZ38xDUPnHDKSGmXA6R3yIFhkIXPfLg== X-Received: by 2002:a05:651c:30c6:b0:32a:8101:bc00 with SMTP id 38308e7fff4ca-32a906cb2f0mr3224591fa.9.1748590408650; Fri, 30 May 2025 00:33:28 -0700 (PDT) Received: from tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net (tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net. [2001:470:27:11::2]) by smtp.gmail.com with ESMTPSA id 38308e7fff4ca-32a85bd29d7sm5155621fa.99.2025.05.30.00.33.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 May 2025 00:33:28 -0700 (PDT) Date: Fri, 30 May 2025 10:33:27 +0300 (EEST) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: FFmpeg development discussions and patches In-Reply-To: Message-ID: <7ce22e41-56d9-6d6e-2e58-f1ea278e5648@martin.st> References: MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH v3 1/2] swscale: rgb_to_yuv neon optimizations X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Dmitriy Kovalenko Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Fri, 30 May 2025, Dmitriy Kovalenko wrote: > I've found quite a few ways to optimize existing ffmpeg's rgb to yuv > subsampled conversion. In this patch stack I'll try to > improve the perofrmance. > > This particular set of changes is a small improvement to all the > existing functions and macro. The biggest performance gain is > coming from post loading increment of the pointer and immediate > prefetching of the memory blocks and interleaving the multiplication shifting operations of > different registers for better scheduling. > > Also changed a bunch of places where cmp + b.le was used instead > of one instruction cbnz/tbnz and some other small cleanups. > > Here are checkasm results on the macbook pro with the latest M4 max > > > > bgra_to_uv_1080_c: 257.5 ( 1.00x) > bgra_to_uv_1080_neon: 211.9 ( 1.22x) > bgra_to_uv_1920_c: 467.1 ( 1.00x) > bgra_to_uv_1920_neon: 379.3 ( 1.23x) > bgra_to_uv_half_1080_c: 198.9 ( 1.00x) > bgra_to_uv_half_1080_neon: 125.7 ( 1.58x) > bgra_to_uv_half_1920_c: 346.3 ( 1.00x) > bgra_to_uv_half_1920_neon: 223.7 ( 1.55x) > > > > bgra_to_uv_1080_c: 268.3 ( 1.00x) > bgra_to_uv_1080_neon: 176.0 ( 1.53x) > bgra_to_uv_1920_c: 456.6 ( 1.00x) > bgra_to_uv_1920_neon: 307.7 ( 1.48x) > bgra_to_uv_half_1080_c: 193.2 ( 1.00x) > bgra_to_uv_half_1080_neon: 96.8 ( 2.00x) > bgra_to_uv_half_1920_c: 347.2 ( 1.00x) > bgra_to_uv_half_1920_neon: 182.6 ( 1.92x) > > With my proprietary test on IOS it gives around 70% of performance > improvement converting bgra 1920x1920 image to yuv420p > > On my linux arm cortex-r processing the performance improvement not that > visible but still consistently faster by 5-10% than the current > implementation. > --- > libswscale/aarch64/input.S | 181 +++++++++++++++++++++++-------------- > 1 file changed, 112 insertions(+), 69 deletions(-) This doesn't compile even on macOS any longer, what is this? I will not respond to this thread any more until _every_ _single_ inline comment has been replied to and motivated, and until the prefetch intructions have split to a separate patch. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".