From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 84B2F4CFDC for ; Fri, 30 May 2025 09:07:44 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 9909468D995; Fri, 30 May 2025 12:07:39 +0300 (EEST) Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 8E2D868D043 for ; Fri, 30 May 2025 12:07:32 +0300 (EEST) Received: by mail-lf1-f44.google.com with SMTP id 2adb3069b0e04-54c0fa6d455so2120110e87.1 for ; Fri, 30 May 2025 02:07:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1748596052; x=1749200852; darn=ffmpeg.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=dVrOnzcx/ZzZB+gFy8Edq2VCinSL7RltMc7pv+PTF8I=; b=yNPhhLK8qr7Coxbm91t6lACyb/7VFx1bB8TfyeSr3LKXDLVWl9YC1lgqUvNJSKPQmY fRbQyIVdq8DbeQB+Mkk6DvX2d7sQKHJfjc694XBatI+ro3V+/vH+D3hEC5aU1s30zm4i /OtYzqt5KLOFNgi7lNkdlt6J4NjqdvyjScO7Upzl4ej1h3RyOfLVcXM3deFG51WUlKPh PG/+1ZBQxXw6wD64/NIVIb1LVBfEqbCjL+FbAiYOGvU0/o8yrFI1DFEVzkbkKRNcs63Z GYpRw9S3AIVje4cYE/QxjSqSknr3MQErgtvTjo0SB3ypM84VWu/W+o1r4fJ/uIfGNQkg bVlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748596052; x=1749200852; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dVrOnzcx/ZzZB+gFy8Edq2VCinSL7RltMc7pv+PTF8I=; b=nn4sjqW6LmHtbnJGl4/vzm7fEkTwPtNXhgC0Y/Ku5dtr9bwcYeaQ3606NnBfe5es9j WganOUCTK0btoNZes3MrZOFsC9R8chGaPWHGAXzMF6cW5v4iuhTg0yvqnLdULLYdQBER vC4+urd4wNbPp9ZUk4uPpd5yJH437wzh2QztQ5Ps9mMCIZVgxPk2Eooj0e4PFY+w3Hr8 dEkLqT9Imkip6CU1VHZHlL4f3KBqlroghzMaOqkj3Oou8SDVh7kMhxXBCZxTl8Y0pUF+ ES0To0yZU0LopZfAdB2hJlMFJPtqf68QKtDGVPD/zjly9PrJcOknj4K9VCn7m3qTAOF3 NKhQ== X-Gm-Message-State: AOJu0YxzqPfOnGMj+Iji8T+o4BWQxkm/psxmKJPFU46xPYMjgiMwriGe 0K3JYzHaUi/I1kH+tTJiDlEjFpZpQ3IrB/0N2bwztJwtlu67mmqpWJpHwIDW5gtpygbhgq+kBp/ 4Rt/NtQ== X-Gm-Gg: ASbGncsCQ7Spet+qnzwk/7T7biHrhnZaF4umc+Dr87nk9kvft4CPwRSrO5S7xK4DaBO +YIhd72D4xhFfGmCsW/epNUz9RSPY+FEHp1Xm52l+r5GF5guRWaBDRxpIIhF6Mg0uNxoFpRtOgk YSBMzyjTFJsEtPxr6fGl438cxf3tCAlG54AmqCQd8zl61TfttclhcDStFgYCnqPmVYyYXQrxCpO PkAZZEoyd4fTdsCROq2yl56zaerl4HUYxcBxDKmSyTDuiXMrepbkWxtcVJ5VVGRT8dGeKFsqVkw y6y5/mWqMc6RNyZJhj7elM24F+Mmyv2r2AyMGfjzURZhlXp4rtBRfUmcImVgYld17g63x6kshdE 9arv/NBGGN2tkoURFG/+UyjfjZnN1n6YkKSd9jUIhf3sElQmVPpNV5TX+yAs7zrhrZWDb X-Google-Smtp-Source: AGHT+IHWC96RKDRepKH7NiEtQlX4Dg8y1LcZsnyo8Ti47bGv7KymmmOPwIjyDyyppmWSZ6ml3VvNRw== X-Received: by 2002:a05:6512:aca:b0:553:2e59:a0f3 with SMTP id 2adb3069b0e04-5533b930b30mr834470e87.36.1748596051425; Fri, 30 May 2025 02:07:31 -0700 (PDT) Received: from tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net (tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net. [2001:470:27:11::2]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-5533787d2bdsm627986e87.17.2025.05.30.02.07.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 May 2025 02:07:31 -0700 (PDT) Date: Fri, 30 May 2025 12:07:26 +0300 (EEST) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: FFmpeg development discussions and patches In-Reply-To: Message-ID: <544ef49a-b717-2175-1216-a23c63bf4fd8@martin.st> References: MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH v4 1/2] swscale: rgb_to_yuv neon optimizations X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Dmitriy Kovalenko Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Fri, 30 May 2025, Dmitriy Kovalenko wrote: >> If you with "non-performant mobile" mean small in-order cores, most of them can handle repeated accumulation like these even faster, if you sequence these so that all accumulations to one register is sequentially. E.g. first all "smlal \u_dst1\().4s", followed by all "smlal \u_dst2\().4s", followed by \v_dst1, followed by \v_dst2. It's worth benchmarking if you do have access to such cores (e.g. Cortex-A53/A55; perhaps that's also the case on the Cortex-R you mentioned in the commit message). > > I mean generally mobile first CPUs. But I just verified even on macbook > pro interleaving instruction per the component does not enable IRL What does "does not enable IRL" mean? > and but having a "hot-register" being multipled several times in > parallel gives a difference. Here is checask results from macbook w/ my > and interleaved by r/g/b component version I'm sorry but it is very hard to interpret what you're saying here; what is the first and second measurement? In any case; now with this version of the patchset which actually does compile and pass checkasm om linux, I tested reordering rgb_to_uv_interleaved_product in the way I suggested, like this: smlal \u_dst1\().4s smlal \u_dst1\().4s smlal \u_dst1\().4s smlal2 \u_dst2\().4s smlal2 \u_dst2\().4s smlal2 \u_dst2\().4s smlal \v_dst1\().4s smlal \v_dst1\().4s smlal \v_dst1\().4s smlal2 \v_dst2\().4s smlal2 \v_dst2\().4s smlal2 \v_dst2\().4s Such accumulation orders can sometimes give significant speedups on in-order cores like Cortex A53 and A55. In this case it didn't make any difference, so the there's no need to investigate it further. >> Does this make any practical difference, as we're just storing the >> lower 32 bits anyway? > > Not really but I found it quite confusing at first becuase it looks like > this instruction will imply narrowing, but looking into the w13 / w13 is > much more clear what is going on. If it doesn't make any difference, then don't change it. The fewer changes in a patch, the easier it is to accept the patch. Especially if you are optimizing code, don't include unrelated changes in the same patch. If you feel strongly that it should be changed for readability/understandability reasons, then factor out that change to a separate patch. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".