From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 4DE484DC61 for ; Sat, 1 Mar 2025 23:21:22 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E3A4368E0B7; Sun, 2 Mar 2025 01:21:18 +0200 (EET) Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B46B368E0B7 for ; Sun, 2 Mar 2025 01:21:12 +0200 (EET) Received: by mail-lf1-f42.google.com with SMTP id 2adb3069b0e04-54954fa61c8so1746402e87.1 for ; Sat, 01 Mar 2025 15:21:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1740871272; x=1741476072; darn=ffmpeg.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=fD7LwWzdpbWEZ6EASfka1vbhf4NjX1y9GfnDya+La9w=; b=xSE5mcD5gRaprlNvoDj4HvAwOJfPS0GBWxoaNFMPJBiYc0oDs810BK+DMv5onDGHpt nm/0hNoqqN8lmlxf8SrTvUVt4Ry3zaVvVVAF43/s589bJ8DNnfxPIoGw97KvDuwHBsLy XT+rwKiJiRI4SEwe9TAsLA2pxuyhe2xBNYa95bbXHxexovu33jgHowPoeHW5DZ6Ilyc9 oKGoy/4Vm7ekhO50oZjwsDDzVOAfeuUqtCuVnRDbRZMoFuZiYKyReJtLBlhAcBVlmQWX X+Wt4TEfzyK4iw9TZfkRVSHwyhF0ZMycSTDsErXchVRZigp7lqlAHk76dlGDyiMof1EE GgVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740871272; x=1741476072; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fD7LwWzdpbWEZ6EASfka1vbhf4NjX1y9GfnDya+La9w=; b=m56LRFj3iXBkyMCKw41Uv9/hwr4hKm9fIU/9gyg/vAVsySGuBMb2NoHGPJ1U/C35M2 8ujFJNbMOxga0GYswhqT7IwbDTgcgqf62YrRtoGNITyqc/qOAWnWMbXEyYHtYov9TBSK 72ga/wv7wRXJCqHrRuLVUiBoOEn1gaIy9MNU43AdHlTGjBKNGc+4olpAgLfy5KsPwUvB SkV59NYKGseuPAghovXSpY7P1LTB4kdSw9XnJRf1fGu8chnlNoM10wjjyK7Z87t1Yfxg a/0He355kHiXWir3b4Atve5mztEpC4luu84rs+07HvQpdXl/uhC95HE/4qDz+Z841emn +rAw== X-Gm-Message-State: AOJu0Yw5lIr8wSCGpYkkERntOnv/T01r63/38LFn1mujZQkm7TC2shvd XLKWU/VMHitCAbMnpf3Od+1isdQEo1TOjig79ZoNeTWpSeoF+JpqVS6IHc2GJ3KdBC5gghLxZ9k Zfw== X-Gm-Gg: ASbGncvEzHaHK7RzI8l8fBiFqk6212WYr8JRacEx4uXthVw+XFdIgGSeg0vS1WBTZcI 3Gk4aW9e+aBIjjHGKtYnYM/L6DCy+922kq9Bn/VfF/INtIJ5+4NmYuII1R8j7QYU/L9bGUtEzid KQF6Tm/6srepSIxywZwZyOkc6zhF4w8IhUG/M9tIfvpE5qy4GL38MKvwMumBexpvmbngPYbZlkm W67SYekYG72WzA5Py3pX+OdiGFmEY/6RHuGdmc0bwKMZ3ESLyaI/2VmjLItUQonVVUJPDR4gM9/ rqIvU6BqfCsXELLWLHp2Fdx/gLpOAxyJvpQTyq/VLyhMyhWdKnBPY8J0N7sUV/hzhdg8ckdmQu+ n1x9hxW7COEEsc1sImz+k+VHgejwHNzGl4tWcWlYj X-Google-Smtp-Source: AGHT+IF5T4tPsCbHQ5vdby48S9oqEK8F5rghbJnSN08ORp9QDBFxYzTxXZwQnlpeWFTY6GbYR77Maw== X-Received: by 2002:ac2:4a8d:0:b0:549:4e78:9ed5 with SMTP id 2adb3069b0e04-5494e789f2emr2729242e87.45.1740871271797; Sat, 01 Mar 2025 15:21:11 -0800 (PST) Received: from tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net (tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net. [2001:470:27:11::2]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-5495b5476c6sm249672e87.167.2025.03.01.15.21.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 01 Mar 2025 15:21:11 -0800 (PST) Date: Sun, 2 Mar 2025 01:21:09 +0200 (EET) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: Krzysztof Pyrkosz via ffmpeg-devel In-Reply-To: <34183bb-28d2-c340-bbf-dba42d724a87@martin.st> Message-ID: <6276a82f-24dc-2a43-5aac-7176ceb466@martin.st> References: <20250301125859.113969-2-ffmpeg@szaka.eu> <34183bb-28d2-c340-bbf-dba42d724a87@martin.st> MIME-Version: 1.0 X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [FFmpeg-devel] [PATCH] swscale/aarch64/hscale.S Refactor hscale_16_to_15__fs_4 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Krzysztof Pyrkosz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-15"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Sun, 2 Mar 2025, Martin Storsj=F6 wrote: > On Sat, 1 Mar 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote: > >> Before/after: >> = >> A78 >> hscale_16_to_15__fs_4_dstW_8_neon: 86.8 ( 1.72x) >> hscale_16_to_15__fs_4_dstW_24_neon: 147.5 ( 2.73x) >> hscale_16_to_15__fs_4_dstW_128_neon: 614.0 ( 3.14x) >> hscale_16_to_15__fs_4_dstW_144_neon: 680.5 ( 3.18x) >> hscale_16_to_15__fs_4_dstW_256_neon: 1193.2 ( 3.19x) >> hscale_16_to_15__fs_4_dstW_512_neon: 2305.0 ( 3.27x) >> = >> hscale_16_to_15__fs_4_dstW_8_neon: 86.0 ( 1.74x) >> hscale_16_to_15__fs_4_dstW_24_neon: 106.8 ( 3.78x) >> hscale_16_to_15__fs_4_dstW_128_neon: 404.0 ( 4.81x) >> hscale_16_to_15__fs_4_dstW_144_neon: 451.8 ( 4.80x) >> hscale_16_to_15__fs_4_dstW_256_neon: 760.5 ( 5.06x) >> hscale_16_to_15__fs_4_dstW_512_neon: 1520.0 ( 5.01x) >> = >> A72 >> hscale_16_to_15__fs_4_dstW_8_neon: 156.8 ( 1.52x) >> hscale_16_to_15__fs_4_dstW_24_neon: 217.8 ( 2.52x) >> hscale_16_to_15__fs_4_dstW_128_neon: 906.8 ( 2.90x) >> hscale_16_to_15__fs_4_dstW_144_neon: 1014.5 ( 2.91x) >> hscale_16_to_15__fs_4_dstW_256_neon: 1751.5 ( 2.96x) >> hscale_16_to_15__fs_4_dstW_512_neon: 3469.3 ( 2.97x) >> = >> hscale_16_to_15__fs_4_dstW_8_neon: 151.2 ( 1.54x) >> hscale_16_to_15__fs_4_dstW_24_neon: 173.4 ( 3.15x) >> hscale_16_to_15__fs_4_dstW_128_neon: 660.0 ( 3.98x) >> hscale_16_to_15__fs_4_dstW_144_neon: 735.7 ( 4.00x) >> hscale_16_to_15__fs_4_dstW_256_neon: 1273.5 ( 4.09x) >> hscale_16_to_15__fs_4_dstW_512_neon: 2488.2 ( 4.16x) >> --- >> = >> This patch removes the use of stack for temporary state and replaces >> interleaved ld4 loads with ld1. >> I'm aware the component is being deprecated, however in my use case >> (screen recording) the total time spent in this function is roughly 15%, >> the improvement is significant and worth sharing. > > The patch looks good. I didn't follow it in exact detail, but it overall = > looks reasonable, and looks much better than the previous form. This = > description of what the patch does and why also is worth keeping in the f= inal = > commit message, but as there's no need to repost the patch, I could just = > adjust the message myself before pushing it. I pushed this one, and the second ac3dsp patch now, with the commit = messages readjusted a little bit. The first ac3dsp patch should be good = too if someone verifies that it's ok to handle 16 elements at a time. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".