From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 1C858443F5 for ; Fri, 9 Sep 2022 07:33:10 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0DC3C68BAF3; Fri, 9 Sep 2022 10:33:08 +0300 (EEST) Received: from mail8.parnet.fi (mail8.parnet.fi [77.234.108.134]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7D6DA68BAB1 for ; Fri, 9 Sep 2022 10:33:01 +0300 (EEST) Received: from mail9.parnet.fi (mail9.parnet.fi [77.234.108.21]) by mail8.parnet.fi with ESMTP id 2897WqNA007760-2897WqNB007760; Fri, 9 Sep 2022 10:32:52 +0300 Received: from foo.martin.st (host-97-187.parnet.fi [77.234.97.187]) by mail9.parnet.fi (Postfix) with ESMTPS id D8786A1430; Fri, 9 Sep 2022 10:32:51 +0300 (EEST) Date: Fri, 9 Sep 2022 10:32:48 +0300 (EEST) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: Hubert Mazur In-Reply-To: <20220908092507.63319-1-hum@semihalf.com> Message-ID: <8d79d28-6336-fe5d-7c15-e5f0aa951731@martin.st> References: <20220908092507.63319-1-hum@semihalf.com> MIME-Version: 1.0 X-FE-Policy-ID: 3:14:2:SYSTEM Subject: Re: [FFmpeg-devel] [PATCH 0/5] Provide optimized neon implementation X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: gjb@semihalf.com, upstream@semihalf.com, jswinney@amazon.com, ffmpeg-devel@ffmpeg.org, mw@semihalf.com, spop@amazon.com Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Thu, 8 Sep 2022, Hubert Mazur wrote: > Fix minor issues in the patches. > Regarding vsse16 I didn't change saba & umlal to sub & smlal. > It doesn't affect the performance, so left it as it was. > The majority of changes refer to nsse16: > - fixed indentation (thanks for pointing out), > - applied the patch from Martin which fixes the balance > within instructions, > - interleaved instructions - apparently this helped a little > to achieve better benchmarks. Thanks! I measured a small further improvement on A53 with this change; from 377 to 370 cycles. > I have also updated the benchmark results for each function - > not a huge performance improvement, but worth the effort. > For nsse and vsse are shown below (these are the biggest changes). > - vsse16 asm from 64.7 to 59.2, > - nsse16 asm from 120.0 to 116.5. It's kinda surprising that the difference is so small, since we reduced the amount of work done in the functions quite significantly (IIRC on A53, the speedup was something like 1.5x compared with the original), but I guess it's understandable if the Graviton 3 is so powerful, that there's enough spare execution units so that a bunch of redundant instructions doesn't really matter. Anyway, this revision of the patchset looked good to me, so I pushed it now. Thanks! // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".