From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 40A384862D for ; Mon, 10 Jun 2024 11:56:12 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7D45E68D71A; Mon, 10 Jun 2024 14:56:09 +0300 (EEST) Received: from mail-lj1-f172.google.com (mail-lj1-f172.google.com [209.85.208.172]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 9A0FF68D58C for ; Mon, 10 Jun 2024 14:56:03 +0300 (EEST) Received: by mail-lj1-f172.google.com with SMTP id 38308e7fff4ca-2eadaac1d28so38717521fa.3 for ; Mon, 10 Jun 2024 04:56:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1718020563; x=1718625363; darn=ffmpeg.org; h=mime-version:references:message-id:in-reply-to:subject:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=SzbCvGP14qoPAtMuTKfoMZGQyBuf1kduRmhXtKs/5mw=; b=M3yWFcqqqwuQYAfzg4/0Be/CoQopG/ptEZWjs7YnUgUPUIZUrrP+NFAxx5lsR5EHfB 2HqnKicJ+1GXTweY76SlFcu7uoacs2ns99/cRWaTTFUFJG2VuxUGveymPtoKUz9XaHyv GuvF6rfbeDovu5QO3dLjX2XMVLG8Dn+/Cd3jZbl4ppAcHDOnMyh0sxcBGYpp+54fjFV4 Fp6wiXhQWvpq5F6evwFC8rmNOzmawG+obGqEN+Qjy5F115aHe1qyTc53ZxPFWkftfKBe +rT/v7Ar+AsDPbHyUQ9RqBfyWn8Sw7smZU0lbCsJZlAZT3U0mqQdZC7yWXoybmPbL/A0 MCNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718020563; x=1718625363; h=mime-version:references:message-id:in-reply-to:subject:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=SzbCvGP14qoPAtMuTKfoMZGQyBuf1kduRmhXtKs/5mw=; b=W2YNBDgPz4tJLsFynfvWZp4QFe+PWhmvQ3LU9nbmwapIkW7ohI3mv0gVsCBIy74kdv OM1Rsst9gMwS9BpB9c/unHkSsgXBebweFghvEtGL3+Z5WnUIOUsS4cTgxV15ztf1L3cy pRudgRmS8cfHCJyMp3xrkowZtTpTNbhsXkqYq1OCgtu43MWv45NQzGVyZ5Or9gBsvVwT vecFFHtoleOUzLfhW2i8+1hxcPJYT7Zsak0gekVL6bFLfODc9WljjGm9e1UVNbTBtbrd P55dCKdZVsLAQ59d7ZSLknvNkMdViDLaR0V1nqVrPVnxpvQzzfctiqSFFo0y0YGD6muS cjYA== X-Gm-Message-State: AOJu0YwO+ezCSAO37/y22zve+oTQxB/ovmk7PEYllKONER1Eetq88oTc MZoU+OpOTImY5fLbmvWhFS3IDH8YPPNeVX/ONFU4y3pwn4EiRt/WCSc1SRuv7lNQHTL0lhSw+2M Ovg== X-Google-Smtp-Source: AGHT+IFWlywCZ3YcEHmJ/rGALHWy4Ba7RqT5bSC7jL+hufvE/1UXwJvMSfz93QjNBhk/v3NFmLUcHQ== X-Received: by 2002:a2e:a548:0:b0:2eb:e45d:686a with SMTP id 38308e7fff4ca-2ebe45d7eb2mr25922301fa.50.1718020562496; Mon, 10 Jun 2024 04:56:02 -0700 (PDT) Received: from tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net (tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net. [2001:470:27:11::2]) by smtp.gmail.com with ESMTPSA id 38308e7fff4ca-2eb4ce5fccasm11386201fa.15.2024.06.10.04.56.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jun 2024 04:56:01 -0700 (PDT) Date: Mon, 10 Jun 2024 14:56:00 +0300 (EEST) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: FFmpeg development discussions and patches In-Reply-To: <20240607140543.130761-4-ramiro.polla@gmail.com> Message-ID: <606a3e3d-c38c-c4a-c6d1-929e2dfa79dc@martin.st> References: <20240607140543.130761-1-ramiro.polla@gmail.com> <20240607140543.130761-4-ramiro.polla@gmail.com> MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH 4/4] swscale/aarch64: add neon {lum, chr}ConvertRange X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Fri, 7 Jun 2024, Ramiro Polla wrote: > chrRangeFromJpeg_8_c: 28.5 > chrRangeFromJpeg_8_neon: 21.2 > chrRangeFromJpeg_24_c: 81.2 > chrRangeFromJpeg_24_neon: 34.7 > chrRangeFromJpeg_128_c: 425.2 > chrRangeFromJpeg_128_neon: 162.0 > chrRangeFromJpeg_144_c: 480.2 > chrRangeFromJpeg_144_neon: 180.2 > chrRangeFromJpeg_256_c: 838.2 > chrRangeFromJpeg_256_neon: 318.0 > chrRangeFromJpeg_512_c: 1698.2 > chrRangeFromJpeg_512_neon: 630.0 > chrRangeToJpeg_8_c: 56.0 > chrRangeToJpeg_8_neon: 23.5 > chrRangeToJpeg_24_c: 147.7 > chrRangeToJpeg_24_neon: 38.2 > chrRangeToJpeg_128_c: 760.2 > chrRangeToJpeg_128_neon: 182.5 > chrRangeToJpeg_144_c: 857.7 > chrRangeToJpeg_144_neon: 204.5 > chrRangeToJpeg_256_c: 1504.2 > chrRangeToJpeg_256_neon: 358.5 > chrRangeToJpeg_512_c: 3025.7 > chrRangeToJpeg_512_neon: 710.5 > lumRangeFromJpeg_8_c: 24.0 > lumRangeFromJpeg_8_neon: 18.2 > lumRangeFromJpeg_24_c: 64.0 > lumRangeFromJpeg_24_neon: 22.2 > lumRangeFromJpeg_128_c: 289.2 > lumRangeFromJpeg_128_neon: 79.2 > lumRangeFromJpeg_144_c: 334.7 > lumRangeFromJpeg_144_neon: 87.7 > lumRangeFromJpeg_256_c: 579.5 > lumRangeFromJpeg_256_neon: 152.0 > lumRangeFromJpeg_512_c: 1208.0 > lumRangeFromJpeg_512_neon: 299.0 > lumRangeToJpeg_8_c: 30.0 > lumRangeToJpeg_8_neon: 19.0 > lumRangeToJpeg_24_c: 82.2 > lumRangeToJpeg_24_neon: 24.0 > lumRangeToJpeg_128_c: 440.7 > lumRangeToJpeg_128_neon: 90.5 > lumRangeToJpeg_144_c: 502.0 > lumRangeToJpeg_144_neon: 102.2 > lumRangeToJpeg_256_c: 893.7 > lumRangeToJpeg_256_neon: 178.0 > lumRangeToJpeg_512_c: 1793.7 > lumRangeToJpeg_512_neon: 355.0 > --- > libswscale/aarch64/Makefile | 1 + > libswscale/aarch64/range_convert_neon.S | 103 ++++++++++++++++++++++++ > libswscale/aarch64/swscale.c | 21 +++++ > libswscale/swscale_internal.h | 1 + > libswscale/utils.c | 4 +- > 5 files changed, 129 insertions(+), 1 deletion(-) > create mode 100644 libswscale/aarch64/range_convert_neon.S > > diff --git a/libswscale/aarch64/Makefile b/libswscale/aarch64/Makefile > index da1d909561..6923827f82 100644 > --- a/libswscale/aarch64/Makefile > +++ b/libswscale/aarch64/Makefile > @@ -4,5 +4,6 @@ OBJS += aarch64/rgb2rgb.o \ > > NEON-OBJS += aarch64/hscale.o \ > aarch64/output.o \ > + aarch64/range_convert_neon.o \ > aarch64/rgb2rgb_neon.o \ > aarch64/yuv2rgb_neon.o \ > diff --git a/libswscale/aarch64/range_convert_neon.S b/libswscale/aarch64/range_convert_neon.S > new file mode 100644 > index 0000000000..5e104971f0 > --- /dev/null > +++ b/libswscale/aarch64/range_convert_neon.S > @@ -0,0 +1,103 @@ > +/* > + * Copyright (c) 2024 Ramiro Polla > + * > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + */ > + > +#include "libavutil/aarch64/asm.S" > + > +.macro lumConvertRange name max mult offset shift We usually use commas between the macro arguments here. Apparently it doesn't make any difference for any of the tools we support, but it would be nice for consistency. (When invoking macros, commas between arguments are optional for most platforms, but not when targeting Apple platforms, so being strict with consistent use of commas is generally good.) > +const offset_\name, align=4 > + .word \offset, \offset, \offset, \offset > +endconst > +function ff_\name, export=1 > +.if \max != 0 > + mov w3, #\max > + dup v24.8h, w3 > +.endif > + mov w3, #\mult > + dup v25.4s, w3 > + movrel x3, offset_\name > + ld1 {v26.4s}, [x3] FWIW, I did see that you were recommended this form, over ld1r, based on some microarchitectural performance numbers. However in our preexisting assembly, manually pre-splatting vectors like this is unusual I would say. I don't have a strong opinion on the matter though. Anyway, the assembly looks reasonable to me. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".