From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id D5C3140297 for ; Tue, 4 Mar 2025 08:27:57 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BDBFC68ECAF; Tue, 4 Mar 2025 10:27:53 +0200 (EET) Received: from mail-lf1-f46.google.com (mail-lf1-f46.google.com [209.85.167.46]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id ADE5D68ECAD for ; Tue, 4 Mar 2025 10:27:46 +0200 (EET) Received: by mail-lf1-f46.google.com with SMTP id 2adb3069b0e04-54973b49353so1060811e87.1 for ; Tue, 04 Mar 2025 00:27:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1741076866; x=1741681666; darn=ffmpeg.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=jtryIiVdrtp9NSPiadMkj6IJDjo4YZL91AgVYfNgl44=; b=jUsxDGrRNaX/jTcR2ZdkUummRZK5w9grxxnmel1KGSETZ6RupO76JJANfFsyKIMa+k UDGCcpxjNz4BUpGhkSTUcl1I6GhjFG+CKKFLVJai0GSJ+lnHRKa1vRYNGwZ9F3p2D9yC aYeC72X/pGYXRHJ4KlJrF1EAcCXxvF0DnxiCmpuHhnuVWV7kfNaEL2xw8gaPuS3WRXeA J2m4HNWf2tc8z77Bu5TsQouOSXBUXHNz9LghKdMSCjHyTiK+IPfm4ft9keeOyFe/SdnH F4KCss7i/v6k7R7hv8YQoXB583of6xPBmEttivZjiIWVwq5RFirI+TVwj9cHRekYprTb BrBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741076866; x=1741681666; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jtryIiVdrtp9NSPiadMkj6IJDjo4YZL91AgVYfNgl44=; b=mUPvTFWjJsfCqrr2iFXSZKyu3UGGjrvD/lyCqluB622FcRnv+kLmiGJ8PErMyzICFH pxRg4/VumrvvoQyYmgyRTatfQWUS/93mEyTqe7oNWfS59XpBAyS9j1espr7iAq1qemMZ TbC3AzM6Io/0Vc8SaipbDB+e0BAkSAF0AylT0M1Zk53jmOWW+IKHRWHU2j8H/6XXAT85 fH+C2SGJxbFZ1Nk6F1YWWVXSn4/C9lCUHUeappnIwud4jMnVbEsm9s11imlWJcHZXKbZ 96Crd1xBAA3x0zJwE7x9YlhjQmSMqx78nPISczcLw3DfARq6dSlsznCAVEQisFwbPHDp MlDg== X-Gm-Message-State: AOJu0Yyt1wogag9DMO1lK9aWWVvoVphdo/4h8Skhb/hPtZxnPZyFEKEC Te7eHn/D8d7Cz9kC27XkwOZ6O0LyE2T8IzSrDR5TT7dcr02uOE/29qedsC6GERBb513nlemsxSQ qrQ== X-Gm-Gg: ASbGncsojJFFPHdwyJVwpZ2aoB35vd5it0Z5RK7ZeQQMJpWEgOeT6Bi7XzGbLDPTZ/I cD2f2SOybBt7sn4QmS1aVpN0aJNktFJE4OaM5Rct53cX9b3P+1sbinrRx29osPTAjB2obs7z00D ALnh5tNLryDXHAMIVTX/b5ZBNEOPGrQFC3HVwU2Gbham42Jv7SX5teKCGK44mlw16JzhbeJwOmZ Db3SDq3jMaOEojG3LX+X9pbHjhrOQ8pO3KNcYe4PMQqXMUXMi3gXlCB+SnA6NcnnsNJhpvQclmr JqMVR4VhHw7at9fjulEe6ilvLnlrCeCJCPN4TuTsrSbgl7EhTUCh1nVoyeUhXcG9ID+uR+4iEhk PtBBvHbBAmx+0OT6My8e92wNm6Yt+AA1Ff8g8Qzrc X-Google-Smtp-Source: AGHT+IGMFu7duIYKhj4Hvh4tM0f6uJcSs11sJmFFgpQrKERjYCTr9fCbxvfbGniJXueCnHhOebUZgQ== X-Received: by 2002:a05:6512:3d19:b0:545:109b:a9d4 with SMTP id 2adb3069b0e04-5494c38ac6amr7232048e87.43.1741076865622; Tue, 04 Mar 2025 00:27:45 -0800 (PST) Received: from tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net (tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net. [2001:470:27:11::2]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-5495900d63bsm977188e87.135.2025.03.04.00.27.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Mar 2025 00:27:45 -0800 (PST) Date: Tue, 4 Mar 2025 10:27:42 +0200 (EET) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: Krzysztof Pyrkosz via ffmpeg-devel In-Reply-To: <20250303210022.7884-2-ffmpeg@szaka.eu> Message-ID: <37b82f-4ba6-58bb-0c8-403c259b19cc@martin.st> References: <20250303210022.7884-2-ffmpeg@szaka.eu> MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH v2] swscale/aarch64: dotprod implementation of rgba32_to_Y X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Krzysztof Pyrkosz Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Mon, 3 Mar 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote: > The idea is to split the 16 bit coefficients into lower and upper half, > invoke udot for the lower half, shift by 8, and follow by udot for the > upper half. > > Benchmark on A78: > bgra_to_y_128_c: 682.0 ( 1.00x) > bgra_to_y_128_neon: 181.2 ( 3.76x) > bgra_to_y_128_dotprod: 117.8 ( 5.79x) > bgra_to_y_1080_c: 5742.5 ( 1.00x) > bgra_to_y_1080_neon: 1472.5 ( 3.90x) > bgra_to_y_1080_dotprod: 906.5 ( 6.33x) > bgra_to_y_1920_c: 10194.0 ( 1.00x) > bgra_to_y_1920_neon: 2589.8 ( 3.94x) > bgra_to_y_1920_dotprod: 1573.8 ( 6.48x) > --- > libswscale/aarch64/input.S | 88 ++++++++++++++++++++++++++++++++++++ > libswscale/aarch64/swscale.c | 17 +++++++ > 2 files changed, 105 insertions(+) LGTM, thanks, I pushed this one now. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".