From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id C659B4D856 for ; Mon, 2 Jun 2025 07:53:12 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id D322C68DCA3; Mon, 2 Jun 2025 10:53:08 +0300 (EEST) Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id C0DB168DA2E for ; Mon, 2 Jun 2025 10:53:01 +0300 (EEST) Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-ad883afdf0cso784478566b.0 for ; Mon, 02 Jun 2025 00:53:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1748850781; x=1749455581; darn=ffmpeg.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=eTSwHdQHemQPwl56FgFTL+W9O2rVpjG/Zm5e2OTEJVo=; b=UgpmnIojQG/Q5SPtd6SZd1ZHv6Dvx2ZY4Eu+PVreDolKYAsUBVzJhk9xEYSuOmAEkX 3nrkqmiZLSsV4uoQodgxG1FamY/nREo5JzDctO2oZR/xFpCczaUq28hVUMnZjV2nikGS qne1GE7N+G1Q27nMQkY5HbBt3KVs7xmzPKAPVl8SE9HcxSTYHU0d4YjfDSs7l8DtjfyM NcxdkiMUnuNGER3/ysUlT+1X660EpsgQbatVPp6fcs9+onA4h4pbUbFnSsxjOacVUl38 PAoISCojrzV2PjRmAErH9muKJ8RfiYi5fewJXGmWPeVjT8FAQYCqNuFFf2E4rG4uPkUd GRFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748850781; x=1749455581; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eTSwHdQHemQPwl56FgFTL+W9O2rVpjG/Zm5e2OTEJVo=; b=Bhqkl29gdfPrH5XVKvfAQTw/tZexLfpnFe1p0Aq/cwbQkRLiVB3twNqYIv4rkk7pz3 JlIZ2qC3DYoQ6bkY07J1qAJ2OqCwHb+/jDRJvoKKLBPknRYlyqNUH+8fmIPhxRiKgrQd QdxhFLCssv4BCpUdL1EB0qrVIRFmThRD2r96O8w+12i+0EdF2L/81FBU4FXnyqI/Awlh eOCB3jjE17bIdByXAM/sRHMKxs7sZnqBJthwfcaF9C4zrritnrN2SS0iietQRH5xVY+I z97h76fjHMDHD5Ovw1s0Lme3JhDWgoRS7MCrwbX9NQVk97EuVCnltsNOEmkdWv8VsGxo Uv+g== X-Gm-Message-State: AOJu0Yz+5RF8NKWya+GtU78UhR1u8aQxD8vcmX1kYYll01pk5FCqUECa bI2WN2CWS71hpi4+LtU9O3hBYd1o4PVDSq/CZ2zVzIHDOxaBmysDKhtWXf4GPYpVSq/RPZaOsCz qsJnPew== X-Gm-Gg: ASbGncsBwLQiuMbtKZ5uPrvSD/GdzykwewLdfdUUmmuzK7wjPCqU3SXbdVMhRLeG1BF rjdp2dBLo8aE4rJeo5LBpSA7EqNMLu7jMr2nsBTx5uVwfuSO056kt/B2bN4AYvYbiGcgv4gFB5B UeAoZJAiH901FaUrTH7Qsq4WVS+YuOgarOQQPKU7Lq65awVaOaBTsCUWdhmIGT4be6DPG+byx+l qXubrQSpLS2dlKmP9m4Lq5fh0rTfVu7XqSWGfu7KiZoRMe3kd+XykOvnBDmNHhbQMJumkDgF44U 4G9Pm2n+a3++Y2wKQQQ9ayIunw8XeTtKkG3AnKnJ6BDStIqdhazWg/zswOh2GGiieWA0QY1VrO6 QcoPU7E77zy5eKPMDYk0fqlhD5bGuj2xW76rKJGCJLeYqyXVKKzywG5593g== X-Google-Smtp-Source: AGHT+IGYu0DNepDWYmqrvapPF4FbHu7L5Tt4F2dpK1dWfhfYiSI/29TqZK1JTa80bbFScOK2Vj+Q/g== X-Received: by 2002:a17:907:3e1f:b0:acb:34b2:851 with SMTP id a640c23a62f3a-adb36bee5afmr1163221766b.44.1748850780865; Mon, 02 Jun 2025 00:53:00 -0700 (PDT) Received: from tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net (tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net. [2001:470:27:11::2]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-ada5d82e870sm759177066b.53.2025.06.02.00.53.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Jun 2025 00:53:00 -0700 (PDT) Date: Mon, 2 Jun 2025 10:52:59 +0300 (EEST) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: FFmpeg development discussions and patches In-Reply-To: Message-ID: <7ebeaff9-856f-1b4f-d9db-48d79ad9e3b@martin.st> References: MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH] swscale/aarch64/output: Implement neon assembly for yuv2nv12cX_c() X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Dash Santosh Sathyanarayanan Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Mon, 2 Jun 2025, Harshitha Sarangu Suresh wrote: > From 7260822a578130a713c1455cca6cdd06f1540db8 Mon Sep 17 00:00:00 2001 > From: Harshitha Suresh > Date: Mon, 19 May 2025 22:37:20 +0530 > Subject: [PATCH] swscale/aarch64/output: Implement neon assembly for yuv2nv12cX_c() > > yuv2nv12cX_2_512_accurate_c: 3508.8 ( 1.00x) > yuv2nv12cX_2_512_accurate_neon: 369.2 ( 9.50x) > yuv2nv12cX_2_512_approximate_c: 3499.0 ( 1.00x) > yuv2nv12cX_2_512_approximate_neon: 370.2 ( 9.45x) > yuv2nv12cX_4_512_accurate_c: 4683.0 ( 1.00x) > yuv2nv12cX_4_512_accurate_neon: 568.8 ( 8.23x) > yuv2nv12cX_4_512_approximate_c: 4682.6 ( 1.00x) > yuv2nv12cX_4_512_approximate_neon: 569.9 ( 8.22x) > yuv2nv12cX_8_512_accurate_c: 7243.0 ( 1.00x) > yuv2nv12cX_8_512_accurate_neon: 937.6 ( 7.72x) > yuv2nv12cX_8_512_approximate_c: 7235.9 ( 1.00x) > yuv2nv12cX_8_512_approximate_neon: 938.3 ( 7.71x) > yuv2nv12cX_16_512_accurate_c: 13749.7 ( 1.00x) > yuv2nv12cX_16_512_accurate_neon: 1708.1 ( 8.05x) > yuv2nv12cX_16_512_approximate_c: 13750.0 ( 1.00x) > yuv2nv12cX_16_512_approximate_neon: 1708.6 ( 8.05x) > --- > libswscale/aarch64/output.S | 308 +++++++++++++++++++++++++++++++++++ > libswscale/aarch64/swscale.c | 18 ++ > 2 files changed, 326 insertions(+) > > diff --git a/libswscale/aarch64/output.S b/libswscale/aarch64/output.S > index 190c438870..8eb89e8b54 100644 > --- a/libswscale/aarch64/output.S > +++ b/libswscale/aarch64/output.S > @@ -226,3 +226,311 @@ function ff_yuv2plane1_8_neon, export=1 > b.gt 2b // loop until width consumed > ret > endfunc > + > +// void ff_yuv2nv12cX_neon(enum AVPixelFormat dstFormat, const uint8_t *chrDither, > +// const int16_t *chrFilter, int chrFilterSize, > +// const int16_t **chrUSrc, const int16_t **chrVSrc, > +// uint8_t *dest, int chrDstW) > + > +function ff_yuv2nv12cX_notswapped_neon, export=1 > + // x0 - dstFormat (unused) > + // x1 - uint8_t *chrDither > + // x2 - int16_t *chrFilter > + // x3 - int chrFilterSize > + // x4 - int16_t **chrUSrc > + // x5 - int16_t **chrVSrc > + // x6 - uint8_t *dest > + // x7 - int chrDstW > + > + // Load dither pattern and compute U and V dither vectors > + ld1 {v0.8b}, [x1] // chrDither[0..7] > + ext v1.8b, v0.8b, v0.8b, #3 // Rotate for V: (i+3)&7 Please adhere to the indentation of the existing code, don't make up your own. This patchset causes a lot of fate tests to fail, and causes ffmpeg to crash in a number of tests while running fate. Please fix that. In order to test the assembly in all relevant configurations before submitting, I have made a setup of github actions, which anybody can use. This also includes indentation checks for the assembly. If you make a copy of my branch at https://github.com/mstorsjo/ffmpeg/commits/gha-aarch64, add your own changes on top, and push to a repo on github, it will run the tests in all the configurations. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".