From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id EDDCE4C63D for ; Mon, 26 May 2025 09:05:48 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 15A7668D4BE; Mon, 26 May 2025 12:05:44 +0300 (EEST) Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 6B52468D16F for ; Mon, 26 May 2025 12:05:37 +0300 (EEST) Received: by mail-lf1-f50.google.com with SMTP id 2adb3069b0e04-551f00720cfso2958016e87.0 for ; Mon, 26 May 2025 02:05:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1748250336; x=1748855136; darn=ffmpeg.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=zmHc7z8cDQV5Q80SfeTSPFbmHqixV3iu29ry1vqp4fc=; b=vZmWXr6rvxXAsG0dggIVk6YcV86qSDoSSRziQr8NMxAdBHPW5Hi71FX0VWCD+hl1Fh ydJrLwJYcyt2jk5fWaodL/Nzw3TgJlPc6zXdUP07CojA3odV7eKik37fdHTpeQL729OC f5ZnyKPiZGKj0Lvgko5Ry5q9datZ9O7arZ+AvxfPCAu76k6ocD005Uu8jFDGkwNIySMQ xxaTFB/kWMxogVAb+yc4Jd4mJIjYlxQ1059u3MulCwkEhtCSfcY7B35UPb+qN9uXSs4P z0cKO3eCgcMENrnleg2NkaxAfzFuP/LIw/HSiwNCDEPGKDNVvHEi2jq9s+jopR6juHQp a3Bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748250336; x=1748855136; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zmHc7z8cDQV5Q80SfeTSPFbmHqixV3iu29ry1vqp4fc=; b=gLXfiMGSR5q9zyMbli2aUY6Xn7SbzHkgsPbYs0bOZSVnUy9MfG/pmGt4Rjl92ZCdiJ K/meXGi1Fdd/skq5QZGQ4NcZqpKw/+NlkfxBenN1ooXS7Nnl5ITQPfqMbzy+gTXkZ3dl ytxSM+y+ICLEvcP+hDz7SHOwmwg//G5iqYj1rGantFLb/btrlrxHH6JQCcLTeIManGLr JJaUWrk33PmrZE1gwuyKX0yCYQCqIvpqiPOBrq8d9Absoe7LR13jfu+UZmzJCXuv5JIQ FReJs1ZTVQitwbh2tmG8uI4PJF404qqRMvhgod8X3eeMf7HxYQ6N5ObkMbqUHNr/WF8o jvRQ== X-Gm-Message-State: AOJu0Yw3DO8q2HmyUYX0QTa/OWUzUOQYgpjMggn+b4AW4i/ZhSSUbQHO H/zTCbHuNbYEftWxV9LFd7eBHkpupVNnzGJQo56Ug1JSiZKxuIlIB43wB18LawpC2fUIOhHwG7J qq+DMHQ== X-Gm-Gg: ASbGncsoM+XbNpKvRXALILH/J3yH3AOTgaXVJ1Ih+rOXi8EMiUCTFVTJkx2Brj5Lkbt viNVu6QyPB2jUSPTwSg5NEXeq0RXplD7LC6gmGOSxHAUA1ugjD/VOPduTkobsb1ruXU1mWG32Im Ryje3p4Y/9vcwd5wg+4HcJdEiDV4+Vuxtem/YHg9NGWVYuN7xnJLQG6x05zJ8SCTPatrzQMBTqo 4/T4XT25K4Pc4GllpGWi52OrUNp11CLpLwyXny6XNMUfISGMLUk3GTSmP67EdC41CTeGmi79wyy feHpCwRL/Qe8rX/AL+uhdH6IpzB58XZqUZhZEkadn4hWny3N6NFXoP0lUTFixEQ7Bur2kMsCIts OPCmXmPcsN045o1cl3C/7wUVjYJk17WEhGgPjqEUraQ4QHs4= X-Google-Smtp-Source: AGHT+IHg/iIdGwlENvQf7TDJlmemJ24DzlqEy31+nUpxZi9i3RqdQO8ocvafSDXN8olato9KQxnJJQ== X-Received: by 2002:a2e:be8c:0:b0:30b:f274:d1e2 with SMTP id 38308e7fff4ca-3295b9b0aa8mr19609841fa.1.1748250336327; Mon, 26 May 2025 02:05:36 -0700 (PDT) Received: from tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net (tunnel335574-pt.tunnel.tserv24.sto1.ipv6.he.net. [2001:470:27:11::2]) by smtp.gmail.com with ESMTPSA id 38308e7fff4ca-328085f64a0sm45513451fa.114.2025.05.26.02.05.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 May 2025 02:05:35 -0700 (PDT) Date: Mon, 26 May 2025 12:05:35 +0300 (EEST) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: FFmpeg development discussions and patches In-Reply-To: Message-ID: References: MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH] swscale/output: Implement neon intrinsics for yuv2planeX_10_c_template() X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Dash Santosh Sathyanarayanan Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Thu, 22 May 2025, Harshitha Sarangu Suresh wrote: > This optimization provides 5x improvement for the module. The boost in performance was calculated by adding C timers inside the C function and the optimized neon intrinsic function. > > > From 904144c2db9e5e72d56360c4c2eb38d426852901 Mon Sep 17 00:00:00 2001 > From: Harshitha Suresh > Date: Thu, 22 May 2025 10:23:55 +0530 > Subject: [PATCH] swscale/output: Implement neon intrinsics for > yuv2planeX_10_c_template() > > --- > libswscale/output.c | 76 ++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 75 insertions(+), 1 deletion(-) > > diff --git a/libswscale/output.c b/libswscale/output.c > index c37649e7ce..345df5ce59 100644 > --- a/libswscale/output.c > +++ b/libswscale/output.c > @@ -22,7 +22,9 @@ > #include > #include > #include > - > +#if defined (__aarch64__) > +#include > +#endif > #include "libavutil/attributes.h" > #include "libavutil/avutil.h" > #include "libavutil/avassert.h" > @@ -337,6 +339,77 @@ yuv2plane1_10_c_template(const int16_t *src, uint16_t *dest, int dstW, > } > } > > + > +#if defined (__aarch64__) && !defined(__APPLE__) Why is Apple excluded here? In any case; this is not the right way to add arch specific optimizations. 1. We don't add unconditional cases in the main arch independent code. We add them with runtime detection in arch specific files, see libswscale/aarch64/*. In the case of aarch64 and neon, things are easier as this extension is available in the compiler baseline and doesn't, strictly, need runtime detection, but nevertheless, the code should be arranged that way. 2. We don't use intrinsics for aarch64, we use standalone assembly files. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".