From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 6F8A14E1CB for ; Sun, 9 Mar 2025 19:41:51 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DA29268E6CD; Sun, 9 Mar 2025 21:41:47 +0200 (EET) Received: from relay9-d.mail.gandi.net (relay9-d.mail.gandi.net [217.70.183.199]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CA54168D18B for ; Sun, 9 Mar 2025 21:41:40 +0200 (EET) Received: by mail.gandi.net (Postfix) with ESMTPSA id 12408443BD for ; Sun, 9 Mar 2025 19:41:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=niedermayer.cc; s=gm1; t=1741549300; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=MXdJ5E0W++W+CS6FNThXDq+3PBK1yrbOBLb//ijkiI0=; b=O5npjutx11WBvgs36sgzALajrXD4PPIVb92p00ngTNuZ2CbOU5HOaNHuyyDL1rdUKDw0G/ 1HhJAIK+vOz2cFH9E8TdP7qy2tyse+EAGAFXnlwe57vzTz8anyreT5/fOYPGVC2G6DOFhy meNp4D+xjo5Uh0M9QIxzhNVTG8Qd7+PE6NYXs17LIBSO10/xLpCocIWow/5HECztCCx7pm nF7NLAxl/uHiBEvj0lBrA0yMHSt6BofVq4WJdYn34Rg1UFyDbEPxSQT1jTIrx7X9EeYM5m 9o1FDRhlqWMtkry5ErgoSk7SSBETJAp5hp8hrm8z9nv4iI86tnofcC0iljxyMg== Date: Sun, 9 Mar 2025 20:41:39 +0100 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20250309194139.GL4991@pb2> References: <20250308235342.GB669161@haasn.xyz> MIME-Version: 1.0 In-Reply-To: <20250308235342.GB669161@haasn.xyz> X-GND-State: clean X-GND-Score: -85 X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgdduudejfedvucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuifetpfffkfdpucggtfgfnhhsuhgsshgtrhhisggvnecuuegrihhlohhuthemuceftddunecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenfghrlhcuvffnffculdduhedmnecujfgurhepfffhvffukfhfgggtuggjsehgtderredttddvnecuhfhrohhmpefoihgthhgrvghlucfpihgvuggvrhhmrgihvghruceomhhitghhrggvlhesnhhivgguvghrmhgrhigvrhdrtggtqeenucggtffrrghtthgvrhhnpeetgfegvdffieeuffevhfeitdfgfeejudekfeegteegveegjeegkedvveejleevkeenucffohhmrghinhepghhithhhuhgsrdgtohhmnecukfhppeeguddrieeirdeijedruddufeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepihhnvghtpeeguddrieeirdeijedruddufedphhgvlhhopehlohgtrghlhhhoshhtpdhmrghilhhfrhhomhepmhhitghhrggvlhesnhhivgguvghrmhgrhigvrhdrtggtpdhnsggprhgtphhtthhopedupdhrtghpthhtohepfhhfmhhpvghgqdguvghvvghlsehffhhmphgvghdrohhrgh X-GND-Sasl: michael@niedermayer.cc Subject: Re: [FFmpeg-devel] [RFC] New swscale internal design prototype X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: multipart/mixed; boundary="===============1330244660215870550==" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --===============1330244660215870550== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="LksWwYmAsEoiTZAI" Content-Disposition: inline --LksWwYmAsEoiTZAI Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Niklas On Sat, Mar 08, 2025 at 11:53:42PM +0100, Niklas Haas wrote: > Hi all, >=20 > for the past two months, I have been working on a prototype for a radical > redesign of the swscale internals, specifically the format handling layer. > This includes, or will eventually expand to include, all format input/out= put > and unscaled special conversion steps. >=20 > I am not yet at a point where the new code can replace the scaling kernel= s, > but for the time being, we could start usaing it for the simple unscaled = cases, > in theory, right away. >=20 > Rather than repeating my entire design document here, I opted to collect = my > notes into a design document on my WIP branch: >=20 > https://github.com/haasn/FFmpeg/blob/swscale3/doc/swscale-v2.txt >=20 > I have spent the past week or so ironing out the last kinks and extensive= ly > benchmarking the new design at least on x86, and it is generally a roughl= y 1.9x > improvement over the existing unscaled special converters across the boar= d, > before even adding any hand written ASM. (This speedup is *just* using the > less-than-optimal compiler output from my reference C code!) >=20 > In some cases we even measure ~3-4x or even ~6x speedups, especially those > where swscale does not currently have hand written SIMD. Overall: >=20 > cpu: 16-core AMD Ryzen Threadripper 1950X > gcc 14.2.1: > single thread: > Overall speedup=3D1.887x faster, min=3D0.250x max=3D22.578x > multi thread: > Overall speedup=3D1.657x faster, min=3D0.190x max=3D87.972x >=20 > (The 0.2x slowdown cases are for rgb8/gbr8 input, which requires LUT supp= ort > for efficient decoding, but I wanted to focus on the core operations fir= st > before worrying about adding LUT-based optimizations to the design) >=20 > I am (almost) ready to begin moving forwards with this design, merging it= into > swscale and using it at least for unscaled format conversions, XYZ decodi= ng, > colorspace transformations (subsuming the existing, horribly unoptimized, > 3DLUT layer), gamma transformations, and so on. >=20 > I wanted to post it here to gather some feedback on the approach. Where d= oes > it fall on the "madness" scale? Is the new operations and optimizer design > comprehensible? Am I trying too hard to reinvent compilers? Are there any > platforms where the high number of function calls per frame would be > probitively expensive? What are the thoughts on the float-first approach?= See > also the list of limitations and improvement ideas at the bottom of my de= sign > document. I think a more float centric design probably makes sense. Floats make things nicer and cleaner It may be needed to support an integer only path for architectures that have a weak fpu. And also may be needed for some cases to get them bitexact AVFloating, a rational float type or AVRational64, both interresting. Do we have other places where either could be used ? thx [...] --=20 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Frequently ignored answer#1 FFmpeg bugs should be sent to our bugtracker. U= ser questions about the command line tools should be sent to the ffmpeg-user ML. And questions about how to use libav* should be sent to the libav-user ML. --LksWwYmAsEoiTZAI Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EABEKAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCZ83u7wAKCRBhHseHBAsP qw5IAJ90bs6UBkiVTuTL3Wgjg+rtPPJY9gCglcAaCbCcXbz/keg3TN3QNeYBk0I= =c8Xt -----END PGP SIGNATURE----- --LksWwYmAsEoiTZAI-- --===============1330244660215870550== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --===============1330244660215870550==--