Re: [FFmpeg-devel] [RFC] New swscale internal design prototype

From: Michael Niedermayer <michael@niedermayer.cc>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [RFC] New swscale internal design prototype
Date: Sun, 9 Mar 2025 20:41:39 +0100
Message-ID: <20250309194139.GL4991@pb2> (raw)
In-Reply-To: <20250308235342.GB669161@haasn.xyz>

[-- Attachment #1.1: Type: text/plain, Size: 3252 bytes --]

Hi Niklas

On Sat, Mar 08, 2025 at 11:53:42PM +0100, Niklas Haas wrote:
> Hi all,
> 
> for the past two months, I have been working on a prototype for a radical
> redesign of the swscale internals, specifically the format handling layer.
> This includes, or will eventually expand to include, all format input/output
> and unscaled special conversion steps.
> 
> I am not yet at a point where the new code can replace the scaling kernels,
> but for the time being, we could start usaing it for the simple unscaled cases,
> in theory, right away.
> 
> Rather than repeating my entire design document here, I opted to collect my
> notes into a design document on my WIP branch:
> 
> https://github.com/haasn/FFmpeg/blob/swscale3/doc/swscale-v2.txt
> 
> I have spent the past week or so ironing out the last kinks and extensively
> benchmarking the new design at least on x86, and it is generally a roughly 1.9x
> improvement over the existing unscaled special converters across the board,
> before even adding any hand written ASM. (This speedup is *just* using the
> less-than-optimal compiler output from my reference C code!)
> 
> In some cases we even measure ~3-4x or even ~6x speedups, especially those
> where swscale does not currently have hand written SIMD. Overall:
> 
> cpu: 16-core AMD Ryzen Threadripper 1950X
> gcc 14.2.1:
>    single thread:
>      Overall speedup=1.887x faster, min=0.250x max=22.578x
>    multi thread:
>      Overall speedup=1.657x faster, min=0.190x max=87.972x
> 
> (The 0.2x slowdown cases are for rgb8/gbr8 input, which requires LUT support
>  for efficient decoding, but I wanted to focus on the core operations first
>  before worrying about adding LUT-based optimizations to the design)
> 
> I am (almost) ready to begin moving forwards with this design, merging it into
> swscale and using it at least for unscaled format conversions, XYZ decoding,
> colorspace transformations (subsuming the existing, horribly unoptimized,
> 3DLUT layer), gamma transformations, and so on.
> 
> I wanted to post it here to gather some feedback on the approach. Where does
> it fall on the "madness" scale? Is the new operations and optimizer design
> comprehensible? Am I trying too hard to reinvent compilers? Are there any
> platforms where the high number of function calls per frame would be
> probitively expensive? What are the thoughts on the float-first approach? See
> also the list of limitations and improvement ideas at the bottom of my design
> document.

I think a more float centric design probably makes sense. Floats make things
nicer and cleaner
It may be needed to support an integer only path for architectures that
have a weak fpu. And also may be needed for some cases to get them bitexact

AVFloating, a rational float type or AVRational64, both interresting.
Do we have other places where either could be used ?

thx

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Frequently ignored answer#1 FFmpeg bugs should be sent to our bugtracker. User
questions about the command line tools should be sent to the ffmpeg-user ML.
And questions about how to use libav* should be sent to the libav-user ML.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 251 bytes --]

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".