Hi Niklas On Sat, Mar 08, 2025 at 11:53:42PM +0100, Niklas Haas wrote: > Hi all, > > for the past two months, I have been working on a prototype for a radical > redesign of the swscale internals, specifically the format handling layer. > This includes, or will eventually expand to include, all format input/output > and unscaled special conversion steps. > > I am not yet at a point where the new code can replace the scaling kernels, > but for the time being, we could start usaing it for the simple unscaled cases, > in theory, right away. > > Rather than repeating my entire design document here, I opted to collect my > notes into a design document on my WIP branch: > > https://github.com/haasn/FFmpeg/blob/swscale3/doc/swscale-v2.txt > > I have spent the past week or so ironing out the last kinks and extensively > benchmarking the new design at least on x86, and it is generally a roughly 1.9x > improvement over the existing unscaled special converters across the board, > before even adding any hand written ASM. (This speedup is *just* using the > less-than-optimal compiler output from my reference C code!) > > In some cases we even measure ~3-4x or even ~6x speedups, especially those > where swscale does not currently have hand written SIMD. Overall: > > cpu: 16-core AMD Ryzen Threadripper 1950X > gcc 14.2.1: > single thread: > Overall speedup=1.887x faster, min=0.250x max=22.578x > multi thread: > Overall speedup=1.657x faster, min=0.190x max=87.972x > > (The 0.2x slowdown cases are for rgb8/gbr8 input, which requires LUT support > for efficient decoding, but I wanted to focus on the core operations first > before worrying about adding LUT-based optimizations to the design) > > I am (almost) ready to begin moving forwards with this design, merging it into > swscale and using it at least for unscaled format conversions, XYZ decoding, > colorspace transformations (subsuming the existing, horribly unoptimized, > 3DLUT layer), gamma transformations, and so on. > > I wanted to post it here to gather some feedback on the approach. Where does > it fall on the "madness" scale? Is the new operations and optimizer design > comprehensible? Am I trying too hard to reinvent compilers? Are there any > platforms where the high number of function calls per frame would be > probitively expensive? What are the thoughts on the float-first approach? See > also the list of limitations and improvement ideas at the bottom of my design > document. I think a more float centric design probably makes sense. Floats make things nicer and cleaner It may be needed to support an integer only path for architectures that have a weak fpu. And also may be needed for some cases to get them bitexact AVFloating, a rational float type or AVRational64, both interresting. Do we have other places where either could be used ? thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Frequently ignored answer#1 FFmpeg bugs should be sent to our bugtracker. User questions about the command line tools should be sent to the ffmpeg-user ML. And questions about how to use libav* should be sent to the libav-user ML.