Re: [FFmpeg-devel] [PATCH 09/11] avutil/half2float: use native _Float16 if available

From: James Almer <jamrial@gmail.com>
To: ffmpeg-devel@ffmpeg.org
Subject: Re: [FFmpeg-devel] [PATCH 09/11] avutil/half2float: use native _Float16 if available
Date: Wed, 10 Aug 2022 19:02:18 -0300
Message-ID: <36b75cba-40d2-1783-8aa8-e2987664f913@gmail.com> (raw)
In-Reply-To: <e14c11e9-87b8-89f7-543f-a7a1d78a53d8@rothenpieler.org>

On 8/10/2022 6:58 PM, Timo Rothenpieler wrote:
> On 10.08.2022 23:03, Andreas Rheinhardt wrote:
>> Timo Rothenpieler:
>>> _Float16 support was available on arm/aarch64 for a while, and with gcc
>>> 12 was enabled on x86 as long as SSE2 is supported.
>>>
>>> If the target arch supports f16c, gcc emits fairly efficient assembly,
>>> taking advantage of it. This is the case on x86-64-v3 or higher.
>>> Without f16c, it emulates it in software using sse2 instructions.
>>
>> How is the performance of this emulation compared to our current code?
>> And how is the native _Float16 performance compared to the current code?
> 
> The performance of the sse2 emulation is actually surprisingly poor, in 
> a quick test:
> 
> ./ffmpeg -s 512x512 -f rawvideo -pix_fmt rgbaf16 -i /dev/zero -vf 
> format=yuv444p -f null -
> 
> _Float16 full SSE2 emulation:
> frame=50074 fps=848 q=-0.0 size=N/A time=00:33:22.96 bitrate=N/A 
> speed=33.9x
> 
> _Float16 f16c accelerated (Zen2, --cpu=znver2):
> frame=50636 fps=1965 q=-0.0 Lsize=N/A time=00:33:45.40 bitrate=N/A 
> speed=78.6x
> 
> classic half2float full software implementation:
> frame=49926 fps=1605 q=-0.0 Lsize=N/A time=00:33:17.00 bitrate=N/A 
> speed=64.2x
> 
> Unfortunately I don't see a good way to runtime-detect the presence of 
> f16c without going full self-written assembly, which would diminish the 
> compilers ability to take advantage of f16c only ever operating on 4 or 
> 8 values at a time.
> But the HAVE_FLOAT16 checks could be paired with a check for __F16C__, 
> which seems to universally be the established define for "the code is 
> being built f16c optimizations".

That should do it, yes. We do check for __SSE__ and similar for some 
other lavu functions after all.

> 
> That at least avoids the case of the apparently quite slow sse2 emulation.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".