From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 4FB0A43D94 for ; Wed, 10 Aug 2022 21:58:53 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 960C268B7C4; Thu, 11 Aug 2022 00:58:51 +0300 (EEST) Received: from btbn.de (btbn.de [136.243.74.85]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4728A68B1E1 for ; Thu, 11 Aug 2022 00:58:45 +0300 (EEST) Received: from [authenticated] by btbn.de (Postfix) with ESMTPSA id EE5D32F4D7A for ; Wed, 10 Aug 2022 23:58:44 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rothenpieler.org; s=mail; t=1660168724; bh=SQJGA7qjkDkErl8331GbYGUlcchorm0v5EGHYW0IB08=; h=Date:Subject:To:References:From:In-Reply-To; b=B0NgVD6BlNj6xiMSi031RLbCUPq61CzoLbfTIRTXgP2o76LUb1Hxl8xuKAt8BeC/j oHIZn5hfMcwv9bWGDw+LMbAFFh2ZO0RvKbHAnni3yxpxbsdPm4wrtB2g7Wj1ihZPJa uXtF2GUyRIqiKwvAmDGtEmSXrbMOzH2kyWW6bWAAkmIDZwco1wkTkBStk0Sa2hlNGL M3KdFh2D4lTFb7Cy+1nVQZOj8XeFWiiZ/yGvhKdkbQTHW18a1DTUgvyfCfX6rXdLtv cY3rjgqereL6Wi1vGFK+RAXqa965ipF18RzSl4zmDBjyxtxLHXALFJHwf5hMOYg+qs agYnLO0TRgeIA== Message-ID: Date: Wed, 10 Aug 2022 23:58:44 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.1.2 Content-Language: en-US To: ffmpeg-devel@ffmpeg.org References: <20220810204712.3123-1-timo@rothenpieler.org> <20220810204712.3123-9-timo@rothenpieler.org> From: Timo Rothenpieler In-Reply-To: Subject: Re: [FFmpeg-devel] [PATCH 09/11] avutil/half2float: use native _Float16 if available X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On 10.08.2022 23:03, Andreas Rheinhardt wrote: > Timo Rothenpieler: >> _Float16 support was available on arm/aarch64 for a while, and with gcc >> 12 was enabled on x86 as long as SSE2 is supported. >> >> If the target arch supports f16c, gcc emits fairly efficient assembly, >> taking advantage of it. This is the case on x86-64-v3 or higher. >> Without f16c, it emulates it in software using sse2 instructions. > > How is the performance of this emulation compared to our current code? > And how is the native _Float16 performance compared to the current code? The performance of the sse2 emulation is actually surprisingly poor, in a quick test: ./ffmpeg -s 512x512 -f rawvideo -pix_fmt rgbaf16 -i /dev/zero -vf format=yuv444p -f null - _Float16 full SSE2 emulation: frame=50074 fps=848 q=-0.0 size=N/A time=00:33:22.96 bitrate=N/A speed=33.9x _Float16 f16c accelerated (Zen2, --cpu=znver2): frame=50636 fps=1965 q=-0.0 Lsize=N/A time=00:33:45.40 bitrate=N/A speed=78.6x classic half2float full software implementation: frame=49926 fps=1605 q=-0.0 Lsize=N/A time=00:33:17.00 bitrate=N/A speed=64.2x Unfortunately I don't see a good way to runtime-detect the presence of f16c without going full self-written assembly, which would diminish the compilers ability to take advantage of f16c only ever operating on 4 or 8 values at a time. But the HAVE_FLOAT16 checks could be paired with a check for __F16C__, which seems to universally be the established define for "the code is being built f16c optimizations". That at least avoids the case of the apparently quite slow sse2 emulation. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".