From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTP id 4DDD6429D9
	for <ffmpegdev@gitmailbox.com>; Thu, 11 Aug 2022 00:15:08 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7DDBF68B8CC;
	Thu, 11 Aug 2022 03:15:06 +0300 (EEST)
Received: from mail-ua1-f48.google.com (mail-ua1-f48.google.com
 [209.85.222.48])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3643668B7C4
 for <ffmpeg-devel@ffmpeg.org>; Thu, 11 Aug 2022 03:15:00 +0300 (EEST)
Received: by mail-ua1-f48.google.com with SMTP id f10so6432848uap.2
 for <ffmpeg-devel@ffmpeg.org>; Wed, 10 Aug 2022 17:15:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :from:to:cc; bh=xhefhxm2reyF39u0lmngABD7eI5yngI0KLiqRe6TyJM=;
 b=mQoxF8xw1ZqdlLMXZSOF0xlW3WfmSeVVElkkqbaDpWWaUljj+xpiZ2NVLF61lFaRVt
 OeEyYwIoKp1dvPdRKDVjSNyJbFDATGFXDzHKhEGRRjdXiPRBvp7O1DQbRNTtyqgeM/C7
 JwJbFwRO0D2ah5aSw8fseYpI6fpe0+dxSojEaNxinTTCA80O1cGtd/wEwC8+GFWrhJ8y
 U5iekyGV6653Y6wJZ540YMwPHIZc7BCLhYUTmf2rcdFTJLe7nfxlnd1G3wELOAVzWr5Z
 08nEgFrq0WhGMHO4hJMHdkr+ryIHg8ilHRtziMFerHYIue3vZRgskeUlJHdfSCYJViSV
 UjAQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :x-gm-message-state:from:to:cc;
 bh=xhefhxm2reyF39u0lmngABD7eI5yngI0KLiqRe6TyJM=;
 b=RvAnZlPKsRQ/kLDP969aXCwlPAs3KAgR43A/npXoYFHfcOZbii5Y4QelcAnosjVb59
 3nqWJRRMx2RYmLSLCAg3XJOWeeb/CC1XllvtZJZ26lZUmiRwlzhxD1FiqLS3Ai2mWJlq
 Q9+MW7CQyiKMmerKZhV21a3UN/Zn6TsgfWqW97VIX5CtQAPcMFvBPWVd5ahI3eNY7W+C
 KyUjZBKQTEOH7Ml53QqMd+9rSR130SgtGPfAuroKs5WpsDusQnI5WLgOM4NVqSmG0Xq7
 cskETN16kDiqwnvSZUQKpnlhw4CwdVROOlnYSJxpWTM5YE1v3sc0TuosxenFvBrE0VKd
 jT0Q==
X-Gm-Message-State: ACgBeo1bkgfopoFpAHhYy4ciHpQVMNS9BMhM9BxtQL/aP3bI51J31m50
 /RrQAvtHQLJJFQVbnD+jJmzk2wopBVI=
X-Google-Smtp-Source: AA6agR6PANDsayJT0y5qI8dmDmfW3TW6hIUXKNGYrQn5IE0XwIcjYxvtC9QjNwEgk/5YxsNvznBgMA==
X-Received: by 2002:ab0:2b06:0:b0:384:c4af:107c with SMTP id
 e6-20020ab02b06000000b00384c4af107cmr12661031uar.77.1660176897818; 
 Wed, 10 Aug 2022 17:14:57 -0700 (PDT)
Received: from [192.168.0.11] ([186.136.131.204])
 by smtp.gmail.com with ESMTPSA id
 t24-20020a67c798000000b003884503a077sm726498vsk.25.2022.08.10.17.14.56
 for <ffmpeg-devel@ffmpeg.org>
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Wed, 10 Aug 2022 17:14:57 -0700 (PDT)
Message-ID: <314f36cb-5fe1-551b-81bc-b3b902dd6c79@gmail.com>
Date: Wed, 10 Aug 2022 21:14:56 -0300
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.12.0
Content-Language: en-US
To: ffmpeg-devel@ffmpeg.org
References: <20220810204712.3123-9-timo@rothenpieler.org>
 <20220810225154.8435-1-timo@rothenpieler.org>
From: James Almer <jamrial@gmail.com>
In-Reply-To: <20220810225154.8435-1-timo@rothenpieler.org>
Subject: Re: [FFmpeg-devel] [PATCH v2 09/11] avutil/half2float: use native
 _Float16 if available
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/314f36cb-5fe1-551b-81bc-b3b902dd6c79@gmail.com/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

On 8/10/2022 7:51 PM, Timo Rothenpieler wrote:
> _Float16 support was available on arm/aarch64 for a while, and with gcc
> 12 was enabled on x86 as long as SSE2 is supported.
> 
> If the target arch supports f16c, gcc emits fairly efficient assembly,
> taking advantage of it. This is the case on x86-64-v3 or higher.
> Same goes on arm, which has native float16 support.
> On x86, without f16c, it emulates it in software using sse2 instructions.
> 
> This has shown to perform rather poorly:
> 
> _Float16 full SSE2 emulation:
> frame=50074 fps=848 q=-0.0 size=N/A time=00:33:22.96 bitrate=N/A speed=33.9x
> 
> _Float16 f16c accelerated (Zen2, --cpu=znver2):
> frame=50636 fps=1965 q=-0.0 Lsize=N/A time=00:33:45.40 bitrate=N/A speed=78.6x
> 
> classic half2float full software implementation:
> frame=49926 fps=1605 q=-0.0 Lsize=N/A time=00:33:17.00 bitrate=N/A speed=64.2x
> 
> Hence an additional check was introduced, that only enables use of
> _Float16 on x86 if f16c is being utilized.
> 
> On aarch64, a similar uplift in performance is seen:
> 
> RPi4 half2float full software implementation:
> frame= 6088 fps=126 q=-0.0 Lsize=N/A time=00:04:03.48 bitrate=N/A speed=5.06x
> 
> RPi4 _Float16:
> frame= 6103 fps=158 q=-0.0 Lsize=N/A time=00:04:04.08 bitrate=N/A speed=6.32x
> 
> Since arm/aarch64 always natively support 16 bit floats, it can always
> be considered fast there.
> 
> I'm not aware of any additional platforms that currently support
> _Float16. And if there are, they should be considered non-fast until
> proven fast.
> ---
>   configure              | 13 +++++++++++++
>   libavutil/float2half.c |  2 ++
>   libavutil/float2half.h | 16 ++++++++++++++++
>   libavutil/half2float.c |  4 ++++
>   libavutil/half2float.h | 16 ++++++++++++++++
>   5 files changed, 51 insertions(+)
> 
> diff --git a/configure b/configure
> index 6761d0cb32..6ede9a5a8f 100755
> --- a/configure
> +++ b/configure
> @@ -2143,6 +2143,8 @@ ARCH_FEATURES="
>       fast_64bit
>       fast_clz
>       fast_cmov
> +    fast_float16
> +    float16

If HAVE_FLOAT16 is not going to be used, then don't export it here. 
Leave it as a configure internal variable.

>       local_aligned
>       simd_align_16
>       simd_align_32
> @@ -5125,6 +5127,8 @@ elif enabled arm; then
>               ;;
>       esac
>   
> +    test_cflags -mfp16-format=ieee && add_cflags -mfp16-format=ieee
> +
>   elif enabled avr32; then
>   
>       case $cpu in
> @@ -6229,6 +6233,15 @@ check_builtin sync_val_compare_and_swap "" "int *ptr; int oldval, newval; __sync
>   check_builtin gmtime_r time.h "time_t *time; struct tm *tm; gmtime_r(time, tm)"
>   check_builtin localtime_r time.h "time_t *time; struct tm *tm; localtime_r(time, tm)"
>   
> +check_builtin float16 "" "_Float16 f16var"
> +if enabled float16; then
> +    if enabled x86; then
> +        test_cpp_condition stddef.h "defined(__F16C__)" && enable fast_float16
> +    elif enabled arm || enabled aarch64; then
> +        enable fast_float16
> +    fi
> +fi
> +
>   case "$custom_allocator" in
>       jemalloc)
>           # jemalloc by default does not use a prefix
> diff --git a/libavutil/float2half.c b/libavutil/float2half.c
> index dba14cef5d..7002612194 100644
> --- a/libavutil/float2half.c
> +++ b/libavutil/float2half.c
> @@ -20,6 +20,7 @@
>   
>   void ff_init_float2half_tables(float2half_tables *t)
>   {
> +#if !HAVE_FAST_FLOAT16
>       for (int i = 0; i < 256; i++) {
>           int e = i - 127;
>   
> @@ -50,4 +51,5 @@ void ff_init_float2half_tables(float2half_tables *t)
>               t->shifttable[i|0x100] = 13;
>           }
>       }
> +#endif
>   }
> diff --git a/libavutil/float2half.h b/libavutil/float2half.h
> index b8c9cdfc4f..437666966b 100644
> --- a/libavutil/float2half.h
> +++ b/libavutil/float2half.h
> @@ -20,21 +20,37 @@
>   #define AVUTIL_FLOAT2HALF_H
>   
>   #include <stdint.h>
> +#include "intfloat.h"
> +
> +#include "config.h"
>   
>   typedef struct float2half_tables {
> +#if HAVE_FAST_FLOAT16
> +    uint8_t dummy;
> +#else
>       uint16_t basetable[512];
>       uint8_t shifttable[512];
> +#endif
>   } float2half_tables;
>   
>   void ff_init_float2half_tables(float2half_tables *t);
>   
>   static inline uint16_t float2half(uint32_t f, const float2half_tables *t)
>   {
> +#if HAVE_FAST_FLOAT16
> +    union {
> +        _Float16 f;
> +        uint16_t i;
> +    } u;
> +    u.f = av_int2float(f);
> +    return u.i;
> +#else
>       uint16_t h;
>   
>       h = t->basetable[(f >> 23) & 0x1ff] + ((f & 0x007fffff) >> t->shifttable[(f >> 23) & 0x1ff]);
>   
>       return h;
> +#endif
>   }
>   
>   #endif /* AVUTIL_FLOAT2HALF_H */
> diff --git a/libavutil/half2float.c b/libavutil/half2float.c
> index baac8e4093..ff198a8187 100644
> --- a/libavutil/half2float.c
> +++ b/libavutil/half2float.c
> @@ -18,6 +18,7 @@
>   
>   #include "libavutil/half2float.h"
>   
> +#if !HAVE_FAST_FLOAT16
>   static uint32_t convertmantissa(uint32_t i)
>   {
>       int32_t m = i << 13; // Zero pad mantissa bits
> @@ -33,9 +34,11 @@ static uint32_t convertmantissa(uint32_t i)
>   
>       return m | e; // Return combined number
>   }
> +#endif
>   
>   void ff_init_half2float_tables(half2float_tables *t)
>   {
> +#if !HAVE_FAST_FLOAT16
>       t->mantissatable[0] = 0;
>       for (int i = 1; i < 1024; i++)
>           t->mantissatable[i] = convertmantissa(i);
> @@ -60,4 +63,5 @@ void ff_init_half2float_tables(half2float_tables *t)
>       t->offsettable[31] = 2048;
>       t->offsettable[32] = 0;
>       t->offsettable[63] = 2048;
> +#endif
>   }
> diff --git a/libavutil/half2float.h b/libavutil/half2float.h
> index cb58e44a1c..57ee8372fe 100644
> --- a/libavutil/half2float.h
> +++ b/libavutil/half2float.h
> @@ -20,22 +20,38 @@
>   #define AVUTIL_HALF2FLOAT_H
>   
>   #include <stdint.h>
> +#include "intfloat.h"
> +
> +#include "config.h"
>   
>   typedef struct half2float_tables {
> +#if HAVE_FAST_FLOAT16
> +    uint8_t dummy;
> +#else
>       uint32_t mantissatable[3072];
>       uint32_t exponenttable[64];
>       uint16_t offsettable[64];
> +#endif
>   } half2float_tables;
>   
>   void ff_init_half2float_tables(half2float_tables *t);
>   
>   static inline uint32_t half2float(uint16_t h, const half2float_tables *t)
>   {
> +#if HAVE_FAST_FLOAT16
> +    union {
> +        _Float16 f;
> +        uint16_t i;
> +    } u;
> +    u.i = h;
> +    return av_float2int(u.f);
> +#else
>       uint32_t f;
>   
>       f = t->mantissatable[t->offsettable[h >> 10] + (h & 0x3ff)] + t->exponenttable[h >> 10];
>   
>       return f;
> +#endif
>   }
>   
>   #endif /* AVUTIL_HALF2FLOAT_H */
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".