From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 92922433EF for ; Wed, 10 Aug 2022 21:04:37 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6E1B168B898; Thu, 11 Aug 2022 00:04:34 +0300 (EEST) Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-oln040092073102.outbound.protection.outlook.com [40.92.73.102]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 567D668B6F2 for ; Thu, 11 Aug 2022 00:04:27 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ayX6F1FdXFvxfaz16EwE+UBnqe93rN+XlBKCHwCnDEBbjRiZ21X9gG42sFcgcu7v03TwG9C/e91ksw/Zu+CEu0nyI1mBGQ0pto2u+5cCk6IRH9r+S6f/oTRXE1bxVg9fhLpoOi8L2K1VUBq7eFXo+V2t5U+UAO/RPxzLuBB4IFupxfaVbWHr3pX9t2q9J5KcafwN+/OcxP51KQOEgA1u6CRnDsOg8jiRSt6vJJT0xw+WG8+sNfCjGcljN4VhCQC7Xnf9bb6X0uNP6HnNhqWRYqOGk9+BgbjQ6lSgA9AezVVNJSLw9sXRqKMNCX/gpo+T/lzQI8xex+/tHeYxcM4jHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=oaR3GzAmw3cZHmOjGqxHyKCXnBKcOXcxoSqw2KPHQXo=; b=gpAGXdOPVJhiWRBw0V+s1tKlEz/z6qbLMp7jUpEdp3GtiylHh/lvBTlXEwoR0SRAQ1yulq/DilR6LTMjo4MqnBw1Lgzmbks2jbou4mA1ZOlVlkzLSuqx8gDFX4qRuWs2SPc+o4WtP79GY5N3kjqbS6w9vtfxSL1odTz8zwenYOlNqjBXyh29nX4IX5z3chT6ngi00wnJzQUwZMeaprF2k73zr6LatXM32ZGeQdRGKG+O38cZf/ms2vFzWFJwRrH9KshmqdeapUUEVMH1r+dkmSkxwyY8hurM2m8kaaSONczngbAKWBC5Hj7Oxh1S/PMn9juyl72U2jX1XsehtsI9oQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=oaR3GzAmw3cZHmOjGqxHyKCXnBKcOXcxoSqw2KPHQXo=; b=Vg11E/Ip/S6L4vBl0UXqeiY1SJpwtAe/0ErQXvW9WtBcSB3q/pef54RHdz9OrsXMEGsHm7Fo6vypTKC5wKoWLjLeLMqCOnc5+WQ2jVg8lPQh6nwvqMtDEUSgzGDXLfbkckDIW1irxEHB9anvLMVOdmeatkFwdW30xuKSDGBJSxOsCiewAJpSskwmllZyUbrskvW8VY5Zd2VseciOEkYnPG4yXnn+7m7qEgUys6Op8XT4D75bgRyNhBL4aM34eG9Y81pYqDROqi+i1GLkkHVdtaGra5QsBA+4DBL4CKsm9+mfWTlDrEIo+6fxFOlUjWWzdv6Qtr5x+K3HNI9lzZEX+w== Received: from DB6PR0101MB2214.eurprd01.prod.exchangelabs.com (2603:10a6:4:42::27) by AS1PR01MB9298.eurprd01.prod.exchangelabs.com (2603:10a6:20b:4d5::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5504.21; Wed, 10 Aug 2022 21:04:25 +0000 Received: from DB6PR0101MB2214.eurprd01.prod.exchangelabs.com ([fe80::210e:b627:bcc9:8c46]) by DB6PR0101MB2214.eurprd01.prod.exchangelabs.com ([fe80::210e:b627:bcc9:8c46%11]) with mapi id 15.20.5504.020; Wed, 10 Aug 2022 21:04:25 +0000 Message-ID: Date: Wed, 10 Aug 2022 23:03:45 +0200 Content-Language: en-US To: ffmpeg-devel@ffmpeg.org References: <20220810204712.3123-1-timo@rothenpieler.org> <20220810204712.3123-9-timo@rothenpieler.org> From: Andreas Rheinhardt In-Reply-To: <20220810204712.3123-9-timo@rothenpieler.org> X-TMN: [4gbGOTdUGMPHXmPJq4S5l2sNGHPVhL3E] X-ClientProxiedBy: ZR0P278CA0191.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:44::10) To DB6PR0101MB2214.eurprd01.prod.exchangelabs.com (2603:10a6:4:42::27) X-Microsoft-Original-Message-ID: <05242965-ead6-6bd2-d778-0d712c93ca76@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 284b2528-c846-4639-f04e-08da7b13e7f4 X-MS-TrafficTypeDiagnostic: AS1PR01MB9298:EE_ X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: /UxFu4kQRsJpudSN1jSx2TFIKIMPcI3QNWFdCgpX/ldahpepE+HpAZnWpdVQKI71BCyKWs//iTpVjr8NiV51A+PkTSIaxvAsrh4cR0hkepW7ckLYd0lx3o291QA018UQ4ZCgTuzS06TuCpfn+qvrNM4ASMYwhIoc4NurO2xyIYIBjb7BnfOKav4jooxMofzq+JIFl5vKIJYg/jqRsBjVD7UG14v5KOS5xIdqKzf2+KAwCn0XOIyrlRCCvjk+oR7z46NKMOu3Z9YxUGQZfU8XGNj60tZtecf5PoCE9i+Xs30sSkMwPpdX96k+jByyei0rsIabk8X1iTNTROYjBFdLbBFsQv5FkF2hI5WEuPEsRaohCkWF8umJ0oz2meUc+tPWmokWogeqyiYghA4Z2ZyyTd+G2zYxDZhpVekjsWsdx9nY+EXepx2QRZn7KABbFt0r+UNxnuJXzqlYwa45SaZ3uYDiGXG+cSZSCvGZ4Xq7TXuFQmRt0dW7QWvD3iGtge09vNQTXvreWvLAStAvpbW7VyKqgJwxfvNWQ8Guu/C9z0sO5etJukaZ51iUlqscLfmsC0dOx8hID9xAlKDvbdbj9Dc83oontSV+RaH31Zo8vw71hOI51RsaUd/jr+f7HT0yi9MEIojwnOCaMPLd3+l3oXpB+UycHlYKUHqfpc6rKcdCLmsm2xrg2t96aE5w5A/F X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?NEZHT2ltTGRqa1pOK2JwWDJQcnpzWlZ5T0FSQmI0eHluR0RQQnRPMVZnMW1p?= =?utf-8?B?Nkx6LytUNWJmeWhhc29Lczkvd1lXSmJhbkxXTUpCMVExSmxZS2hpUUtOZG1u?= =?utf-8?B?eExqNGpsZngzWEdZeVRFSWxSTHloTXJ6U3lsMXlvVmZ2NW52VmpVV1p3b09i?= =?utf-8?B?VDA0dXcyTDFYQW5JcUlCbDNyNm9YTmI5OXlsRUlQZ1VFSkdsTmxmTU5YSGRT?= =?utf-8?B?cjlZdW50TWtkNXJ1TkNtbTJVbU5ETmNEYVFQeVUxdzQrcnEvck1XRGEvSjlD?= =?utf-8?B?MXhmZTRnM2FEblNWQUZQNHhHazMyQUZhQ0dPRWRBWnAyS3V2T2JUV0Q2RmEr?= =?utf-8?B?QXVLdTJqckZaMUNzOGkwZTdJM2o4SmJyWExaalBEZzQxS25hQ2hTT3dlWXpr?= =?utf-8?B?OUpFZTJFdE5FZ25jK1hNcG1GUExlaE9JYlFzRys1Z2NXMkRLUVFaYmFYVVN2?= =?utf-8?B?d2djNHczVUNxWVFkcjNRb0NuYVQvbUFSdjdzM2twaDJlclJlQkpFRGZ3TVJu?= =?utf-8?B?WDBJbGtIMEdSZkdVUUZUWCtZbDJBWUVVaHhhR29abWJVcXFvQlpqd2grbEw5?= =?utf-8?B?b2x5QW9xUGJZUHNQeVpNcWlwYWR4d1ByOHk2SlFWWE52S3RtRTFsTGhQdm5R?= =?utf-8?B?Sm9VV2V4dVRPU3dweUZvZFpYcTlsSndCY2loZWRHSnJrZ2Y1MFh6T3NEL2sy?= =?utf-8?B?U0xhUExkbkp5ZXZFeTUwSjQxMU1nSWt4c2VqR0Z2L3lENHYxY1N2bm9ZZ01K?= =?utf-8?B?RkdGanZMdEIvbVFEaWNRSVJGU2dPaktvVGRSc0tpaUtRNC9GazRNN1JhOCtD?= =?utf-8?B?NHRGWUs1eGJodnpldVpoUUFoTy95N0ZySWlibWNxa1crU1Nlbk9wNTlrUkZa?= =?utf-8?B?Y1NaSTZJdjBIS1ltUHhYejJXaDdLYnBTTUU5NElXNHBIamJWZkxRY0JYZjEv?= =?utf-8?B?d1hFTHVWemtpQkFXZUwvenptdU9BeGduZkZNMnNuMVJ6bzRPMUpyMm9oaXJz?= =?utf-8?B?ektGcjcwVjBXOGlPNlJwVGpPeXU4VVFoT2hUZ1lTVGxRZTFnMjZJS2RZMEt6?= =?utf-8?B?K0VFV2JPSHFIdG95V2Y2ZXh3K2Y0ZmJUYTJmZThDenVYc2lTNmVzUHhZcmxv?= =?utf-8?B?T20xMzNTa1dLUkUvcldMREd2bnZiOFc1MUxYeWt0eUN0ZGxUeTk4NmM4WGNl?= =?utf-8?B?T2JuQzBIYTBpcERJdjhMSnFMc3pHSFVPL05rcEVPcWlFMEo5MVdodDdsS0dF?= =?utf-8?B?dCs5TXIwZ3JyZmdSQytJVEpZY3FsekJQVFVlbmdGUUo5REdxR3VtL0tSMjUw?= =?utf-8?B?VG9ENnQrZThWSnE5M3p5ZGRsK01aS2gxbjRoWThDejZzYmFvTGtBKzl3REkv?= =?utf-8?B?UE5HMjNLeDhRTU1NTmhKN2RhdFFxcHJic21ZdlFnWlNHMnV4M05tU09Ja1oz?= =?utf-8?B?T3poSTVWMlVIVGRqZWtiRTUzRS92K01oL3Z6dVZuU05hZjFJb2JpSzRTWXd2?= =?utf-8?B?OUhTR2hxNXk4czBjRXB6M01GSW9qOEdLSGlTNld4ZFEycXNXUmZ2Z1YzbUZZ?= =?utf-8?B?QXRIZ2o0T09yNjlQS0pCT1pVY3pOQTBmclNLc1VYTWdjeE8yMnNPS3N5WUFG?= =?utf-8?B?eUZXOWkrNGZVa3dqZERRUWZydy94Z1VaWGFkZWFTT3BMaTV2MHkzREo1WTlC?= =?utf-8?B?cEFLelFINS9yVGgrUlExNlJMSXpwK3BZWlNjVnFqMDFML2FvNlFDMmRhaVcv?= =?utf-8?Q?bLWRr2Fc+pFSdPS6lI=3D?= X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 284b2528-c846-4639-f04e-08da7b13e7f4 X-MS-Exchange-CrossTenant-AuthSource: DB6PR0101MB2214.eurprd01.prod.exchangelabs.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Aug 2022 21:04:25.5655 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS1PR01MB9298 Subject: Re: [FFmpeg-devel] [PATCH 09/11] avutil/half2float: use native _Float16 if available X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Timo Rothenpieler: > _Float16 support was available on arm/aarch64 for a while, and with gcc > 12 was enabled on x86 as long as SSE2 is supported. > > If the target arch supports f16c, gcc emits fairly efficient assembly, > taking advantage of it. This is the case on x86-64-v3 or higher. > Without f16c, it emulates it in software using sse2 instructions. How is the performance of this emulation compared to our current code? And how is the native _Float16 performance compared to the current code? > --- > configure | 4 ++++ > libavutil/float2half.c | 2 ++ > libavutil/float2half.h | 16 ++++++++++++++++ > libavutil/half2float.c | 4 ++++ > libavutil/half2float.h | 16 ++++++++++++++++ > 5 files changed, 42 insertions(+) > > diff --git a/configure b/configure > index 6761d0cb32..2536ae012d 100755 > --- a/configure > +++ b/configure > @@ -2143,6 +2143,7 @@ ARCH_FEATURES=" > fast_64bit > fast_clz > fast_cmov > + float16 > local_aligned > simd_align_16 > simd_align_32 > @@ -5125,6 +5126,8 @@ elif enabled arm; then > ;; > esac > > + test_cflags -mfp16-format=ieee && add_cflags -mfp16-format=ieee > + > elif enabled avr32; then > > case $cpu in > @@ -6228,6 +6231,7 @@ check_builtin MemoryBarrier windows.h "MemoryBarrier()" > check_builtin sync_val_compare_and_swap "" "int *ptr; int oldval, newval; __sync_val_compare_and_swap(ptr, oldval, newval)" > check_builtin gmtime_r time.h "time_t *time; struct tm *tm; gmtime_r(time, tm)" > check_builtin localtime_r time.h "time_t *time; struct tm *tm; localtime_r(time, tm)" > +check_builtin float16 "" "_Float16 f16var" > > case "$custom_allocator" in > jemalloc) > diff --git a/libavutil/float2half.c b/libavutil/float2half.c > index dba14cef5d..1390d3acc0 100644 > --- a/libavutil/float2half.c > +++ b/libavutil/float2half.c > @@ -20,6 +20,7 @@ > > void ff_init_float2half_tables(float2half_tables *t) > { > +#if !HAVE_FLOAT16 > for (int i = 0; i < 256; i++) { > int e = i - 127; > > @@ -50,4 +51,5 @@ void ff_init_float2half_tables(float2half_tables *t) > t->shifttable[i|0x100] = 13; > } > } > +#endif > } > diff --git a/libavutil/float2half.h b/libavutil/float2half.h > index b8c9cdfc4f..8c1fb804b7 100644 > --- a/libavutil/float2half.h > +++ b/libavutil/float2half.h > @@ -20,21 +20,37 @@ > #define AVUTIL_FLOAT2HALF_H > > #include > +#include "intfloat.h" > + > +#include "config.h" > > typedef struct float2half_tables { > +#if HAVE_FLOAT16 > + uint8_t dummy; > +#else > uint16_t basetable[512]; > uint8_t shifttable[512]; > +#endif > } float2half_tables; > > void ff_init_float2half_tables(float2half_tables *t); > > static inline uint16_t float2half(uint32_t f, const float2half_tables *t) > { > +#if HAVE_FLOAT16 > + union { > + _Float16 f; > + uint16_t i; > + } u; > + u.f = av_int2float(f); > + return u.i; > +#else > uint16_t h; > > h = t->basetable[(f >> 23) & 0x1ff] + ((f & 0x007fffff) >> t->shifttable[(f >> 23) & 0x1ff]); > > return h; > +#endif > } > > #endif /* AVUTIL_FLOAT2HALF_H */ > diff --git a/libavutil/half2float.c b/libavutil/half2float.c > index baac8e4093..873226d3a0 100644 > --- a/libavutil/half2float.c > +++ b/libavutil/half2float.c > @@ -18,6 +18,7 @@ > > #include "libavutil/half2float.h" > > +#if !HAVE_FLOAT16 > static uint32_t convertmantissa(uint32_t i) > { > int32_t m = i << 13; // Zero pad mantissa bits > @@ -33,9 +34,11 @@ static uint32_t convertmantissa(uint32_t i) > > return m | e; // Return combined number > } > +#endif > > void ff_init_half2float_tables(half2float_tables *t) > { > +#if !HAVE_FLOAT16 > t->mantissatable[0] = 0; > for (int i = 1; i < 1024; i++) > t->mantissatable[i] = convertmantissa(i); > @@ -60,4 +63,5 @@ void ff_init_half2float_tables(half2float_tables *t) > t->offsettable[31] = 2048; > t->offsettable[32] = 0; > t->offsettable[63] = 2048; > +#endif > } > diff --git a/libavutil/half2float.h b/libavutil/half2float.h > index cb58e44a1c..b2a7c934a6 100644 > --- a/libavutil/half2float.h > +++ b/libavutil/half2float.h > @@ -20,22 +20,38 @@ > #define AVUTIL_HALF2FLOAT_H > > #include > +#include "intfloat.h" > + > +#include "config.h" > > typedef struct half2float_tables { > +#if HAVE_FLOAT16 > + uint8_t dummy; > +#else > uint32_t mantissatable[3072]; > uint32_t exponenttable[64]; > uint16_t offsettable[64]; > +#endif > } half2float_tables; > > void ff_init_half2float_tables(half2float_tables *t); > > static inline uint32_t half2float(uint16_t h, const half2float_tables *t) > { > +#if HAVE_FLOAT16 > + union { > + _Float16 f; > + uint16_t i; > + } u; > + u.i = h; > + return av_float2int(u.f); > +#else > uint32_t f; > > f = t->mantissatable[t->offsettable[h >> 10] + (h & 0x3ff)] + t->exponenttable[h >> 10]; > > return f; > +#endif > } > > #endif /* AVUTIL_HALF2FLOAT_H */ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".