From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 758CC433EA for ; Wed, 10 Aug 2022 22:02:35 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E5D1F68B8DD; Thu, 11 Aug 2022 01:02:28 +0300 (EEST) Received: from mail-ua1-f51.google.com (mail-ua1-f51.google.com [209.85.222.51]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 94E3068B697 for ; Thu, 11 Aug 2022 01:02:22 +0300 (EEST) Received: by mail-ua1-f51.google.com with SMTP id c19so6325636uat.6 for ; Wed, 10 Aug 2022 15:02:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc; bh=v6Hp4kEGCu7uFGuqR47mwlmJHwuR1quVGB+sjqSimgE=; b=T2mdzZN/24Oe1qDmt7SQXX5JZNKni+sZphS9Ps/HWwIaKGjGsEddZLGXx9AJ6/HzY6 IHL2xy3UjBt06N5C4HD5vc9IcC3m+RwNElYBCcbIDjtbupWx1qST/degBkhsJh/5iVYF me9cocOiyTvCRTUq9tgbCjdbR2reAxqlsuHKJqLXFHA6wPP1B7w+9ipGDWAEjJtVs4UK alY3Fc8VqNp7wffvdV5/xV8FtBVULmcfOwXkTcSx1CUlNOgjW/zThVBlrjBfW34bvvNs Gy/4GVSmsnh32EeTiV8uMbRYnjTeGTV0EJcaajKfVemyACVXiXZs80RzACHFKacbbqIS 6BZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc; bh=v6Hp4kEGCu7uFGuqR47mwlmJHwuR1quVGB+sjqSimgE=; b=UjfLn2KVm+3q0pf0HOS9SRIg6K9gjKi1U45RNE4XN3zxtzxlu5yk3pl6c6M6rZvqiL J0HW0atXT4s72WUXI6xQ6M604jpjVfoiJ1PgKW/+Bo+iMwRBzXg8b6FDnOqy2Y0d7AGk Z4FTMWvTbl+5i57zvIFqXrzXhk4VZBbPm6QPovifkUqG5NO2ArerVuGLjx0YwdmDq7DL /XuzoJo644nmD6ZnzZonMI5Na4xk+nMappQpv/3ve3Qxx4vaU9e1P195cqInS8JKV78b MvJlVGyaAjt93ZYoiUcaJz7k1SsqlFbogve2dvoTDFDRNWtex4oCGXQP5eQmwgjV0/wu 9aOQ== X-Gm-Message-State: ACgBeo1+KjbOmkfiBToM+ZbxaB4lSyB6ij3eM4867jt41f0TMT6KdKjm o/Twp5kCK7Tn3m7QocwSmdhCAki91kQ= X-Google-Smtp-Source: AA6agR7ivYEoyRyPv0qHhQbTUxufay3ntp4oZ6ZdNZ+Bwz0Zze3kPWu6u1tzqts2/bBeqzYQuznuuA== X-Received: by 2002:ab0:6890:0:b0:385:7893:d6dc with SMTP id t16-20020ab06890000000b003857893d6dcmr12412970uar.54.1660168940329; Wed, 10 Aug 2022 15:02:20 -0700 (PDT) Received: from [192.168.0.11] ([186.136.131.204]) by smtp.gmail.com with ESMTPSA id a67-20020a676646000000b0035a1e404cc0sm563918vsc.34.2022.08.10.15.02.19 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 10 Aug 2022 15:02:19 -0700 (PDT) Message-ID: <36b75cba-40d2-1783-8aa8-e2987664f913@gmail.com> Date: Wed, 10 Aug 2022 19:02:18 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.12.0 Content-Language: en-US To: ffmpeg-devel@ffmpeg.org References: <20220810204712.3123-1-timo@rothenpieler.org> <20220810204712.3123-9-timo@rothenpieler.org> From: James Almer In-Reply-To: Subject: Re: [FFmpeg-devel] [PATCH 09/11] avutil/half2float: use native _Float16 if available X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On 8/10/2022 6:58 PM, Timo Rothenpieler wrote: > On 10.08.2022 23:03, Andreas Rheinhardt wrote: >> Timo Rothenpieler: >>> _Float16 support was available on arm/aarch64 for a while, and with gcc >>> 12 was enabled on x86 as long as SSE2 is supported. >>> >>> If the target arch supports f16c, gcc emits fairly efficient assembly, >>> taking advantage of it. This is the case on x86-64-v3 or higher. >>> Without f16c, it emulates it in software using sse2 instructions. >> >> How is the performance of this emulation compared to our current code? >> And how is the native _Float16 performance compared to the current code? > > The performance of the sse2 emulation is actually surprisingly poor, in > a quick test: > > ./ffmpeg -s 512x512 -f rawvideo -pix_fmt rgbaf16 -i /dev/zero -vf > format=yuv444p -f null - > > _Float16 full SSE2 emulation: > frame=50074 fps=848 q=-0.0 size=N/A time=00:33:22.96 bitrate=N/A > speed=33.9x > > _Float16 f16c accelerated (Zen2, --cpu=znver2): > frame=50636 fps=1965 q=-0.0 Lsize=N/A time=00:33:45.40 bitrate=N/A > speed=78.6x > > classic half2float full software implementation: > frame=49926 fps=1605 q=-0.0 Lsize=N/A time=00:33:17.00 bitrate=N/A > speed=64.2x > > Unfortunately I don't see a good way to runtime-detect the presence of > f16c without going full self-written assembly, which would diminish the > compilers ability to take advantage of f16c only ever operating on 4 or > 8 values at a time. > But the HAVE_FLOAT16 checks could be paired with a check for __F16C__, > which seems to universally be the established define for "the code is > being built f16c optimizations". That should do it, yes. We do check for __SSE__ and similar for some other lavu functions after all. > > That at least avoids the case of the apparently quite slow sse2 emulation. > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".