From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 6975F433A2 for ; Mon, 8 Aug 2022 22:37:17 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4D26968B72B; Tue, 9 Aug 2022 01:37:16 +0300 (EEST) Received: from btbn.de (btbn.de [136.243.74.85]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 976DA68B6E9 for ; Tue, 9 Aug 2022 01:37:10 +0300 (EEST) Received: from [authenticated] by btbn.de (Postfix) with ESMTPSA id 4C03535B9B6 for ; Tue, 9 Aug 2022 00:37:10 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rothenpieler.org; s=mail; t=1659998230; bh=zciG2bxu1gR71ewbts/hzgdjcXDcynTxsrCcEHg6/l8=; h=Date:Subject:To:References:From:In-Reply-To; b=V9SkB5V1tZIoG/nhQXX5GodmxtIL++BY5wP3RboxPmtgR/aftIERc+j+26sJJ5sOB 5Dk4yMLkqv4hVQrtoZ9DH7nAAuI24oh6BrQymYiDpju0AVMLYF24Xdzj54LJI/wuBQ 6b2Qw62Opa6/CgKbjgLUHZYdUtNAF4BaBLTCkSUfIRnOJQVBQlw1GBCoXyZa8ozboe fLRQAYeLSVg/2/pjGy0k9tdujFdIrC4aVzZc5qPe8zhEnJjm5KDMGMdGLkDoGhhgEv HRI5YPOqrc9FyqYZN1o2mMP7ghVE91JA1cH89TOv5z0jNVxIuT2Dnhh/imzkEJT45X Ya+64ihmNkS9w== Message-ID: <2dac775f-5076-14c2-8afd-a50bbef5531f@rothenpieler.org> Date: Tue, 9 Aug 2022 00:37:09 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.1.1 Content-Language: en-US To: ffmpeg-devel@ffmpeg.org References: <20220808182358.24264-1-timo@rothenpieler.org> <7a75c699-5050-534f-d7e9-127207b66d59@rothenpieler.org> From: Timo Rothenpieler In-Reply-To: Subject: Re: [FFmpeg-devel] [PATCH] swscale/input: add rgbaf16 input support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On 09.08.2022 00:07, Mark Reid wrote: > On Mon, Aug 8, 2022 at 1:59 PM Timo Rothenpieler > wrote: > >> On 08.08.2022 21:39, Mark Reid wrote: >>> On Mon, Aug 8, 2022 at 11:24 AM Timo Rothenpieler >> >>> wrote: >>> >>>> This is by no means perfect, since at least ddagrab will return scRGB >>>> data with values outside of 0.0f to 1.0f for HDR values. >>>> Its primary purpose is to be able to work with the format at all. >>>> >>>> _Float16 support was available on arm/aarch64 for a while, and with gcc >>>> 12 was enabled on x86 as long as SSE2 is supported. >>>> >>>> If the target arch supports f16c, gcc emits fairly efficient assembly, >>>> taking advantage of it. This is the case on x86-64-v3 or higher. >>>> Without f16c, it emulates it in software using sse2 instructions. >>>> --- >>>> >>>> I am by no means certain this is the correct way to implement this. >>>> Tested it with ddagrab output in that format, and it looks like what I'd >>>> expect. >>>> >>>> Specially the order of arguments is a bit of a mystery. I'd have >>>> expected them to be in order of the planes, so for packed formats, only >>>> the first one would matter. >>>> But a bunch of other packed formats left the first src unused, and so I >>>> followed along, and it ended up working fine. >>>> >>>> >>> Have you looked at the exr decoder half2float.h? It already has f16 to >> f32 >>> decoding functions. >>> >> >> For performance, using the compilers native, and potentially hardware >> accelerated, support is probably preferable. >> Though as a no-float16-fallback it's probably not too horrible. >> Just not sure if it's worth the extra effort, given that by the time >> this sees any use at all, gcc 12 will be very common. >> >> Might even think about _Float16 support for exr in that case. >> Would be an interesting benchmark. >> > > Having the fallback will likely be required to have this patch accepted, > also this will need fate tests. > > +static void rgbaf16ToUV_half_c(uint8_t *_dstU, uint8_t *_dstV, >> + const uint8_t *unused0, const uint8_t >> *src1, const uint8_t *src2, >> + int width, uint32_t *_rgb2yuv) >> +{ >> +#if HAVE_FLOAT16 >> + const _Float16 *src = (const _Float16*)src1; >> + uint16_t *dstU = (uint16_t*)_dstU; >> + uint16_t *dstV = (uint16_t*)_dstV; >> + int32_t *rgb2yuv = (int32_t*)_rgb2yuv; >> + int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu = >> rgb2yuv[BU_IDX]; >> + int32_t rv = rgb2yuv[RV_IDX], gv = rgb2yuv[GV_IDX], bv = >> rgb2yuv[BV_IDX]; >> + int i; >> + av_assert1(src1==src2); >> + for (i = 0; i < width; i++) { >> + int r = (lrintf(av_clipf(65535.0f * src[i*8+0], 0.0f, 65535.0f)) + >> + lrintf(av_clipf(65535.0f * src[i*8+4], 0.0f, 65535.0f))) >>>> 1; >> + int g = (lrintf(av_clipf(65535.0f * src[i*8+1], 0.0f, 65535.0f)) + >> + lrintf(av_clipf(65535.0f * src[i*8+5], 0.0f, 65535.0f))) >>>> 1; >> + int b = (lrintf(av_clipf(65535.0f * src[i*8+2], 0.0f, 65535.0f)) + >> + lrintf(av_clipf(65535.0f * src[i*8+6], 0.0f, 65535.0f))) >>>> 1; >> + >> + dstU[i] = (ru*r + gu*g + bu*b + (0x10001<<(RGB2YUV_SHIFT-1))) >> >> RGB2YUV_SHIFT; >> + dstV[i] = (rv*r + gv*g + bv*b + (0x10001<<(RGB2YUV_SHIFT-1))) >> >> RGB2YUV_SHIFT; >> + } >> +#endif >> +} > > > IF defining out the core of the function is not the best approach here, > specifically for platforms without HAVE_FLOAT16. > I would probably try and put the accelerated half2float conversion in > half2float.h and move that header to libavutil instead. The entire support for the format is removed from swscale in this case, so the function ending up empty doesn't matter. I'll see if it can be added to half2float, but I can't even tell if it implements ieee floats, or something else. One issue is that SIMD acceleration for half to single operation operates on either 4 or 8 values in parallel. That doesn't work with how half2float.h is right now set up. For one, it's always exactly one value, and then it's also taking in and returning integers. Looking at the current two consumers, it might be possible to make them take advantage of the SIMD version. They seem to operate on blocks of data most of the time. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".