From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTP id 39F7742E40
	for <ffmpegdev@gitmailbox.com>; Mon,  8 Aug 2022 22:07:26 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6E5EB68B760;
	Tue,  9 Aug 2022 01:07:23 +0300 (EEST)
Received: from mail-qk1-f180.google.com (mail-qk1-f180.google.com
 [209.85.222.180])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 33BF968B53E
 for <ffmpeg-devel@ffmpeg.org>; Tue,  9 Aug 2022 01:07:17 +0300 (EEST)
Received: by mail-qk1-f180.google.com with SMTP id w6so7528689qkf.3
 for <ffmpeg-devel@ffmpeg.org>; Mon, 08 Aug 2022 15:07:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=to:subject:message-id:date:from:in-reply-to:references:mime-version
 :from:to:cc; bh=2d2NeSkYIPkmKNUmnRW9JZja1DWWMA3ALdw1U9LuQXc=;
 b=k827/LL+JA4h7hPTqFsP9RLxNiM2w3iaO2CppzzwyqE8pE+RaEx1LvConws7YE70tR
 Ga1+hsGAm3XqeNU38Y0jSYLXHjJwcDoAJUHLgsrO8GCGGf35sIZYmnAJbEKQlu7fKuKX
 VaitNMFCJbHPv/YegBEnkZQOxqxWjOla4JhCBVDkWx4FeIVMCbgW85CQuST4Vn7kQCaw
 gu58rB4ESOVhJHPyRr70+unnffKa9wEgej9//ha6Bd6aUmPk8DqwZY94YxYOHWGHo5xi
 l4aPXLkpQ3Nlqsct+f9zgyzcPMvHb99oTO5en8GsoWdbzYLdAYpCEOzA66OYEhK+BeZV
 qQkQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=to:subject:message-id:date:from:in-reply-to:references:mime-version
 :x-gm-message-state:from:to:cc;
 bh=2d2NeSkYIPkmKNUmnRW9JZja1DWWMA3ALdw1U9LuQXc=;
 b=H/m94orki0XEwImEBSy9e0MWJpJTN32oXjEYypAAgkrj1ZKIF0eRczd/T3w3ziWJZl
 VJy2TCxL1iZteNSfSNvBk8l74BPkFk5hxsDnuY54XYZQK5PbAKpCRL5CMXdsgBQlpnHx
 RDb+sSzvF9gVgRCqlfKY21xmMr97QA3QShoHH1PUBs62mpehuGjGGwApRGQJL/9uWoHL
 eH+FHOjH31eDFffJ/+xoMxfkhOQFo0cBz6u5jr37GYTbhyvmqHrqh+JjppRfJeMHGcKn
 LJwX3HEbLhYMEDALpjIfKUrH7teivX/AHJYPKFY2IDOyHq0y7mCzq2ORqx94zLtEK2vq
 S8Zw==
X-Gm-Message-State: ACgBeo0xNn9p2QYNKaJOWivjaz/TH1dgR1w2grTxLpSi1LkcjJ4xtc/x
 i81D+EYDLu3/pNK2jAUSOP39F97yzKAL8rlmX5KW9hwtxjs=
X-Google-Smtp-Source: AA6agR5TZ95lR+g4fLGXzKgqQYO1PYPUaXnkTiYfNyXZek5/+bVXRzTQlZTFSlOBe502FmHXQL5z0+RB3C/VPEKixVo=
X-Received: by 2002:a05:620a:40c5:b0:6b6:c5:581b with SMTP id
 g5-20020a05620a40c500b006b600c5581bmr15100573qko.742.1659996434825; Mon, 08
 Aug 2022 15:07:14 -0700 (PDT)
MIME-Version: 1.0
References: <20220808182358.24264-1-timo@rothenpieler.org>
 <CA+anCRmwy6C-SS+ynM593y8nMgditqRd61J6XEHa86m8fqhaYg@mail.gmail.com>
 <7a75c699-5050-534f-d7e9-127207b66d59@rothenpieler.org>
In-Reply-To: <7a75c699-5050-534f-d7e9-127207b66d59@rothenpieler.org>
From: Mark Reid <mindmark@gmail.com>
Date: Mon, 8 Aug 2022 15:07:03 -0700
Message-ID: <CA+anCRme61uOiqqDeo5uv6BRuw-hjYzv6z4n-LG9Hms37cWRSg@mail.gmail.com>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
Subject: Re: [FFmpeg-devel] [PATCH] swscale/input: add rgbaf16 input support
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/CA+anCRme61uOiqqDeo5uv6BRuw-hjYzv6z4n-LG9Hms37cWRSg@mail.gmail.com/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

On Mon, Aug 8, 2022 at 1:59 PM Timo Rothenpieler <timo@rothenpieler.org>
wrote:

> On 08.08.2022 21:39, Mark Reid wrote:
> > On Mon, Aug 8, 2022 at 11:24 AM Timo Rothenpieler <timo@rothenpieler.org
> >
> > wrote:
> >
> >> This is by no means perfect, since at least ddagrab will return scRGB
> >> data with values outside of 0.0f to 1.0f for HDR values.
> >> Its primary purpose is to be able to work with the format at all.
> >>
> >> _Float16 support was available on arm/aarch64 for a while, and with gcc
> >> 12 was enabled on x86 as long as SSE2 is supported.
> >>
> >> If the target arch supports f16c, gcc emits fairly efficient assembly,
> >> taking advantage of it. This is the case on x86-64-v3 or higher.
> >> Without f16c, it emulates it in software using sse2 instructions.
> >> ---
> >>
> >> I am by no means certain this is the correct way to implement this.
> >> Tested it with ddagrab output in that format, and it looks like what I'd
> >> expect.
> >>
> >> Specially the order of arguments is a bit of a mystery. I'd have
> >> expected them to be in order of the planes, so for packed formats, only
> >> the first one would matter.
> >> But a bunch of other packed formats left the first src unused, and so I
> >> followed along, and it ended up working fine.
> >>
> >>
> > Have you looked at the exr decoder half2float.h? It already has f16 to
> f32
> > decoding functions.
> >
>
> For performance, using the compilers native, and potentially hardware
> accelerated, support is probably preferable.
> Though as a no-float16-fallback it's probably not too horrible.
> Just not sure if it's worth the extra effort, given that by the time
> this sees any use at all, gcc 12 will be very common.
>
> Might even think about _Float16 support for exr in that case.
> Would be an interesting benchmark.
>

Having the fallback will likely be required to have this patch accepted,
also this will need fate tests.

+static void rgbaf16ToUV_half_c(uint8_t *_dstU, uint8_t *_dstV,
> +                               const uint8_t *unused0, const uint8_t
> *src1, const uint8_t *src2,
> +                               int width, uint32_t *_rgb2yuv)
> +{
> +#if HAVE_FLOAT16
> +    const _Float16 *src = (const _Float16*)src1;
> +    uint16_t *dstU = (uint16_t*)_dstU;
> +    uint16_t *dstV = (uint16_t*)_dstV;
> +    int32_t *rgb2yuv = (int32_t*)_rgb2yuv;
> +    int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu =
> rgb2yuv[BU_IDX];
> +    int32_t rv = rgb2yuv[RV_IDX], gv = rgb2yuv[GV_IDX], bv =
> rgb2yuv[BV_IDX];
> +    int i;
> +    av_assert1(src1==src2);
> +    for (i = 0; i < width; i++) {
> +        int r = (lrintf(av_clipf(65535.0f * src[i*8+0], 0.0f, 65535.0f)) +
> +                 lrintf(av_clipf(65535.0f * src[i*8+4], 0.0f, 65535.0f)))
> >> 1;
> +        int g = (lrintf(av_clipf(65535.0f * src[i*8+1], 0.0f, 65535.0f)) +
> +                 lrintf(av_clipf(65535.0f * src[i*8+5], 0.0f, 65535.0f)))
> >> 1;
> +        int b = (lrintf(av_clipf(65535.0f * src[i*8+2], 0.0f, 65535.0f)) +
> +                 lrintf(av_clipf(65535.0f * src[i*8+6], 0.0f, 65535.0f)))
> >> 1;
> +
> +        dstU[i] = (ru*r + gu*g + bu*b + (0x10001<<(RGB2YUV_SHIFT-1))) >>
> RGB2YUV_SHIFT;
> +        dstV[i] = (rv*r + gv*g + bv*b + (0x10001<<(RGB2YUV_SHIFT-1))) >>
> RGB2YUV_SHIFT;
> +    }
> +#endif
> +}


IF defining out the core of the function is not the best approach here,
specifically for platforms without HAVE_FLOAT16.
I would probably try and put the accelerated half2float conversion in
half2float.h and move that header to libavutil instead.


> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".