From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTP id D044942E64
	for <ffmpegdev@gitmailbox.com>; Mon,  8 Aug 2022 19:39:57 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A0A0B68B77C;
	Mon,  8 Aug 2022 22:39:53 +0300 (EEST)
Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com
 [209.85.160.181])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2EC1068B6D1
 for <ffmpeg-devel@ffmpeg.org>; Mon,  8 Aug 2022 22:39:47 +0300 (EEST)
Received: by mail-qt1-f181.google.com with SMTP id e23so7303847qts.1
 for <ffmpeg-devel@ffmpeg.org>; Mon, 08 Aug 2022 12:39:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:from:to:cc;
 bh=Hu0HX81Pa1G5mlRw3MlBn505k1Hle/8fvbBjm4hjyUg=;
 b=giFbP4cFXTefa2JdKf4z34geOGNziR7YMKVO8M5SZ2o0bYq8P8/zbBkrwqDnvJFMgd
 AM1tlRwMMO4S4Q1S/rM5uj/a0oT5klgGPPsmlqMHLJk0xddkQW8MtuDs11C1lW4rdwMd
 KxyrTV0Sz0cERqFYJFuHN/DCfBTqUTlhHYuRCU18Ba5ilRu/8rqV5qvj40jOhstevaDh
 CvHJrgHCiylJYjeGhm0ve/K+JmOhFfuFgZ/lXE7AxqvNJc2rnXH3tiKGig9ybiSe7BAV
 gwz3f4K1c95qzA20Tq5xW30jiakWCo6BUM26n1cHV870FbLzQ5v2p5MsTzDwrEUkir0S
 a/vg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:x-gm-message-state:from:to:cc;
 bh=Hu0HX81Pa1G5mlRw3MlBn505k1Hle/8fvbBjm4hjyUg=;
 b=zmdpZ0BK5oSLuOUk1JpZd+brdjvdvC+p5RRGTqELJoU/ikF18L0702aOH9gt19YbD7
 IaDDptzb7TncatiT69uAk3I2WJegIU8oYrZ3p+I7gyrdElcfPIHUVUi9JNBCvVzm+tHI
 n6HWQFSrrKUDYnhSAtkicxmOVZvuKIcYRSkY3L/x6kQCoaRYc1CT5hSPps5qQ1J2UfZ3
 In/93/Tsjm9N5WKS1p7Aj2V0Av+9VX/DfPMgdIe3uQRSROMoOagWip0UnUoARATrffoZ
 JIwKQ3vnTddh3SRHa+TK1W2yjbCrc9LUFvo40GKdBYw9Dza5EWOxrgIhSJnfM2Y28wrr
 3w8g==
X-Gm-Message-State: ACgBeo1pRXuKQjLPXtHOS4ph7e6XL7ETK5LzyCWfsgqGmmudHRr2WWd2
 ek6V6ena2usp3KvwmA/RUupCn8JCv9+pVXn7MKuWfzHRDfg=
X-Google-Smtp-Source: AA6agR4mQcrj+WzSRIw5Hjc84/xdFqX9D5vMeL8thYlkKng2OlfAt0dxSevkKpzmrXDwWHIUBP+T7xSioHVurWRESgM=
X-Received: by 2002:ac8:7f04:0:b0:343:36d:9a1f with SMTP id
 f4-20020ac87f04000000b00343036d9a1fmr2383827qtk.566.1659987585267; Mon, 08
 Aug 2022 12:39:45 -0700 (PDT)
MIME-Version: 1.0
References: <20220808182358.24264-1-timo@rothenpieler.org>
In-Reply-To: <20220808182358.24264-1-timo@rothenpieler.org>
From: Mark Reid <mindmark@gmail.com>
Date: Mon, 8 Aug 2022 12:39:34 -0700
Message-ID: <CA+anCRmwy6C-SS+ynM593y8nMgditqRd61J6XEHa86m8fqhaYg@mail.gmail.com>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
Subject: Re: [FFmpeg-devel] [PATCH] swscale/input: add rgbaf16 input support
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Timo Rothenpieler <timo@rothenpieler.org>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/CA+anCRmwy6C-SS+ynM593y8nMgditqRd61J6XEHa86m8fqhaYg@mail.gmail.com/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

On Mon, Aug 8, 2022 at 11:24 AM Timo Rothenpieler <timo@rothenpieler.org>
wrote:

> This is by no means perfect, since at least ddagrab will return scRGB
> data with values outside of 0.0f to 1.0f for HDR values.
> Its primary purpose is to be able to work with the format at all.
>
> _Float16 support was available on arm/aarch64 for a while, and with gcc
> 12 was enabled on x86 as long as SSE2 is supported.
>
> If the target arch supports f16c, gcc emits fairly efficient assembly,
> taking advantage of it. This is the case on x86-64-v3 or higher.
> Without f16c, it emulates it in software using sse2 instructions.
> ---
>
> I am by no means certain this is the correct way to implement this.
> Tested it with ddagrab output in that format, and it looks like what I'd
> expect.
>
> Specially the order of arguments is a bit of a mystery. I'd have
> expected them to be in order of the planes, so for packed formats, only
> the first one would matter.
> But a bunch of other packed formats left the first src unused, and so I
> followed along, and it ended up working fine.
>
>
Have you looked at the exr decoder half2float.h? It already has f16 to f32
decoding functions.


>  configure            |  2 +
>  libswscale/input.c   | 95 ++++++++++++++++++++++++++++++++++++++++++++
>  libswscale/utils.c   |  3 ++
>  libswscale/version.h |  2 +-
>  4 files changed, 101 insertions(+), 1 deletion(-)
>
> diff --git a/configure b/configure
> index 6761d0cb32..d989498bba 100755
> --- a/configure
> +++ b/configure
> @@ -2143,6 +2143,7 @@ ARCH_FEATURES="
>      fast_64bit
>      fast_clz
>      fast_cmov
> +    float16
>      local_aligned
>      simd_align_16
>      simd_align_32
> @@ -6228,6 +6229,7 @@ check_builtin MemoryBarrier windows.h
> "MemoryBarrier()"
>  check_builtin sync_val_compare_and_swap "" "int *ptr; int oldval, newval;
> __sync_val_compare_and_swap(ptr, oldval, newval)"
>  check_builtin gmtime_r time.h "time_t *time; struct tm *tm;
> gmtime_r(time, tm)"
>  check_builtin localtime_r time.h "time_t *time; struct tm *tm;
> localtime_r(time, tm)"
> +check_builtin float16 "" "_Float16 f16var"
>
>  case "$custom_allocator" in
>      jemalloc)
> diff --git a/libswscale/input.c b/libswscale/input.c
> index 68abc4d62c..0b5bd952e8 100644
> --- a/libswscale/input.c
> +++ b/libswscale/input.c
> @@ -1111,6 +1111,89 @@ static void grayf32##endian_name##ToY16_c(uint8_t
> *dst, const uint8_t *src,
>  rgbf32_planar_funcs_endian(le, 0)
>  rgbf32_planar_funcs_endian(be, 1)
>
> +static void rgbaf16ToUV_half_c(uint8_t *_dstU, uint8_t *_dstV,
> +                               const uint8_t *unused0, const uint8_t
> *src1, const uint8_t *src2,
> +                               int width, uint32_t *_rgb2yuv)
> +{
> +#if HAVE_FLOAT16
> +    const _Float16 *src = (const _Float16*)src1;
> +    uint16_t *dstU = (uint16_t*)_dstU;
> +    uint16_t *dstV = (uint16_t*)_dstV;
> +    int32_t *rgb2yuv = (int32_t*)_rgb2yuv;
> +    int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu =
> rgb2yuv[BU_IDX];
> +    int32_t rv = rgb2yuv[RV_IDX], gv = rgb2yuv[GV_IDX], bv =
> rgb2yuv[BV_IDX];
> +    int i;
> +    av_assert1(src1==src2);
> +    for (i = 0; i < width; i++) {
> +        int r = (lrintf(av_clipf(65535.0f * src[i*8+0], 0.0f, 65535.0f)) +
> +                 lrintf(av_clipf(65535.0f * src[i*8+4], 0.0f, 65535.0f)))
> >> 1;
> +        int g = (lrintf(av_clipf(65535.0f * src[i*8+1], 0.0f, 65535.0f)) +
> +                 lrintf(av_clipf(65535.0f * src[i*8+5], 0.0f, 65535.0f)))
> >> 1;
> +        int b = (lrintf(av_clipf(65535.0f * src[i*8+2], 0.0f, 65535.0f)) +
> +                 lrintf(av_clipf(65535.0f * src[i*8+6], 0.0f, 65535.0f)))
> >> 1;
> +
> +        dstU[i] = (ru*r + gu*g + bu*b + (0x10001<<(RGB2YUV_SHIFT-1))) >>
> RGB2YUV_SHIFT;
> +        dstV[i] = (rv*r + gv*g + bv*b + (0x10001<<(RGB2YUV_SHIFT-1))) >>
> RGB2YUV_SHIFT;
> +    }
> +#endif
> +}
> +
> +static void rgbaf16ToUV_c(uint8_t *_dstU, uint8_t *_dstV,
> +                          const uint8_t *unused0, const uint8_t *src1,
> const uint8_t *src2,
> +                          int width, uint32_t *_rgb2yuv)
> +{
> +#if HAVE_FLOAT16
> +    const _Float16 *src = (const _Float16*)src1;
> +    uint16_t *dstU = (uint16_t*)_dstU;
> +    uint16_t *dstV = (uint16_t*)_dstV;
> +    int32_t *rgb2yuv = (int32_t*)_rgb2yuv;
> +    int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu =
> rgb2yuv[BU_IDX];
> +    int32_t rv = rgb2yuv[RV_IDX], gv = rgb2yuv[GV_IDX], bv =
> rgb2yuv[BV_IDX];
> +    int i;
> +    av_assert1(src1==src2);
> +    for (i = 0; i < width; i++) {
> +        int r = lrintf(av_clipf(65535.0f * src[i*4+0], 0.0f, 65535.0f));
> +        int g = lrintf(av_clipf(65535.0f * src[i*4+1], 0.0f, 65535.0f));
> +        int b = lrintf(av_clipf(65535.0f * src[i*4+2], 0.0f, 65535.0f));
> +
> +        dstU[i] = (ru*r + gu*g + bu*b + (0x10001<<(RGB2YUV_SHIFT-1))) >>
> RGB2YUV_SHIFT;
> +        dstV[i] = (rv*r + gv*g + bv*b + (0x10001<<(RGB2YUV_SHIFT-1))) >>
> RGB2YUV_SHIFT;
> +    }
> +#endif
> +}
> +
> +static void rgbaf16ToY_c(uint8_t *_dst, const uint8_t *_src, const
> uint8_t *unused0, const uint8_t *unused1,
> +                         int width, uint32_t *_rgb2yuv)
> +{
> +#if HAVE_FLOAT16
> +    const _Float16 *src = (const _Float16*)_src;
> +    uint16_t *dst = (uint16_t*)_dst;
> +    int32_t *rgb2yuv = (int32_t*)_rgb2yuv;
> +    int32_t ry = rgb2yuv[RY_IDX], gy = rgb2yuv[GY_IDX], by =
> rgb2yuv[BY_IDX];
> +    int i;
> +    for (i = 0; i < width; i++) {
> +        int r = lrintf(av_clipf(65535.0f * src[i*4+0], 0.0f, 65535.0f));
> +        int g = lrintf(av_clipf(65535.0f * src[i*4+1], 0.0f, 65535.0f));
> +        int b = lrintf(av_clipf(65535.0f * src[i*4+2], 0.0f, 65535.0f));
> +
> +        dst[i] = (ry*r + gy*g + by*b + (0x2001<<(RGB2YUV_SHIFT-1))) >>
> RGB2YUV_SHIFT;
> +    }
> +#endif
> +}
> +
> +static void rgbaf16ToA_c(uint8_t *_dst, const uint8_t *_src, const
> uint8_t *unused0, const uint8_t *unused1,
> +                         int width, uint32_t *unused2)
> +{
> +#if HAVE_FLOAT16
> +    const _Float16 *src = (const _Float16*)_src;
> +    uint16_t *dst = (uint16_t*)_dst;
> +    int i;
> +    for (i=0; i<width; i++) {
> +        dst[i] = lrintf(av_clipf(65535.0f * src[i*4+3], 0.0f, 65535.0f));
> +    }
> +#endif
> +}
> +
>  av_cold void ff_sws_init_input_funcs(SwsContext *c)
>  {
>      enum AVPixelFormat srcFormat = c->srcFormat;
> @@ -1375,6 +1458,9 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c)
>          case AV_PIX_FMT_X2BGR10LE:
>              c->chrToYV12 = bgr30leToUV_half_c;
>              break;
> +        case AV_PIX_FMT_RGBAF16:
> +            c->chrToYV12 = rgbaf16ToUV_half_c;
> +            break;
>          }
>      } else {
>          switch (srcFormat) {
> @@ -1462,6 +1548,9 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c)
>          case AV_PIX_FMT_X2BGR10LE:
>              c->chrToYV12 = bgr30leToUV_c;
>              break;
> +        case AV_PIX_FMT_RGBAF16:
> +            c->chrToYV12 = rgbaf16ToUV_c;
> +            break;
>          }
>      }
>
> @@ -1750,6 +1839,9 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c)
>      case AV_PIX_FMT_X2BGR10LE:
>          c->lumToYV12 = bgr30leToY_c;
>          break;
> +    case AV_PIX_FMT_RGBAF16:
> +        c->lumToYV12 = rgbaf16ToY_c;
> +        break;
>      }
>      if (c->needAlpha) {
>          if (is16BPS(srcFormat) || isNBPS(srcFormat)) {
> @@ -1769,6 +1861,9 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c)
>          case AV_PIX_FMT_ARGB:
>              c->alpToYV12 = abgrToA_c;
>              break;
> +        case AV_PIX_FMT_RGBAF16:
> +            c->alpToYV12 = rgbaf16ToA_c;
> +            break;
>          case AV_PIX_FMT_YA8:
>              c->alpToYV12 = uyvyToY_c;
>              break;
> diff --git a/libswscale/utils.c b/libswscale/utils.c
> index 34503e57f4..c5c22017ff 100644
> --- a/libswscale/utils.c
> +++ b/libswscale/utils.c
> @@ -259,6 +259,9 @@ static const FormatEntry format_entries[] = {
>      [AV_PIX_FMT_P416LE]      = { 1, 1 },
>      [AV_PIX_FMT_NV16]        = { 1, 1 },
>      [AV_PIX_FMT_VUYA]        = { 1, 1 },
> +#if HAVE_FLOAT16
> +    [AV_PIX_FMT_RGBAF16]     = { 1, 0 },
> +#endif
>  };
>
>  int ff_shuffle_filter_coefficients(SwsContext *c, int *filterPos,
> diff --git a/libswscale/version.h b/libswscale/version.h
> index 3193562d18..d8694bb5c0 100644
> --- a/libswscale/version.h
> +++ b/libswscale/version.h
> @@ -29,7 +29,7 @@
>  #include "version_major.h"
>
>  #define LIBSWSCALE_VERSION_MINOR   8
> -#define LIBSWSCALE_VERSION_MICRO 102
> +#define LIBSWSCALE_VERSION_MICRO 103
>
>  #define LIBSWSCALE_VERSION_INT  AV_VERSION_INT(LIBSWSCALE_VERSION_MAJOR, \
>                                                 LIBSWSCALE_VERSION_MINOR, \
> --
> 2.34.1
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".