From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id D044942E64 for ; Mon, 8 Aug 2022 19:39:57 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A0A0B68B77C; Mon, 8 Aug 2022 22:39:53 +0300 (EEST) Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2EC1068B6D1 for ; Mon, 8 Aug 2022 22:39:47 +0300 (EEST) Received: by mail-qt1-f181.google.com with SMTP id e23so7303847qts.1 for ; Mon, 08 Aug 2022 12:39:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=Hu0HX81Pa1G5mlRw3MlBn505k1Hle/8fvbBjm4hjyUg=; b=giFbP4cFXTefa2JdKf4z34geOGNziR7YMKVO8M5SZ2o0bYq8P8/zbBkrwqDnvJFMgd AM1tlRwMMO4S4Q1S/rM5uj/a0oT5klgGPPsmlqMHLJk0xddkQW8MtuDs11C1lW4rdwMd KxyrTV0Sz0cERqFYJFuHN/DCfBTqUTlhHYuRCU18Ba5ilRu/8rqV5qvj40jOhstevaDh CvHJrgHCiylJYjeGhm0ve/K+JmOhFfuFgZ/lXE7AxqvNJc2rnXH3tiKGig9ybiSe7BAV gwz3f4K1c95qzA20Tq5xW30jiakWCo6BUM26n1cHV870FbLzQ5v2p5MsTzDwrEUkir0S a/vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=Hu0HX81Pa1G5mlRw3MlBn505k1Hle/8fvbBjm4hjyUg=; b=zmdpZ0BK5oSLuOUk1JpZd+brdjvdvC+p5RRGTqELJoU/ikF18L0702aOH9gt19YbD7 IaDDptzb7TncatiT69uAk3I2WJegIU8oYrZ3p+I7gyrdElcfPIHUVUi9JNBCvVzm+tHI n6HWQFSrrKUDYnhSAtkicxmOVZvuKIcYRSkY3L/x6kQCoaRYc1CT5hSPps5qQ1J2UfZ3 In/93/Tsjm9N5WKS1p7Aj2V0Av+9VX/DfPMgdIe3uQRSROMoOagWip0UnUoARATrffoZ JIwKQ3vnTddh3SRHa+TK1W2yjbCrc9LUFvo40GKdBYw9Dza5EWOxrgIhSJnfM2Y28wrr 3w8g== X-Gm-Message-State: ACgBeo1pRXuKQjLPXtHOS4ph7e6XL7ETK5LzyCWfsgqGmmudHRr2WWd2 ek6V6ena2usp3KvwmA/RUupCn8JCv9+pVXn7MKuWfzHRDfg= X-Google-Smtp-Source: AA6agR4mQcrj+WzSRIw5Hjc84/xdFqX9D5vMeL8thYlkKng2OlfAt0dxSevkKpzmrXDwWHIUBP+T7xSioHVurWRESgM= X-Received: by 2002:ac8:7f04:0:b0:343:36d:9a1f with SMTP id f4-20020ac87f04000000b00343036d9a1fmr2383827qtk.566.1659987585267; Mon, 08 Aug 2022 12:39:45 -0700 (PDT) MIME-Version: 1.0 References: <20220808182358.24264-1-timo@rothenpieler.org> In-Reply-To: <20220808182358.24264-1-timo@rothenpieler.org> From: Mark Reid Date: Mon, 8 Aug 2022 12:39:34 -0700 Message-ID: To: FFmpeg development discussions and patches X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [FFmpeg-devel] [PATCH] swscale/input: add rgbaf16 input support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Timo Rothenpieler Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Mon, Aug 8, 2022 at 11:24 AM Timo Rothenpieler wrote: > This is by no means perfect, since at least ddagrab will return scRGB > data with values outside of 0.0f to 1.0f for HDR values. > Its primary purpose is to be able to work with the format at all. > > _Float16 support was available on arm/aarch64 for a while, and with gcc > 12 was enabled on x86 as long as SSE2 is supported. > > If the target arch supports f16c, gcc emits fairly efficient assembly, > taking advantage of it. This is the case on x86-64-v3 or higher. > Without f16c, it emulates it in software using sse2 instructions. > --- > > I am by no means certain this is the correct way to implement this. > Tested it with ddagrab output in that format, and it looks like what I'd > expect. > > Specially the order of arguments is a bit of a mystery. I'd have > expected them to be in order of the planes, so for packed formats, only > the first one would matter. > But a bunch of other packed formats left the first src unused, and so I > followed along, and it ended up working fine. > > Have you looked at the exr decoder half2float.h? It already has f16 to f32 decoding functions. > configure | 2 + > libswscale/input.c | 95 ++++++++++++++++++++++++++++++++++++++++++++ > libswscale/utils.c | 3 ++ > libswscale/version.h | 2 +- > 4 files changed, 101 insertions(+), 1 deletion(-) > > diff --git a/configure b/configure > index 6761d0cb32..d989498bba 100755 > --- a/configure > +++ b/configure > @@ -2143,6 +2143,7 @@ ARCH_FEATURES=" > fast_64bit > fast_clz > fast_cmov > + float16 > local_aligned > simd_align_16 > simd_align_32 > @@ -6228,6 +6229,7 @@ check_builtin MemoryBarrier windows.h > "MemoryBarrier()" > check_builtin sync_val_compare_and_swap "" "int *ptr; int oldval, newval; > __sync_val_compare_and_swap(ptr, oldval, newval)" > check_builtin gmtime_r time.h "time_t *time; struct tm *tm; > gmtime_r(time, tm)" > check_builtin localtime_r time.h "time_t *time; struct tm *tm; > localtime_r(time, tm)" > +check_builtin float16 "" "_Float16 f16var" > > case "$custom_allocator" in > jemalloc) > diff --git a/libswscale/input.c b/libswscale/input.c > index 68abc4d62c..0b5bd952e8 100644 > --- a/libswscale/input.c > +++ b/libswscale/input.c > @@ -1111,6 +1111,89 @@ static void grayf32##endian_name##ToY16_c(uint8_t > *dst, const uint8_t *src, > rgbf32_planar_funcs_endian(le, 0) > rgbf32_planar_funcs_endian(be, 1) > > +static void rgbaf16ToUV_half_c(uint8_t *_dstU, uint8_t *_dstV, > + const uint8_t *unused0, const uint8_t > *src1, const uint8_t *src2, > + int width, uint32_t *_rgb2yuv) > +{ > +#if HAVE_FLOAT16 > + const _Float16 *src = (const _Float16*)src1; > + uint16_t *dstU = (uint16_t*)_dstU; > + uint16_t *dstV = (uint16_t*)_dstV; > + int32_t *rgb2yuv = (int32_t*)_rgb2yuv; > + int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu = > rgb2yuv[BU_IDX]; > + int32_t rv = rgb2yuv[RV_IDX], gv = rgb2yuv[GV_IDX], bv = > rgb2yuv[BV_IDX]; > + int i; > + av_assert1(src1==src2); > + for (i = 0; i < width; i++) { > + int r = (lrintf(av_clipf(65535.0f * src[i*8+0], 0.0f, 65535.0f)) + > + lrintf(av_clipf(65535.0f * src[i*8+4], 0.0f, 65535.0f))) > >> 1; > + int g = (lrintf(av_clipf(65535.0f * src[i*8+1], 0.0f, 65535.0f)) + > + lrintf(av_clipf(65535.0f * src[i*8+5], 0.0f, 65535.0f))) > >> 1; > + int b = (lrintf(av_clipf(65535.0f * src[i*8+2], 0.0f, 65535.0f)) + > + lrintf(av_clipf(65535.0f * src[i*8+6], 0.0f, 65535.0f))) > >> 1; > + > + dstU[i] = (ru*r + gu*g + bu*b + (0x10001<<(RGB2YUV_SHIFT-1))) >> > RGB2YUV_SHIFT; > + dstV[i] = (rv*r + gv*g + bv*b + (0x10001<<(RGB2YUV_SHIFT-1))) >> > RGB2YUV_SHIFT; > + } > +#endif > +} > + > +static void rgbaf16ToUV_c(uint8_t *_dstU, uint8_t *_dstV, > + const uint8_t *unused0, const uint8_t *src1, > const uint8_t *src2, > + int width, uint32_t *_rgb2yuv) > +{ > +#if HAVE_FLOAT16 > + const _Float16 *src = (const _Float16*)src1; > + uint16_t *dstU = (uint16_t*)_dstU; > + uint16_t *dstV = (uint16_t*)_dstV; > + int32_t *rgb2yuv = (int32_t*)_rgb2yuv; > + int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu = > rgb2yuv[BU_IDX]; > + int32_t rv = rgb2yuv[RV_IDX], gv = rgb2yuv[GV_IDX], bv = > rgb2yuv[BV_IDX]; > + int i; > + av_assert1(src1==src2); > + for (i = 0; i < width; i++) { > + int r = lrintf(av_clipf(65535.0f * src[i*4+0], 0.0f, 65535.0f)); > + int g = lrintf(av_clipf(65535.0f * src[i*4+1], 0.0f, 65535.0f)); > + int b = lrintf(av_clipf(65535.0f * src[i*4+2], 0.0f, 65535.0f)); > + > + dstU[i] = (ru*r + gu*g + bu*b + (0x10001<<(RGB2YUV_SHIFT-1))) >> > RGB2YUV_SHIFT; > + dstV[i] = (rv*r + gv*g + bv*b + (0x10001<<(RGB2YUV_SHIFT-1))) >> > RGB2YUV_SHIFT; > + } > +#endif > +} > + > +static void rgbaf16ToY_c(uint8_t *_dst, const uint8_t *_src, const > uint8_t *unused0, const uint8_t *unused1, > + int width, uint32_t *_rgb2yuv) > +{ > +#if HAVE_FLOAT16 > + const _Float16 *src = (const _Float16*)_src; > + uint16_t *dst = (uint16_t*)_dst; > + int32_t *rgb2yuv = (int32_t*)_rgb2yuv; > + int32_t ry = rgb2yuv[RY_IDX], gy = rgb2yuv[GY_IDX], by = > rgb2yuv[BY_IDX]; > + int i; > + for (i = 0; i < width; i++) { > + int r = lrintf(av_clipf(65535.0f * src[i*4+0], 0.0f, 65535.0f)); > + int g = lrintf(av_clipf(65535.0f * src[i*4+1], 0.0f, 65535.0f)); > + int b = lrintf(av_clipf(65535.0f * src[i*4+2], 0.0f, 65535.0f)); > + > + dst[i] = (ry*r + gy*g + by*b + (0x2001<<(RGB2YUV_SHIFT-1))) >> > RGB2YUV_SHIFT; > + } > +#endif > +} > + > +static void rgbaf16ToA_c(uint8_t *_dst, const uint8_t *_src, const > uint8_t *unused0, const uint8_t *unused1, > + int width, uint32_t *unused2) > +{ > +#if HAVE_FLOAT16 > + const _Float16 *src = (const _Float16*)_src; > + uint16_t *dst = (uint16_t*)_dst; > + int i; > + for (i=0; i + dst[i] = lrintf(av_clipf(65535.0f * src[i*4+3], 0.0f, 65535.0f)); > + } > +#endif > +} > + > av_cold void ff_sws_init_input_funcs(SwsContext *c) > { > enum AVPixelFormat srcFormat = c->srcFormat; > @@ -1375,6 +1458,9 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c) > case AV_PIX_FMT_X2BGR10LE: > c->chrToYV12 = bgr30leToUV_half_c; > break; > + case AV_PIX_FMT_RGBAF16: > + c->chrToYV12 = rgbaf16ToUV_half_c; > + break; > } > } else { > switch (srcFormat) { > @@ -1462,6 +1548,9 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c) > case AV_PIX_FMT_X2BGR10LE: > c->chrToYV12 = bgr30leToUV_c; > break; > + case AV_PIX_FMT_RGBAF16: > + c->chrToYV12 = rgbaf16ToUV_c; > + break; > } > } > > @@ -1750,6 +1839,9 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c) > case AV_PIX_FMT_X2BGR10LE: > c->lumToYV12 = bgr30leToY_c; > break; > + case AV_PIX_FMT_RGBAF16: > + c->lumToYV12 = rgbaf16ToY_c; > + break; > } > if (c->needAlpha) { > if (is16BPS(srcFormat) || isNBPS(srcFormat)) { > @@ -1769,6 +1861,9 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c) > case AV_PIX_FMT_ARGB: > c->alpToYV12 = abgrToA_c; > break; > + case AV_PIX_FMT_RGBAF16: > + c->alpToYV12 = rgbaf16ToA_c; > + break; > case AV_PIX_FMT_YA8: > c->alpToYV12 = uyvyToY_c; > break; > diff --git a/libswscale/utils.c b/libswscale/utils.c > index 34503e57f4..c5c22017ff 100644 > --- a/libswscale/utils.c > +++ b/libswscale/utils.c > @@ -259,6 +259,9 @@ static const FormatEntry format_entries[] = { > [AV_PIX_FMT_P416LE] = { 1, 1 }, > [AV_PIX_FMT_NV16] = { 1, 1 }, > [AV_PIX_FMT_VUYA] = { 1, 1 }, > +#if HAVE_FLOAT16 > + [AV_PIX_FMT_RGBAF16] = { 1, 0 }, > +#endif > }; > > int ff_shuffle_filter_coefficients(SwsContext *c, int *filterPos, > diff --git a/libswscale/version.h b/libswscale/version.h > index 3193562d18..d8694bb5c0 100644 > --- a/libswscale/version.h > +++ b/libswscale/version.h > @@ -29,7 +29,7 @@ > #include "version_major.h" > > #define LIBSWSCALE_VERSION_MINOR 8 > -#define LIBSWSCALE_VERSION_MICRO 102 > +#define LIBSWSCALE_VERSION_MICRO 103 > > #define LIBSWSCALE_VERSION_INT AV_VERSION_INT(LIBSWSCALE_VERSION_MAJOR, \ > LIBSWSCALE_VERSION_MINOR, \ > -- > 2.34.1 > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".