From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTP id 6D0A742FAE
	for <ffmpegdev@gitmailbox.com>; Mon, 14 Nov 2022 01:51:01 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 834B468BD2E;
	Mon, 14 Nov 2022 03:50:58 +0200 (EET)
Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com
 [209.85.216.53])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 695D568B1F1
 for <ffmpeg-devel@ffmpeg.org>; Mon, 14 Nov 2022 03:50:52 +0200 (EET)
Received: by mail-pj1-f53.google.com with SMTP id b11so9106716pjp.2
 for <ffmpeg-devel@ffmpeg.org>; Sun, 13 Nov 2022 17:50:52 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=to:subject:message-id:date:from:in-reply-to:references:mime-version
 :from:to:cc:subject:date:message-id:reply-to;
 bh=xdMUFXv+AAXzuqVB7TVo+ByC1s8lxYn1d7R65KZfP5c=;
 b=YC1649JNpiXS96BfMz3QAvpj1KNFUvwt+qPL/jaUhqVll75Q7ia+7tVKP+R3gDvC/M
 E4OP0Qs9JHJCpjTKRiKI+JelqAhOkX3PfCg9mmo7LBkUhGJkvXp20T3Nhsw/U5BHOqOS
 iHZ1/24lVYyEZM7MReF3mRDo8aABQVhdwcne0VeGBqHD6K0bB8dsPyDQdyMUQLeE98du
 bmxkZ2CeaXFkAtGajvV7+3P615PHPD+IvfYQPkS/6l0DXFUFZHrn07YnkRvsOkinv1tI
 ESBGSAmhPq902bHVXCqIc0U8K3tudzsrBEIR6OMGaKav6+w0e85c6yb+pKc9eGiumnVQ
 9nRg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=to:subject:message-id:date:from:in-reply-to:references:mime-version
 :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
 bh=xdMUFXv+AAXzuqVB7TVo+ByC1s8lxYn1d7R65KZfP5c=;
 b=SRjHtFipC4sGXVjSrtZiwk4XvQImIqZviCpO9Dw1Q15s90V/EBQJ2PYA9NXs3HDugG
 H29dqpe5wB4JZpaIpT8RnLOoQNPNOy/4i9sGofo8YhRNnW27oKbjA6F2Kg1slrqWA39x
 KPtAZWVLahAvvZ72Yui0Uf5iCrmuPbZdd/k7Z9NIMYZ0ZY7rnnkE6zFfzGlhScOiNCzz
 qRpjyvtgU3IOWQhYxe7jKTlpG7zaEUSVyLwY5n1j8MHdTvopMoiaKhUDsnmZnPguJlxR
 ecqJktwCvGRFE5pCB+e/2C59JmVafHggVRodXJ7Kjdtu8QZwh/ibbkEqCzX4H/WvWsVH
 kWuA==
X-Gm-Message-State: ANoB5pkHZajRWXlQoZiNR5VrxBK2asinp72gum+pK3fxKT8B+MuGHeaF
 haMz5aIi0m3PW6QjwLzqBXUWdI7VihO5OGMADBfrIrqI
X-Google-Smtp-Source: AA0mqf5VLCC+0YXQQwgY8cs6QYELXt+5xxqvyyN49GO3RRpb+x/XaPffOSc5RZAp0J/r+luaWujKzZaLV6KIQq3hQlw=
X-Received: by 2002:a17:90a:fe8b:b0:212:f169:140e with SMTP id
 co11-20020a17090afe8b00b00212f169140emr11469818pjb.215.1668390650044; Sun, 13
 Nov 2022 17:50:50 -0800 (PST)
MIME-Version: 1.0
References: <20221103040010.1134-1-mindmark@gmail.com>
 <20221103040010.1134-2-mindmark@gmail.com>
 <20221113212453.GF1814017@pb2>
In-Reply-To: <20221113212453.GF1814017@pb2>
From: Mark Reid <mindmark@gmail.com>
Date: Sun, 13 Nov 2022 17:50:37 -0800
Message-ID: <CA+anCRmx-hO38tJAmF84+6RbpZ9Vcm6tAMT4peP5TX7uEXLXqw@mail.gmail.com>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
Subject: Re: [FFmpeg-devel] [PATCH v3 1/4] swscale/input: add rgbaf32 input
 support
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/CA+anCRmx-hO38tJAmF84+6RbpZ9Vcm6tAMT4peP5TX7uEXLXqw@mail.gmail.com/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

On Sun, Nov 13, 2022 at 1:25 PM Michael Niedermayer <michael@niedermayer.cc>
wrote:

> On Wed, Nov 02, 2022 at 09:00:07PM -0700, mindmark@gmail.com wrote:
> > From: Mark Reid <mindmark@gmail.com>
> >
> > ---
> >  libswscale/input.c | 172 +++++++++++++++++++++++++++++++++++++++++++++
> >  libswscale/utils.c |   4 ++
> >  2 files changed, 176 insertions(+)
> >
> > diff --git a/libswscale/input.c b/libswscale/input.c
> > index 7ff7bfaa01..4683284b0b 100644
> > --- a/libswscale/input.c
> > +++ b/libswscale/input.c
> > @@ -1284,6 +1284,136 @@ static void rgbaf16##endian_name##ToA_c(uint8_t
> *_dst, const uint8_t *_src, cons
> >  rgbaf16_funcs_endian(le, 0)
> >  rgbaf16_funcs_endian(be, 1)
> >
> > +#define rdpx(src) (is_be ? av_int2float(AV_RB32(&src)):
> av_int2float(AV_RL32(&src)))
> > +
> > +static av_always_inline void rgbaf32ToUV_half_endian(uint16_t *dstU,
> uint16_t *dstV, int is_be,
> > +                                                     const float *src,
> int width,
> > +                                                     int32_t *rgb2yuv,
> int comp)
> > +{
> > +    int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu =
> rgb2yuv[BU_IDX];
> > +    int32_t rv = rgb2yuv[RV_IDX], gv = rgb2yuv[GV_IDX], bv =
> rgb2yuv[BV_IDX];
> > +    int i;
> > +    for (i = 0; i < width; i++) {
>
> > +        int r = (lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+0]),
> 0.0f, 65535.0f)) +
> > +                 lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+4]),
> 0.0f, 65535.0f))) >> 1;
> > +        int g = (lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+1]),
> 0.0f, 65535.0f)) +
> > +                 lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+5]),
> 0.0f, 65535.0f))) >> 1;
> > +        int b = (lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+2]),
> 0.0f, 65535.0f)) +
> > +                 lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+6]),
> 0.0f, 65535.0f))) >> 1;
> > +
> > +        dstU[i] = (ru*r + gu*g + bu*b + (0x10001<<(RGB2YUV_SHIFT-1)))
> >> RGB2YUV_SHIFT;
> > +        dstV[i] = (rv*r + gv*g + bv*b + (0x10001<<(RGB2YUV_SHIFT-1)))
> >> RGB2YUV_SHIFT;
>
> I would expect this sort of code to use 2 lrintf() and 2 av_clipf() not 6
>
>
ya it is a bit excessive, I'll just remove the _half conversions for now,
they aren't strictly necessary as far as I can tell.


>
> > +    }
> > +}
> > +
> > +static av_always_inline void rgbaf32ToUV_endian(uint16_t *dstU,
> uint16_t *dstV, int is_be,
> > +                                                const float *src, int
> width,
> > +                                                int32_t *rgb2yuv, int
> comp)
> > +{
> > +    int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu =
> rgb2yuv[BU_IDX];
> > +    int32_t rv = rgb2yuv[RV_IDX], gv = rgb2yuv[GV_IDX], bv =
> rgb2yuv[BV_IDX];
> > +    int i;
> > +    for (i = 0; i < width; i++) {
> > +        int r = lrintf(av_clipf(65535.0f * rdpx(src[i*comp+0]), 0.0f,
> 65535.0f));
> > +        int g = lrintf(av_clipf(65535.0f * rdpx(src[i*comp+1]), 0.0f,
> 65535.0f));
> > +        int b = lrintf(av_clipf(65535.0f * rdpx(src[i*comp+2]), 0.0f,
> 65535.0f));
> > +
> > +        dstU[i] = (ru*r + gu*g + bu*b + (0x10001<<(RGB2YUV_SHIFT-1)))
> >> RGB2YUV_SHIFT;
> > +        dstV[i] = (rv*r + gv*g + bv*b + (0x10001<<(RGB2YUV_SHIFT-1)))
> >> RGB2YUV_SHIFT;
> > +    }
> > +}
> > +
>
> > +static av_always_inline void rgbaf32ToY_endian(uint16_t *dst, const
> float *src, int is_be,
> > +                                               int width, int32_t
> *rgb2yuv, int comp)
> > +{
> > +    int32_t ry = rgb2yuv[RY_IDX], gy = rgb2yuv[GY_IDX], by =
> rgb2yuv[BY_IDX];
> > +    int i;
> > +    for (i = 0; i < width; i++) {
> > +        int r = lrintf(av_clipf(65535.0f * rdpx(src[i*comp+0]), 0.0f,
> 65535.0f));
> > +        int g = lrintf(av_clipf(65535.0f * rdpx(src[i*comp+1]), 0.0f,
> 65535.0f));
> > +        int b = lrintf(av_clipf(65535.0f * rdpx(src[i*comp+2]), 0.0f,
> 65535.0f));
> > +
>
> > +        dst[i] = (ry*r + gy*g + by*b + (0x2001<<(RGB2YUV_SHIFT-1))) >>
> RGB2YUV_SHIFT;
>
> there is one output so there should be only need for one clip and one
> float->int
>

This is matching the f32 planar version. I think I was paranoid about
things being bitexact for tests and that's why it's currently being done
this way.
I'll see what happens if I introduce more float operations, could I perhaps
do this in a later patch? some asm might have to change too.


> thx
>
> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Any man who breaks a law that conscience tells him is unjust and willingly
> accepts the penalty by staying in jail in order to arouse the conscience
> of
> the community on the injustice of the law is at that moment expressing the
> very highest respect for law. - Martin Luther King Jr
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".