From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 232B144CD2 for ; Mon, 14 Nov 2022 22:24:08 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8EFCD68BAF0; Tue, 15 Nov 2022 00:24:06 +0200 (EET) Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A10B16880B3 for ; Tue, 15 Nov 2022 00:23:59 +0200 (EET) Received: by mail-pf1-f172.google.com with SMTP id k22so12405038pfd.3 for ; Mon, 14 Nov 2022 14:23:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=GMQ74Yo1pfsDdClf+8H35AlUGFKMlLCwQ5rB3feSbeA=; b=kmTX3Pz1v0rWLqBtw8AkBzSkTdwuZja80pzLYqRb51xelzV8PTrh4MpOXGRdj8gSh2 0af2JooVQrH+ry4HbG0vzt6earWlOiOHgP1tFNtcqDxIMuF76iiIguOXCEDlzKJZI7Kw x0aaXfdTz4VUI2HKbHY8F4xDGsd0XlZ7n7KESkju1q1vQoTcBS2gVv7lwkn5mXwtP8nm n3HfnB+B5TrP+NLNnTDhWN9XYqzTROoFQ3UO6HQT3o5OPTtKOSzbypABU/SwSdPX0sAX 5inwFXaugL1w7//azguuLIEDq6fA4kcZzUn5sEcy4bCdyQKcb2F8tovyWSKFKKUGksqO 0k8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GMQ74Yo1pfsDdClf+8H35AlUGFKMlLCwQ5rB3feSbeA=; b=mnuH/1RJc1Aiv2ALK8hF8OqBVhpt4Bbu48WAbwlZ0kGy8yekM/6/b25D8wvQtHfLjl SO1/YoZNxZEczU2a6zhdsRVDyrawD9za7yhjJc4Er4qMCFpx6z+EfM9ZAB5tzN0MM2Ps nBnlMUD2Y7nh2q5m9kmbmGUHMSZlLDtIf3GePGw88FJPnPL3w6+7JaERv2VVzHQ2fiha laIdxepn7QqBOnCpd4NFG1rtC3x4H5gtA26BBBt85TpWv+DN8Rt0Ezx2w639+IKFsQ3W 041Jp/3twchgs2IQwAohBTBllk2Q7Mmn8enNZMxCD2sZzp8uOwtMlcEkMfoGguvThoLP MnTg== X-Gm-Message-State: ANoB5pn9apiyEU10AIhGDvCiGHLEXCOuS1CJSOURoazHvAxnjAtdlT9K PEETuSIFEmofN97NJutGJrPoOTmwnvHjddBeoflnxMrgE3Y= X-Google-Smtp-Source: AA0mqf4sIvfZg04Glu4x5qrf0g90BsKbD51kCPnnmKT6kPkCW988NQFTGZ8BMniTvvKdNt5M19iHHvOsEsrZzskntPg= X-Received: by 2002:a63:1457:0:b0:460:7078:dd7a with SMTP id 23-20020a631457000000b004607078dd7amr13251866pgu.286.1668464636890; Mon, 14 Nov 2022 14:23:56 -0800 (PST) MIME-Version: 1.0 References: <20221103040010.1134-1-mindmark@gmail.com> <20221103040010.1134-2-mindmark@gmail.com> <20221113212453.GF1814017@pb2> <20221114210753.GI1814017@pb2> In-Reply-To: <20221114210753.GI1814017@pb2> From: Mark Reid Date: Mon, 14 Nov 2022 14:23:43 -0800 Message-ID: To: FFmpeg development discussions and patches X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [FFmpeg-devel] [PATCH v3 1/4] swscale/input: add rgbaf32 input support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Mon, Nov 14, 2022 at 1:08 PM Michael Niedermayer wrote: > On Sun, Nov 13, 2022 at 05:50:37PM -0800, Mark Reid wrote: > > On Sun, Nov 13, 2022 at 1:25 PM Michael Niedermayer < > michael@niedermayer.cc> > > wrote: > > > > > On Wed, Nov 02, 2022 at 09:00:07PM -0700, mindmark@gmail.com wrote: > > > > From: Mark Reid > > > > > > > > --- > > > > libswscale/input.c | 172 > +++++++++++++++++++++++++++++++++++++++++++++ > > > > libswscale/utils.c | 4 ++ > > > > 2 files changed, 176 insertions(+) > > > > > > > > diff --git a/libswscale/input.c b/libswscale/input.c > > > > index 7ff7bfaa01..4683284b0b 100644 > > > > --- a/libswscale/input.c > > > > +++ b/libswscale/input.c > > > > @@ -1284,6 +1284,136 @@ static void > rgbaf16##endian_name##ToA_c(uint8_t > > > *_dst, const uint8_t *_src, cons > > > > rgbaf16_funcs_endian(le, 0) > > > > rgbaf16_funcs_endian(be, 1) > > > > > > > > +#define rdpx(src) (is_be ? av_int2float(AV_RB32(&src)): > > > av_int2float(AV_RL32(&src))) > > > > + > > > > +static av_always_inline void rgbaf32ToUV_half_endian(uint16_t *dstU, > > > uint16_t *dstV, int is_be, > > > > + const float > *src, > > > int width, > > > > + int32_t > *rgb2yuv, > > > int comp) > > > > +{ > > > > + int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu = > > > rgb2yuv[BU_IDX]; > > > > + int32_t rv = rgb2yuv[RV_IDX], gv = rgb2yuv[GV_IDX], bv = > > > rgb2yuv[BV_IDX]; > > > > + int i; > > > > + for (i = 0; i < width; i++) { > > > > > > > + int r = (lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+0]), > > > 0.0f, 65535.0f)) + > > > > + lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+4]), > > > 0.0f, 65535.0f))) >> 1; > > > > + int g = (lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+1]), > > > 0.0f, 65535.0f)) + > > > > + lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+5]), > > > 0.0f, 65535.0f))) >> 1; > > > > + int b = (lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+2]), > > > 0.0f, 65535.0f)) + > > > > + lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+6]), > > > 0.0f, 65535.0f))) >> 1; > > > > + > > > > + dstU[i] = (ru*r + gu*g + bu*b + > (0x10001<<(RGB2YUV_SHIFT-1))) > > > >> RGB2YUV_SHIFT; > > > > + dstV[i] = (rv*r + gv*g + bv*b + > (0x10001<<(RGB2YUV_SHIFT-1))) > > > >> RGB2YUV_SHIFT; > > > > > > I would expect this sort of code to use 2 lrintf() and 2 av_clipf() > not 6 > > > > > > > > ya it is a bit excessive, I'll just remove the _half conversions for now, > > they aren't strictly necessary as far as I can tell. > > do you see a problem with just factorizing them out ? > it shouldnt be hard to reorder the operations > It's just fate checksums and float math that make me apprehensive :p. hmm this code path doesn't actually seem to get tested by fate. Now that I relook at it, the indexing looks wrong for the 3 channel formats too. > > > > > > > > > > > > + } > > > > +} > > > > + > > > > +static av_always_inline void rgbaf32ToUV_endian(uint16_t *dstU, > > > uint16_t *dstV, int is_be, > > > > + const float *src, > int > > > width, > > > > + int32_t *rgb2yuv, > int > > > comp) > > > > +{ > > > > + int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu = > > > rgb2yuv[BU_IDX]; > > > > + int32_t rv = rgb2yuv[RV_IDX], gv = rgb2yuv[GV_IDX], bv = > > > rgb2yuv[BV_IDX]; > > > > + int i; > > > > + for (i = 0; i < width; i++) { > > > > + int r = lrintf(av_clipf(65535.0f * rdpx(src[i*comp+0]), > 0.0f, > > > 65535.0f)); > > > > + int g = lrintf(av_clipf(65535.0f * rdpx(src[i*comp+1]), > 0.0f, > > > 65535.0f)); > > > > + int b = lrintf(av_clipf(65535.0f * rdpx(src[i*comp+2]), > 0.0f, > > > 65535.0f)); > > > > + > > > > + dstU[i] = (ru*r + gu*g + bu*b + > (0x10001<<(RGB2YUV_SHIFT-1))) > > > >> RGB2YUV_SHIFT; > > > > + dstV[i] = (rv*r + gv*g + bv*b + > (0x10001<<(RGB2YUV_SHIFT-1))) > > > >> RGB2YUV_SHIFT; > > > > + } > > > > +} > > > > + > > > > > > > +static av_always_inline void rgbaf32ToY_endian(uint16_t *dst, const > > > float *src, int is_be, > > > > + int width, int32_t > > > *rgb2yuv, int comp) > > > > +{ > > > > + int32_t ry = rgb2yuv[RY_IDX], gy = rgb2yuv[GY_IDX], by = > > > rgb2yuv[BY_IDX]; > > > > + int i; > > > > + for (i = 0; i < width; i++) { > > > > + int r = lrintf(av_clipf(65535.0f * rdpx(src[i*comp+0]), > 0.0f, > > > 65535.0f)); > > > > + int g = lrintf(av_clipf(65535.0f * rdpx(src[i*comp+1]), > 0.0f, > > > 65535.0f)); > > > > + int b = lrintf(av_clipf(65535.0f * rdpx(src[i*comp+2]), > 0.0f, > > > 65535.0f)); > > > > + > > > > > > > + dst[i] = (ry*r + gy*g + by*b + (0x2001<<(RGB2YUV_SHIFT-1))) > >> > > > RGB2YUV_SHIFT; > > > > > > there is one output so there should be only need for one clip and one > > > float->int > > > > > > > This is matching the f32 planar version. I think I was paranoid about > > things being bitexact for tests and that's why it's currently being done > > this way. > > I'll see what happens if I introduce more float operations, could I > perhaps > > do this in a later patch? some asm might have to change too. > > of course can be a seperate patch in a set. Maybe f32 planar can be changed > at the same time > great, I'll do that change together in a later patch. > > thx > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > In a rich man's house there is no place to spit but his face. > -- Diogenes of Sinope > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".