From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id CC2E344CC6 for ; Mon, 14 Nov 2022 21:08:04 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4F4EC68BB41; Mon, 14 Nov 2022 23:08:01 +0200 (EET) Received: from relay7-d.mail.gandi.net (relay7-d.mail.gandi.net [217.70.183.200]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CEB6668B96D for ; Mon, 14 Nov 2022 23:07:54 +0200 (EET) Received: (Authenticated sender: michael@niedermayer.cc) by mail.gandi.net (Postfix) with ESMTPSA id 073A520002 for ; Mon, 14 Nov 2022 21:07:53 +0000 (UTC) Date: Mon, 14 Nov 2022 22:07:53 +0100 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20221114210753.GI1814017@pb2> References: <20221103040010.1134-1-mindmark@gmail.com> <20221103040010.1134-2-mindmark@gmail.com> <20221113212453.GF1814017@pb2> MIME-Version: 1.0 In-Reply-To: Subject: Re: [FFmpeg-devel] [PATCH v3 1/4] swscale/input: add rgbaf32 input support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: multipart/mixed; boundary="===============6515138070383462294==" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --===============6515138070383462294== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="cjXvCArabh/jFWdZ" Content-Disposition: inline --cjXvCArabh/jFWdZ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Nov 13, 2022 at 05:50:37PM -0800, Mark Reid wrote: > On Sun, Nov 13, 2022 at 1:25 PM Michael Niedermayer > wrote: >=20 > > On Wed, Nov 02, 2022 at 09:00:07PM -0700, mindmark@gmail.com wrote: > > > From: Mark Reid > > > > > > --- > > > libswscale/input.c | 172 +++++++++++++++++++++++++++++++++++++++++++= ++ > > > libswscale/utils.c | 4 ++ > > > 2 files changed, 176 insertions(+) > > > > > > diff --git a/libswscale/input.c b/libswscale/input.c > > > index 7ff7bfaa01..4683284b0b 100644 > > > --- a/libswscale/input.c > > > +++ b/libswscale/input.c > > > @@ -1284,6 +1284,136 @@ static void rgbaf16##endian_name##ToA_c(uint8= _t > > *_dst, const uint8_t *_src, cons > > > rgbaf16_funcs_endian(le, 0) > > > rgbaf16_funcs_endian(be, 1) > > > > > > +#define rdpx(src) (is_be ? av_int2float(AV_RB32(&src)): > > av_int2float(AV_RL32(&src))) > > > + > > > +static av_always_inline void rgbaf32ToUV_half_endian(uint16_t *dstU, > > uint16_t *dstV, int is_be, > > > + const float *sr= c, > > int width, > > > + int32_t *rgb2yu= v, > > int comp) > > > +{ > > > + int32_t ru =3D rgb2yuv[RU_IDX], gu =3D rgb2yuv[GU_IDX], bu =3D > > rgb2yuv[BU_IDX]; > > > + int32_t rv =3D rgb2yuv[RV_IDX], gv =3D rgb2yuv[GV_IDX], bv =3D > > rgb2yuv[BV_IDX]; > > > + int i; > > > + for (i =3D 0; i < width; i++) { > > > > > + int r =3D (lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+0]= ), > > 0.0f, 65535.0f)) + > > > + lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+4]), > > 0.0f, 65535.0f))) >> 1; > > > + int g =3D (lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+1]= ), > > 0.0f, 65535.0f)) + > > > + lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+5]), > > 0.0f, 65535.0f))) >> 1; > > > + int b =3D (lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+2]= ), > > 0.0f, 65535.0f)) + > > > + lrintf(av_clipf(65535.0f * rdpx(src[i*(comp*2)+6]), > > 0.0f, 65535.0f))) >> 1; > > > + > > > + dstU[i] =3D (ru*r + gu*g + bu*b + (0x10001<<(RGB2YUV_SHIFT-1= ))) > > >> RGB2YUV_SHIFT; > > > + dstV[i] =3D (rv*r + gv*g + bv*b + (0x10001<<(RGB2YUV_SHIFT-1= ))) > > >> RGB2YUV_SHIFT; > > > > I would expect this sort of code to use 2 lrintf() and 2 av_clipf() not= 6 > > > > > ya it is a bit excessive, I'll just remove the _half conversions for now, > they aren't strictly necessary as far as I can tell. do you see a problem with just factorizing them out ? it shouldnt be hard to reorder the operations >=20 >=20 > > > > > + } > > > +} > > > + > > > +static av_always_inline void rgbaf32ToUV_endian(uint16_t *dstU, > > uint16_t *dstV, int is_be, > > > + const float *src, int > > width, > > > + int32_t *rgb2yuv, int > > comp) > > > +{ > > > + int32_t ru =3D rgb2yuv[RU_IDX], gu =3D rgb2yuv[GU_IDX], bu =3D > > rgb2yuv[BU_IDX]; > > > + int32_t rv =3D rgb2yuv[RV_IDX], gv =3D rgb2yuv[GV_IDX], bv =3D > > rgb2yuv[BV_IDX]; > > > + int i; > > > + for (i =3D 0; i < width; i++) { > > > + int r =3D lrintf(av_clipf(65535.0f * rdpx(src[i*comp+0]), 0.= 0f, > > 65535.0f)); > > > + int g =3D lrintf(av_clipf(65535.0f * rdpx(src[i*comp+1]), 0.= 0f, > > 65535.0f)); > > > + int b =3D lrintf(av_clipf(65535.0f * rdpx(src[i*comp+2]), 0.= 0f, > > 65535.0f)); > > > + > > > + dstU[i] =3D (ru*r + gu*g + bu*b + (0x10001<<(RGB2YUV_SHIFT-1= ))) > > >> RGB2YUV_SHIFT; > > > + dstV[i] =3D (rv*r + gv*g + bv*b + (0x10001<<(RGB2YUV_SHIFT-1= ))) > > >> RGB2YUV_SHIFT; > > > + } > > > +} > > > + > > > > > +static av_always_inline void rgbaf32ToY_endian(uint16_t *dst, const > > float *src, int is_be, > > > + int width, int32_t > > *rgb2yuv, int comp) > > > +{ > > > + int32_t ry =3D rgb2yuv[RY_IDX], gy =3D rgb2yuv[GY_IDX], by =3D > > rgb2yuv[BY_IDX]; > > > + int i; > > > + for (i =3D 0; i < width; i++) { > > > + int r =3D lrintf(av_clipf(65535.0f * rdpx(src[i*comp+0]), 0.= 0f, > > 65535.0f)); > > > + int g =3D lrintf(av_clipf(65535.0f * rdpx(src[i*comp+1]), 0.= 0f, > > 65535.0f)); > > > + int b =3D lrintf(av_clipf(65535.0f * rdpx(src[i*comp+2]), 0.= 0f, > > 65535.0f)); > > > + > > > > > + dst[i] =3D (ry*r + gy*g + by*b + (0x2001<<(RGB2YUV_SHIFT-1))= ) >> > > RGB2YUV_SHIFT; > > > > there is one output so there should be only need for one clip and one > > float->int > > >=20 > This is matching the f32 planar version. I think I was paranoid about > things being bitexact for tests and that's why it's currently being done > this way. > I'll see what happens if I introduce more float operations, could I perha= ps > do this in a later patch? some asm might have to change too. of course can be a seperate patch in a set. Maybe f32 planar can be changed at the same time thx [...] --=20 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB In a rich man's house there is no place to spit but his face. -- Diogenes of Sinope --cjXvCArabh/jFWdZ Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EABEIAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCY3KuIgAKCRBhHseHBAsP q0xeAKCS2YNU6MiLaf/SuVwGjIYaLMgjYACeKzGj4A1NVOVnb+D/DU0131Gg3DQ= =1FKw -----END PGP SIGNATURE----- --cjXvCArabh/jFWdZ-- --===============6515138070383462294== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --===============6515138070383462294==--