From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id E214647E19 for ; Fri, 27 Oct 2023 18:38:23 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C7E0D68CAA2; Fri, 27 Oct 2023 21:38:21 +0300 (EEST) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 46EC968C477 for ; Fri, 27 Oct 2023 21:38:16 +0300 (EEST) Received: by mail.gandi.net (Postfix) with ESMTPSA id 9E88E1C0002 for ; Fri, 27 Oct 2023 18:38:15 +0000 (UTC) Date: Fri, 27 Oct 2023 20:38:14 +0200 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20231027183814.GW3543730@pb2> References: <20231024150443.7438-1-michael@niedermayer.cc> <20231024150443.7438-2-michael@niedermayer.cc> MIME-Version: 1.0 In-Reply-To: X-GND-Sasl: michael@niedermayer.cc Subject: Re: [FFmpeg-devel] [PATCH 2/4] avcodec/get_bits: Avoid 2nd bitstream read in GET_VLC() if bits are known at build and small X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: multipart/mixed; boundary="===============7449224587420255389==" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --===============7449224587420255389== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="2IfSHBTcwkaAacpK" Content-Disposition: inline --2IfSHBTcwkaAacpK Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Oct 27, 2023 at 05:10:32AM +0200, Andreas Rheinhardt wrote: > Michael Niedermayer: > > Signed-off-by: Michael Niedermayer > > --- > > libavcodec/get_bits.h | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > >=20 > > diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h > > index cfcf97c021c..86cea00494a 100644 > > --- a/libavcodec/get_bits.h > > +++ b/libavcodec/get_bits.h > > @@ -581,8 +581,12 @@ static inline const uint8_t *align_get_bits(GetBit= Context *s) > > n =3D table[index].len; \ > > \ > > if (max_depth > 1 && n < 0) { \ > > - LAST_SKIP_BITS(name, gb, bits); \ > > - UPDATE_CACHE(name, gb); \ > > + if (av_builtin_constant_p(bits <=3D MIN_CACHE_BITS/2) && b= its <=3D MIN_CACHE_BITS/2) { \ > > + SKIP_BITS(name, gb, bits); \ > > + } else { \ > > + LAST_SKIP_BITS(name, gb, bits); \ > > + UPDATE_CACHE(name, gb); \ > > + } \ > > \ > > nb_bits =3D -n; \ > > \ >=20 > This is problematic: The GET_VLC macro does not presume that > MIN_CACHE_BITS are available; there is code that directly uses GET_VLC > instead of get_vlc2(). >=20 > I had the same idea when I made my VLC patchset, yet I wanted to first > apply it (which I forgot). While investigating the above issue, I found > out that all users of GET_VLC always call UPDATE_CACHE immediately > before GET_VLC, so UPDATE_CACHE should be moved into GET_VLC; > furthermore, no user of GET_VLC relies on the reloads inside of GET_VLC. > The patches for this are here: > https://github.com/mkver/FFmpeg/commits/vlc Shall I send them? >=20 > Notice that making GET_VLC more standalone enables improvements over the > current approach; yet it will not lead to optimal code: E.g. the VLCs in > decode_alpha_block() in speedhqdec.c are so short that one could read > both VLCs with only one UPDATE_CACHE(); another example is mjpegdec.c > which currently does this: >=20 > GET_VLC(code, re, &s->gb, s->vlcs[1][ac_index].table, 9, 2); >=20 > i +=3D ((unsigned)code) >> 4; > code &=3D 0xf; > if (code) { > if (code > MIN_CACHE_BITS - 16) > UPDATE_CACHE(re, &s->gb); >=20 > { > int cache =3D GET_CACHE(re, &s->gb); > int sign =3D (~cache) >> 31; > level =3D (NEG_USR32(sign ^ cache,code) ^ sign) - sig= n; > } >=20 > LAST_SKIP_BITS(re, &s->gb, code); >=20 > Because of the reloads in GET_VLC, there will always be at least > MIN_CACHE_BITS - 9 (=3D 16) bits available after GET_VLC, so one can read > code (<=3D 15) bits without updating the cache at all (16 in > MIN_CACHE_BITS - 16 is the maximum length of a VLC code used here); this > will no longer be possible with this optimization. > Btw: I am surprised that there is a branch before UPDATE_CACHE instead > of an unconditional UPDATE_CACHE. I also do not really see why this uses > these macros directly and not the functions. >=20 > Given my objection to your patch #1, magicyuv will not benefit from > this; a different approach (see > https://github.com/mkver/FFmpeg/commit/9b5a977957968c0718dea55a5b15f060ef= 6201dc) > is to add a get_vlc() that uses the nb of bits used to create the VLC > and a compile-time upper bound for the maximum length of a VLC code as > parameters instead of the maximum depth of the VLC. >=20 > Reading VLCs for the cached bitstream reader can btw also be improved: > https://github.com/mkver/FFmpeg/commit/fba57506a9cf6be2f4aa5eeee7b10d5472= 9fd92a speaking of that, i was wondering if the alternatives we had in get_bits.h A32_BITSTREAM_READER wouldnt be worth reinvestigating especially when extended to 64bit some of these readers might perform better There are then just more bits available and fewer reads and fewer mispredic= ted branches for cached ones It would be somewhat nice if we could avoid having 2 different APIs as we h= ave now with the cached and normal reader. Also the normal one with 64bit would be interresting, more available bits so fewer reads also i was wondering about a vlc reader thats entirely free of conditional branches. Just a loop that in each iteration would step by 0-n symbols forward and update a pointer to which table to use next but i dont think i will have the time to try to implement this. I have alot more ideas than i have time to try sadly, if you wan/can/or did try any of above that would be interresting thx [...] --=20 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB "You are 36 times more likely to die in a bathtub than at the hands of a terrorist. Also, you are 2.5 times more likely to become a president and 2 times more likely to become an astronaut, than to die in a terrorist attack." -- Thoughty2 --2IfSHBTcwkaAacpK Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EABEIAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCZTwDkwAKCRBhHseHBAsP q2RLAJ991bv8IXKWCMtLyk5o3lWe8WEciwCgl/+n5ziKfNfHWuGnYrUL7lqpZGE= =K9QZ -----END PGP SIGNATURE----- --2IfSHBTcwkaAacpK-- --===============7449224587420255389== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --===============7449224587420255389==--