From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> To: ffmpeg-devel@ffmpeg.org Subject: Re: [FFmpeg-devel] [PATCH 2/4] avcodec/get_bits: Avoid 2nd bitstream read in GET_VLC() if bits are known at build and small Date: Mon, 30 Oct 2023 21:49:07 +0100 Message-ID: <AS8P250MB07445F7442F4618AB0C6AF108FA1A@AS8P250MB0744.EURP250.PROD.OUTLOOK.COM> (raw) In-Reply-To: <20231027183814.GW3543730@pb2> Michael Niedermayer: > On Fri, Oct 27, 2023 at 05:10:32AM +0200, Andreas Rheinhardt wrote: >> Michael Niedermayer: >>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> >>> --- >>> libavcodec/get_bits.h | 8 ++++++-- >>> 1 file changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h >>> index cfcf97c021c..86cea00494a 100644 >>> --- a/libavcodec/get_bits.h >>> +++ b/libavcodec/get_bits.h >>> @@ -581,8 +581,12 @@ static inline const uint8_t *align_get_bits(GetBitContext *s) >>> n = table[index].len; \ >>> \ >>> if (max_depth > 1 && n < 0) { \ >>> - LAST_SKIP_BITS(name, gb, bits); \ >>> - UPDATE_CACHE(name, gb); \ >>> + if (av_builtin_constant_p(bits <= MIN_CACHE_BITS/2) && bits <= MIN_CACHE_BITS/2) { \ >>> + SKIP_BITS(name, gb, bits); \ >>> + } else { \ >>> + LAST_SKIP_BITS(name, gb, bits); \ >>> + UPDATE_CACHE(name, gb); \ >>> + } \ >>> \ >>> nb_bits = -n; \ >>> \ >> >> This is problematic: The GET_VLC macro does not presume that >> MIN_CACHE_BITS are available; there is code that directly uses GET_VLC >> instead of get_vlc2(). >> >> I had the same idea when I made my VLC patchset, yet I wanted to first >> apply it (which I forgot). While investigating the above issue, I found >> out that all users of GET_VLC always call UPDATE_CACHE immediately >> before GET_VLC, so UPDATE_CACHE should be moved into GET_VLC; >> furthermore, no user of GET_VLC relies on the reloads inside of GET_VLC. >> The patches for this are here: >> https://github.com/mkver/FFmpeg/commits/vlc Shall I send them? >> >> Notice that making GET_VLC more standalone enables improvements over the >> current approach; yet it will not lead to optimal code: E.g. the VLCs in >> decode_alpha_block() in speedhqdec.c are so short that one could read >> both VLCs with only one UPDATE_CACHE(); another example is mjpegdec.c >> which currently does this: >> >> GET_VLC(code, re, &s->gb, s->vlcs[1][ac_index].table, 9, 2); >> >> i += ((unsigned)code) >> 4; >> code &= 0xf; >> if (code) { >> if (code > MIN_CACHE_BITS - 16) >> UPDATE_CACHE(re, &s->gb); >> >> { >> int cache = GET_CACHE(re, &s->gb); >> int sign = (~cache) >> 31; >> level = (NEG_USR32(sign ^ cache,code) ^ sign) - sign; >> } >> >> LAST_SKIP_BITS(re, &s->gb, code); >> >> Because of the reloads in GET_VLC, there will always be at least >> MIN_CACHE_BITS - 9 (= 16) bits available after GET_VLC, so one can read >> code (<= 15) bits without updating the cache at all (16 in >> MIN_CACHE_BITS - 16 is the maximum length of a VLC code used here); this >> will no longer be possible with this optimization. >> Btw: I am surprised that there is a branch before UPDATE_CACHE instead >> of an unconditional UPDATE_CACHE. I also do not really see why this uses >> these macros directly and not the functions. >> >> Given my objection to your patch #1, magicyuv will not benefit from >> this; a different approach (see >> https://github.com/mkver/FFmpeg/commit/9b5a977957968c0718dea55a5b15f060ef6201dc) >> is to add a get_vlc() that uses the nb of bits used to create the VLC >> and a compile-time upper bound for the maximum length of a VLC code as >> parameters instead of the maximum depth of the VLC. >> > >> Reading VLCs for the cached bitstream reader can btw also be improved: >> https://github.com/mkver/FFmpeg/commit/fba57506a9cf6be2f4aa5eeee7b10d54729fd92a > > speaking of that, i was wondering if the alternatives we had in get_bits.h > A32_BITSTREAM_READER wouldnt be worth reinvestigating > especially when extended to 64bit some of these readers might perform better > There are then just more bits available and fewer reads and fewer mispredicted > branches for cached ones > > It would be somewhat nice if we could avoid having 2 different APIs as we have > now with the cached and normal reader. > Also the normal one with 64bit would be interresting, more available bits > so fewer reads I already did something like that in b0fb8e82dde375efb8d5602f6cd479f210c1e93c. But be aware that the most important reason we read more often than we necessarily have to is not the number of bits in the cache, but that we stopped using the macros (like SKIP_CACHE, SKIP_COUNTER) directly and used functions which are more standalone (and IMO also more readable). Furthermore, creating replacements of these macros is complicated for cached bitstream readers, because of the implicit overread checks. Here is the important part of read_nz: if (n > bc->bits_valid) { if (BS_FUNC(priv_refill_32)(bc) < 0) bc->bits_valid = n; } return BS_FUNC(priv_val_get)(bc, n); priv_val_get presumes there to be enough valid bits available; otherwise bits_valid wraps around. This does not mean that there is no hope for avoiding unnecessary reads: We could add an unsigned get_bits_cached(GetBitContext *gb, int n) (basically a get_bits(), but the valid bits are the most significant bits (for BE)) and get_bits_from_cache(unsigned *cache, int n) (basically, SHOW_UBITS+SKIP_CACHE (or priv_val_get for the cached API) and that could then be used as follows: unsigned cache = get_bits_cached(gb, 4 * 4); s->mpeg_f_code[0][0] = get_bits_from_cache(&cache, 4); s->mpeg_f_code[0][1] = get_bits_from_cache(&cache, 4); s->mpeg_f_code[1][0] = get_bits_from_cache(&cache, 4); s->mpeg_f_code[1][1] = get_bits_from_cache(&cache, 4); This works because get_bits_cached() would already remove the specified number of bits from the bitstream; but this also shows its restrictions: It is only usable when the number of bits to be read subsequently is known in advance. Another potential API is to add a show_bits_cached() that is the same as show_bits(), but returns the valid bits in the most significant bits (for BE). This could then be complemented with a get_bits_from_cache() and in the end one needs to skip the number of bits actually consumed. The latter would of course need to validate/sanitize the number of bits consumed for both the cached and uncached reader. For the cached reader, doing so incurs a check that is avoided in case one uses the get_bits_cached() API outlined above (of course, this only applies to the scenarios where one can actually use said API). > > also i was wondering about a vlc reader thats entirely free of conditional > branches. Just a loop that in each iteration would step by 0-n symbols > forward and update a pointer to which table to use next Doesn't this have the downside that short symbols need as many iterations as the longest one? > but i dont think i will have the time to try to implement this. > > I have alot more ideas than i have time to try sadly, if you wan/can/or did > try any of above that would be interresting > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2023-10-30 20:48 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-10-24 15:04 [FFmpeg-devel] [PATCH 1/4] avcodec/magicyuv: Use a compile time constant for vlc_bits Michael Niedermayer 2023-10-24 15:04 ` [FFmpeg-devel] [PATCH 2/4] avcodec/get_bits: Avoid 2nd bitstream read in GET_VLC() if bits are known at build and small Michael Niedermayer 2023-10-27 3:10 ` Andreas Rheinhardt 2023-10-27 18:38 ` Michael Niedermayer 2023-10-30 20:49 ` Andreas Rheinhardt [this message] 2023-10-31 0:25 ` Michael Niedermayer 2023-10-24 15:04 ` [FFmpeg-devel] [PATCH 3/4] avcodec/get_bits: Implement get_vlc_multi() Michael Niedermayer 2023-10-24 15:04 ` [FFmpeg-devel] [PATCH 4/4] avcodec/magicyuv: Set UNCHECKED_BITSTREAM_READER Michael Niedermayer 2023-10-26 21:37 ` [FFmpeg-devel] [PATCH 1/4] avcodec/magicyuv: Use a compile time constant for vlc_bits Andreas Rheinhardt
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=AS8P250MB07445F7442F4618AB0C6AF108FA1A@AS8P250MB0744.EURP250.PROD.OUTLOOK.COM \ --to=andreas.rheinhardt@outlook.com \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git