Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
* [FFmpeg-devel] [PATCH 1/4] avcodec/magicyuv: Use a compile time constant for vlc_bits
@ 2023-10-24 15:04 Michael Niedermayer
  2023-10-24 15:04 ` [FFmpeg-devel] [PATCH 2/4] avcodec/get_bits: Avoid 2nd bitstream read in GET_VLC() if bits are known at build and small Michael Niedermayer
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Michael Niedermayer @ 2023-10-24 15:04 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

This will permit further optimizations

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 libavcodec/magicyuv.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/libavcodec/magicyuv.c b/libavcodec/magicyuv.c
index 4a143cdbbf7..78d7f44cd65 100644
--- a/libavcodec/magicyuv.c
+++ b/libavcodec/magicyuv.c
@@ -124,13 +124,13 @@ static void magicyuv_median_pred16(uint16_t *dst, const uint16_t *src1,
     x = 0; \
     for (; x < width-c && get_bits_left(&gb) > 0;) {\
         ret = get_vlc_multi(&gb, (uint8_t *)dst + x * b, multi, \
-                            vlc, vlc_bits, 3); \
+                            vlc, VLC_BITS, 3); \
         if (ret <= 0) \
             return AVERROR_INVALIDDATA; \
         x += ret; \
     } \
     for (; x < width && get_bits_left(&gb) > 0; x++) \
-        dst[x] = get_vlc2(&gb, vlc, vlc_bits, 3); \
+        dst[x] = get_vlc2(&gb, vlc, VLC_BITS, 3); \
     dst += stride; \
 }
 
@@ -155,7 +155,6 @@ static int magy_decode_slice10(AVCodecContext *avctx, void *tdata,
         ptrdiff_t stride = p->linesize[i] / 2;
         const VLC_MULTI_ELEM *const multi = s->multi[i].table;
         const VLCElem *const vlc = s->vlc[i].table;
-        const int vlc_bits = s->vlc[i].bits;
         int flags, pred;
         int ret = init_get_bits8(&gb, s->buf + s->slices[i][j].start,
                                  s->slices[i][j].size);
-- 
2.17.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [FFmpeg-devel] [PATCH 2/4] avcodec/get_bits: Avoid 2nd bitstream read in GET_VLC() if bits are known at build and small
  2023-10-24 15:04 [FFmpeg-devel] [PATCH 1/4] avcodec/magicyuv: Use a compile time constant for vlc_bits Michael Niedermayer
@ 2023-10-24 15:04 ` Michael Niedermayer
  2023-10-27  3:10   ` Andreas Rheinhardt
  2023-10-24 15:04 ` [FFmpeg-devel] [PATCH 3/4] avcodec/get_bits: Implement get_vlc_multi() Michael Niedermayer
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Michael Niedermayer @ 2023-10-24 15:04 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 libavcodec/get_bits.h | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
index cfcf97c021c..86cea00494a 100644
--- a/libavcodec/get_bits.h
+++ b/libavcodec/get_bits.h
@@ -581,8 +581,12 @@ static inline const uint8_t *align_get_bits(GetBitContext *s)
         n     = table[index].len;                               \
                                                                 \
         if (max_depth > 1 && n < 0) {                           \
-            LAST_SKIP_BITS(name, gb, bits);                     \
-            UPDATE_CACHE(name, gb);                             \
+            if (av_builtin_constant_p(bits <= MIN_CACHE_BITS/2) && bits <= MIN_CACHE_BITS/2) { \
+                SKIP_BITS(name, gb, bits);                      \
+            } else {                                            \
+                LAST_SKIP_BITS(name, gb, bits);                 \
+                UPDATE_CACHE(name, gb);                         \
+            }                                                   \
                                                                 \
             nb_bits = -n;                                       \
                                                                 \
-- 
2.17.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [FFmpeg-devel] [PATCH 3/4] avcodec/get_bits: Implement get_vlc_multi()
  2023-10-24 15:04 [FFmpeg-devel] [PATCH 1/4] avcodec/magicyuv: Use a compile time constant for vlc_bits Michael Niedermayer
  2023-10-24 15:04 ` [FFmpeg-devel] [PATCH 2/4] avcodec/get_bits: Avoid 2nd bitstream read in GET_VLC() if bits are known at build and small Michael Niedermayer
@ 2023-10-24 15:04 ` Michael Niedermayer
  2023-10-24 15:04 ` [FFmpeg-devel] [PATCH 4/4] avcodec/magicyuv: Set UNCHECKED_BITSTREAM_READER Michael Niedermayer
  2023-10-26 21:37 ` [FFmpeg-devel] [PATCH 1/4] avcodec/magicyuv: Use a compile time constant for vlc_bits Andreas Rheinhardt
  3 siblings, 0 replies; 9+ messages in thread
From: Michael Niedermayer @ 2023-10-24 15:04 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 libavcodec/get_bits.h | 61 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 58 insertions(+), 3 deletions(-)

diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
index 86cea00494a..806e4b16f49 100644
--- a/libavcodec/get_bits.h
+++ b/libavcodec/get_bits.h
@@ -668,13 +668,68 @@ static av_always_inline int get_vlc2(GetBitContext *s, const VLCElem *table,
     return code;
 }
 
-static inline int get_vlc_multi(GetBitContext *s, uint8_t *dst,
+/**
+ * Parse a vlc / vlc_multi code.
+ * @param bits is the number of bits which will be read at once, must be
+ *             identical to nb_bits in vlc_init(), should be known at compile time
+ * @param max_depth is the number of times bits bits must be read to completely
+ *                  read the longest vlc code
+ *                  = (max_vlc_length + bits - 1) / bits
+ * @param dst the parsed symbol(s) will be stored here. Up to 8 bytes are written
+ * @returns number of symbols parsed
+ * If the vlc code is invalid and max_depth=1, then no bits will be removed.
+ * If the vlc code is invalid and max_depth>1, then the number of bits removed
+ * is undefined.
+ */
+static inline int get_vlc_multi(GetBitContext *s,  uint8_t * restrict dst,
                                 const VLC_MULTI_ELEM *const Jtable,
                                 const VLCElem *const table,
                                 const int bits, const int max_depth)
 {
-    dst[0] = get_vlc2(s, table, bits, max_depth);
-    return 1;
+    int ret, nb_bits, n;
+    unsigned int index;
+
+    OPEN_READER(re, s);
+    UPDATE_CACHE(re, s);
+
+    index = SHOW_UBITS(re, s, bits);
+    n = Jtable[index].len;
+    if (Jtable[index].num) {
+        AV_COPY64U(dst, Jtable[index].val);
+        ret = Jtable[index].num;
+    } else {
+        int code = table[index].sym;
+
+        if (av_builtin_constant_p(bits <= MIN_CACHE_BITS/2) && bits <= MIN_CACHE_BITS/2) {
+            SKIP_BITS(re, s, bits);
+        } else {
+            LAST_SKIP_BITS(re, s, bits);
+            UPDATE_CACHE(re, s);
+        }
+
+        nb_bits = -n;
+
+        index = SHOW_UBITS(re, s, nb_bits) + code;
+        code  = table[index].sym;
+        n     = table[index].len;
+        if (max_depth > 2 && n < 0) {
+            LAST_SKIP_BITS(re, s, nb_bits);
+            UPDATE_CACHE(re, s);
+
+            nb_bits = -n;
+
+            index = SHOW_UBITS(re, s, nb_bits) + code;
+            code  = table[index].sym;
+            n     = table[index].len;
+        }
+        dst[0] = code;
+        ret = n > 0;
+    }
+    LAST_SKIP_BITS(re, s, n);
+
+    CLOSE_READER(re, s);
+
+    return ret;
 }
 
 static inline int decode012(GetBitContext *gb)
-- 
2.17.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [FFmpeg-devel] [PATCH 4/4] avcodec/magicyuv: Set UNCHECKED_BITSTREAM_READER
  2023-10-24 15:04 [FFmpeg-devel] [PATCH 1/4] avcodec/magicyuv: Use a compile time constant for vlc_bits Michael Niedermayer
  2023-10-24 15:04 ` [FFmpeg-devel] [PATCH 2/4] avcodec/get_bits: Avoid 2nd bitstream read in GET_VLC() if bits are known at build and small Michael Niedermayer
  2023-10-24 15:04 ` [FFmpeg-devel] [PATCH 3/4] avcodec/get_bits: Implement get_vlc_multi() Michael Niedermayer
@ 2023-10-24 15:04 ` Michael Niedermayer
  2023-10-26 21:37 ` [FFmpeg-devel] [PATCH 1/4] avcodec/magicyuv: Use a compile time constant for vlc_bits Andreas Rheinhardt
  3 siblings, 0 replies; 9+ messages in thread
From: Michael Niedermayer @ 2023-10-24 15:04 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

The code already checks for the end everywhere

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 libavcodec/magicyuv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libavcodec/magicyuv.c b/libavcodec/magicyuv.c
index 78d7f44cd65..2ab29df2f51 100644
--- a/libavcodec/magicyuv.c
+++ b/libavcodec/magicyuv.c
@@ -23,6 +23,7 @@
 #include <string.h>
 
 #define CACHED_BITSTREAM_READER !ARCH_X86_32
+#define UNCHECKED_BITSTREAM_READER 1
 
 #include "libavutil/pixdesc.h"
 
-- 
2.17.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [FFmpeg-devel] [PATCH 1/4] avcodec/magicyuv: Use a compile time constant for vlc_bits
  2023-10-24 15:04 [FFmpeg-devel] [PATCH 1/4] avcodec/magicyuv: Use a compile time constant for vlc_bits Michael Niedermayer
                   ` (2 preceding siblings ...)
  2023-10-24 15:04 ` [FFmpeg-devel] [PATCH 4/4] avcodec/magicyuv: Set UNCHECKED_BITSTREAM_READER Michael Niedermayer
@ 2023-10-26 21:37 ` Andreas Rheinhardt
  3 siblings, 0 replies; 9+ messages in thread
From: Andreas Rheinhardt @ 2023-10-26 21:37 UTC (permalink / raw)
  To: ffmpeg-devel

Michael Niedermayer:
> This will permit further optimizations
> 
> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
>  libavcodec/magicyuv.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/libavcodec/magicyuv.c b/libavcodec/magicyuv.c
> index 4a143cdbbf7..78d7f44cd65 100644
> --- a/libavcodec/magicyuv.c
> +++ b/libavcodec/magicyuv.c
> @@ -124,13 +124,13 @@ static void magicyuv_median_pred16(uint16_t *dst, const uint16_t *src1,
>      x = 0; \
>      for (; x < width-c && get_bits_left(&gb) > 0;) {\
>          ret = get_vlc_multi(&gb, (uint8_t *)dst + x * b, multi, \
> -                            vlc, vlc_bits, 3); \
> +                            vlc, VLC_BITS, 3); \
>          if (ret <= 0) \
>              return AVERROR_INVALIDDATA; \
>          x += ret; \
>      } \
>      for (; x < width && get_bits_left(&gb) > 0; x++) \
> -        dst[x] = get_vlc2(&gb, vlc, vlc_bits, 3); \
> +        dst[x] = get_vlc2(&gb, vlc, VLC_BITS, 3); \
>      dst += stride; \
>  }
>  
> @@ -155,7 +155,6 @@ static int magy_decode_slice10(AVCodecContext *avctx, void *tdata,
>          ptrdiff_t stride = p->linesize[i] / 2;
>          const VLC_MULTI_ELEM *const multi = s->multi[i].table;
>          const VLCElem *const vlc = s->vlc[i].table;
> -        const int vlc_bits = s->vlc[i].bits;
>          int flags, pred;
>          int ret = init_get_bits8(&gb, s->buf + s->slices[i][j].start,
>                                   s->slices[i][j].size);

The VLCs are created via "ff_vlc_init_multi_from_lengths(vlc, multi,
FFMIN(he[0].len, VLC_BITS), nb_elems, nb_elems,", so the number of bits
of the VLC is not a compile-time constant.

- Andreas

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [FFmpeg-devel] [PATCH 2/4] avcodec/get_bits: Avoid 2nd bitstream read in GET_VLC() if bits are known at build and small
  2023-10-24 15:04 ` [FFmpeg-devel] [PATCH 2/4] avcodec/get_bits: Avoid 2nd bitstream read in GET_VLC() if bits are known at build and small Michael Niedermayer
@ 2023-10-27  3:10   ` Andreas Rheinhardt
  2023-10-27 18:38     ` Michael Niedermayer
  0 siblings, 1 reply; 9+ messages in thread
From: Andreas Rheinhardt @ 2023-10-27  3:10 UTC (permalink / raw)
  To: ffmpeg-devel

Michael Niedermayer:
> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
>  libavcodec/get_bits.h | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
> index cfcf97c021c..86cea00494a 100644
> --- a/libavcodec/get_bits.h
> +++ b/libavcodec/get_bits.h
> @@ -581,8 +581,12 @@ static inline const uint8_t *align_get_bits(GetBitContext *s)
>          n     = table[index].len;                               \
>                                                                  \
>          if (max_depth > 1 && n < 0) {                           \
> -            LAST_SKIP_BITS(name, gb, bits);                     \
> -            UPDATE_CACHE(name, gb);                             \
> +            if (av_builtin_constant_p(bits <= MIN_CACHE_BITS/2) && bits <= MIN_CACHE_BITS/2) { \
> +                SKIP_BITS(name, gb, bits);                      \
> +            } else {                                            \
> +                LAST_SKIP_BITS(name, gb, bits);                 \
> +                UPDATE_CACHE(name, gb);                         \
> +            }                                                   \
>                                                                  \
>              nb_bits = -n;                                       \
>                                                                  \

This is problematic: The GET_VLC macro does not presume that
MIN_CACHE_BITS are available; there is code that directly uses GET_VLC
instead of get_vlc2().

I had the same idea when I made my VLC patchset, yet I wanted to first
apply it (which I forgot). While investigating the above issue, I found
out that all users of GET_VLC always call UPDATE_CACHE immediately
before GET_VLC, so UPDATE_CACHE should be moved into GET_VLC;
furthermore, no user of GET_VLC relies on the reloads inside of GET_VLC.
The patches for this are here:
https://github.com/mkver/FFmpeg/commits/vlc Shall I send them?

Notice that making GET_VLC more standalone enables improvements over the
current approach; yet it will not lead to optimal code: E.g. the VLCs in
decode_alpha_block() in speedhqdec.c are so short that one could read
both VLCs with only one UPDATE_CACHE(); another example is mjpegdec.c
which currently does this:

        GET_VLC(code, re, &s->gb, s->vlcs[1][ac_index].table, 9, 2);

        i += ((unsigned)code) >> 4;
            code &= 0xf;
        if (code) {
            if (code > MIN_CACHE_BITS - 16)
                UPDATE_CACHE(re, &s->gb);

            {
                int cache = GET_CACHE(re, &s->gb);
                int sign  = (~cache) >> 31;
                level     = (NEG_USR32(sign ^ cache,code) ^ sign) - sign;
            }

            LAST_SKIP_BITS(re, &s->gb, code);

Because of the reloads in GET_VLC, there will always be at least
MIN_CACHE_BITS - 9 (= 16) bits available after GET_VLC, so one can read
code (<= 15) bits without updating the cache at all (16 in
MIN_CACHE_BITS - 16 is the maximum length of a VLC code used here); this
will no longer be possible with this optimization.
Btw: I am surprised that there is a branch before UPDATE_CACHE instead
of an unconditional UPDATE_CACHE. I also do not really see why this uses
these macros directly and not the functions.

Given my objection to your patch #1, magicyuv will not benefit from
this; a different approach (see
https://github.com/mkver/FFmpeg/commit/9b5a977957968c0718dea55a5b15f060ef6201dc)
is to add a get_vlc() that uses the nb of bits used to create the VLC
and a compile-time upper bound for the maximum length of a VLC code as
parameters instead of the maximum depth of the VLC.

Reading VLCs for the cached bitstream reader can btw also be improved:
https://github.com/mkver/FFmpeg/commit/fba57506a9cf6be2f4aa5eeee7b10d54729fd92a

- Andreas

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [FFmpeg-devel] [PATCH 2/4] avcodec/get_bits: Avoid 2nd bitstream read in GET_VLC() if bits are known at build and small
  2023-10-27  3:10   ` Andreas Rheinhardt
@ 2023-10-27 18:38     ` Michael Niedermayer
  2023-10-30 20:49       ` Andreas Rheinhardt
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Niedermayer @ 2023-10-27 18:38 UTC (permalink / raw)
  To: FFmpeg development discussions and patches


[-- Attachment #1.1: Type: text/plain, Size: 5428 bytes --]

On Fri, Oct 27, 2023 at 05:10:32AM +0200, Andreas Rheinhardt wrote:
> Michael Niedermayer:
> > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > ---
> >  libavcodec/get_bits.h | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
> > index cfcf97c021c..86cea00494a 100644
> > --- a/libavcodec/get_bits.h
> > +++ b/libavcodec/get_bits.h
> > @@ -581,8 +581,12 @@ static inline const uint8_t *align_get_bits(GetBitContext *s)
> >          n     = table[index].len;                               \
> >                                                                  \
> >          if (max_depth > 1 && n < 0) {                           \
> > -            LAST_SKIP_BITS(name, gb, bits);                     \
> > -            UPDATE_CACHE(name, gb);                             \
> > +            if (av_builtin_constant_p(bits <= MIN_CACHE_BITS/2) && bits <= MIN_CACHE_BITS/2) { \
> > +                SKIP_BITS(name, gb, bits);                      \
> > +            } else {                                            \
> > +                LAST_SKIP_BITS(name, gb, bits);                 \
> > +                UPDATE_CACHE(name, gb);                         \
> > +            }                                                   \
> >                                                                  \
> >              nb_bits = -n;                                       \
> >                                                                  \
> 
> This is problematic: The GET_VLC macro does not presume that
> MIN_CACHE_BITS are available; there is code that directly uses GET_VLC
> instead of get_vlc2().
> 
> I had the same idea when I made my VLC patchset, yet I wanted to first
> apply it (which I forgot). While investigating the above issue, I found
> out that all users of GET_VLC always call UPDATE_CACHE immediately
> before GET_VLC, so UPDATE_CACHE should be moved into GET_VLC;
> furthermore, no user of GET_VLC relies on the reloads inside of GET_VLC.
> The patches for this are here:
> https://github.com/mkver/FFmpeg/commits/vlc Shall I send them?
> 
> Notice that making GET_VLC more standalone enables improvements over the
> current approach; yet it will not lead to optimal code: E.g. the VLCs in
> decode_alpha_block() in speedhqdec.c are so short that one could read
> both VLCs with only one UPDATE_CACHE(); another example is mjpegdec.c
> which currently does this:
> 
>         GET_VLC(code, re, &s->gb, s->vlcs[1][ac_index].table, 9, 2);
> 
>         i += ((unsigned)code) >> 4;
>             code &= 0xf;
>         if (code) {
>             if (code > MIN_CACHE_BITS - 16)
>                 UPDATE_CACHE(re, &s->gb);
> 
>             {
>                 int cache = GET_CACHE(re, &s->gb);
>                 int sign  = (~cache) >> 31;
>                 level     = (NEG_USR32(sign ^ cache,code) ^ sign) - sign;
>             }
> 
>             LAST_SKIP_BITS(re, &s->gb, code);
> 
> Because of the reloads in GET_VLC, there will always be at least
> MIN_CACHE_BITS - 9 (= 16) bits available after GET_VLC, so one can read
> code (<= 15) bits without updating the cache at all (16 in
> MIN_CACHE_BITS - 16 is the maximum length of a VLC code used here); this
> will no longer be possible with this optimization.
> Btw: I am surprised that there is a branch before UPDATE_CACHE instead
> of an unconditional UPDATE_CACHE. I also do not really see why this uses
> these macros directly and not the functions.
> 
> Given my objection to your patch #1, magicyuv will not benefit from
> this; a different approach (see
> https://github.com/mkver/FFmpeg/commit/9b5a977957968c0718dea55a5b15f060ef6201dc)
> is to add a get_vlc() that uses the nb of bits used to create the VLC
> and a compile-time upper bound for the maximum length of a VLC code as
> parameters instead of the maximum depth of the VLC.
> 

> Reading VLCs for the cached bitstream reader can btw also be improved:
> https://github.com/mkver/FFmpeg/commit/fba57506a9cf6be2f4aa5eeee7b10d54729fd92a

speaking of that, i was wondering if the alternatives we had in get_bits.h
A32_BITSTREAM_READER wouldnt be worth reinvestigating
especially when extended to 64bit some of these readers might perform better
There are then just more bits available and fewer reads and fewer mispredicted
branches for cached ones

It would be somewhat nice if we could avoid having 2 different APIs as we have
now with the cached and normal reader.
Also the normal one with 64bit would be interresting, more available bits
so fewer reads

also i was wondering about a vlc reader thats entirely free of conditional
branches. Just a loop that in each iteration would step by 0-n symbols
forward and update a pointer to which table to use next
but i dont think i will have the time to try to implement this.

I have alot more ideas than i have time to try sadly, if you wan/can/or did
try any of above that would be interresting

thx

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

"You are 36 times more likely to die in a bathtub than at the hands of a
terrorist. Also, you are 2.5 times more likely to become a president and
2 times more likely to become an astronaut, than to die in a terrorist
attack." -- Thoughty2


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 251 bytes --]

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [FFmpeg-devel] [PATCH 2/4] avcodec/get_bits: Avoid 2nd bitstream read in GET_VLC() if bits are known at build and small
  2023-10-27 18:38     ` Michael Niedermayer
@ 2023-10-30 20:49       ` Andreas Rheinhardt
  2023-10-31  0:25         ` Michael Niedermayer
  0 siblings, 1 reply; 9+ messages in thread
From: Andreas Rheinhardt @ 2023-10-30 20:49 UTC (permalink / raw)
  To: ffmpeg-devel

Michael Niedermayer:
> On Fri, Oct 27, 2023 at 05:10:32AM +0200, Andreas Rheinhardt wrote:
>> Michael Niedermayer:
>>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
>>> ---
>>>  libavcodec/get_bits.h | 8 ++++++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
>>> index cfcf97c021c..86cea00494a 100644
>>> --- a/libavcodec/get_bits.h
>>> +++ b/libavcodec/get_bits.h
>>> @@ -581,8 +581,12 @@ static inline const uint8_t *align_get_bits(GetBitContext *s)
>>>          n     = table[index].len;                               \
>>>                                                                  \
>>>          if (max_depth > 1 && n < 0) {                           \
>>> -            LAST_SKIP_BITS(name, gb, bits);                     \
>>> -            UPDATE_CACHE(name, gb);                             \
>>> +            if (av_builtin_constant_p(bits <= MIN_CACHE_BITS/2) && bits <= MIN_CACHE_BITS/2) { \
>>> +                SKIP_BITS(name, gb, bits);                      \
>>> +            } else {                                            \
>>> +                LAST_SKIP_BITS(name, gb, bits);                 \
>>> +                UPDATE_CACHE(name, gb);                         \
>>> +            }                                                   \
>>>                                                                  \
>>>              nb_bits = -n;                                       \
>>>                                                                  \
>>
>> This is problematic: The GET_VLC macro does not presume that
>> MIN_CACHE_BITS are available; there is code that directly uses GET_VLC
>> instead of get_vlc2().
>>
>> I had the same idea when I made my VLC patchset, yet I wanted to first
>> apply it (which I forgot). While investigating the above issue, I found
>> out that all users of GET_VLC always call UPDATE_CACHE immediately
>> before GET_VLC, so UPDATE_CACHE should be moved into GET_VLC;
>> furthermore, no user of GET_VLC relies on the reloads inside of GET_VLC.
>> The patches for this are here:
>> https://github.com/mkver/FFmpeg/commits/vlc Shall I send them?
>>
>> Notice that making GET_VLC more standalone enables improvements over the
>> current approach; yet it will not lead to optimal code: E.g. the VLCs in
>> decode_alpha_block() in speedhqdec.c are so short that one could read
>> both VLCs with only one UPDATE_CACHE(); another example is mjpegdec.c
>> which currently does this:
>>
>>         GET_VLC(code, re, &s->gb, s->vlcs[1][ac_index].table, 9, 2);
>>
>>         i += ((unsigned)code) >> 4;
>>             code &= 0xf;
>>         if (code) {
>>             if (code > MIN_CACHE_BITS - 16)
>>                 UPDATE_CACHE(re, &s->gb);
>>
>>             {
>>                 int cache = GET_CACHE(re, &s->gb);
>>                 int sign  = (~cache) >> 31;
>>                 level     = (NEG_USR32(sign ^ cache,code) ^ sign) - sign;
>>             }
>>
>>             LAST_SKIP_BITS(re, &s->gb, code);
>>
>> Because of the reloads in GET_VLC, there will always be at least
>> MIN_CACHE_BITS - 9 (= 16) bits available after GET_VLC, so one can read
>> code (<= 15) bits without updating the cache at all (16 in
>> MIN_CACHE_BITS - 16 is the maximum length of a VLC code used here); this
>> will no longer be possible with this optimization.
>> Btw: I am surprised that there is a branch before UPDATE_CACHE instead
>> of an unconditional UPDATE_CACHE. I also do not really see why this uses
>> these macros directly and not the functions.
>>
>> Given my objection to your patch #1, magicyuv will not benefit from
>> this; a different approach (see
>> https://github.com/mkver/FFmpeg/commit/9b5a977957968c0718dea55a5b15f060ef6201dc)
>> is to add a get_vlc() that uses the nb of bits used to create the VLC
>> and a compile-time upper bound for the maximum length of a VLC code as
>> parameters instead of the maximum depth of the VLC.
>>
> 
>> Reading VLCs for the cached bitstream reader can btw also be improved:
>> https://github.com/mkver/FFmpeg/commit/fba57506a9cf6be2f4aa5eeee7b10d54729fd92a
> 
> speaking of that, i was wondering if the alternatives we had in get_bits.h
> A32_BITSTREAM_READER wouldnt be worth reinvestigating
> especially when extended to 64bit some of these readers might perform better
> There are then just more bits available and fewer reads and fewer mispredicted
> branches for cached ones
> 
> It would be somewhat nice if we could avoid having 2 different APIs as we have
> now with the cached and normal reader.
> Also the normal one with 64bit would be interresting, more available bits
> so fewer reads

I already did something like that in
b0fb8e82dde375efb8d5602f6cd479f210c1e93c. But be aware that the most
important reason we read more often than we necessarily have to is not
the number of bits in the cache, but that we stopped using the macros
(like SKIP_CACHE, SKIP_COUNTER) directly and used functions which are
more standalone (and IMO also more readable).
Furthermore, creating replacements of these macros is complicated for
cached bitstream readers, because of the implicit overread checks. Here
is the important part of read_nz:

    if (n > bc->bits_valid) {
        if (BS_FUNC(priv_refill_32)(bc) < 0)
            bc->bits_valid = n;
    }

    return BS_FUNC(priv_val_get)(bc, n);

priv_val_get presumes there to be enough valid bits available; otherwise
bits_valid wraps around.

This does not mean that there is no hope for avoiding unnecessary reads:
We could add an unsigned get_bits_cached(GetBitContext *gb, int n)
(basically a get_bits(), but the valid bits are the most significant
bits (for BE)) and get_bits_from_cache(unsigned *cache, int n)
(basically, SHOW_UBITS+SKIP_CACHE (or priv_val_get for the cached API)
and that could then be used as follows:

    unsigned cache = get_bits_cached(gb, 4 * 4);

    s->mpeg_f_code[0][0] = get_bits_from_cache(&cache, 4);
    s->mpeg_f_code[0][1] = get_bits_from_cache(&cache, 4);
    s->mpeg_f_code[1][0] = get_bits_from_cache(&cache, 4);
    s->mpeg_f_code[1][1] = get_bits_from_cache(&cache, 4);

This works because get_bits_cached() would already remove the specified
number of bits from the bitstream; but this also shows its restrictions:
It is only usable when the number of bits to be read subsequently is
known in advance.

Another potential API is to add a show_bits_cached() that is the same as
show_bits(), but returns the valid bits in the most significant bits
(for BE). This could then be complemented with a get_bits_from_cache()
and in the end one needs to skip the number of bits actually consumed.
The latter would of course need to validate/sanitize the number of bits
consumed for both the cached and uncached reader. For the cached reader,
doing so incurs a check that is avoided in case one uses the
get_bits_cached() API outlined above (of course, this only applies to
the scenarios where one can actually use said API).

> 
> also i was wondering about a vlc reader thats entirely free of conditional
> branches. Just a loop that in each iteration would step by 0-n symbols
> forward and update a pointer to which table to use next

Doesn't this have the downside that short symbols need as many
iterations as the longest one?

> but i dont think i will have the time to try to implement this.
> 
> I have alot more ideas than i have time to try sadly, if you wan/can/or did
> try any of above that would be interresting
> 

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [FFmpeg-devel] [PATCH 2/4] avcodec/get_bits: Avoid 2nd bitstream read in GET_VLC() if bits are known at build and small
  2023-10-30 20:49       ` Andreas Rheinhardt
@ 2023-10-31  0:25         ` Michael Niedermayer
  0 siblings, 0 replies; 9+ messages in thread
From: Michael Niedermayer @ 2023-10-31  0:25 UTC (permalink / raw)
  To: FFmpeg development discussions and patches


[-- Attachment #1.1: Type: text/plain, Size: 2151 bytes --]

On Mon, Oct 30, 2023 at 09:49:07PM +0100, Andreas Rheinhardt wrote:
> Michael Niedermayer:
[...]

> > 
> > also i was wondering about a vlc reader thats entirely free of conditional
> > branches. Just a loop that in each iteration would step by 0-n symbols
> > forward and update a pointer to which table to use next
> 
> Doesn't this have the downside that short symbols need as many
> iterations as the longest one?

i dont think so but maybe iam thinking of something else

lets assume our main table is 10bits
so we read 10 bits look it up in the table and that tells us what there is
and lets assume these 10 bits contain 2 complete symbols and one partial

first we copy from the table a block into our symbol output list (that contains 2 symbols)
second we take a pointer from the table to the next table
    the incomplete can either now be handled in the next iteration or we
    could point back to the main 10bit table and only handle the
    2 complete in this iteration
third we move bits (10) and symbols output pointer (2) forward

now we go back to the start of the loop and continue handling the partial symbol
first we copy from the table a block into our symbol output list (that contains 0 symbols as our partial one still is unfinished)
second we take a pointer from the table to the next table
    this is the next table to decode the long symbol
third we move bits (10) and symbols output pointer (0) forward

now we go back to the start of the loop and continue handling the partial symbol
first we copy from the table a block into our symbol output list (that finally completes the long symbol so we output it)
second we take a pointer from the table to the next which is back at the main table
third we move bits (x) and symbols output pointer (1) forward


thx

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Frequently ignored answer#1 FFmpeg bugs should be sent to our bugtracker. User
questions about the command line tools should be sent to the ffmpeg-user ML.
And questions about how to use libav* should be sent to the libav-user ML.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 251 bytes --]

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-10-31  0:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-24 15:04 [FFmpeg-devel] [PATCH 1/4] avcodec/magicyuv: Use a compile time constant for vlc_bits Michael Niedermayer
2023-10-24 15:04 ` [FFmpeg-devel] [PATCH 2/4] avcodec/get_bits: Avoid 2nd bitstream read in GET_VLC() if bits are known at build and small Michael Niedermayer
2023-10-27  3:10   ` Andreas Rheinhardt
2023-10-27 18:38     ` Michael Niedermayer
2023-10-30 20:49       ` Andreas Rheinhardt
2023-10-31  0:25         ` Michael Niedermayer
2023-10-24 15:04 ` [FFmpeg-devel] [PATCH 3/4] avcodec/get_bits: Implement get_vlc_multi() Michael Niedermayer
2023-10-24 15:04 ` [FFmpeg-devel] [PATCH 4/4] avcodec/magicyuv: Set UNCHECKED_BITSTREAM_READER Michael Niedermayer
2023-10-26 21:37 ` [FFmpeg-devel] [PATCH 1/4] avcodec/magicyuv: Use a compile time constant for vlc_bits Andreas Rheinhardt

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git