[FFmpeg-devel] [RFC] AVDictionary2

* [FFmpeg-devel] [RFC] AVDictionary2
@ 2025-04-08 10:19 Michael Niedermayer
  2025-04-08 16:10 ` Romain Beauxis
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Michael Niedermayer @ 2025-04-08 10:19 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

[-- Attachment #1.1: Type: text/plain, Size: 2757 bytes --]

Hi all

As i have too many things to do already i did the most logic thing and
started thinking about a new and unrelated idea.

This is a list of problems and ideas, that everyone is welcome to add to and
comment on.

AVDictionary is just bad.

* its complicated internally with
  unneeded alternative (AV_DICT_DONT_STRDUP_VAL/KEY) these are rarely used
  and probably not relevant for performance.

* all basic operations are as slow as possible.
  you want to find, update or remove an entry, search through all entries

* its heavy on memory allocations
  1 malloc for key, 1 malloc for value, 1 realloc on the AVDictionaryEntry array
  that makes 2+ malloc() for every "foo"="bar"

Ideas:
1. put the node struct (AVDictionaryEntry), the key and value in the same
   allocated block, 1 malloc() instead of 2.
   We can simply concatenate the key and value string, we could even use the
   0 terminator instead of the 2nd pointer. Either way the whole
   can go to the end of the Node structure for a tree
1b. Now if we did put the key and value together, we can order in the tree
   by this combined entity. Why ? because now we have a unique ordering
   and also the key+value could be required to be always unique. Simplifying
   things from what we have now and making it more replicatable, no
   more changes in output because order changed
2. We have a simple AVL tree implementation which we could use to make
   all operations O(log n) instead of O(n)
3. We could go with hash tables, splay trees, critbit trees or something
   else. hash tables have issues with malicious/odd input which would
   require more complexity to workaround.

Of course we could also go a step further and eliminate the malloc per
node and put it all in a linear array.
        As in, insert -> append at the end,
        realloc with every power of 2 size increase
        complete rebuild once enough elements are removed
    not sure this isnt overkill for a metadata string dictionary

I probably wont have time to implement this in the near future but as i
was thinking about this, it seemed to make sense to write this down and
post here

git grep av_dict | wc is 1436

So its used a bit, justifying looking at improving it

git grep AV_DICT_DONT_STRDUP | wc is 87
git grep AV_DICT_DONT_STRDUP libavutil/ tests doc | wc is 20

Seems not too common and one malloc/copy of a string once per metadata entry
which is once per file generally, seems a strange optimization to me

thx

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety -- Benjamin Franklin

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 251 bytes --]

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 19+ messages in thread