From: "softworkz ." <softworkz-at-hotmail.com@ffmpeg.org>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [RFC] AVDictionary2
Date: Tue, 8 Apr 2025 21:30:16 +0000
Message-ID: <DM8P223MB036537514113E407B2437CF3BAB52@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <20250408194502.GR4991@pb2>
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> Michael Niedermayer
> Sent: Dienstag, 8. April 2025 21:45
> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [RFC] AVDictionary2
>
> On Tue, Apr 08, 2025 at 06:36:55PM +0000, softworkz . wrote:
> >
> >
> > > -----Original Message-----
> > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> > > Michael Niedermayer
> > > Sent: Dienstag, 8. April 2025 20:16
> > > To: FFmpeg development discussions and patches <ffmpeg-
> devel@ffmpeg.org>
> > > Subject: Re: [FFmpeg-devel] [RFC] AVDictionary2
> > >
> > > Hi softworkz
> > >
> > > On Tue, Apr 08, 2025 at 04:56:36PM +0000, softworkz . wrote:
> > > [...]
> > > > Hi Michael,
> > > >
> > > > it's been a while, but as far as memory serves, wasn't a linear
> search
> > > even more efficient than other methods as long as we're dealing with
> no
> > > more than a few dozens of items?
> > >
> > > a dozen is 12, so a few dozen would minimally be 24
> > >
> > > at average to find an entry in a list of 24 you need 12 comparisions
> > > with a
> > > linear search and 24 in worst case
> > >
> > > an AVL tree with 24 entries i think needs 7 comparisions in the
> worst
> > > case
> > > So its certainly faster in number of comparisions
> > >
> > > the cost of strcmp() and overhead then come into play but small sets
> > > arent really what seperates the 2 choices.
> > > The seperation happens with there are many entries. dictionary is
> > > generic
> > > if you had a million entries a linear search will take about a
> million
> > > comparisions, the AVL tree should need less than ~30 in the worst
> case
> > > thats 5 orders of magnitude difference
> > >
> > >
> > > >
> > > > In turn, my question would be whether we even have use cases with
> > > hundreds or thousands of dictionary entries?
> > >
> > > We use dictionary for metadata and options mainly.
> > > It would be possible to also use a linear list until the number of
> > > entries reaches a threshold
> >
> > LOL, sorry I really didn't want to make it even more complicated.
> >
> > Sticking on that side for a moment though, what you have skipped in
> the comparison above is the insertion cost, because the insertion cost
> is what buys you the 7 instead of 24 (worst) or x instead of 12
> (average) comparisons on lookup. One of my takeaways in that area was
> that there's always a break-even point below of which there's nothing to
> win.
> >
> > At the bottom line, I love optimizations and for dictionaries with
> larger amounts, everything you said is perfectly valid of course. What I
> tried to ask is just whether we actually have any case of dictionary use
> that would benefit from that kind of optimization?
>
> I know that years ago there was some case in the command line option
> handling
> where some linear search resulted in some O(n^3) which was noticable
> I dont remember if that was a AVDictionary
>
> also, if we use a linear search, what should we do with a file that
> contains 10k or 100k+ entries ?
> and then something checks for example for each of these entries if
> theres
> a corresponidng one in the local language, so for 100k entries someone
> could do a linear lookup that fails thus 100k * 100k
> This is a constructed case but it sounds plausible to me with such a
> file
>
> If we do a linear search then everyone needs to be carefull what they
> use AVDictionary for.
All granted, and surely, the current implementation hardly deserves its name because it's definitely not what you would expect from a dictionary.
I'm not responding arbitrarily to this topic. I had just recently spent some thought on it, as you'll quickly find out what it does - at least when working inside the Ffmpeg source.
One of those cases is FFprobe output filtering by using -show_entries to select specific fields. There are two actual hot paths which are printing of frame and packet data (each including descendants). In case when -show_entries is specified, the desired fields are stored in an AVDictionary (per section).
The frame section has 34 fields atm (https://softworkz.github.io/ffmpeg_output_apis/ffprobe_schema.html), so the maximum reasonable number of entries is 33, while in the typical case, the number is likely a lot less. Insert performance is irrelevant as it happens only once on start but the lookup count can be excessive - e.g. 34k lookups for 1k frames.
In this context I was wondering whether a different dictionary implementation might provide any benefit and I concluded that it probably won't - at least not for small numbers of selected fields.
In that context I made another experiment, trying to find the fastest way to print all video keyframe times using packet data only. The keyframe packet-filter was hard-coded and I used show_entries, specifying only pts_time for packets. Packet section has 11 fields atm, so 11k dictionary lookup for 1k packets. I compared that to a patch which only prints pts_time, completely skipping the regular printing code (no dictionary lookups) and gained only 15% (3s instead of 3.5s). So, eventually, the cost of those 11k lookups (albeit in a single-entry dictionary) was a lot less than expected.
To tell you the truth - at that point I was thinking: "Ah, clever! That's why the AVDictionary is done like that" 😊
So, this is the background of my previous replies - otherwise of course, I have nothing against a better dictionary.
Best,
sw
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2025-04-08 21:30 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-08 10:19 Michael Niedermayer
2025-04-08 16:10 ` Romain Beauxis
2025-04-08 20:29 ` Michael Niedermayer
2025-04-08 22:18 ` Gerion Entrup
2025-04-08 22:35 ` Michael Niedermayer
2025-04-08 22:37 ` softworkz .
2025-04-08 16:56 ` softworkz .
2025-04-08 18:16 ` Michael Niedermayer
2025-04-08 18:36 ` softworkz .
2025-04-08 19:45 ` Michael Niedermayer
2025-04-08 21:30 ` softworkz . [this message]
2025-04-11 19:06 ` Michael Niedermayer
2025-04-12 1:41 ` softworkz .
2025-04-12 11:02 ` softworkz .
2025-04-09 0:00 ` Leo Izen
2025-04-09 16:56 ` Michael Niedermayer
2025-04-10 8:40 ` Nicolas George
2025-04-10 18:31 ` softworkz .
2025-04-11 20:50 ` Michael Niedermayer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DM8P223MB036537514113E407B2437CF3BAB52@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM \
--to=softworkz-at-hotmail.com@ffmpeg.org \
--cc=ffmpeg-devel@ffmpeg.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git