Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
From: "softworkz ." <softworkz-at-hotmail.com@ffmpeg.org>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [RFC] AVDictionary2
Date: Tue, 8 Apr 2025 21:30:16 +0000
Message-ID: <DM8P223MB036537514113E407B2437CF3BAB52@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <20250408194502.GR4991@pb2>



> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> Michael Niedermayer
> Sent: Dienstag, 8. April 2025 21:45
> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [RFC] AVDictionary2
> 
> On Tue, Apr 08, 2025 at 06:36:55PM +0000, softworkz . wrote:
> >
> >
> > > -----Original Message-----
> > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> > > Michael Niedermayer
> > > Sent: Dienstag, 8. April 2025 20:16
> > > To: FFmpeg development discussions and patches <ffmpeg-
> devel@ffmpeg.org>
> > > Subject: Re: [FFmpeg-devel] [RFC] AVDictionary2
> > >
> > > Hi softworkz
> > >
> > > On Tue, Apr 08, 2025 at 04:56:36PM +0000, softworkz . wrote:
> > > [...]
> > > > Hi Michael,
> > > >
> > > > it's been a while, but as far as memory serves, wasn't a linear
> search
> > > even more efficient than other methods as long as we're dealing with
> no
> > > more than a few dozens of items?
> > >
> > > a dozen is 12, so a few dozen would minimally be 24
> > >
> > > at average to find an entry in a list of 24 you need 12 comparisions
> > > with a
> > > linear search and 24 in worst case
> > >
> > > an AVL tree with 24 entries i think needs 7 comparisions in the
> worst
> > > case
> > > So its certainly faster in number of comparisions
> > >
> > > the cost of strcmp() and overhead then come into play but small sets
> > > arent really what seperates the 2 choices.
> > > The seperation happens with there are many entries. dictionary is
> > > generic
> > > if you had a million entries a linear search will take about a
> million
> > > comparisions, the AVL tree should need less than ~30 in the worst
> case
> > > thats 5 orders of magnitude difference
> > >
> > >
> > > >
> > > > In turn, my question would be whether we even have use cases with
> > > hundreds or thousands of dictionary entries?
> > >
> > > We use dictionary for metadata and options mainly.
> > > It would be possible to also use a linear list until the number of
> > > entries reaches a threshold
> >
> > LOL, sorry I really didn't want to make it even more complicated.
> >
> > Sticking on that side for a moment though, what you have skipped in
> the comparison above is the insertion cost, because the insertion cost
> is what buys you the 7 instead of 24 (worst) or x instead of 12
> (average) comparisons on lookup. One of my takeaways in that area was
> that there's always a break-even point below of which there's nothing to
> win.
> >
> > At the bottom line, I love optimizations and for dictionaries with
> larger amounts, everything you said is perfectly valid of course. What I
> tried to ask is just whether we actually have any case of dictionary use
> that would benefit from that kind of optimization?
> 
> I know that years ago there was some case in the command line option
> handling
> where some linear search resulted in some O(n^3) which was noticable
> I dont remember if that was a AVDictionary
> 
> also, if we use a linear search, what should we do with a file that
> contains 10k or 100k+ entries ?
> and then something checks for example for each of these entries if
> theres
> a corresponidng one in the local language, so for 100k entries someone
> could do a linear lookup that fails thus 100k * 100k
> This is a constructed case but it sounds plausible to me with such a
> file
> 
> If we do a linear search then everyone needs to be carefull what they
> use AVDictionary for.


All granted, and surely, the current implementation hardly deserves its name because it's definitely not what you would expect from a dictionary.

I'm not responding arbitrarily to this topic. I had just recently spent some thought on it, as you'll quickly find out what it does - at least when working inside the Ffmpeg source.
One of those cases is FFprobe output filtering by using -show_entries to select specific fields. There are two actual hot paths which are printing of frame and packet data (each including descendants). In case when -show_entries is specified, the desired fields are stored in an AVDictionary (per section).

The frame section has 34 fields atm (https://softworkz.github.io/ffmpeg_output_apis/ffprobe_schema.html), so the maximum reasonable number of entries is 33, while in the typical case, the number is likely a lot less. Insert performance is irrelevant as it happens only once on start but the lookup count can be excessive - e.g. 34k lookups for 1k frames. 

In this context I was wondering whether a different dictionary implementation might provide any benefit and I concluded that it probably won't - at least not for small numbers of selected fields.

In that context I made another experiment, trying to find the fastest way to print all video keyframe times using packet data only. The keyframe packet-filter was hard-coded and I used show_entries, specifying only pts_time for packets. Packet section has 11 fields atm, so 11k dictionary lookup for 1k packets. I compared that to a patch which only prints pts_time, completely skipping the regular printing code (no dictionary lookups) and gained only 15% (3s instead of 3.5s). So, eventually, the cost of those 11k lookups (albeit in a single-entry dictionary) was a lot less than expected.

To tell you the truth - at that point I was thinking: "Ah, clever! That's why the AVDictionary is done like that" 😊 

So, this is the background of my previous replies - otherwise of course, I have nothing against a better dictionary.


Best,
sw










_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

  reply	other threads:[~2025-04-08 21:30 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-08 10:19 Michael Niedermayer
2025-04-08 16:10 ` Romain Beauxis
2025-04-08 20:29   ` Michael Niedermayer
2025-04-08 22:18     ` Gerion Entrup
2025-04-08 22:35       ` Michael Niedermayer
2025-04-08 22:37       ` softworkz .
2025-04-08 16:56 ` softworkz .
2025-04-08 18:16   ` Michael Niedermayer
2025-04-08 18:36     ` softworkz .
2025-04-08 19:45       ` Michael Niedermayer
2025-04-08 21:30         ` softworkz . [this message]
2025-04-11 19:06           ` Michael Niedermayer
2025-04-12  1:41             ` softworkz .
2025-04-12 11:02             ` softworkz .
2025-04-09  0:00 ` Leo Izen
2025-04-09 16:56   ` Michael Niedermayer
2025-04-10  8:40 ` Nicolas George
2025-04-10 18:31   ` softworkz .
2025-04-11 20:50 ` Michael Niedermayer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM8P223MB036537514113E407B2437CF3BAB52@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM \
    --to=softworkz-at-hotmail.com@ffmpeg.org \
    --cc=ffmpeg-devel@ffmpeg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git