From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTPS id 6777C4C64C
	for <ffmpegdev@gitmailbox.com>; Tue,  8 Apr 2025 20:29:48 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 767876898A6;
	Tue,  8 Apr 2025 23:29:45 +0300 (EEST)
Received: from relay2-d.mail.gandi.net (relay2-d.mail.gandi.net
 [217.70.183.194])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2348B687FCC
 for <ffmpeg-devel@ffmpeg.org>; Tue,  8 Apr 2025 23:29:39 +0300 (EEST)
Received: by mail.gandi.net (Postfix) with ESMTPSA id 69639442A8
 for <ffmpeg-devel@ffmpeg.org>; Tue,  8 Apr 2025 20:29:38 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=niedermayer.cc;
 s=gm1; t=1744144178;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:mime-version:mime-version:content-type:content-type:
 in-reply-to:in-reply-to:references:references;
 bh=aWOzMh8veXAqZ+uHaqlyJxZiTPfEw9c5htxxXH/Fc50=;
 b=HVoS/s0PitETavVhO8yPjgD/0qZFr8Nz3Td8Lpk2xMu6pJoEOIW9Jhsfs5cGSaxUphwFyT
 OaNLBWL8d6zuDLGGmQcl8K83Tadr6SICnyLVa/d2OdV1APbwsckmOIVe9knXQywxktP2PN
 fJGzWWx1RJiczNTeRaOqgxZgUSJIj9anoRKxgLdnyylMtKdhKMQEBKG+0kJ5zpltc4IjUI
 vSpcw80tCqb1+cK56I/QuTe4UaHG2UcKiQygYQwgCrlG1f0imMP6LWABADR+Tyns6O6jj0
 fFIs8L7PqYNw7fZ+goYsaNTN9hwpdS/UlGUnEYpOo88g6tsluE0IaXM+MGnhaQ==
Date: Tue, 8 Apr 2025 22:29:37 +0200
From: Michael Niedermayer <michael@niedermayer.cc>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Message-ID: <20250408202937.GS4991@pb2>
References: <20250408101959.GP4991@pb2>
 <CABWZ6OTR-Yuxo5g-ttq20LX2PHsAWteFGr_oz6FHc1p=3X5VwQ@mail.gmail.com>
MIME-Version: 1.0
In-Reply-To: <CABWZ6OTR-Yuxo5g-ttq20LX2PHsAWteFGr_oz6FHc1p=3X5VwQ@mail.gmail.com>
X-GND-State: clean
X-GND-Score: -70
X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtdegtdehucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuifetpfffkfdpucggtfgfnhhsuhgsshgtrhhisggvnecuuegrihhlohhuthemuceftddunecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenfghrlhcuvffnffculdeftddmnecujfgurhepfffhvffukfhfgggtuggjsehgtderredttddunecuhfhrohhmpefoihgthhgrvghlucfpihgvuggvrhhmrgihvghruceomhhitghhrggvlhesnhhivgguvghrmhgrhigvrhdrtggtqeenucggtffrrghtthgvrhhnpedutedvhfduuedugedufefghefhvedvgffgffekhfdvgfdvtefftdejkeehteefheenucfkphepgedurdeiiedrieejrdduudefnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehinhgvthepgedurdeiiedrieejrdduudefpdhhvghloheplhhotggrlhhhohhsthdpmhgrihhlfhhrohhmpehmihgthhgrvghlsehnihgvuggvrhhmrgihvghrrdgttgdpnhgspghrtghpthhtohepuddprhgtphhtthhopehffhhmphgvghdquggvvhgvlhesfhhfmhhpvghgrdhorhhg
X-GND-Sasl: michael@niedermayer.cc
Subject: Re: [FFmpeg-devel] [RFC] AVDictionary2
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Content-Type: multipart/mixed; boundary="===============8953897302461974718=="
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/20250408202937.GS4991@pb2/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>


--===============8953897302461974718==
Content-Type: multipart/signed; micalg=pgp-sha512;
	protocol="application/pgp-signature"; boundary="ST1+xW5JvyJKAn+j"
Content-Disposition: inline


--ST1+xW5JvyJKAn+j
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Apr 08, 2025 at 11:10:21AM -0500, Romain Beauxis wrote:
> Le mar. 8 avr. 2025 =E0 05:20, Michael Niedermayer
> <michael@niedermayer.cc> a =E9crit :
> >
> > Hi all
> >
> > As i have too many things to do already i did the most logic thing and
> > started thinking about a new and unrelated idea.
> >
> > This is a list of problems and ideas, that everyone is welcome to add t=
o and
> > comment on.
> >
> > AVDictionary is just bad.
> >
> > * its complicated internally with
> >   unneeded alternative (AV_DICT_DONT_STRDUP_VAL/KEY) these are rarely u=
sed
> >   and probably not relevant for performance.
> >
> > * all basic operations are as slow as possible.
> >   you want to find, update or remove an entry, search through all entri=
es
> >
> > * its heavy on memory allocations
> >   1 malloc for key, 1 malloc for value, 1 realloc on the AVDictionaryEn=
try array
> >   that makes 2+ malloc() for every "foo"=3D"bar"
> >
> > Ideas:
> > 1. put the node struct (AVDictionaryEntry), the key and value in the sa=
me
> >    allocated block, 1 malloc() instead of 2.
> >    We can simply concatenate the key and value string, we could even us=
e the
> >    0 terminator instead of the 2nd pointer. Either way the whole
> >    can go to the end of the Node structure for a tree
> > 1b. Now if we did put the key and value together, we can order in the t=
ree
> >    by this combined entity. Why ? because now we have a unique ordering
> >    and also the key+value could be required to be always unique. Simpli=
fying
> >    things from what we have now and making it more replicatable, no
> >    more changes in output because order changed
> > 2. We have a simple AVL tree implementation which we could use to make
> >    all operations O(log n) instead of O(n)
> > 3. We could go with hash tables, splay trees, critbit trees or something
> >    else. hash tables have issues with malicious/odd input which would
> >    require more complexity to workaround.
> >
> > Of course we could also go a step further and eliminate the malloc per
> > node and put it all in a linear array.
> >         As in, insert -> append at the end,
> >         realloc with every power of 2 size increase
> >         complete rebuild once enough elements are removed
> >     not sure this isnt overkill for a metadata string dictionary
> >
> > I probably wont have time to implement this in the near future but as i
> > was thinking about this, it seemed to make sense to write this down and
> > post here
> >
> > git grep av_dict | wc is 1436
> >
> > So its used a bit, justifying looking at improving it
> >
> >
> > git grep AV_DICT_DONT_STRDUP | wc is 87
> > git grep AV_DICT_DONT_STRDUP libavutil/ tests doc | wc is 20
> >
> > Seems not too common and one malloc/copy of a string once per metadata =
entry
> > which is once per file generally, seems a strange optimization to me
>=20
> Some questions that could be relevant:
[...]

>
> * Any interest in storing multiple values for the same key? This seems
> like a niche case but, as you pointed out in another thread, typically
> vorbis metadata do allow multiple key/values for the same field.

For a single key multiple values should not be stored
You can do
Author1=3DEve
Author2=3DAdam
or
Author=3DAdam and Eve

But dont do
Author=3DEve
Author=3DAdam
because if you do that and then you get later a
Author=3DLilith
what does that mean? that its now 1 Author or 3 Authors
or 2 and if 2 then which 2 ?

Or said another way, you cant have multiple identical keys like that AND
allow updates.



>=20
> * Any interest in storing an optional encoding value for text strings?

encoding is UTF-8 unicode
you can use "Private Use Areas" within unicode if you want to export
characters that the source failed to map to unicode

How we do this exactly is up to debate. But it seems more powerfull
and simpler for te user than to require the user app to handle every
encoding

There are 4 potential cases, i think
1. We are sure what a symbol means and we return only that in unicode
2. We are sure what a symbol means and we return that in unicode AND the
   source 8bit char in a PUA
3. We are not sure what a symbol means and we return our best guess in
   unicode AND the source 8bit char in a PUA
4. We are not sure what a symbol means and we return the source 8bit char
   in a PUA only

If we do that we would need 512 values from a PUA 2 sets of 256 one to
follow up on the last unicode symbol and one that comes alone


> This could be very useful to increase interoperability between legacy
> systems. Typically, a lot of icecast ICY metadata are still passed as
> latin1. This way, the library could pass them unchanged and let the
> system decide what to do with them.

If we do the suggested PUA case above, the a muxer could use either the
standard unicode or PUA values


>=20
> * Any interest in having alternative value for key names? Most
> metadata systems carry their own naming conventions that are then
> mapped to conventional/normalized names like TIT2 for title in ID3v2
> frames. Having key name aliases could allow the library to refer to
> their own normalized values while allowing a transparent end-to-end
> handling of e.g. ID3v2 where you could dump the exact same frames
> using their native keys.

maybe the native keys could be attached somehow as extra information
iam not sure about complexity though

Title(TIT2)=3D...


>=20
> * Similarly, any interest in carrying a source indicator? One of the
> reasons the recent AV_DICT_DEDUP commit as suggested was to deal with
> the same metadata key coming from two different sources. With a.
> source indicator you can let the metadata flow end-to-end and let the
> user make decisions about what to do in these cases.

This feels like very similar to the "TIT2" case above

thx

[...]
--=20
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Never trust a computer, one day, it may think you are the virus. -- Compn

--ST1+xW5JvyJKAn+j
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iF0EABEKAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCZ/WHLQAKCRBhHseHBAsP
q7nEAJ49+lV3hHTRhfhr7PaKTxKJC3l3swCghCx5DDlvJWa1i65chq3hb7zMEow=
=1oTz
-----END PGP SIGNATURE-----

--ST1+xW5JvyJKAn+j--

--===============8953897302461974718==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

--===============8953897302461974718==--