From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTPS id 004A34CBB9
	for <ffmpegdev@gitmailbox.com>; Fri, 11 Apr 2025 20:51:00 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2CF9768C40A;
	Fri, 11 Apr 2025 23:50:56 +0300 (EEST)
Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net
 [217.70.183.197])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A32A268C37D
 for <ffmpeg-devel@ffmpeg.org>; Fri, 11 Apr 2025 23:50:49 +0300 (EEST)
Received: by mail.gandi.net (Postfix) with ESMTPSA id 925CC43A22
 for <ffmpeg-devel@ffmpeg.org>; Fri, 11 Apr 2025 20:50:48 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=niedermayer.cc;
 s=gm1; t=1744404648;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:mime-version:mime-version:content-type:content-type:
 in-reply-to:in-reply-to:references:references;
 bh=pjuMR0Y8H/5NvMDqVSnJxTCNBG7/ahnOLwYXVcx6sS0=;
 b=UntrMC8q1R6Fq5uR/OEkdXBgOKa5V/YjuEKW20KtEcunRMVd6/oBzM7lBz6i4ObGOhrEqF
 dlutTcques7mWY8Ezi7BF4e4RzSjb5UhGVnhP2eWFnEsNhU+dYajwGyIt7kjQkOfKUOaLV
 S9zyJ7mBxfv+/je/m6j7TYaedvDm5zfzLNvSWPjkTAkgbXJyA/w9fB0EizAbreGapYWXug
 CiiIyqXk3pmETDFC71hW9l/eQw7mB/ReJNKJHth2Hch0/o+R3oh9Cw414kIMLFaEr9fC/m
 swHwAaZSk2mr8+8HIcUA3sMQdmwKJNc7TaibnzdkiLwK5/eMCsFqzwdecd5PCA==
Date: Fri, 11 Apr 2025 22:50:47 +0200
From: Michael Niedermayer <michael@niedermayer.cc>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Message-ID: <20250411205047.GD4991@pb2>
References: <20250408101959.GP4991@pb2>
MIME-Version: 1.0
In-Reply-To: <20250408101959.GP4991@pb2>
X-GND-State: clean
X-GND-Score: -85
X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvuddvkeduucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuifetpfffkfdpucggtfgfnhhsuhgsshgtrhhisggvnecuuegrihhlohhuthemuceftddunecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenfghrlhcuvffnffculdduhedmnecujfgurhepfffhvffukfhfgggtuggjsehgtderredttddvnecuhfhrohhmpefoihgthhgrvghlucfpihgvuggvrhhmrgihvghruceomhhitghhrggvlhesnhhivgguvghrmhgrhigvrhdrtggtqeenucggtffrrghtthgvrhhnpeffledtfeevfeffheeuuefhtdejieelueeftdeitdfgheetgefffeefteekffdthfenucffohhmrghinhepfhhfmhhpvghgrdhorhhgnecukfhppeeguddrieeirdeijedruddufeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepihhnvghtpeeguddrieeirdeijedruddufedphhgvlhhopehlohgtrghlhhhoshhtpdhmrghilhhfrhhomhepmhhitghhrggvlhesnhhivgguvghrmhgrhigvrhdrtggtpdhnsggprhgtphhtthhopedupdhrtghpthhtohepfhhfmhhpvghgqdguvghvvghlsehffhhmphgvghdrohhrgh
X-GND-Sasl: michael@niedermayer.cc
Subject: Re: [FFmpeg-devel] [RFC] AVDictionary2
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Content-Type: multipart/mixed; boundary="===============4991605088193853599=="
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/20250411205047.GD4991@pb2/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>


--===============4991605088193853599==
Content-Type: multipart/signed; micalg=pgp-sha512;
	protocol="application/pgp-signature"; boundary="D5cEKevPIgUj4x1u"
Content-Disposition: inline


--D5cEKevPIgUj4x1u
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Apr 08, 2025 at 12:19:59PM +0200, Michael Niedermayer wrote:
> Hi all
>=20
> As i have too many things to do already i did the most logic thing and
> started thinking about a new and unrelated idea.
>=20
> This is a list of problems and ideas, that everyone is welcome to add to =
and
> comment on.
>=20
> AVDictionary is just bad.
>=20
> * its complicated internally with
>   unneeded alternative (AV_DICT_DONT_STRDUP_VAL/KEY) these are rarely used
>   and probably not relevant for performance.
>=20
> * all basic operations are as slow as possible.
>   you want to find, update or remove an entry, search through all entries
>=20
> * its heavy on memory allocations
>   1 malloc for key, 1 malloc for value, 1 realloc on the AVDictionaryEntr=
y array
>   that makes 2+ malloc() for every "foo"=3D"bar"
>=20
> Ideas:
> 1. put the node struct (AVDictionaryEntry), the key and value in the same
>    allocated block, 1 malloc() instead of 2.
>    We can simply concatenate the key and value string, we could even use =
the
>    0 terminator instead of the 2nd pointer. Either way the whole
>    can go to the end of the Node structure for a tree
> 1b. Now if we did put the key and value together, we can order in the tree
>    by this combined entity. Why ? because now we have a unique ordering
>    and also the key+value could be required to be always unique. Simplify=
ing
>    things from what we have now and making it more replicatable, no
>    more changes in output because order changed
> 2. We have a simple AVL tree implementation which we could use to make
>    all operations O(log n) instead of O(n)
> 3. We could go with hash tables, splay trees, critbit trees or something
>    else. hash tables have issues with malicious/odd input which would
>    require more complexity to workaround.
>=20
> Of course we could also go a step further and eliminate the malloc per
> node and put it all in a linear array.
>         As in, insert -> append at the end,
>         realloc with every power of 2 size increase
>         complete rebuild once enough elements are removed
>     not sure this isnt overkill for a metadata string dictionary
>=20
> I probably wont have time to implement this in the near future but as i
> was thinking about this, it seemed to make sense to write this down and
> post here
>=20
> git grep av_dict | wc is 1436
>=20
> So its used a bit, justifying looking at improving it
>=20
>=20
> git grep AV_DICT_DONT_STRDUP | wc is 87
> git grep AV_DICT_DONT_STRDUP libavutil/ tests doc | wc is 20
>=20
> Seems not too common and one malloc/copy of a string once per metadata en=
try
> which is once per file generally, seems a strange optimization to me

If anyone wants a testcase that shows dictionary speed issues, try to do
anything with this file:
https://samples.ffmpeg.org/nut/meta.nut

its a single image 256x256 jpeg with a lot of metadata

thx

[...]

--=20
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Its not that you shouldnt use gotos but rather that you should write
readable code and code with gotos often but not always is less readable

--D5cEKevPIgUj4x1u
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iF0EABEKAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCZ/mAowAKCRBhHseHBAsP
q9CzAJ9oOQcaaplYQCXY5mWqe+LnEoqrjACglKrZaPgj45SmlN2Y/8JdIHsC7Bk=
=37Z4
-----END PGP SIGNATURE-----

--D5cEKevPIgUj4x1u--

--===============4991605088193853599==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

--===============4991605088193853599==--