On Thu, Feb 13, 2025 at 10:29:33PM +0100, Tomas Härdin wrote: > tor 2025-02-13 klockan 13:03 +0100 skrev Michael Niedermayer: > > On Thu, Feb 13, 2025 at 12:40:24PM +0100, Tomas Härdin wrote: > > > ons 2025-02-12 klockan 23:03 +0100 skrev Michael Niedermayer: > > > > On Wed, Feb 12, 2025 at 12:03:37PM +0100, Tomas Härdin wrote: > > > > > tor 2025-02-06 klockan 15:58 +0100 skrev Michael Niedermayer: > > > > > > Hi Tomas > > > > > > > > > > > > On Wed, Feb 05, 2025 at 03:24:24PM +0100, Tomas Härdin wrote: > > > > > > > Seems reasonable to me and passes FATE > > > > > > > > > > > > > > /Tomas > > > > > > > > > > > > >  avformat.h   |    2 +- > > > > > > >  format.c     |    8 ++++---- > > > > > > >  libopenmpt.c |    2 +- > > > > > > >  3 files changed, 6 insertions(+), 6 deletions(-) > > > > > > > 01f04f79202640330d6be91b0215f92f14d1845a  0008-Make-mime- > > > > > > > type- > > > > > > > award-a-bonus-probe-score.patch > > > > > > > From ecc3459990f2871fd907f96fe66362b8fea41bd8 Mon Sep 17 > > > > > > > 00:00:00 > > > > > > > 2001 > > > > > > > From: =?UTF-8?q?Peter=20Zeb=C3=BChr?= > > > > > > > Date: Tue, 21 Nov 2023 14:16:49 +0100 > > > > > > > Subject: [PATCH 8/8] Make mime-type award a bonus probe > > > > > > > score > > > > > > > > > > > > > > This changes the default behaviour of ffmpeg where content- > > > > > > > type > > > > > > > headers > > > > > > > on an input gives an absolut probe score (of 75) to instead > > > > > > > give a > > > > > > > bonus > > > > > > > score (of 30). This gives the probe a better chance to > > > > > > > arrive > > > > > > > at > > > > > > > the > > > > > > > correct format by (hopefully) giving a large enough bonus > > > > > > > to > > > > > > > push > > > > > > > edge > > > > > > > cases in the right direction (MPEG-PS vs MP3, I am looking > > > > > > > at > > > > > > > you) > > > > > > > while > > > > > > > also not adversly punishing clearer cases (raw ADTS marked > > > > > > > as > > > > > > > "audio/mpeg" for example). > > > > > > > > > > > > > > This patch was regression tested against 20 million recent > > > > > > > podcast > > > > > > > submissions (after content-type propagation was added to > > > > > > > original-storage), and 50k Juno vodcasts submissions > > > > > > > (dito). No > > > > > > > adverse > > > > > > > effects observed (but the bonus may still need tweaking if > > > > > > > other > > > > > > > edge > > > > > > > cases are detected in production). > > > > > > > --- > > > > > > >  libavformat/avformat.h   | 2 +- > > > > > > >  libavformat/format.c     | 8 ++++---- > > > > > > >  libavformat/libopenmpt.c | 2 +- > > > > > > >  3 files changed, 6 insertions(+), 6 deletions(-) > > > > > > > > > > > > what is the score ? > > > > > > a higher score means more likely but how much more ? > > > > > > maybe we should come up with a more formal definition > > > > > > like that score is the number of bits of entropy that where > > > > > > checked > > > > > > or > > > > > > something like that. > > > > > > in such a framework, adding 30 for a mime type match would > > > > > > probably > > > > > > make sense > > > > > > > > > > > > without such a framework, adding 30 to a abstract score is > > > > > > hard > > > > > > to > > > > > > review > > > > > > beyond that, i dont see anything breaking from this but then > > > > > > i > > > > > > dont think we have real tests for mime types > > > > > > > > > > We don't really have tests for the probe scores at all, which > > > > > is a > > > > > problem. Perhaps if we collected some tricky samples we could > > > > > construct > > > > > a test that demands a certain ordering of probe scores for > > > > > them? > > > > > For > > > > > now scores are tested indirectly by the fact that most tests > > > > > rely > > > > > on > > > > > correct probing > > > > > > > > we have > > > > tools/probetest > > > > > > > > probetest [-f ] [ []] > > > > > > Yeah but that only tests with random data, not say an ordering of > > > probe > > > scores for actual test files. > > > > yes, it could/should be extended > > > > probetest as is is still quite usefull though as it catches probe > > functions > > which give high scores on random trash > > Might be better to leverage afl-fuzz since it is more wily in its > tricks to provoke different program behavior. Then exit(1) whenever the > test program probes something incorrectly. For example you could start > with a small, valid MPEG-PS file and have afl-fuzz generate slightly > different versions of it that don't probe as such A real fuzzer will make every probe, probe incorrectly. Maybe i misunderstood what you suggested what we want is that 1. Random binary, random ascii, randon utf8 and intermediates do not get detected as any format (thats what probetest does) 2. that format A is detected more as format A than format B where B != A we and our users test this by simply using ffmpeg and fate Testing that a "randomly damaged" A is still detected as A. Iam not sure this is actuallly generally usefull. When such A doesnt exist it would constrain our probing code for no gain. And i think real world files are poorly modelled by randomly (bit wise)damaged files having a really large corpus of real world odd files and test probing on them seems the "ideal" way to test probing to me [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB "You are 36 times more likely to die in a bathtub than at the hands of a terrorist. Also, you are 2.5 times more likely to become a president and 2 times more likely to become an astronaut, than to die in a terrorist attack." -- Thoughty2