On Wed, Feb 12, 2025 at 12:03:37PM +0100, Tomas Härdin wrote:
> tor 2025-02-06 klockan 15:58 +0100 skrev Michael Niedermayer:
> > Hi Tomas
> > 
> > On Wed, Feb 05, 2025 at 03:24:24PM +0100, Tomas Härdin wrote:
> > > Seems reasonable to me and passes FATE
> > > 
> > > /Tomas
> > 
> > >  avformat.h   |    2 +-
> > >  format.c     |    8 ++++----
> > >  libopenmpt.c |    2 +-
> > >  3 files changed, 6 insertions(+), 6 deletions(-)
> > > 01f04f79202640330d6be91b0215f92f14d1845a  0008-Make-mime-type-
> > > award-a-bonus-probe-score.patch
> > > From ecc3459990f2871fd907f96fe66362b8fea41bd8 Mon Sep 17 00:00:00
> > > 2001
> > > From: =?UTF-8?q?Peter=20Zeb=C3=BChr?= <peterz@spotify.com>
> > > Date: Tue, 21 Nov 2023 14:16:49 +0100
> > > Subject: [PATCH 8/8] Make mime-type award a bonus probe score
> > > 
> > > This changes the default behaviour of ffmpeg where content-type
> > > headers
> > > on an input gives an absolut probe score (of 75) to instead give a
> > > bonus
> > > score (of 30). This gives the probe a better chance to arrive at
> > > the
> > > correct format by (hopefully) giving a large enough bonus to push
> > > edge
> > > cases in the right direction (MPEG-PS vs MP3, I am looking at you)
> > > while
> > > also not adversly punishing clearer cases (raw ADTS marked as
> > > "audio/mpeg" for example).
> > > 
> > > This patch was regression tested against 20 million recent podcast
> > > submissions (after content-type propagation was added to
> > > original-storage), and 50k Juno vodcasts submissions (dito). No
> > > adverse
> > > effects observed (but the bonus may still need tweaking if other
> > > edge
> > > cases are detected in production).
> > > ---
> > >  libavformat/avformat.h   | 2 +-
> > >  libavformat/format.c     | 8 ++++----
> > >  libavformat/libopenmpt.c | 2 +-
> > >  3 files changed, 6 insertions(+), 6 deletions(-)
> > 
> > what is the score ?
> > a higher score means more likely but how much more ?
> > maybe we should come up with a more formal definition
> > like that score is the number of bits of entropy that where checked
> > or
> > something like that.
> > in such a framework, adding 30 for a mime type match would probably
> > make sense
> > 
> > without such a framework, adding 30 to a abstract score is hard to
> > review
> > beyond that, i dont see anything breaking from this but then i
> > dont think we have real tests for mime types
> 
> We don't really have tests for the probe scores at all, which is a
> problem. Perhaps if we collected some tricky samples we could construct
> a test that demands a certain ordering of probe scores for them? For
> now scores are tested indirectly by the fact that most tests rely on
> correct probing

we have
tools/probetest

probetest [-f <input format>] [<retry_count> [<max_size>]]


> 
> Also you can't really "formalize" social relations. The reason why
> certain files probe as one thing and not another is down to certain
> workflows that demand such behavior, which also entails some workflows
> being rejected, or at least requiring explicit -f. 

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Awnsering whenever a program halts or runs forever is
On a turing machine, in general impossible (turings halting problem).
On any real computer, always possible as a real computer has a finite number
of states N, and will either halt in less than N cycles or never halt.