From: Michael Niedermayer <michael@niedermayer.cc>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH 8/8] Make mime-type award a bonus probe score
Date: Thu, 20 Feb 2025 22:08:20 +0100
Message-ID: <20250220210820.GK4991@pb2> (raw)
In-Reply-To: <707df676aa8c92a0ffd72e095d52404466bdb7b4.camel@haerdin.se>
[-- Attachment #1.1: Type: text/plain, Size: 5841 bytes --]
On Thu, Feb 13, 2025 at 10:29:33PM +0100, Tomas Härdin wrote:
> tor 2025-02-13 klockan 13:03 +0100 skrev Michael Niedermayer:
> > On Thu, Feb 13, 2025 at 12:40:24PM +0100, Tomas Härdin wrote:
> > > ons 2025-02-12 klockan 23:03 +0100 skrev Michael Niedermayer:
> > > > On Wed, Feb 12, 2025 at 12:03:37PM +0100, Tomas Härdin wrote:
> > > > > tor 2025-02-06 klockan 15:58 +0100 skrev Michael Niedermayer:
> > > > > > Hi Tomas
> > > > > >
> > > > > > On Wed, Feb 05, 2025 at 03:24:24PM +0100, Tomas Härdin wrote:
> > > > > > > Seems reasonable to me and passes FATE
> > > > > > >
> > > > > > > /Tomas
> > > > > >
> > > > > > > avformat.h | 2 +-
> > > > > > > format.c | 8 ++++----
> > > > > > > libopenmpt.c | 2 +-
> > > > > > > 3 files changed, 6 insertions(+), 6 deletions(-)
> > > > > > > 01f04f79202640330d6be91b0215f92f14d1845a 0008-Make-mime-
> > > > > > > type-
> > > > > > > award-a-bonus-probe-score.patch
> > > > > > > From ecc3459990f2871fd907f96fe66362b8fea41bd8 Mon Sep 17
> > > > > > > 00:00:00
> > > > > > > 2001
> > > > > > > From: =?UTF-8?q?Peter=20Zeb=C3=BChr?= <peterz@spotify.com>
> > > > > > > Date: Tue, 21 Nov 2023 14:16:49 +0100
> > > > > > > Subject: [PATCH 8/8] Make mime-type award a bonus probe
> > > > > > > score
> > > > > > >
> > > > > > > This changes the default behaviour of ffmpeg where content-
> > > > > > > type
> > > > > > > headers
> > > > > > > on an input gives an absolut probe score (of 75) to instead
> > > > > > > give a
> > > > > > > bonus
> > > > > > > score (of 30). This gives the probe a better chance to
> > > > > > > arrive
> > > > > > > at
> > > > > > > the
> > > > > > > correct format by (hopefully) giving a large enough bonus
> > > > > > > to
> > > > > > > push
> > > > > > > edge
> > > > > > > cases in the right direction (MPEG-PS vs MP3, I am looking
> > > > > > > at
> > > > > > > you)
> > > > > > > while
> > > > > > > also not adversly punishing clearer cases (raw ADTS marked
> > > > > > > as
> > > > > > > "audio/mpeg" for example).
> > > > > > >
> > > > > > > This patch was regression tested against 20 million recent
> > > > > > > podcast
> > > > > > > submissions (after content-type propagation was added to
> > > > > > > original-storage), and 50k Juno vodcasts submissions
> > > > > > > (dito). No
> > > > > > > adverse
> > > > > > > effects observed (but the bonus may still need tweaking if
> > > > > > > other
> > > > > > > edge
> > > > > > > cases are detected in production).
> > > > > > > ---
> > > > > > > libavformat/avformat.h | 2 +-
> > > > > > > libavformat/format.c | 8 ++++----
> > > > > > > libavformat/libopenmpt.c | 2 +-
> > > > > > > 3 files changed, 6 insertions(+), 6 deletions(-)
> > > > > >
> > > > > > what is the score ?
> > > > > > a higher score means more likely but how much more ?
> > > > > > maybe we should come up with a more formal definition
> > > > > > like that score is the number of bits of entropy that where
> > > > > > checked
> > > > > > or
> > > > > > something like that.
> > > > > > in such a framework, adding 30 for a mime type match would
> > > > > > probably
> > > > > > make sense
> > > > > >
> > > > > > without such a framework, adding 30 to a abstract score is
> > > > > > hard
> > > > > > to
> > > > > > review
> > > > > > beyond that, i dont see anything breaking from this but then
> > > > > > i
> > > > > > dont think we have real tests for mime types
> > > > >
> > > > > We don't really have tests for the probe scores at all, which
> > > > > is a
> > > > > problem. Perhaps if we collected some tricky samples we could
> > > > > construct
> > > > > a test that demands a certain ordering of probe scores for
> > > > > them?
> > > > > For
> > > > > now scores are tested indirectly by the fact that most tests
> > > > > rely
> > > > > on
> > > > > correct probing
> > > >
> > > > we have
> > > > tools/probetest
> > > >
> > > > probetest [-f <input format>] [<retry_count> [<max_size>]]
> > >
> > > Yeah but that only tests with random data, not say an ordering of
> > > probe
> > > scores for actual test files.
> >
> > yes, it could/should be extended
> >
> > probetest as is is still quite usefull though as it catches probe
> > functions
> > which give high scores on random trash
>
> Might be better to leverage afl-fuzz since it is more wily in its
> tricks to provoke different program behavior. Then exit(1) whenever the
> test program probes something incorrectly. For example you could start
> with a small, valid MPEG-PS file and have afl-fuzz generate slightly
> different versions of it that don't probe as such
A real fuzzer will make every probe, probe incorrectly. Maybe i misunderstood
what you suggested
what we want is that
1. Random binary, random ascii, randon utf8 and intermediates do not get
detected as any format (thats what probetest does)
2. that format A is detected more as format A than format B where B != A
we and our users test this by simply using ffmpeg and fate
Testing that a "randomly damaged" A is still detected as A. Iam not sure
this is actuallly generally usefull. When such A doesnt exist it would
constrain our probing code for no gain.
And i think real world files are poorly modelled by randomly (bit wise)damaged
files
having a really large corpus of real world odd files and test probing on them
seems the "ideal" way to test probing to me
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
"You are 36 times more likely to die in a bathtub than at the hands of a
terrorist. Also, you are 2.5 times more likely to become a president and
2 times more likely to become an astronaut, than to die in a terrorist
attack." -- Thoughty2
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
[-- Attachment #2: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2025-02-20 21:08 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-05 14:18 [FFmpeg-devel] [PATCH 1/8] avformat/http: Return EIO for prematurely broken connection Tomas Härdin
2025-02-05 14:19 ` [FFmpeg-devel] [PATCH 2/8] libavcodec/wmadec: Return AVERROR_INVALIDDATA on decoding errors Tomas Härdin
2025-02-05 16:28 ` Marth64
2025-02-05 14:20 ` [FFmpeg-devel] [PATCH 3/8] libavformat/flacdec: Export samples md5 as metadata Tomas Härdin
2025-02-06 15:07 ` Michael Niedermayer
2025-02-12 10:56 ` Tomas Härdin
2025-02-12 21:55 ` Michael Niedermayer
2025-02-13 11:41 ` Tomas Härdin
2025-02-13 11:52 ` Gyan Doshi
2025-02-12 11:14 ` Andreas Rheinhardt
2025-02-12 11:27 ` Tomas Härdin
2025-02-12 12:27 ` Andreas Rheinhardt
2025-02-12 13:32 ` Tomas Härdin
2025-02-05 14:20 ` [FFmpeg-devel] [PATCH 4/8] avformat/flacdec: Return correct error-codes on read-failure Tomas Härdin
2025-02-06 15:01 ` Michael Niedermayer
2025-02-05 14:21 ` [FFmpeg-devel] [PATCH 5/8] rtmp: Set correct message stream id when writing as server Tomas Härdin
2025-02-05 14:22 ` [FFmpeg-devel] [PATCH 6/8] GOL-1361: Remove invalid CTTS sample_offset check Tomas Härdin
2025-02-12 11:11 ` Tomas Härdin
2025-02-05 14:23 ` [FFmpeg-devel] [PATCH 7/8] avformat/mp3dec: Subtract known padding from duration Tomas Härdin
2025-02-05 14:24 ` [FFmpeg-devel] [PATCH 8/8] Make mime-type award a bonus probe score Tomas Härdin
2025-02-06 14:58 ` Michael Niedermayer
2025-02-12 11:03 ` Tomas Härdin
2025-02-12 22:03 ` Michael Niedermayer
2025-02-13 11:40 ` Tomas Härdin
2025-02-13 12:03 ` Michael Niedermayer
2025-02-13 21:29 ` Tomas Härdin
2025-02-20 21:08 ` Michael Niedermayer [this message]
2025-02-21 9:15 ` Tomas Härdin
2025-02-18 14:43 ` [FFmpeg-devel] [PATCH 1/8] avformat/http: Return EIO for prematurely broken connection Tomas Härdin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250220210820.GK4991@pb2 \
--to=michael@niedermayer.cc \
--cc=ffmpeg-devel@ffmpeg.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git