Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
From: "Tomas Härdin" <git@haerdin.se>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH] avformat/webvttdec: improve WebVTT parsing
Date: Mon, 09 Jun 2025 23:51:17 +0200
Message-ID: <5d3bd41e77e388369dd3a69469a57f1a1131e00e.camel@haerdin.se> (raw)
In-Reply-To: <F20F7F57-A3E4-4AA1-925E-AB1C2CA8C358@orca.pet>

fre 2025-06-06 klockan 22:22 +0200 skrev Marcos Del Sol Vives:
> 
> 
> El 6 de junio de 2025 21:43:58 CEST, "Tomas Härdin" <git@haerdin.se>
> escribió:
> > 
> > Sounds like the demuxer correctly rejected some broken files
> > 
> 
> The WebVTT standard does not call for a fatal error unless the magic
> header does not match. The current implementation is not only non-
> compliant with the standard, but will also break on future changes.

The linked file does not follow the syntax specified in section 4 of
the WebVTT standard. Therefore it is not WebVTT.

The probe function should ensure that the file starts with the
necessary bytes. I see it can let some non-compliant files slip by,
since it does not check whether [BOM]WEBVTT is followed by a newline,
and possibly a space or tab and any non-newline characters. We could
fix that in the main parsing loop. We also shouldn't expect any
"WEBVTT" chunks after the first one.

webvttdec.c also allows REGION etc chunks outside of where section 4
says they are allowed. In my opinion this is bad, since it means the
demuxer allows more than just WebVTT.

I've been harping on the permissive attitude towards parsing on this
list for a while. The reason why I do this is because every time we're
lax with parsing, some user will come to rely on said laxness rather
than fixing their workflow. Therefore we're perpetually unable to fix
our demuxers. My opinion is that it is best to nip this permissiveness
in the bud. The fact that the demuxer does the wrong thing right now is
no excuse to make it behave even more incorrectly.

What I'd like to see is the project moving towards either parser
combinators or a domain specific language for grammars like a PEG
variant extended with length fields.

We can't do much about the W3C making breaking changes to their
standards in the future, other than updating our code when that
happens. We're lucky that the spec is quite narrow. The space for
making non-breaking changes to it is quite small. They could for
example reuse NOTE chunks for future functionality. For example, if W3C
wants to allow STYLE chunks in the middle of the file the current
syntax does not allow that. But they could amend it by using "NOTE
STYLE" for stylesheets between cues.

All this is just my views of course. Other devs might feel very
differently. I'd point out it's no longer the wild west of the early
2000's.

/Tomas
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

  reply	other threads:[~2025-06-09 21:51 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-27 10:28 Marcos Del Sol Vives via ffmpeg-devel
2025-05-27 10:40 ` Marcos Del Sol via ffmpeg-devel
2025-06-06 19:43   ` Tomas Härdin
2025-06-06 20:22     ` Marcos Del Sol Vives
2025-06-09 21:51       ` Tomas Härdin [this message]
2025-06-09 23:51         ` Marcos Del Sol
2025-06-10 11:42           ` Marcos Del Sol
2025-06-13 13:03             ` Marcos Del Sol
2025-06-18  7:01               ` Tomas Härdin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5d3bd41e77e388369dd3a69469a57f1a1131e00e.camel@haerdin.se \
    --to=git@haerdin.se \
    --cc=ffmpeg-devel@ffmpeg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git