From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id C04094AAA2 for ; Fri, 18 Jul 2025 13:00:28 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 254E568CF07; Fri, 18 Jul 2025 16:00:25 +0300 (EEST) Received: from relay15.mail.gandi.net (relay15.mail.gandi.net [217.70.178.235]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 744F568CD50 for ; Fri, 18 Jul 2025 16:00:18 +0300 (EEST) Received: by mail.gandi.net (Postfix) with ESMTPSA id A96F8442AC; Fri, 18 Jul 2025 13:00:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=niedermayer.cc; s=gm1; t=1752843618; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=pEvj1uHnB1X1ee2xLpDQ+G497EPUUUQs5fgRlv4xmkA=; b=PoKTPmbrPSsBJHxpR2LDp3wGkXUGnHWU8RWGc756IdL04Cpnsa/XYIKRAsKl1kbWH4FrfM zdL4VuPwm3tkeKRuiuZ8W+zqszuypsfRTLTVEsdHah2VfYOx+nvA5wqSB4NzXYXbNLZYSr l/53dwIfXjQP1J3WOPXAyO760Z4e6HkHLpVNJSWGU9aXKcIS7Sf8uOiu3rksWDq6oBR6aV upNaYB5H6lMpJBQd7/Ut7y3PPj52lqitud4DSK4u0vPLY47WDyQ6VzpjOssOhY6EINrJA0 p62LIcBn2oRNyF6EjG72m2G8Dq3yzLZyx8gAcbKqgpSeg2v6T8VvDKh0NYlpvg== Date: Fri, 18 Jul 2025 15:00:16 +0200 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20250718130016.GZ29660@pb2> References: <20250527102811.369474-1-marcos@orca.pet> MIME-Version: 1.0 In-Reply-To: <20250527102811.369474-1-marcos@orca.pet> X-GND-State: clean X-GND-Score: -70 X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdefgdeifeehvdcutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfitefpfffkpdcuggftfghnshhusghstghrihgsvgenuceurghilhhouhhtmecufedtudenucesvcftvggtihhpihgvnhhtshculddquddttddmnegfrhhlucfvnfffucdlfedtmdenucfjughrpeffhffvvefukfhfgggtuggjsehgtderredttddvnecuhfhrohhmpefoihgthhgrvghlucfpihgvuggvrhhmrgihvghruceomhhitghhrggvlhesnhhivgguvghrmhgrhigvrhdrtggtqeenucggtffrrghtthgvrhhnpeeltdeihedthfeutdfhgeejtdelvefgudffuddvudffgfdthedtudegudeihfejtdenucfkphepgedurdeiiedrieehrddujeeinecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehinhgvthepgedurdeiiedrieehrddujeeipdhhvghloheplhhotggrlhhhohhsthdpmhgrihhlfhhrohhmpehmihgthhgrvghlsehnihgvuggvrhhmrgihvghrrdgttgdpnhgspghrtghpthhtohepvddprhgtphhtthhopehffhhmphgvghdquggvvhgvlhesfhhfmhhpvghgrdhorhhgpdhrtghpthhtohepmhgrrhgtohhssehorhgtrgdrphgvth Subject: Re: [FFmpeg-devel] [PATCH] avformat/webvttdec: improve WebVTT parsing X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Marcos Del Sol Vives Content-Type: multipart/mixed; boundary="===============2681277248752359329==" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --===============2681277248752359329== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="5WoePNz4a+0N9i6Q" Content-Disposition: inline --5WoePNz4a+0N9i6Q Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi On Tue, May 27, 2025 at 12:28:11PM +0200, Marcos Del Sol Vives via ffmpeg-d= evel wrote: > The parser will now strictly check if WebVTT files start with the correct > "WEBVTT" marker. Before, files were not checked if they truly started > with it. >=20 > It will also now ignore all non-cue blocks, instead of only a hardcoded > list. This is closer to the specification that calls for no action > if unknown blocks are encountered. >=20 > Signed-off-by: Marcos Del Sol Vives > --- > libavformat/webvttdec.c | 178 ++++++++++++++++++++++------------------ > 1 file changed, 98 insertions(+), 80 deletions(-) >=20 > diff --git a/libavformat/webvttdec.c b/libavformat/webvttdec.c > index 6feda1585e..b454b2c1cf 100644 > --- a/libavformat/webvttdec.c > +++ b/libavformat/webvttdec.c > @@ -58,6 +58,79 @@ static int64_t read_ts(const char *s) > return AV_NOPTS_VALUE; > } > =20 > +static int webvtt_parse_cue(WebVTTContext *webvtt, AVBPrint *cue, int64_= t pos) > +{ > + int i; > + AVPacket *sub; > + const char *p, *identifier, *settings; > + size_t identifier_len, settings_len; > + int64_t ts_start, ts_end; > + > + p =3D identifier =3D cue->str; > + > + /* optional cue identifier (can be a number like in SRT or some kind= of > + * chaptering id) */ > + for (i =3D 0; p[i] && p[i] !=3D '\n' && p[i] !=3D '\r'; i++) { > + if (!strncmp(p + i, "-->", 3)) { > + identifier =3D NULL; > + break; > + } > + } > + if (!identifier) > + identifier_len =3D 0; > + else { > + identifier_len =3D strcspn(p, "\r\n"); > + p +=3D identifier_len; > + if (*p =3D=3D '\r') > + p++; > + if (*p =3D=3D '\n') > + p++; > + } > + > + /* cue timestamps */ > + if ((ts_start =3D read_ts(p)) =3D=3D AV_NOPTS_VALUE) > + return AVERROR_INVALIDDATA; > + if (!(p =3D strstr(p, "-->"))) > + return AVERROR_INVALIDDATA; > + p +=3D 2; > + do p++; while (*p =3D=3D ' ' || *p =3D=3D '\t'); > + if ((ts_end =3D read_ts(p)) =3D=3D AV_NOPTS_VALUE) > + return AVERROR_INVALIDDATA; > + > + /* optional cue settings */ > + p +=3D strcspn(p, "\n\r\t "); > + while (*p =3D=3D '\t' || *p =3D=3D ' ') > + p++; > + settings =3D p; > + settings_len =3D strcspn(p, "\r\n"); > + p +=3D settings_len; > + if (*p =3D=3D '\r') > + p++; > + if (*p =3D=3D '\n') > + p++; > + > + /* create packet */ > + sub =3D ff_subtitles_queue_insert(&webvtt->q, p, strlen(p), 0); > + if (!sub) > + return AVERROR(ENOMEM); > + sub->pos =3D pos; > + sub->pts =3D ts_start; > + sub->duration =3D ts_end - ts_start; > + > +#define SET_SIDE_DATA(name, type) do { \ > + if (name##_len) { \ > + uint8_t *buf =3D av_packet_new_side_data(sub, type, name##_len);= \ > + if (!buf) \ > + return AVERROR(ENOMEM); \ > + memcpy(buf, name, name##_len); \ > + } \ > +} while (0) > + > + SET_SIDE_DATA(identifier, AV_PKT_DATA_WEBVTT_IDENTIFIER); > + SET_SIDE_DATA(settings, AV_PKT_DATA_WEBVTT_SETTINGS); > + return 0; > +} > + > static int webvtt_read_header(AVFormatContext *s) > { > WebVTTContext *webvtt =3D s->priv_data; > @@ -74,13 +147,27 @@ static int webvtt_read_header(AVFormatContext *s) > =20 > av_bprint_init(&cue, 0, AV_BPRINT_SIZE_UNLIMITED); > =20 > + res =3D ff_subtitles_read_chunk(s->pb, &cue); > + if (res < 0) { > + av_log(s, AV_LOG_ERROR, "Unable to read file header\n"); > + goto end; > + } > + > + if (!cue.len) { > + av_log(s, AV_LOG_ERROR, "Unable to read file header\n"); > + res =3D AVERROR_EOF; > + goto end; > + } > + > + if (!strncmp(cue.str, "\xEF\xBB\xBFWEBVTT", 9) && > + !strncmp(cue.str, "WEBVTT", 6)) { > + av_log(s, AV_LOG_ERROR, "Invalid file header\n"); > + res =3D AVERROR_INVALIDDATA; > + goto end; > + } > + > for (;;) { > - int i; > - int64_t pos; > - AVPacket *sub; > - const char *p, *identifier, *settings; > - size_t identifier_len, settings_len; > - int64_t ts_start, ts_end; > + int64_t pos =3D avio_tell(s->pb); > =20 > res =3D ff_subtitles_read_chunk(s->pb, &cue); > if (res < 0) > @@ -89,81 +176,12 @@ static int webvtt_read_header(AVFormatContext *s) > if (!cue.len) > break; > =20 > - p =3D identifier =3D cue.str; > - pos =3D avio_tell(s->pb); > - > - /* ignore header chunk */ > - if (!strncmp(p, "\xEF\xBB\xBFWEBVTT", 9) || > - !strncmp(p, "WEBVTT", 6) || > - !strncmp(p, "STYLE", 5) || > - !strncmp(p, "REGION", 6) || > - !strncmp(p, "NOTE", 4)) > - continue; > - > - /* optional cue identifier (can be a number like in SRT or some = kind of > - * chaptering id) */ > - for (i =3D 0; p[i] && p[i] !=3D '\n' && p[i] !=3D '\r'; i++) { > - if (!strncmp(p + i, "-->", 3)) { > - identifier =3D NULL; > - break; > - } > - } > - if (!identifier) > - identifier_len =3D 0; > - else { > - identifier_len =3D strcspn(p, "\r\n"); > - p +=3D identifier_len; > - if (*p =3D=3D '\r') > - p++; > - if (*p =3D=3D '\n') > - p++; > + res =3D webvtt_parse_cue(webvtt, &cue, pos); > + if (res < 0) { > + if (res !=3D AVERROR_INVALIDDATA) > + goto end; > + av_log(s, AV_LOG_DEBUG, "Ignoring non-cue block at 0x%"PRIx6= 4"\n", pos); > } > - > - /* cue timestamps */ > - if ((ts_start =3D read_ts(p)) =3D=3D AV_NOPTS_VALUE) > - break; > - if (!(p =3D strstr(p, "-->"))) > - break; > - p +=3D 2; > - do p++; while (*p =3D=3D ' ' || *p =3D=3D '\t'); > - if ((ts_end =3D read_ts(p)) =3D=3D AV_NOPTS_VALUE) > - break; > - > - /* optional cue settings */ > - p +=3D strcspn(p, "\n\r\t "); > - while (*p =3D=3D '\t' || *p =3D=3D ' ') > - p++; > - settings =3D p; > - settings_len =3D strcspn(p, "\r\n"); > - p +=3D settings_len; > - if (*p =3D=3D '\r') > - p++; > - if (*p =3D=3D '\n') > - p++; > - > - /* create packet */ > - sub =3D ff_subtitles_queue_insert(&webvtt->q, p, strlen(p), 0); > - if (!sub) { > - res =3D AVERROR(ENOMEM); > - goto end; > - } > - sub->pos =3D pos; > - sub->pts =3D ts_start; > - sub->duration =3D ts_end - ts_start; > - > -#define SET_SIDE_DATA(name, type) do { \ > - if (name##_len) { \ > - uint8_t *buf =3D av_packet_new_side_data(sub, type, name##_len);= \ > - if (!buf) { \ > - res =3D AVERROR(ENOMEM); = \ > - goto end; \ > - } \ > - memcpy(buf, name, name##_len); \ > - } \ > -} while (0) > - > - SET_SIDE_DATA(identifier, AV_PKT_DATA_WEBVTT_IDENTIFIER); > - SET_SIDE_DATA(settings, AV_PKT_DATA_WEBVTT_SETTINGS); > } This factorizes the code out and modifyies it at the same time that makes it hard to review the modification, can you maybe split it in 2 (or more) patches, one that just moves code and the other(s) then changeing it I can confirm that it fixes decoding the sample and passes fate also a fate test for the odd sample could be usefull thx [...] --=20 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Asymptotically faster algorithms should always be preferred if you have asymptotical amounts of data --5WoePNz4a+0N9i6Q Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EABEKAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCaHpFXQAKCRBhHseHBAsP q/qhAJ9ndJ257p0pB9B+gIdxgjZ8dlSJIACfd132MDxfB5+5WG2er7s2cPG9oQ0= =DDCk -----END PGP SIGNATURE----- --5WoePNz4a+0N9i6Q-- --===============2681277248752359329== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --===============2681277248752359329==--