From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 3419A4D1CA for ; Thu, 13 Nov 2025 00:11:43 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'KMS5fGbnPDoC89kAaS3j+AsYFyyZKoJcSw5uMamw6AI=', expected b'rOsc6R35eGQ7z0DLIzMxFxKsRYLs0crQnBvUBmt5q3A=')) header.d=orca.pet header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1762992694; h=date : to : message-id : in-reply-to : references : mime-version : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=3CrGxgPzrv1QagpCZJe+ma2mRKoNi1hgjKfwiOyPD2o=; b=CQDcD0Yt8rj/anh1BndV7mdZUf6FQ6TSxqJAmhduhrTH//Hd4QJNcDoxAOLLUKv9pR39v wZQKkf5SFCwZ+dfn5Wzf++wQ9dLZsEUeaYhN0VcceSLDNydmY51paWnKUdKuLnB8WkTdnvI oBiJpydBpQNJ6TaA0PIGajq8lHz2XboAGkz5PTVJh9adrYI5hEWdvNz0RHVvEEBSDZzLQds Pa72y8L/NVDToiKeNE4pQizez6g9RexuCX0wx180oYcW75FKt44bWtIjB+Ywd8Mou7Xh2R3 syF/aA+oX4eylLv/7vzBXtFrlae1TDmHNvh2EjcP/FKilSfwz7EmeBf9INVA== Received: from [172.19.0.2] (unknown [172.19.0.2]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id B9E8568FCBE; Thu, 13 Nov 2025 02:11:34 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1762992668; b=AR6VFCjcGg60Cf+FuBMGjDerP0qDmRlE3E/Lsx4fxAUUoH97z1bhHYoFhb/cny4lUoJV/ SRv4MX1qO3pZywC2UtX/msV3AjQoefBMoRc/O4TMzJv2wRJXUIcM/xG0GbD/X+UUCYmpzGe aXqKZ15d05apfQBe10Eh7RvZlTmrcWCiFC9Ob/IuuxMbhBIOfDrObKzzxxvBjEzWw7m2Tku 8k0fNvrbkWfdl875HyDRl0I48kYfwAdKDgnqwK2HEYSvF6YbtknkOIGeVfd9+BFVZnjxy6E TDJ6ndEV03Xtze+jFyTSwnpuBOlgw7V7hN0Vg5nH9FOyDK9foPHMyKhT4DCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1762992668; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=KMS5fGbnPDoC89kAaS3j+AsYFyyZKoJcSw5uMamw6AI=; b=nfsI1Lg2+TcRoY4noWE0dwWwhwk7gwtK28JtI7U4R7EMf3uSDje9hwtNo5V6bkudRLvVV BIlmrP5YY0/QiwIhwd6a1HCC6Qf/uMwLgDV3oljq2i5yT2GL5yD1mbhsfJ9Y7SSkVoIj0wN YCWrqm+6hX2qJFAQwyyPrK7u/yRl5CX7HQPZwhvu5tdUzs9qWKitkc4BzAblZlifeBGHJek FHsYDeUpVUO6GTFj20VP2H736Va5GhEgRWP6F6lBl5NYTvB40d1xRdMErG3vC9aB+n3XmPP aIZ6BEDQnd6/d3uF71A1CZi9+FWGmui+6g2oIaCXH7q2UdJvyXUCHMvIYCmA== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=orca.pet; arc=none; dmarc=none Authentication-Results: ffmpeg.org; dkim=pass header.d=orca.pet; arc=none (Message is not ARC signed); dmarc=none Received: from smtpout3.mo533.mail-out.ovh.net (3.mo533.mail-out.ovh.net [46.105.35.92]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 0FABF68FC8D for ; Tue, 11 Nov 2025 15:38:15 +0200 (EET) Received: from director3.derp.mail-out.ovh.net (director3.derp.mail-out.ovh.net [152.228.215.222]) by mo533.mail-out.ovh.net (Postfix) with ESMTPS id 4d5SKL6wyPz5vWK; Tue, 11 Nov 2025 13:38:14 +0000 (UTC) Received: from director3.derp.mail-out.ovh.net (director3.derp.mail-out.ovh.net. [127.0.0.1]) by director3.derp.mail-out.ovh.net (inspect_sender_mail_agent) with SMTP for ; Tue, 11 Nov 2025 13:38:14 +0000 (UTC) Received: from mta11.priv.ovhmail-u1.ea.mail.ovh.net (unknown [10.110.0.227]) by director3.derp.mail-out.ovh.net (Postfix) with ESMTPS id 4d5SKL4Q9Yz5vZV; Tue, 11 Nov 2025 13:38:14 +0000 (UTC) Received: from mailstore102.priv.ovhmail-u1.ea.mail.ovh.net (unknown [10.2.8.102]) by mta11.priv.ovhmail-u1.ea.mail.ovh.net (Postfix) with ESMTP id 446E09A3677; Tue, 11 Nov 2025 13:38:14 +0000 (UTC) Date: Tue, 11 Nov 2025 13:38:14 +0000 (UTC) To: Michael Niedermayer Message-ID: <1813421686.26011235.1762868294127.JavaMail.zimbra@orca.pet> In-Reply-To: <20250718130016.GZ29660@pb2> References: <20250527102811.369474-1-marcos@orca.pet> <20250718130016.GZ29660@pb2> MIME-Version: 1.0 X-Originating-IP: [74.235.141.176] Thread-Topic: avformat/webvttdec: improve WebVTT parsing Thread-Index: Yn3jH9G6wxQDOtIcWc3aXmeBfRocHA== X-Ovh-Tracer-Id: 4343158892794828379 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: dmFkZTEuXRMiVr4uNgC2IHZtCuAGm3wMGxTGe0hZunMRNFjGfprIAd6uJ1AYFbOdpN9yO1NezeRHcONNDP27A4PqGZjcSZFFkepdjjBdS825mSvUcruMo/vLvJ5DCY7k8jSyB6JxsFEqlG3ph85TWgiVdBg/nFxWV0unUWu+zoNdIVQ1BQl8P4sfzowAn52MUIdzmtyOKFHgHomQy3moiKgLwS5HAdsfdd69YmJqHkTjytvAvUp6/c4GsJqnr/O/lYWTDnxiZQuaTRGTWGpGKuP9qcS6xJrvbCEzpeHRerIwerrpMyyuE6D/yxhstChuy5Csxsg2MWvjwqI9pkKsy8Mdv1+jGWxwTzb7CQ3irdjVal/vP74F+ub250pLkZyPLhtlkoah9fo2KVxQA8faTH57hTu1ZrTeu9ZJ11BH7NQgUW4xTJ+KEPuc6z9Yz1RBkeBtCDLM8kMeDOzu4sB8XKil5o8C+vYRgroo+b7SC9N1yz9I4qdhgwP7h1gRPTF5xVN80ZGLqFi1BtPmMywlsZbl7zKZK9OaIDJBe1jYt2JEyUit1xa3BZw0mRkp0obV2aTx2jgPieMJ4XZ88Bf8KG0FOr9i6LX/X+0/GeRrFzSH/AgN1Qia316NDM+yu4NYsG/kChaK/v1kGj/K2Z+yuynLGu4GSkFN2xLbQzRnbGLG6Vij9Q DKIM-Signature: a=rsa-sha256; bh=rOsc6R35eGQ7z0DLIzMxFxKsRYLs0crQnBvUBmt5q3A=; c=relaxed/relaxed; d=orca.pet; h=From; s=ovhmo-selector-1; t=1762868295; v=1; b=Z1Zx0LTGPD5FO/StI5qqnugdvhjJmvMWwo7zfjAWFNnFMrdeK+BabbJawLGNXfjxhNj6X6X6 kHnUx4q8c6fXe0dB80dPUgO4/bDVM6ojsxevZ6wJGzbuHFIFo9PO9oVJbvUJ+CDC8BcUV4TLQRz cBuwuYG0VSy4s5SU4ZUXD9uGGGyDjx+guAPFbuiRYap/Vj8kKy2Ds4Z9QNU7qUkUc5wUcxzPgTv C2V3Cyfi6pFXuPGozn1gDwTJ0vb17aKbzubC7vdvBDQ6dUmyN29/lYR8jGr8kV4fdSVbUOd6ZuZ SqZSywEaqCAmgb/B0ICjum3CcOp7WEJ486HIHn0OTl1gw== X-MailFrom: SRS0=c0C/=5T=orca.pet=marcos@ffmpeg.org X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation Message-ID-Hash: BGJ557K4L7ZLVSAIJIMCE3R6TQYWHYXL X-Message-ID-Hash: BGJ557K4L7ZLVSAIJIMCE3R6TQYWHYXL X-Mailman-Approved-At: Thu, 13 Nov 2025 00:11:02 +0000 X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] Re: [PATCH] avformat/webvttdec: improve WebVTT parsing List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Marcos Del Sol via ffmpeg-devel Cc: FFmpeg development discussions and patches , Marcos Del Sol Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: Hello Michael. I've discovered there's a web Git so I've uploaded the changes you requested there: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20875 Greetings, Marcos -----Original Message----- From: Michael To: FFmpeg Cc: Marcos Date: Friday, 18 July 2025 15:00 CEST Subject: Re: [FFmpeg-devel] [PATCH] avformat/webvttdec: improve WebVTT parsing Hi On Tue, May 27, 2025 at 12:28:11PM +0200, Marcos Del Sol Vives via ffmpeg-devel wrote: > The parser will now strictly check if WebVTT files start with the correct > "WEBVTT" marker. Before, files were not checked if they truly started > with it. > > It will also now ignore all non-cue blocks, instead of only a hardcoded > list. This is closer to the specification that calls for no action > if unknown blocks are encountered. > > Signed-off-by: Marcos Del Sol Vives > --- > libavformat/webvttdec.c | 178 ++++++++++++++++++++++------------------ > 1 file changed, 98 insertions(+), 80 deletions(-) > > diff --git a/libavformat/webvttdec.c b/libavformat/webvttdec.c > index 6feda1585e..b454b2c1cf 100644 > --- a/libavformat/webvttdec.c > +++ b/libavformat/webvttdec.c > @@ -58,6 +58,79 @@ static int64_t read_ts(const char *s) > return AV_NOPTS_VALUE; > } > > +static int webvtt_parse_cue(WebVTTContext *webvtt, AVBPrint *cue, int64_t pos) > +{ > + int i; > + AVPacket *sub; > + const char *p, *identifier, *settings; > + size_t identifier_len, settings_len; > + int64_t ts_start, ts_end; > + > + p = identifier = cue->str; > + > + /* optional cue identifier (can be a number like in SRT or some kind of > + * chaptering id) */ > + for (i = 0; p[i] && p[i] != '\n' && p[i] != '\r'; i++) { > + if (!strncmp(p + i, "-->", 3)) { > + identifier = NULL; > + break; > + } > + } > + if (!identifier) > + identifier_len = 0; > + else { > + identifier_len = strcspn(p, "\r\n"); > + p += identifier_len; > + if (*p == '\r') > + p++; > + if (*p == '\n') > + p++; > + } > + > + /* cue timestamps */ > + if ((ts_start = read_ts(p)) == AV_NOPTS_VALUE) > + return AVERROR_INVALIDDATA; > + if (!(p = strstr(p, "-->"))) > + return AVERROR_INVALIDDATA; > + p += 2; > + do p++; while (*p == ' ' || *p == '\t'); > + if ((ts_end = read_ts(p)) == AV_NOPTS_VALUE) > + return AVERROR_INVALIDDATA; > + > + /* optional cue settings */ > + p += strcspn(p, "\n\r\t "); > + while (*p == '\t' || *p == ' ') > + p++; > + settings = p; > + settings_len = strcspn(p, "\r\n"); > + p += settings_len; > + if (*p == '\r') > + p++; > + if (*p == '\n') > + p++; > + > + /* create packet */ > + sub = ff_subtitles_queue_insert(&webvtt->q, p, strlen(p), 0); > + if (!sub) > + return AVERROR(ENOMEM); > + sub->pos = pos; > + sub->pts = ts_start; > + sub->duration = ts_end - ts_start; > + > +#define SET_SIDE_DATA(name, type) do { \ > + if (name##_len) { \ > + uint8_t *buf = av_packet_new_side_data(sub, type, name##_len); \ > + if (!buf) \ > + return AVERROR(ENOMEM); \ > + memcpy(buf, name, name##_len); \ > + } \ > +} while (0) > + > + SET_SIDE_DATA(identifier, AV_PKT_DATA_WEBVTT_IDENTIFIER); > + SET_SIDE_DATA(settings, AV_PKT_DATA_WEBVTT_SETTINGS); > + return 0; > +} > + > static int webvtt_read_header(AVFormatContext *s) > { > WebVTTContext *webvtt = s->priv_data; > @@ -74,13 +147,27 @@ static int webvtt_read_header(AVFormatContext *s) > > av_bprint_init(&cue, 0, AV_BPRINT_SIZE_UNLIMITED); > > + res = ff_subtitles_read_chunk(s->pb, &cue); > + if (res < 0) { > + av_log(s, AV_LOG_ERROR, "Unable to read file header\n"); > + goto end; > + } > + > + if (!cue.len) { > + av_log(s, AV_LOG_ERROR, "Unable to read file header\n"); > + res = AVERROR_EOF; > + goto end; > + } > + > + if (!strncmp(cue.str, "\xEF\xBB\xBFWEBVTT", 9) && > + !strncmp(cue.str, "WEBVTT", 6)) { > + av_log(s, AV_LOG_ERROR, "Invalid file header\n"); > + res = AVERROR_INVALIDDATA; > + goto end; > + } > + > for (;;) { > - int i; > - int64_t pos; > - AVPacket *sub; > - const char *p, *identifier, *settings; > - size_t identifier_len, settings_len; > - int64_t ts_start, ts_end; > + int64_t pos = avio_tell(s->pb); > > res = ff_subtitles_read_chunk(s->pb, &cue); > if (res < 0) > @@ -89,81 +176,12 @@ static int webvtt_read_header(AVFormatContext *s) > if (!cue.len) > break; > > - p = identifier = cue.str; > - pos = avio_tell(s->pb); > - > - /* ignore header chunk */ > - if (!strncmp(p, "\xEF\xBB\xBFWEBVTT", 9) || > - !strncmp(p, "WEBVTT", 6) || > - !strncmp(p, "STYLE", 5) || > - !strncmp(p, "REGION", 6) || > - !strncmp(p, "NOTE", 4)) > - continue; > - > - /* optional cue identifier (can be a number like in SRT or some kind of > - * chaptering id) */ > - for (i = 0; p[i] && p[i] != '\n' && p[i] != '\r'; i++) { > - if (!strncmp(p + i, "-->", 3)) { > - identifier = NULL; > - break; > - } > - } > - if (!identifier) > - identifier_len = 0; > - else { > - identifier_len = strcspn(p, "\r\n"); > - p += identifier_len; > - if (*p == '\r') > - p++; > - if (*p == '\n') > - p++; > + res = webvtt_parse_cue(webvtt, &cue, pos); > + if (res < 0) { > + if (res != AVERROR_INVALIDDATA) > + goto end; > + av_log(s, AV_LOG_DEBUG, "Ignoring non-cue block at 0x%"PRIx64"\n", pos); > } > - > - /* cue timestamps */ > - if ((ts_start = read_ts(p)) == AV_NOPTS_VALUE) > - break; > - if (!(p = strstr(p, "-->"))) > - break; > - p += 2; > - do p++; while (*p == ' ' || *p == '\t'); > - if ((ts_end = read_ts(p)) == AV_NOPTS_VALUE) > - break; > - > - /* optional cue settings */ > - p += strcspn(p, "\n\r\t "); > - while (*p == '\t' || *p == ' ') > - p++; > - settings = p; > - settings_len = strcspn(p, "\r\n"); > - p += settings_len; > - if (*p == '\r') > - p++; > - if (*p == '\n') > - p++; > - > - /* create packet */ > - sub = ff_subtitles_queue_insert(&webvtt->q, p, strlen(p), 0); > - if (!sub) { > - res = AVERROR(ENOMEM); > - goto end; > - } > - sub->pos = pos; > - sub->pts = ts_start; > - sub->duration = ts_end - ts_start; > - > -#define SET_SIDE_DATA(name, type) do { \ > - if (name##_len) { \ > - uint8_t *buf = av_packet_new_side_data(sub, type, name##_len); \ > - if (!buf) { \ > - res = AVERROR(ENOMEM); \ > - goto end; \ > - } \ > - memcpy(buf, name, name##_len); \ > - } \ > -} while (0) > - > - SET_SIDE_DATA(identifier, AV_PKT_DATA_WEBVTT_IDENTIFIER); > - SET_SIDE_DATA(settings, AV_PKT_DATA_WEBVTT_SETTINGS); > } This factorizes the code out and modifyies it at the same time that makes it hard to review the modification, can you maybe split it in 2 (or more) patches, one that just moves code and the other(s) then changeing it I can confirm that it fixes decoding the sample and passes fate also a fate test for the odd sample could be usefull thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Asymptotically faster algorithms should always be preferred if you have asymptotical amounts of data _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org