From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 619CE4C298 for ; Sun, 9 Nov 2025 14:26:07 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'mRteUdnroHjr2NER6sYZP2A4kKB7D9l6skHmoVleH0A=', expected b'NnGpQ7Z/Kv5CCGLpYdfbfDWYTfEKKtpmY3nhYcO4U9E=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1762698348; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=mRteUdnroHjr2NER6sYZP2A4kKB7D9l6skHmoVleH0A=; b=e2JrbpExX7W4BrC6YWWaqxI9g/Sa88J/AwCD87ThHRwrYqwtvRE5gGDF9R0hJdC/E3cHe NOtFAulsjiav63cuyWSrcUGSigrJC3+zaQ95Ec9Kytq1dvu9ZNBFQuVWq89/fOZ4QqJabOa IAAI2LwGd3mWdifOxMtJoSF+F2pLHKE+yrLoDD3weecfKyJn9uSefP+VWb3ErqU7vNqKc/C Uv4NguSyDdRat48JAU7v6lwzoYZKE2ou03TetEvXVTEvDn3J+LbCpUeHUiuhWMlsi5zgxf1 +5JEJTH7D8OtbQTPduFxoqlZjCDPu6uldpDn9IOfUXTHJrPEOrARgl5y/14g== Received: from [172.19.0.2] (unknown [172.19.0.2]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 6B3D368FBA8; Sun, 9 Nov 2025 16:25:48 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1762698330; b=iwgwIZphGafFxEzF9LVOPybiS5B4KyYXff8Xk8kddXvu7pz2N4gy6n0OLjEZZGuCmxZIV +FRDUgNQvNFZdPbexxYBWnjhjalc71BrOHxtF18gOGCBLqDNpLTcRUOh1mOwXBHHAg7W/nd I7vR4kOsuyOcMG2MuQImQhxjAnCt2qpFCs0jtXjMOy+5WdHXYEBNp3pHB0VAukPjSyc53zN ng93SRDjObSqrIVt0hnk/fRsXkHPxT2p1xPtjHEaPCgI1KL9AHfB1O23/4T8V7oaiV5vwy7 p3IEHN9bszof8RKcPq8fe4uukaG27rzByJC0zHkWY31xV5DADN8auFpeLhfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1762698330; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=hMHT/8CE/c42dbvP5F0dxC35Wf4SkqMRF8p79kyi2rk=; b=L3GrVp5Lg5l5xEoTbsVKwd8dtBIG/Fl96yq9+7IATgnExA53s8jSNUQXvTYU0wS1HSYX4 nUJQkXjux6XRK+s6FWTO5sro0gYItQXMhpyg53LmCMd9GMZU67Q/tV6opGogW/9i5xoLN3+ +bQBv6CTMDbylFByLP7qlbftqx0gz3v8uPwk0Ht4yJGzzEDH0YNyS9PeTyMP65lwgu9NY6k Tzyvnam9GF9phCMi7CyhKvtpnGVmj/YYt0CqYDFPKOh2bA6yWpcV+PtDSa9LNXibH0x0fNJ WO+dYzwZZEpEeUU4u45WWWGjvwh1cCADQZM4Qo7P8skloBrHCKT6Vo+eVKOQ== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1762698322; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=NnGpQ7Z/Kv5CCGLpYdfbfDWYTfEKKtpmY3nhYcO4U9E=; b=ondY64OQzhcc0rDrMRbHQVwPq+S5Ji32+LkcnOoMnD9xbz2NwHCsSYBMorqvsnmkEoroJ quDou1gzVlYNIyWmaZTad5irO4+TOkPCCcKlI8K/h7rM8MCsQM83n76MvONVhTyy1lfluTp efiwDo4uq6Px1ijj1senurZmfy3stCfxwjTzrzr6iKjR6neRkqPWN6J4cnXKcg/wC8xXN7O M5NdpYSVR9NVNmMTApyAKmjMQ8zf2NgBj/BM+NOoDRRHHPLYL3b4WRqwHfV0J1xgPt0QPbr ktFsjLYVO7aDV8aKn/LQPr/W4odXC107EEtuMzUMY5cDZYVEA//4TNNmSJDA== Received: from 188d6d40ca7a (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id D707268FA99 for ; Sun, 9 Nov 2025 16:25:22 +0200 (EET) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Sun, 09 Nov 2025 14:25:22 -0000 Message-ID: <176269832305.25.13954307878253090850@2cb04c0e5124> Message-ID-Hash: VI3WDY5UYFIACBTJACGXEFSK5USNRKRO X-Message-ID-Hash: VI3WDY5UYFIACBTJACGXEFSK5USNRKRO X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH] Ignore unknown blocks in WebVTT decoding (PR #20875) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: socram via ffmpeg-devel Cc: socram Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #20875 opened by socram URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20875 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20875.patch [As previously discussed in the mailing list](https://ffmpeg.org/pipermail/ffmpeg-devel/2025-July/346806.html) this change makes the decoder ignore all unknown blocks, only failing if the file header is not valid, which follows closer the official specification. This is required to properly parse for some WebVTT that use [an older specification of the "Region" chunk](https://dvcs.w3.org/hg/text-tracks/raw-file/default/608toVTT/region.html), such as https://statics.3cat.cat/multimedia/vtt/2/4/1745967620742.vtt used by 3Cat, the streaming service of a national TV broadcaster. As requested on the mailing list, this splits the patch in two for easier verification: one simply moves the logic to the function, and the other actually makes the change in logic. >>From f2c112a6e07d3aa9cb79e2d8598a409d69c4b393 Mon Sep 17 00:00:00 2001 From: Marcos Del Sol Vives Date: Sun, 9 Nov 2025 14:26:33 +0100 Subject: [PATCH 1/2] avformat/webvttdec: move cue processing logic to its own function --- libavformat/webvttdec.c | 166 ++++++++++++++++++++-------------------- 1 file changed, 85 insertions(+), 81 deletions(-) diff --git a/libavformat/webvttdec.c b/libavformat/webvttdec.c index 4ca1e939b1..2e9e9a50b4 100644 --- a/libavformat/webvttdec.c +++ b/libavformat/webvttdec.c @@ -58,6 +58,89 @@ static int64_t read_ts(const char *s) return AV_NOPTS_VALUE; } +static int webvtt_parse_cue(WebVTTContext *webvtt, AVBPrint *cue, int64_t pos) +{ + int i; + AVPacket *sub; + const char *p, *identifier, *settings; + size_t identifier_len, settings_len; + int64_t ts_start, ts_end; + + p = identifier = cue->str; + + /* ignore header chunk */ + if (!strncmp(p, "\xEF\xBB\xBFWEBVTT", 9) || + !strncmp(p, "WEBVTT", 6) || + !strncmp(p, "STYLE", 5) || + !strncmp(p, "REGION", 6) || + !strncmp(p, "NOTE", 4)) + return 0; + + /* optional cue identifier (can be a number like in SRT or some kind of + * chaptering id) */ + for (i = 0; p[i] && p[i] != '\n' && p[i] != '\r'; i++) { + if (!strncmp(p + i, "-->", 3)) { + identifier = NULL; + break; + } + } + if (!identifier) + identifier_len = 0; + else { + identifier_len = strcspn(p, "\r\n"); + p += identifier_len; + if (*p == '\r') + p++; + if (*p == '\n') + p++; + } + + /* cue timestamps */ + if ((ts_start = read_ts(p)) == AV_NOPTS_VALUE) + return AVERROR_INVALIDDATA; + if (!(p = strstr(p, "-->"))) + return AVERROR_INVALIDDATA; + p += 2; + do p++; while (*p == ' ' || *p == '\t'); + if ((ts_end = read_ts(p)) == AV_NOPTS_VALUE) + return AVERROR_INVALIDDATA; + + /* optional cue settings */ + p += strcspn(p, "\n\r\t "); + while (*p == '\t' || *p == ' ') + p++; + settings = p; + settings_len = strcspn(p, "\r\n"); + p += settings_len; + if (*p == '\r') + p++; + if (*p == '\n') + p++; + + /* create packet */ + sub = ff_subtitles_queue_insert(&webvtt->q, p, strlen(p), 0); + if (!sub) { + return AVERROR(ENOMEM); + } + sub->pos = pos; + sub->pts = ts_start; + sub->duration = ts_end - ts_start; + +#define SET_SIDE_DATA(name, type) do { \ + if (name##_len) { \ + uint8_t *buf = av_packet_new_side_data(sub, type, name##_len); \ + if (!buf) \ + return AVERROR(ENOMEM); \ + memcpy(buf, name, name##_len); \ + } \ +} while (0) + + SET_SIDE_DATA(identifier, AV_PKT_DATA_WEBVTT_IDENTIFIER); + SET_SIDE_DATA(settings, AV_PKT_DATA_WEBVTT_SETTINGS); + + return 0; +} + static int webvtt_read_header(AVFormatContext *s) { WebVTTContext *webvtt = s->priv_data; @@ -75,13 +158,6 @@ static int webvtt_read_header(AVFormatContext *s) av_bprint_init(&cue, 0, AV_BPRINT_SIZE_UNLIMITED); for (;;) { - int i; - int64_t pos; - AVPacket *sub; - const char *p, *identifier, *settings; - size_t identifier_len, settings_len; - int64_t ts_start, ts_end; - res = ff_subtitles_read_chunk(s->pb, &cue); if (res < 0) goto end; @@ -89,81 +165,9 @@ static int webvtt_read_header(AVFormatContext *s) if (!cue.len) break; - p = identifier = cue.str; - pos = avio_tell(s->pb); - - /* ignore header chunk */ - if (!strncmp(p, "\xEF\xBB\xBFWEBVTT", 9) || - !strncmp(p, "WEBVTT", 6) || - !strncmp(p, "STYLE", 5) || - !strncmp(p, "REGION", 6) || - !strncmp(p, "NOTE", 4)) - continue; - - /* optional cue identifier (can be a number like in SRT or some kind of - * chaptering id) */ - for (i = 0; p[i] && p[i] != '\n' && p[i] != '\r'; i++) { - if (!strncmp(p + i, "-->", 3)) { - identifier = NULL; - break; - } - } - if (!identifier) - identifier_len = 0; - else { - identifier_len = strcspn(p, "\r\n"); - p += identifier_len; - if (*p == '\r') - p++; - if (*p == '\n') - p++; - } - - /* cue timestamps */ - if ((ts_start = read_ts(p)) == AV_NOPTS_VALUE) + res = webvtt_parse_cue(webvtt, &cue, avio_tell(s->pb)); + if (res < 0) break; - if (!(p = strstr(p, "-->"))) - break; - p += 2; - do p++; while (*p == ' ' || *p == '\t'); - if ((ts_end = read_ts(p)) == AV_NOPTS_VALUE) - break; - - /* optional cue settings */ - p += strcspn(p, "\n\r\t "); - while (*p == '\t' || *p == ' ') - p++; - settings = p; - settings_len = strcspn(p, "\r\n"); - p += settings_len; - if (*p == '\r') - p++; - if (*p == '\n') - p++; - - /* create packet */ - sub = ff_subtitles_queue_insert(&webvtt->q, p, strlen(p), 0); - if (!sub) { - res = AVERROR(ENOMEM); - goto end; - } - sub->pos = pos; - sub->pts = ts_start; - sub->duration = ts_end - ts_start; - -#define SET_SIDE_DATA(name, type) do { \ - if (name##_len) { \ - uint8_t *buf = av_packet_new_side_data(sub, type, name##_len); \ - if (!buf) { \ - res = AVERROR(ENOMEM); \ - goto end; \ - } \ - memcpy(buf, name, name##_len); \ - } \ -} while (0) - - SET_SIDE_DATA(identifier, AV_PKT_DATA_WEBVTT_IDENTIFIER); - SET_SIDE_DATA(settings, AV_PKT_DATA_WEBVTT_SETTINGS); } ff_subtitles_queue_finalize(s, &webvtt->q); -- 2.49.1 >>From caf1c80be0fa7431ce7c3799dcae846b25180ebe Mon Sep 17 00:00:00 2001 From: Marcos Del Sol Vives Date: Sun, 9 Nov 2025 15:18:26 +0100 Subject: [PATCH 2/2] avformat/webvttdec: ignore unknown blocks --- libavformat/webvttdec.c | 38 +++++++++++++++++++++++++++----------- 1 file changed, 27 insertions(+), 11 deletions(-) diff --git a/libavformat/webvttdec.c b/libavformat/webvttdec.c index 2e9e9a50b4..19289b1e0a 100644 --- a/libavformat/webvttdec.c +++ b/libavformat/webvttdec.c @@ -68,14 +68,6 @@ static int webvtt_parse_cue(WebVTTContext *webvtt, AVBPrint *cue, int64_t pos) p = identifier = cue->str; - /* ignore header chunk */ - if (!strncmp(p, "\xEF\xBB\xBFWEBVTT", 9) || - !strncmp(p, "WEBVTT", 6) || - !strncmp(p, "STYLE", 5) || - !strncmp(p, "REGION", 6) || - !strncmp(p, "NOTE", 4)) - return 0; - /* optional cue identifier (can be a number like in SRT or some kind of * chaptering id) */ for (i = 0; p[i] && p[i] != '\n' && p[i] != '\r'; i++) { @@ -157,7 +149,28 @@ static int webvtt_read_header(AVFormatContext *s) av_bprint_init(&cue, 0, AV_BPRINT_SIZE_UNLIMITED); + res = ff_subtitles_read_chunk(s->pb, &cue); + if (res < 0) { + av_log(s, AV_LOG_ERROR, "Unable to read file header\n"); + goto end; + } + + if (!cue.len) { + av_log(s, AV_LOG_ERROR, "Unable to read file header\n"); + res = AVERROR_EOF; + goto end; + } + + if (!strncmp(cue.str, "\xEF\xBB\xBFWEBVTT", 9) && + !strncmp(cue.str, "WEBVTT", 6)) { + av_log(s, AV_LOG_ERROR, "Invalid file header\n"); + res = AVERROR_INVALIDDATA; + goto end; + } + for (;;) { + int64_t pos = avio_tell(s->pb); + res = ff_subtitles_read_chunk(s->pb, &cue); if (res < 0) goto end; @@ -165,9 +178,12 @@ static int webvtt_read_header(AVFormatContext *s) if (!cue.len) break; - res = webvtt_parse_cue(webvtt, &cue, avio_tell(s->pb)); - if (res < 0) - break; + res = webvtt_parse_cue(webvtt, &cue, pos); + if (res < 0) { + if (res != AVERROR_INVALIDDATA) + goto end; + av_log(s, AV_LOG_DEBUG, "Ignoring non-cue block at 0x%"PRIx64"\n", pos); + } } ff_subtitles_queue_finalize(s, &webvtt->q); -- 2.49.1 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org