From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 300944CAB7 for ; Tue, 27 May 2025 10:29:15 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id E9BDD68D4EB; Tue, 27 May 2025 13:29:10 +0300 (EEST) Received: from smtpout9.mo534.mail-out.ovh.net (smtpout9.mo534.mail-out.ovh.net [178.33.251.187]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 5635168C916 for ; Tue, 27 May 2025 13:29:04 +0300 (EEST) Received: from director1.derp.mail-out.ovh.net (director1.derp.mail-out.ovh.net [51.68.80.175]) by mo534.mail-out.ovh.net (Postfix) with ESMTPS id 4b684b0Vp6z1fHm; Tue, 27 May 2025 10:29:02 +0000 (UTC) Received: from director1.derp.mail-out.ovh.net (director1.derp.mail-out.ovh.net. [127.0.0.1]) by director1.derp.mail-out.ovh.net (inspect_sender_mail_agent) with SMTP for ; Tue, 27 May 2025 10:29:02 +0000 (UTC) Received: from mta10.priv.ovhmail-u1.ea.mail.ovh.net (unknown [10.110.96.5]) by director1.derp.mail-out.ovh.net (Postfix) with ESMTPS id 4b684Z3PG1z5xRL; Tue, 27 May 2025 10:29:02 +0000 (UTC) Received: from orca.pet (unknown [10.1.6.5]) by mta10.priv.ovhmail-u1.ea.mail.ovh.net (Postfix) with ESMTPSA id 18DF0DA3BAF; Tue, 27 May 2025 10:29:02 +0000 (UTC) X-OVh-ClientIp: 147.156.42.5 To: ffmpeg-devel@ffmpeg.org Date: Tue, 27 May 2025 12:28:11 +0200 Message-Id: <20250527102811.369474-1-marcos@orca.pet> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Ovh-Tracer-Id: 10659457369673127510 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: 0 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeeffedrtddtgddvtddugeculddtuddrgeefvddrtddtmdcutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfqggfjpdevjffgvefmvefgnecuuegrihhlohhuthemucehtddtnecunecujfgurhephffvvefufffkofgggfestdekredtredttdenucfhrhhomhepofgrrhgtohhsucffvghlucfuohhlucggihhvvghsuceomhgrrhgtohhssehorhgtrgdrphgvtheqnecuggftrfgrthhtvghrnhepgffhgfefvefghfetveevgffhleffjedvjeekieejgeeiuddvffetieejjeejgfegnecukfhppeduvdejrddtrddtrddupddugeejrdduheeirdegvddrheenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepihhnvghtpeduvdejrddtrddtrddupdhmrghilhhfrhhomhepmhgrrhgtohhssehorhgtrgdrphgvthdpnhgspghrtghpthhtohepvddprhgtphhtthhopehffhhmphgvghdquggvvhgvlhesfhhfmhhpvghgrdhorhhgpdhrtghpthhtohepmhgrrhgtohhssehorhgtrgdrphgvthdpoffvtefjohhsthepmhhoheefgegmpdhmohguvgepshhmthhpohhuth Subject: [FFmpeg-devel] [PATCH] avformat/webvttdec: improve WebVTT parsing X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Marcos Del Sol Vives via ffmpeg-devel Reply-To: FFmpeg development discussions and patches Cc: Marcos Del Sol Vives Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: The parser will now strictly check if WebVTT files start with the correct "WEBVTT" marker. Before, files were not checked if they truly started with it. It will also now ignore all non-cue blocks, instead of only a hardcoded list. This is closer to the specification that calls for no action if unknown blocks are encountered. Signed-off-by: Marcos Del Sol Vives --- libavformat/webvttdec.c | 178 ++++++++++++++++++++++------------------ 1 file changed, 98 insertions(+), 80 deletions(-) diff --git a/libavformat/webvttdec.c b/libavformat/webvttdec.c index 6feda1585e..b454b2c1cf 100644 --- a/libavformat/webvttdec.c +++ b/libavformat/webvttdec.c @@ -58,6 +58,79 @@ static int64_t read_ts(const char *s) return AV_NOPTS_VALUE; } +static int webvtt_parse_cue(WebVTTContext *webvtt, AVBPrint *cue, int64_t pos) +{ + int i; + AVPacket *sub; + const char *p, *identifier, *settings; + size_t identifier_len, settings_len; + int64_t ts_start, ts_end; + + p = identifier = cue->str; + + /* optional cue identifier (can be a number like in SRT or some kind of + * chaptering id) */ + for (i = 0; p[i] && p[i] != '\n' && p[i] != '\r'; i++) { + if (!strncmp(p + i, "-->", 3)) { + identifier = NULL; + break; + } + } + if (!identifier) + identifier_len = 0; + else { + identifier_len = strcspn(p, "\r\n"); + p += identifier_len; + if (*p == '\r') + p++; + if (*p == '\n') + p++; + } + + /* cue timestamps */ + if ((ts_start = read_ts(p)) == AV_NOPTS_VALUE) + return AVERROR_INVALIDDATA; + if (!(p = strstr(p, "-->"))) + return AVERROR_INVALIDDATA; + p += 2; + do p++; while (*p == ' ' || *p == '\t'); + if ((ts_end = read_ts(p)) == AV_NOPTS_VALUE) + return AVERROR_INVALIDDATA; + + /* optional cue settings */ + p += strcspn(p, "\n\r\t "); + while (*p == '\t' || *p == ' ') + p++; + settings = p; + settings_len = strcspn(p, "\r\n"); + p += settings_len; + if (*p == '\r') + p++; + if (*p == '\n') + p++; + + /* create packet */ + sub = ff_subtitles_queue_insert(&webvtt->q, p, strlen(p), 0); + if (!sub) + return AVERROR(ENOMEM); + sub->pos = pos; + sub->pts = ts_start; + sub->duration = ts_end - ts_start; + +#define SET_SIDE_DATA(name, type) do { \ + if (name##_len) { \ + uint8_t *buf = av_packet_new_side_data(sub, type, name##_len); \ + if (!buf) \ + return AVERROR(ENOMEM); \ + memcpy(buf, name, name##_len); \ + } \ +} while (0) + + SET_SIDE_DATA(identifier, AV_PKT_DATA_WEBVTT_IDENTIFIER); + SET_SIDE_DATA(settings, AV_PKT_DATA_WEBVTT_SETTINGS); + return 0; +} + static int webvtt_read_header(AVFormatContext *s) { WebVTTContext *webvtt = s->priv_data; @@ -74,13 +147,27 @@ static int webvtt_read_header(AVFormatContext *s) av_bprint_init(&cue, 0, AV_BPRINT_SIZE_UNLIMITED); + res = ff_subtitles_read_chunk(s->pb, &cue); + if (res < 0) { + av_log(s, AV_LOG_ERROR, "Unable to read file header\n"); + goto end; + } + + if (!cue.len) { + av_log(s, AV_LOG_ERROR, "Unable to read file header\n"); + res = AVERROR_EOF; + goto end; + } + + if (!strncmp(cue.str, "\xEF\xBB\xBFWEBVTT", 9) && + !strncmp(cue.str, "WEBVTT", 6)) { + av_log(s, AV_LOG_ERROR, "Invalid file header\n"); + res = AVERROR_INVALIDDATA; + goto end; + } + for (;;) { - int i; - int64_t pos; - AVPacket *sub; - const char *p, *identifier, *settings; - size_t identifier_len, settings_len; - int64_t ts_start, ts_end; + int64_t pos = avio_tell(s->pb); res = ff_subtitles_read_chunk(s->pb, &cue); if (res < 0) @@ -89,81 +176,12 @@ static int webvtt_read_header(AVFormatContext *s) if (!cue.len) break; - p = identifier = cue.str; - pos = avio_tell(s->pb); - - /* ignore header chunk */ - if (!strncmp(p, "\xEF\xBB\xBFWEBVTT", 9) || - !strncmp(p, "WEBVTT", 6) || - !strncmp(p, "STYLE", 5) || - !strncmp(p, "REGION", 6) || - !strncmp(p, "NOTE", 4)) - continue; - - /* optional cue identifier (can be a number like in SRT or some kind of - * chaptering id) */ - for (i = 0; p[i] && p[i] != '\n' && p[i] != '\r'; i++) { - if (!strncmp(p + i, "-->", 3)) { - identifier = NULL; - break; - } - } - if (!identifier) - identifier_len = 0; - else { - identifier_len = strcspn(p, "\r\n"); - p += identifier_len; - if (*p == '\r') - p++; - if (*p == '\n') - p++; + res = webvtt_parse_cue(webvtt, &cue, pos); + if (res < 0) { + if (res != AVERROR_INVALIDDATA) + goto end; + av_log(s, AV_LOG_DEBUG, "Ignoring non-cue block at 0x%"PRIx64"\n", pos); } - - /* cue timestamps */ - if ((ts_start = read_ts(p)) == AV_NOPTS_VALUE) - break; - if (!(p = strstr(p, "-->"))) - break; - p += 2; - do p++; while (*p == ' ' || *p == '\t'); - if ((ts_end = read_ts(p)) == AV_NOPTS_VALUE) - break; - - /* optional cue settings */ - p += strcspn(p, "\n\r\t "); - while (*p == '\t' || *p == ' ') - p++; - settings = p; - settings_len = strcspn(p, "\r\n"); - p += settings_len; - if (*p == '\r') - p++; - if (*p == '\n') - p++; - - /* create packet */ - sub = ff_subtitles_queue_insert(&webvtt->q, p, strlen(p), 0); - if (!sub) { - res = AVERROR(ENOMEM); - goto end; - } - sub->pos = pos; - sub->pts = ts_start; - sub->duration = ts_end - ts_start; - -#define SET_SIDE_DATA(name, type) do { \ - if (name##_len) { \ - uint8_t *buf = av_packet_new_side_data(sub, type, name##_len); \ - if (!buf) { \ - res = AVERROR(ENOMEM); \ - goto end; \ - } \ - memcpy(buf, name, name##_len); \ - } \ -} while (0) - - SET_SIDE_DATA(identifier, AV_PKT_DATA_WEBVTT_IDENTIFIER); - SET_SIDE_DATA(settings, AV_PKT_DATA_WEBVTT_SETTINGS); } ff_subtitles_queue_finalize(s, &webvtt->q); -- 2.34.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".