* [FFmpeg-devel] [PATCH 1/3] lavf/subtitles: Do not eat \n\n @ 2024-03-28 22:55 Tomas Härdin 2024-03-28 22:56 ` [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input Tomas Härdin ` (3 more replies) 0 siblings, 4 replies; 20+ messages in thread From: Tomas Härdin @ 2024-03-28 22:55 UTC (permalink / raw) To: ffmpeg-devel _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input 2024-03-28 22:55 [FFmpeg-devel] [PATCH 1/3] lavf/subtitles: Do not eat \n\n Tomas Härdin @ 2024-03-28 22:56 ` Tomas Härdin 2024-03-28 22:57 ` Tomas Härdin 2024-03-28 22:57 ` [FFmpeg-devel] [PATCH 1/3] lavf/subtitles: Do not eat \n\n Tomas Härdin ` (2 subsequent siblings) 3 siblings, 1 reply; 20+ messages in thread From: Tomas Härdin @ 2024-03-28 22:56 UTC (permalink / raw) To: FFmpeg development discussions and patches I am once again asking more people on this list to peruse https://langsec.org/ /Tomas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input 2024-03-28 22:56 ` [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input Tomas Härdin @ 2024-03-28 22:57 ` Tomas Härdin 2024-03-29 23:35 ` Michael Niedermayer 0 siblings, 1 reply; 20+ messages in thread From: Tomas Härdin @ 2024-03-28 22:57 UTC (permalink / raw) To: FFmpeg development discussions and patches [-- Attachment #1: Type: text/plain, Size: 13 bytes --] Here as well [-- Attachment #2: 0002-lavf-srtdec-Permit-streaming-input.patch --] [-- Type: text/x-patch, Size: 11574 bytes --] From 6d0684ca6fe02d80fc07a622fb85445a6917c29f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomas=20H=C3=A4rdin?= <git@haerdin.se> Date: Thu, 28 Mar 2024 22:15:18 +0100 Subject: [PATCH 2/3] lavf/srtdec: Permit streaming input This is largely a rewrite. Read packets in srt_read_packet() rather than reading the entire file in srt_read_header(). Rely on AVFMT_GENERIC_INDEX for seeking. Allow zero-length packets (same as WebVTT). The implementation before this is broken in at least the following ways: Decimals like .999999 are silently accepted and converted to 999.999 seconds. This is because no verification is done on the milliseconds part. This patch enforces that the minutes and seconds parts are 00-59, and the milliseconds 000-999. It's not perfect since FFmpeg doesn't have regex functionality or indeed any kind of parsing framework, but it's better than before. Segmenting cues by lines that consist of just a single integer is incredibly wrong, since the subtitle text itself may well contain lines that are just a lone integer. This means files written with CR line endings that have text with lone integers are parsed in a completely broken manner. Neither can we segment by lines containing --> since this is permissible in SubRip (as far as I can tell). WebVTT explicitly forbids it however. --- libavformat/srtdec.c | 211 ++++++++++++++++--------------- tests/ref/fate/sub-srt-rrn-remux | 4 + 2 files changed, 116 insertions(+), 99 deletions(-) diff --git a/libavformat/srtdec.c b/libavformat/srtdec.c index 6bf73599a7..c74d40b726 100644 --- a/libavformat/srtdec.c +++ b/libavformat/srtdec.c @@ -28,7 +28,8 @@ #include "libavutil/intreadwrite.h" typedef struct { - FFDemuxSubtitlesQueue q; + AVBPrint buf; + FFTextReader tr; } SRTContext; static int srt_probe(const AVProbeData *p) @@ -72,54 +73,66 @@ struct event_info { static int get_event_info(const char *line, struct event_info *ei) { - int hh1, mm1, ss1, ms1; - int hh2, mm2, ss2, ms2; - - ei->x1 = ei->x2 = ei->y1 = ei->y2 = ei->duration = -1; - ei->pts = AV_NOPTS_VALUE; - ei->pos = -1; - if (sscanf(line, "%d:%d:%d%*1[,.]%d --> %d:%d:%d%*1[,.]%d" + unsigned int hh1, mm1, ss1, ms1; + unsigned int hh2, mm2, ss2, ms2; + int n, m = 0; + + // require exactly two digits for mm and ss, three digits for ms + n = sscanf(line, "%*u:%*1u%*1u:%*1u%*1u%*1[,.]%*1u%*1u%*1u --> %*u:%*1u%*1u:%*1u%*1u%*1[,.]%*1u%*1u%*1u%n", &m); + if (n < 0 || m <= 0) + return -1; + n = sscanf(line, "%u:%u:%u%*1[,.]%u --> %u:%u:%u%*1[,.]%u" "%*[ ]X1:%"PRId32" X2:%"PRId32" Y1:%"PRId32" Y2:%"PRId32, &hh1, &mm1, &ss1, &ms1, &hh2, &mm2, &ss2, &ms2, - &ei->x1, &ei->x2, &ei->y1, &ei->y2) >= 8) { + &ei->x1, &ei->x2, &ei->y1, &ei->y2); + // do not accept partial positions (9 <= n <= 11) + if ((n == 8 || n == 12) && + // require timestamps to be well-formed + mm1 < 60 && ss1 < 60 && ms1 < 1000 && + mm2 < 60 && ss2 < 60 && ms2 < 1000) { + // hh is converted to long long before the multiplication + // hence this cannot overflow even if hh == UINT_MAX const int64_t start = (hh1*3600LL + mm1*60LL + ss1) * 1000LL + ms1; const int64_t end = (hh2*3600LL + mm2*60LL + ss2) * 1000LL + ms2; - ei->duration = end - start; - ei->pts = start; - return 0; + // accept start == end (hidden subtitles) since a FATE test requires it + // WebVTT by contrast does not allow this + if (start <= end) { + ei->duration = end - start; + ei->pts = start; + return 0; + } } return -1; } -static int add_event(FFDemuxSubtitlesQueue *q, AVBPrint *buf, char *line_cache, - const struct event_info *ei, int append_cache) +static int output_packet(AVBPrint *buf, + const struct event_info *ei, AVPacket *pkt) { - if (append_cache && line_cache[0]) - av_bprintf(buf, "%s\n", line_cache); - line_cache[0] = 0; + int ret; if (!av_bprint_is_complete(buf)) return AVERROR(ENOMEM); + // len < size throughout this loop, so we don't need to + // call av_bprint_is_complete() twice like the old code did while (buf->len > 0 && buf->str[buf->len - 1] == '\n') buf->str[--buf->len] = 0; - if (buf->len) { - AVPacket *sub = ff_subtitles_queue_insert_bprint(q, buf, 0); - if (!sub) - return AVERROR(ENOMEM); - av_bprint_clear(buf); - sub->pos = ei->pos; - sub->pts = ei->pts; - sub->duration = ei->duration; - if (ei->x1 != -1) { - uint8_t *p = av_packet_new_side_data(sub, AV_PKT_DATA_SUBTITLE_POSITION, 16); - if (p) { - AV_WL32(p, ei->x1); - AV_WL32(p + 4, ei->y1); - AV_WL32(p + 8, ei->x2); - AV_WL32(p + 12, ei->y2); - } + ret = av_new_packet(pkt, buf->len); + if (ret) + return ret; + memcpy(pkt->data, buf->str, buf->len); + pkt->flags |= AV_PKT_FLAG_KEY; + pkt->pos = ei->pos; + pkt->dts = pkt->pts = ei->pts; + pkt->duration = ei->duration; + if (ei->x1 != -1) { + uint8_t *p = av_packet_new_side_data(pkt, AV_PKT_DATA_SUBTITLE_POSITION, 16); + if (p) { + AV_WL32(p, ei->x1); + AV_WL32(p + 4, ei->y1); + AV_WL32(p + 8, ei->x2); + AV_WL32(p + 12, ei->y2); } } @@ -129,14 +142,8 @@ static int add_event(FFDemuxSubtitlesQueue *q, AVBPrint *buf, char *line_cache, static int srt_read_header(AVFormatContext *s) { SRTContext *srt = s->priv_data; - AVBPrint buf; AVStream *st = avformat_new_stream(s, NULL); - int res = 0; - char line[4096], line_cache[4096]; - int has_event_info = 0; - struct event_info ei; - FFTextReader tr; - ff_text_init_avio(s, &tr, s->pb); + ff_text_init_avio(s, &srt->tr, s->pb); if (!st) return AVERROR(ENOMEM); @@ -144,84 +151,90 @@ static int srt_read_header(AVFormatContext *s) st->codecpar->codec_type = AVMEDIA_TYPE_SUBTITLE; st->codecpar->codec_id = AV_CODEC_ID_SUBRIP; - av_bprint_init(&buf, 0, AV_BPRINT_SIZE_UNLIMITED); + av_bprint_init(&srt->buf, 0, AV_BPRINT_SIZE_UNLIMITED); - line_cache[0] = 0; + return 0; +} - while (!ff_text_eof(&tr)) { - struct event_info tmp_ei; - const int64_t pos = ff_text_pos(&tr); - ptrdiff_t len = ff_subtitles_read_line(&tr, line, sizeof(line)); +static int srt_read_packet(AVFormatContext *s, AVPacket *pkt) +{ + SRTContext *srt = s->priv_data; + struct event_info ei = {-1, -1, -1, -1}; + /** + * Three states: + * + * 0: discarding any leading blank lines + * 1: have cue number, will read event info + * 2: have event info, will read lines until blank line or EOF + */ + int state = 0; + av_bprint_clear(&srt->buf); + + if (ff_text_eof(&srt->tr)) + return AVERROR_EOF; + + while (!ff_text_eof(&srt->tr)) { + char line[4096]; + const int64_t pos = ff_text_pos(&srt->tr); + ptrdiff_t len = ff_subtitles_read_line(&srt->tr, line, sizeof(line)); + + if (len < 0) { + return len; + } - if (len < 0) - break; + if (!len || !line[0]) { + if (state == 0) + continue; // skip leading blank lines + else + break; + } - if (!len || !line[0]) + if (state == 0) { + state = 1; + ei.pos = pos; + // we don't care about the line number continue; - - if (get_event_info(line, &tmp_ei) < 0) { - char *pline; - - if (!has_event_info) - continue; - - if (line_cache[0]) { - /* We got some cache and a new line so we assume the cached - * line was actually part of the payload */ - av_bprintf(&buf, "%s\n", line_cache); - line_cache[0] = 0; + } else if (state == 1) { + // expect event into + int ret = get_event_info(line, &ei); + if (ret < 0) { + av_log(s, AV_LOG_ERROR, "malformed event info\n"); + return AVERROR_INVALIDDATA; } - - /* If the line doesn't start with a number, we assume it's part of - * the payload, otherwise is likely an event number preceding the - * timing information... but we can't be sure of this yet, so we - * cache it */ - if (strtol(line, &pline, 10) < 0 || line == pline) - av_bprintf(&buf, "%s\n", line); - else - strcpy(line_cache, line); + state = 2; } else { - if (has_event_info) { - /* We have the information of previous event, append it to the - * queue. We insert the cached line if and only if the payload - * is empty and the cached line is not a standalone number. */ - char *pline = NULL; - const int standalone_number = strtol(line_cache, &pline, 10) >= 0 && pline && !*pline; - res = add_event(&srt->q, &buf, line_cache, &ei, !buf.len && !standalone_number); - if (res < 0) - goto end; - } else { - has_event_info = 1; - } - tmp_ei.pos = pos; - ei = tmp_ei; + av_bprintf(&srt->buf, "%s\n", line); } } - /* Append the last event. Here we force the cache to be flushed, because a - * trailing number is more likely to be geniune (for example a copyright - * date) and not the event index of an inexistant event */ - if (has_event_info) { - res = add_event(&srt->q, &buf, line_cache, &ei, 1); - if (res < 0) - goto end; + if (state == 2) { + return output_packet(&srt->buf, &ei, pkt); + } else if (state != 0) { + // we got a line number but no event info + av_log(s, AV_LOG_ERROR, "malformed end of file\n"); + return AVERROR_INVALIDDATA; + } else { + // ignore trailing blank lines + return AVERROR_EOF; } +} - ff_subtitles_queue_finalize(s, &srt->q); - -end: - av_bprint_finalize(&buf, NULL); - return res; +static int srt_read_close(AVFormatContext *s) +{ + SRTContext *srt = s->priv_data; + av_bprint_finalize(&srt->buf, NULL); + return 0; } + const FFInputFormat ff_srt_demuxer = { .p.name = "srt", .p.long_name = NULL_IF_CONFIG_SMALL("SubRip subtitle"), .priv_data_size = sizeof(SRTContext), + .p.flags = AVFMT_GENERIC_INDEX, .flags_internal = FF_INFMT_FLAG_INIT_CLEANUP, .read_probe = srt_probe, .read_header = srt_read_header, - .read_packet = ff_subtitles_read_packet, - .read_seek2 = ff_subtitles_read_seek, - .read_close = ff_subtitles_read_close, + .read_packet = srt_read_packet, + .read_close = srt_read_close, }; diff --git a/tests/ref/fate/sub-srt-rrn-remux b/tests/ref/fate/sub-srt-rrn-remux index 1cb66d21bc..48c49e6fa1 100644 --- a/tests/ref/fate/sub-srt-rrn-remux +++ b/tests/ref/fate/sub-srt-rrn-remux @@ -2111,3 +2111,7 @@ girly, girly girl ¶ 00:20:53,053 --> 00:20:55,954 ¶ girly, girly girl-- ¶ +452 +00:20:56,000 --> 00:20:59,128 + + -- 2.39.2 [-- Attachment #3: Type: text/plain, Size: 251 bytes --] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input 2024-03-28 22:57 ` Tomas Härdin @ 2024-03-29 23:35 ` Michael Niedermayer 2024-03-30 0:03 ` Tomas Härdin 2024-03-30 8:31 ` Tomas Härdin 0 siblings, 2 replies; 20+ messages in thread From: Michael Niedermayer @ 2024-03-29 23:35 UTC (permalink / raw) To: FFmpeg development discussions and patches [-- Attachment #1.1: Type: text/plain, Size: 3055 bytes --] On Thu, Mar 28, 2024 at 11:57:57PM +0100, Tomas Härdin wrote: > Here as well > libavformat/srtdec.c | 211 ++++++++++++++++++++------------------- > tests/ref/fate/sub-srt-rrn-remux | 4 > 2 files changed, 116 insertions(+), 99 deletions(-) > 699d8b957286e190de6d5ca5cd17e67bb59dab7c 0002-lavf-srtdec-Permit-streaming-input.patch > From 6d0684ca6fe02d80fc07a622fb85445a6917c29f Mon Sep 17 00:00:00 2001 > From: =?UTF-8?q?Tomas=20H=C3=A4rdin?= <git@haerdin.se> > Date: Thu, 28 Mar 2024 22:15:18 +0100 > Subject: [PATCH 2/3] lavf/srtdec: Permit streaming input > > This is largely a rewrite. > > Read packets in srt_read_packet() rather than reading the entire file in srt_read_header(). > Rely on AVFMT_GENERIC_INDEX for seeking. > Allow zero-length packets (same as WebVTT). > The implementation before this is broken in at least the following ways: > > Decimals like .999999 are silently accepted and converted to 999.999 seconds. > This is because no verification is done on the milliseconds part. > This patch enforces that the minutes and seconds parts are 00-59, and the milliseconds 000-999. > It's not perfect since FFmpeg doesn't have regex functionality or indeed any kind of parsing framework, > but it's better than before. > > Segmenting cues by lines that consist of just a single integer is incredibly wrong, > since the subtitle text itself may well contain lines that are just a lone integer. > This means files written with CR line endings that have text with lone integers are > parsed in a completely broken manner. Neither can we segment by lines containing --> > since this is permissible in SubRip (as far as I can tell). WebVTT explicitly forbids it however. > --- > libavformat/srtdec.c | 211 ++++++++++++++++--------------- > tests/ref/fate/sub-srt-rrn-remux | 4 + > 2 files changed, 116 insertions(+), 99 deletions(-) breaks fate here: --- ./tests/ref/fate/sub-srt-madness-timeshift 2024-03-29 20:43:34.617419731 +0100 +++ tests/data/fate/sub-srt-madness-timeshift 2024-03-30 00:30:08.776949369 +0100 @@ -3,34 +3,7 @@ okay, let's make things easy 2 -00:00:05,160 --> 00:00:05,263 -31 i'm a number but the only payload so please keep me :) - -3 00:00:06,473 --> 00:00:07,584 hello 5 -don't forget me. - -4 -00:00:08,695 --> 00:00:09,806 -no. -let's add some fun - -5 -00:00:10,917 --> 00:00:12,028 -let's do it in reverse bc wtf not -45 yes this is a number but i'm actually part of the sub - -6 -00:00:12,028 --> 00:00:13,139 -1 -0 -next is negative, not a chapnum ;) --1 - -7 -00:00:13,241 --> 00:00:13,263 -credits -2015 Test sub-srt-madness-timeshift failed. Look at tests/data/fate/sub-srt-madness-timeshift.err for details. tests/Makefile:309: recipe for target 'fate-sub-srt-madness-timeshift' failed make: *** [fate-sub-srt-madness-timeshift] Error 1 [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB You can kill me, but you cannot change the truth. [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] [-- Attachment #2: Type: text/plain, Size: 251 bytes --] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input 2024-03-29 23:35 ` Michael Niedermayer @ 2024-03-30 0:03 ` Tomas Härdin 2024-03-30 8:31 ` Tomas Härdin 1 sibling, 0 replies; 20+ messages in thread From: Tomas Härdin @ 2024-03-30 0:03 UTC (permalink / raw) To: FFmpeg development discussions and patches lör 2024-03-30 klockan 00:35 +0100 skrev Michael Niedermayer: > On Thu, Mar 28, 2024 at 11:57:57PM +0100, Tomas Härdin wrote: > > Here as well > > > libavformat/srtdec.c | 211 ++++++++++++++++++++------ > > ------------- > > tests/ref/fate/sub-srt-rrn-remux | 4 > > 2 files changed, 116 insertions(+), 99 deletions(-) > > 699d8b957286e190de6d5ca5cd17e67bb59dab7c 0002-lavf-srtdec-Permit- > > streaming-input.patch > > From 6d0684ca6fe02d80fc07a622fb85445a6917c29f Mon Sep 17 00:00:00 > > 2001 > > From: =?UTF-8?q?Tomas=20H=C3=A4rdin?= <git@haerdin.se> > > Date: Thu, 28 Mar 2024 22:15:18 +0100 > > Subject: [PATCH 2/3] lavf/srtdec: Permit streaming input > > > > This is largely a rewrite. > > > > Read packets in srt_read_packet() rather than reading the entire > > file in srt_read_header(). > > Rely on AVFMT_GENERIC_INDEX for seeking. > > Allow zero-length packets (same as WebVTT). > > The implementation before this is broken in at least the following > > ways: > > > > Decimals like .999999 are silently accepted and converted to > > 999.999 seconds. > > This is because no verification is done on the milliseconds part. > > This patch enforces that the minutes and seconds parts are 00-59, > > and the milliseconds 000-999. > > It's not perfect since FFmpeg doesn't have regex functionality or > > indeed any kind of parsing framework, > > but it's better than before. > > > > Segmenting cues by lines that consist of just a single integer is > > incredibly wrong, > > since the subtitle text itself may well contain lines that are just > > a lone integer. > > This means files written with CR line endings that have text with > > lone integers are > > parsed in a completely broken manner. Neither can we segment by > > lines containing --> > > since this is permissible in SubRip (as far as I can tell). WebVTT > > explicitly forbids it however. > > --- > > libavformat/srtdec.c | 211 ++++++++++++++++----------- > > ---- > > tests/ref/fate/sub-srt-rrn-remux | 4 + > > 2 files changed, 116 insertions(+), 99 deletions(-) > > breaks fate here: > > --- ./tests/ref/fate/sub-srt-madness-timeshift 2024-03-29 > 20:43:34.617419731 +0100 > +++ tests/data/fate/sub-srt-madness-timeshift 2024-03-30 > 00:30:08.776949369 +0100 > @@ -3,34 +3,7 @@ > okay, let's make things easy > > 2 > -00:00:05,160 --> 00:00:05,263 > -31 i'm a number but the only payload so please keep me :) > - > -3 > 00:00:06,473 --> 00:00:07,584 > hello > 5 > -don't forget me. > - > -4 > -00:00:08,695 --> 00:00:09,806 > -no. > -let's add some fun > - > -5 > -00:00:10,917 --> 00:00:12,028 > -let's do it in reverse bc wtf not > -45 yes this is a number but i'm actually part of the sub > - > -6 > -00:00:12,028 --> 00:00:13,139 > -1 > -0 > -next is negative, not a chapnum ;) > --1 > - > -7 > -00:00:13,241 --> 00:00:13,263 > -credits > -2015 > > Test sub-srt-madness-timeshift failed. Look at tests/data/fate/sub- > srt-madness-timeshift.err for details. > tests/Makefile:309: recipe for target 'fate-sub-srt-madness- > timeshift' failed > make: *** [fate-sub-srt-madness-timeshift] Error 1 Can confirm. I was busy with fate-sub-srt-rrn-remux which popped up when running fate that I forgot to run a full fate after fixing it. Will submit a new patch series later /Tomas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input 2024-03-29 23:35 ` Michael Niedermayer 2024-03-30 0:03 ` Tomas Härdin @ 2024-03-30 8:31 ` Tomas Härdin 2024-03-30 11:36 ` Paul B Mahol 2024-03-30 11:44 ` Nicolas George 1 sibling, 2 replies; 20+ messages in thread From: Tomas Härdin @ 2024-03-30 8:31 UTC (permalink / raw) To: FFmpeg development discussions and patches lör 2024-03-30 klockan 00:35 +0100 skrev Michael Niedermayer: > breaks fate here: > > --- ./tests/ref/fate/sub-srt-madness-timeshift 2024-03-29 > 20:43:34.617419731 +0100 > +++ tests/data/fate/sub-srt-madness-timeshift 2024-03-30 Sorry but this file is utter crap and shouldn't be part of FATE. Look at this: > 53 > 00:00:01,111 --> 00:00:02,222 > okay, let's make things easy > > 21 lol jk > 00:00:03,333 --> 00:00:04,444 > hello > 5 > > > don't forget me. > 3 > > > 00:00:05,555 --> 00:00:06,666 > > > no. > let's add some fun > 44 yes we can > 00:00:07,777 --> 00:00:06,666 > let's do it in reverse bc wtf not > This file is crafted to test srtdec as it is, rather than following what passes for an SRT spec (that doom9 forum post[1] and the videolan wiki[2]). We should remove it, or keep it as a sample that should fail parsing. More interesting is fate-sub-srt-badsyntax. Despite the name it doesn't really have bad syntax, but its cues are in a random order. I guess it exists to test the cue sorting logic. But should subtitle demuxers really sort subtitles in this way? We don't do that for other formats that can have non-decreasing timestamps. For comparison, the WebVTT spec explicitly disallows decreasing timestamps. I also wonder how this file was created. My guess is "via a text editor" since it seems to consist of bits from different SRT files. There's little reason to support such deliberately broken files, at least having a bunch of sorting logic just for it. There's no reason we couldn't output packets in stored order. The rest of the subtitle test cases pass. /Tomas [1] https://forum.doom9.org/showthread.php?p=470941#post470941 [2] https://wiki.videolan.org/SubRip/ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input 2024-03-30 8:31 ` Tomas Härdin @ 2024-03-30 11:36 ` Paul B Mahol 2024-03-30 11:44 ` Nicolas George 1 sibling, 0 replies; 20+ messages in thread From: Paul B Mahol @ 2024-03-30 11:36 UTC (permalink / raw) To: FFmpeg development discussions and patches On Sat, Mar 30, 2024 at 9:31 AM Tomas Härdin <git@haerdin.se> wrote: > lör 2024-03-30 klockan 00:35 +0100 skrev Michael Niedermayer: > > breaks fate here: > > > > --- ./tests/ref/fate/sub-srt-madness-timeshift 2024-03-29 > > 20:43:34.617419731 +0100 > > +++ tests/data/fate/sub-srt-madness-timeshift 2024-03-30 > > Sorry but this file is utter crap and shouldn't be part of FATE. Look > at this: > > > 53 > > 00:00:01,111 --> 00:00:02,222 > > okay, let's make things easy > > > > 21 lol jk > > 00:00:03,333 --> 00:00:04,444 > > hello > > 5 > > > > > > don't forget me. > > 3 > > > > > > 00:00:05,555 --> 00:00:06,666 > > > > > > no. > > let's add some fun > > 44 yes we can > > 00:00:07,777 --> 00:00:06,666 > > let's do it in reverse bc wtf not > > > > This file is crafted to test srtdec as it is, rather than following > what passes for an SRT spec (that doom9 forum post[1] and the videolan > wiki[2]). We should remove it, or keep it as a sample that should fail > parsing. > > More interesting is fate-sub-srt-badsyntax. Despite the name it doesn't > really have bad syntax, but its cues are in a random order. I guess it > exists to test the cue sorting logic. But should subtitle demuxers > really sort subtitles in this way? We don't do that for other formats > that can have non-decreasing timestamps. For comparison, the WebVTT > spec explicitly disallows decreasing timestamps. I also wonder how this > file was created. My guess is "via a text editor" since it seems to > consist of bits from different SRT files. There's little reason to > support such deliberately broken files, at least having a bunch of > sorting logic just for it. There's no reason we couldn't output packets > in stored order. > > The rest of the subtitle test cases pass. > I agree, current subtitles not being handled in streamed way is bad practice. If some subtitle have wrong order of items, than new generic subtitle filter could be implemented which would fix that by either using queue or seeking around in subtitle stream. > > /Tomas > > [1] https://forum.doom9.org/showthread.php?p=470941#post470941 > [2] https://wiki.videolan.org/SubRip/ > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input 2024-03-30 8:31 ` Tomas Härdin 2024-03-30 11:36 ` Paul B Mahol @ 2024-03-30 11:44 ` Nicolas George 2024-03-30 14:44 ` Tomas Härdin 1 sibling, 1 reply; 20+ messages in thread From: Nicolas George @ 2024-03-30 11:44 UTC (permalink / raw) To: FFmpeg development discussions and patches Tomas Härdin (12024-03-30): > More interesting is fate-sub-srt-badsyntax. Despite the name it doesn't > really have bad syntax, but its cues are in a random order. I guess it > exists to test the cue sorting logic. But should subtitle demuxers > really sort subtitles in this way? We don't do that for other formats > that can have non-decreasing timestamps. For comparison, the WebVTT > spec explicitly disallows decreasing timestamps. On the other hand, I remember seeing a lot of ASS files from the fansub world where titles, signs and karaokes are added at the end after the dialogues, relying on sorting by the player. (But I guess in the New and Improved FFmpeg, files originating from the fansub world are not worth our time, it is enough to be able to decode files for Crunchyroll…) -- Nicolas George _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input 2024-03-30 11:44 ` Nicolas George @ 2024-03-30 14:44 ` Tomas Härdin 2024-03-30 14:49 ` Nicolas George 0 siblings, 1 reply; 20+ messages in thread From: Tomas Härdin @ 2024-03-30 14:44 UTC (permalink / raw) To: FFmpeg development discussions and patches lör 2024-03-30 klockan 12:44 +0100 skrev Nicolas George: > Tomas Härdin (12024-03-30): > > More interesting is fate-sub-srt-badsyntax. Despite the name it > > doesn't > > really have bad syntax, but its cues are in a random order. I guess > > it > > exists to test the cue sorting logic. But should subtitle demuxers > > really sort subtitles in this way? We don't do that for other > > formats > > that can have non-decreasing timestamps. For comparison, the WebVTT > > spec explicitly disallows decreasing timestamps. > > On the other hand, I remember seeing a lot of ASS files from the > fansub > world where titles, signs and karaokes are added at the end after the > dialogues, relying on sorting by the player. Players can implement sorting if they wish. Why should we misrepresent what the file says? These people could also fix their workflows, put karaoke lyrics in a separate stream etc.. Business logic does not belong in lavf, and certainly not deep within demuxers. > (But I guess in the New and Improved FFmpeg, files originating from > the > fansub world are not worth our time, it is enough to be able to > decode > files for Crunchyroll…) Snark doesn't help your case. One potential solution is to do this style of parsing when the input is non-seekable. But then we have the silly situation where streamed and non-streamed behavior differs considerably. Another way could be to move the sorting further up, into demux.c or so, extending the generic index functionality. Finally I will note that sorting does not happen when subtitles are muxed in say mkv or avi, so the behavior is not even consistent across demuxers that support subtitles. With logic further up, and proper discarding, sorting could be done there as well. /Tomas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input 2024-03-30 14:44 ` Tomas Härdin @ 2024-03-30 14:49 ` Nicolas George 2024-03-30 15:23 ` Tomas Härdin 0 siblings, 1 reply; 20+ messages in thread From: Nicolas George @ 2024-03-30 14:49 UTC (permalink / raw) To: FFmpeg development discussions and patches Tomas Härdin (12024-03-30): > Players can implement sorting if they wish. API break. > One potential solution is to do this style of parsing when the input is > non-seekable. But then we have the silly situation where streamed and > non-streamed behavior differs considerably. Sure, better break both cases than only one. > Finally I will note that sorting does not happen when subtitles are > muxed in say mkv or avi, so the behavior is not even consistent across > demuxers that support subtitles. AVI or MKV demuxer do not sort their packets because the packets are supposed to be already sorted; ASS demuxer sorts its packets because there is no guarantee the text are sorted in the file. [exploding head emoji] -- Nicolas George _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input 2024-03-30 14:49 ` Nicolas George @ 2024-03-30 15:23 ` Tomas Härdin 2024-03-30 15:34 ` Nicolas George 2024-03-30 16:02 ` Andreas Rheinhardt 0 siblings, 2 replies; 20+ messages in thread From: Tomas Härdin @ 2024-03-30 15:23 UTC (permalink / raw) To: FFmpeg development discussions and patches lör 2024-03-30 klockan 15:49 +0100 skrev Nicolas George: > Tomas Härdin (12024-03-30): > > Players can implement sorting if they wish. > > API break. lavf's API provides no guarantees regarding presentation order > > > Finally I will note that sorting does not happen when subtitles are > > muxed in say mkv or avi, so the behavior is not even consistent > > across > > demuxers that support subtitles. > > AVI or MKV demuxer do not sort their packets because the packets are > supposed to be already sorted; "Supposed to" is doing a lot of work here. IIRC AVI is fundamentally incapable of providing out-of-order anything, this is true (B-frames being notably haram in AVI). It is however capable of providing poorly muxed files. For example it is perfecectly legal in AVI to mux all video, then all audio, rather than the typical case where audio and video are interleaved (the I in AVI). The same goes for many formats. MOV supports basically any ordering via ctts shenanigans if I'm not mistaken. > ASS demuxer sorts its packets because > there is no guarantee the text are sorted in the file So? I'm making a normative argument. /Tomas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input 2024-03-30 15:23 ` Tomas Härdin @ 2024-03-30 15:34 ` Nicolas George 2024-03-30 16:02 ` Andreas Rheinhardt 1 sibling, 0 replies; 20+ messages in thread From: Nicolas George @ 2024-03-30 15:34 UTC (permalink / raw) To: FFmpeg development discussions and patches Tomas Härdin (12024-03-30): > lavf's API provides no guarantees regarding presentation order It used to work, you are about to require new code from applications for it to work. That is an API break, and pretending otherwise like you do here is just a cop out. > "Supposed to" is doing a lot of work here. IIRC AVI is fundamentally > incapable of providing out-of-order anything, this is true (B-frames > being notably haram in AVI). It is however capable of providing poorly > muxed files. And these files are widely considered broken and annoying. > So? I'm making a normative argument. I do not know what you try to mean here and I do not want to know. I hope you realize you have no authority to decide which ASS file is valid and which is not. -- Nicolas George _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input 2024-03-30 15:23 ` Tomas Härdin 2024-03-30 15:34 ` Nicolas George @ 2024-03-30 16:02 ` Andreas Rheinhardt 2024-03-30 16:28 ` Tomas Härdin ` (2 more replies) 1 sibling, 3 replies; 20+ messages in thread From: Andreas Rheinhardt @ 2024-03-30 16:02 UTC (permalink / raw) To: ffmpeg-devel Tomas Härdin: > lör 2024-03-30 klockan 15:49 +0100 skrev Nicolas George: >> Tomas Härdin (12024-03-30): >>> Players can implement sorting if they wish. >> >> API break. > > lavf's API provides no guarantees regarding presentation order >> > >>> Finally I will note that sorting does not happen when subtitles are >>> muxed in say mkv or avi, so the behavior is not even consistent >>> across >>> demuxers that support subtitles. >> >> AVI or MKV demuxer do not sort their packets because the packets are >> supposed to be already sorted; > > "Supposed to" is doing a lot of work here. IIRC AVI is fundamentally > incapable of providing out-of-order anything, this is true (B-frames > being notably haram in AVI). It is however capable of providing poorly > muxed files. For example it is perfecectly legal in AVI to mux all > video, then all audio, rather than the typical case where audio and > video are interleaved (the I in AVI). The same goes for many formats. > MOV supports basically any ordering via ctts shenanigans if I'm not > mistaken. > 1. AVI does not have a way to signal pts, but you can simply store stuff with reordering in it (in coding order); you just need something (most likely a decoder) that can properly reorder the frames for presentation. 2. IIRC our AVI demuxer tries to properly interleave the packets returned by AVI even if the input file is non-interleaved; the same goes for mov/mp4 (where the index and not the file position is used). >> ASS demuxer sorts its packets because >> there is no guarantee the text are sorted in the file > > So? I'm making a normative argument. > Normative about what? The ASS specification [1] explicitly says: "SSA does not care what order events are entered in. They could be entered in complete reverse order, and SSA would still play everything correctly in the right order ie. you cannot assume that each dialogue line is in chronological order in the script file." If you force reordering on our users, then this is a breaking change and it would impair the usefulness of libavformat: If users have to implement workaround for issues of certain file formats (for files which are not even broken according to the specification of the format in question!), then what point is there in using libavformat at all? - Andreas [1]: www.tcax.org/docs/ass-specs.htm _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input 2024-03-30 16:02 ` Andreas Rheinhardt @ 2024-03-30 16:28 ` Tomas Härdin 2024-04-01 13:15 ` arch1t3cht 2024-04-01 14:34 ` Tomas Härdin 2 siblings, 0 replies; 20+ messages in thread From: Tomas Härdin @ 2024-03-30 16:28 UTC (permalink / raw) To: FFmpeg development discussions and patches lör 2024-03-30 klockan 17:02 +0100 skrev Andreas Rheinhardt: > > > ASS demuxer sorts its packets because > > > there is no guarantee the text are sorted in the file > > > > So? I'm making a normative argument. > > > > Normative about what? The ASS specification [1] explicitly says: > > "SSA does not care what order events are entered in. > > They could be entered in complete reverse order, and SSA would > still play everything correctly in the right order ie. you cannot > assume > that each dialogue line is in chronological order in the script > file." This describes what SSA does, not what lavf should do. lavf does not guarantee ordered subtitles in general, as far as I can tell. > If you force reordering on our users, then this is a breaking change Hence why we shouldn't put business logic in lavf. It would have been better to put it in ffmpeg.c. Not putting business logic in lavf is a point I've been making for years. Here we see an excellent reason why. We can maintain the current behavior by putting the sorting logic further up in lavf, assuming API users care. If API users don't care but CLI users do then we could put it in ffmpeg.c. If we are to maintain compatibility with an ill-defined set of API users rather than say specific packages (vlc, mpv, melt etc) then the only sensible solution is to put the sorting logic further up in lavf, say demux.c. /Tomas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input 2024-03-30 16:02 ` Andreas Rheinhardt 2024-03-30 16:28 ` Tomas Härdin @ 2024-04-01 13:15 ` arch1t3cht 2024-04-01 14:34 ` Tomas Härdin 2 siblings, 0 replies; 20+ messages in thread From: arch1t3cht @ 2024-04-01 13:15 UTC (permalink / raw) To: ffmpeg-devel On 30/03/2024 17:02, Andreas Rheinhardt wrote: > Tomas Härdin: >> lör 2024-03-30 klockan 15:49 +0100 skrev Nicolas George: >>> ASS demuxer sorts its packets because >>> there is no guarantee the text are sorted in the file >> >> So? I'm making a normative argument. >> > > Normative about what? The ASS specification [1] explicitly says: > > "SSA does not care what order events are entered in. > > [1]: www.tcax.org/docs/ass-specs.htm It's worth noting that this document, while often referred to as the ASS specification, is not actually an authoritative specification. The ASS format is implementation-defined: https://github.com/libass/libass/wiki/ASS-File-Format-Guide#preface-other-format-guidesspecification But of course this doesn't change the fact that the format allows events to be in any order. (Though this is also not the same as not caring at all what order events are entered in: While any order is valid, the order of events will affect their layering and collision detection.) _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input 2024-03-30 16:02 ` Andreas Rheinhardt 2024-03-30 16:28 ` Tomas Härdin 2024-04-01 13:15 ` arch1t3cht @ 2024-04-01 14:34 ` Tomas Härdin 2 siblings, 0 replies; 20+ messages in thread From: Tomas Härdin @ 2024-04-01 14:34 UTC (permalink / raw) To: FFmpeg development discussions and patches lör 2024-03-30 klockan 17:02 +0100 skrev Andreas Rheinhardt: > If you force reordering on our users, then this is a breaking change Another thing that bears keeping in mind: sorting isn't consistent even across demuxers that sort. Most sort by timestamp (SUB_SORT_TS_POS), but vobsub sorts by byte position (SUB_SORT_POS_TS). It does this so as to be able to merge split subtitles later, but it does then not sort by timestamp after merging, as far as I can tell. I'm a bit busy with other stuff in the coming days, but hopefully I should be able to knock out a prototype by the end of the week that still provides sorted subs when possible, except when the input is a stream (which doesn't work now anyways). This kind of nonsense is why being strict with input is a good idea. /Tomas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/3] lavf/subtitles: Do not eat \n\n 2024-03-28 22:55 [FFmpeg-devel] [PATCH 1/3] lavf/subtitles: Do not eat \n\n Tomas Härdin 2024-03-28 22:56 ` [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input Tomas Härdin @ 2024-03-28 22:57 ` Tomas Härdin 2024-03-28 23:06 ` [FFmpeg-devel] [PATCH 3/3] lavf/subtitles: Unfix ticket #5032 Tomas Härdin 2024-03-30 0:08 ` [FFmpeg-devel] [PATCH 1/2] lavf/subtitles: Add ff_text_peek_r16(), only accept \r, \n, \r\n and \r\r\n line endings Tomas Härdin 3 siblings, 0 replies; 20+ messages in thread From: Tomas Härdin @ 2024-03-28 22:57 UTC (permalink / raw) To: FFmpeg development discussions and patches [-- Attachment #1: Type: text/plain, Size: 42 bytes --] Oops, forgot to actually attach the patch [-- Attachment #2: 0001-lavf-subtitles-Do-not-eat-n-n.patch --] [-- Type: text/x-patch, Size: 1516 bytes --] From fccc13f728c50a676d20f3ca5b822fa1da60b259 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomas=20H=C3=A4rdin?= <git@haerdin.se> Date: Thu, 28 Mar 2024 20:30:37 +0100 Subject: [PATCH 1/3] lavf/subtitles: Do not eat \n\n --- libavformat/subtitles.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/libavformat/subtitles.c b/libavformat/subtitles.c index 3413763c7b..bda549abd0 100644 --- a/libavformat/subtitles.c +++ b/libavformat/subtitles.c @@ -446,11 +446,12 @@ int ff_subtitles_read_chunk(AVIOContext *pb, AVBPrint *buf) ptrdiff_t ff_subtitles_read_line(FFTextReader *tr, char *buf, size_t size) { size_t cur = 0; + unsigned char c; if (!size) return 0; buf[0] = '\0'; while (cur + 1 < size) { - unsigned char c = ff_text_r8(tr); + c = ff_text_r8(tr); if (!c) return ff_text_eof(tr) ? cur : AVERROR_INVALIDDATA; if (c == '\r' || c == '\n') @@ -458,9 +459,13 @@ ptrdiff_t ff_subtitles_read_line(FFTextReader *tr, char *buf, size_t size) buf[cur++] = c; buf[cur] = '\0'; } - while (ff_text_peek_r8(tr) == '\r') - ff_text_r8(tr); - if (ff_text_peek_r8(tr) == '\n') - ff_text_r8(tr); + // don't eat \n\n + if (c == '\r') { + // sub/ticket5032-rrn.srt has \r\r\n + while (ff_text_peek_r8(tr) == '\r') + ff_text_r8(tr); + if (ff_text_peek_r8(tr) == '\n') + ff_text_r8(tr); + } return cur; } -- 2.39.2 [-- Attachment #3: Type: text/plain, Size: 251 bytes --] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* [FFmpeg-devel] [PATCH 3/3] lavf/subtitles: Unfix ticket #5032 2024-03-28 22:55 [FFmpeg-devel] [PATCH 1/3] lavf/subtitles: Do not eat \n\n Tomas Härdin 2024-03-28 22:56 ` [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input Tomas Härdin 2024-03-28 22:57 ` [FFmpeg-devel] [PATCH 1/3] lavf/subtitles: Do not eat \n\n Tomas Härdin @ 2024-03-28 23:06 ` Tomas Härdin 2024-03-29 12:29 ` Tomas Härdin 2024-03-30 0:08 ` [FFmpeg-devel] [PATCH 1/2] lavf/subtitles: Add ff_text_peek_r16(), only accept \r, \n, \r\n and \r\r\n line endings Tomas Härdin 3 siblings, 1 reply; 20+ messages in thread From: Tomas Härdin @ 2024-03-28 23:06 UTC (permalink / raw) To: FFmpeg development discussions and patches [-- Attachment #1: Type: text/plain, Size: 776 bytes --] Obviously the most controversial patch in this set. I'd like to know what program exactly that produced the broken file in ticket #5032 so that we can send the guilty party to parsing jail More seriously, eating any run of CR CR CR... is incredibly broken and is part of the reason for the convoluted parsing logic in srtdec before this patchset. A compromise solution could be to peek two bytes forward and see if the line ending is CR CR LF. My guess is the file was produced by running unix2dos on a file with CR LF, thus converting the LF to CR LF. webvttdec is better with this, probably because these kinds of line ending is explicitly banned by the WebVTT spec. That is, CR CR LF is to be treated as two line endings, no ifs or buts about it. /Tomas [-- Attachment #2: 0003-lavf-subtitles-Unfix-ticket-5032.patch --] [-- Type: text/x-patch, Size: 1953 bytes --] From 784672af12da1d5fefeefad83cffa4b31f8b50fd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomas=20H=C3=A4rdin?= <git@haerdin.se> Date: Thu, 28 Mar 2024 23:30:06 +0100 Subject: [PATCH 3/3] lavf/subtitles: Unfix ticket #5032 This ticket should have been marked as wontfix so as to not encourage completely broken muxers. --- libavformat/subtitles.c | 10 ++-------- tests/fate/subtitles.mak | 3 --- 2 files changed, 2 insertions(+), 11 deletions(-) diff --git a/libavformat/subtitles.c b/libavformat/subtitles.c index bda549abd0..11fa89f5eb 100644 --- a/libavformat/subtitles.c +++ b/libavformat/subtitles.c @@ -459,13 +459,7 @@ ptrdiff_t ff_subtitles_read_line(FFTextReader *tr, char *buf, size_t size) buf[cur++] = c; buf[cur] = '\0'; } - // don't eat \n\n - if (c == '\r') { - // sub/ticket5032-rrn.srt has \r\r\n - while (ff_text_peek_r8(tr) == '\r') - ff_text_r8(tr); - if (ff_text_peek_r8(tr) == '\n') - ff_text_r8(tr); - } + if (c == '\r' && ff_text_peek_r8(tr) == '\n') + ff_text_r8(tr); return cur; } diff --git a/tests/fate/subtitles.mak b/tests/fate/subtitles.mak index 90412e9ac1..940cd9a9ec 100644 --- a/tests/fate/subtitles.mak +++ b/tests/fate/subtitles.mak @@ -73,9 +73,6 @@ fate-sub-srt: CMD = fmtstdout ass -i $(TARGET_SAMPLES)/sub/SubRip_capability_tes FATE_SUBTITLES_ASS-$(call DEMDEC, SRT, SUBRIP) += fate-sub-srt-badsyntax fate-sub-srt-badsyntax: CMD = fmtstdout ass -i $(TARGET_SAMPLES)/sub/badsyntax.srt -FATE_SUBTITLES-$(call ALLYES, SRT_DEMUXER SUBRIP_DECODER SRT_MUXER) += fate-sub-srt-rrn-remux -fate-sub-srt-rrn-remux: CMD = fmtstdout srt -i $(TARGET_SAMPLES)/sub/ticket5032-rrn.srt -c:s copy - FATE_SUBTITLES-$(call ALLYES, SRT_DEMUXER SUBRIP_DECODER SRT_MUXER) += fate-sub-srt-madness-timeshift fate-sub-srt-madness-timeshift: CMD = fmtstdout srt -itsoffset 3.14 -i $(TARGET_SAMPLES)/sub/madness.srt -c:s copy -- 2.39.2 [-- Attachment #3: Type: text/plain, Size: 251 bytes --] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [FFmpeg-devel] [PATCH 3/3] lavf/subtitles: Unfix ticket #5032 2024-03-28 23:06 ` [FFmpeg-devel] [PATCH 3/3] lavf/subtitles: Unfix ticket #5032 Tomas Härdin @ 2024-03-29 12:29 ` Tomas Härdin 0 siblings, 0 replies; 20+ messages in thread From: Tomas Härdin @ 2024-03-29 12:29 UTC (permalink / raw) To: FFmpeg development discussions and patches [-- Attachment #1: Type: text/plain, Size: 116 bytes --] Here's an alternative solution which maintains support for \r\r\n by peeking two bytes into the future. /Tomas [-- Attachment #2: 0001-lavf-subtitles-Add-ff_text_peek_r16-accept-r-n-r-n-a.patch --] [-- Type: text/x-patch, Size: 2906 bytes --] From ed6f0b2e6c8e52574fcfdac73fcfca560434c079 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomas=20H=C3=A4rdin?= <git@haerdin.se> Date: Thu, 28 Mar 2024 23:30:06 +0100 Subject: [PATCH] lavf/subtitles: Add ff_text_peek_r16(), accept \r, \n, \r\n and \r\r\n line endings --- libavformat/subtitles.c | 46 +++++++++++++++++++++++++++++++++++++---- libavformat/subtitles.h | 5 +++++ 2 files changed, 47 insertions(+), 4 deletions(-) diff --git a/libavformat/subtitles.c b/libavformat/subtitles.c index bda549abd0..01187df6ab 100644 --- a/libavformat/subtitles.c +++ b/libavformat/subtitles.c @@ -22,6 +22,7 @@ #include "subtitles.h" #include "avio_internal.h" #include "libavutil/avstring.h" +#include "libavutil/intreadwrite.h" void ff_text_init_avio(void *s, FFTextReader *r, AVIOContext *pb) { @@ -106,6 +107,42 @@ int ff_text_peek_r8(FFTextReader *r) return c; } +int ff_text_peek_r16(FFTextReader *r) +{ + int c1, c2; + if (r->buf_pos < r->buf_len - 1) + return AV_RB16(&r->buf[r->buf_pos]); + + // missing one or two bytes + c1 = ff_text_r8(r); + if (avio_feof(r->pb)) + return 0; + + if (r->buf_pos == r->buf_len - 1) { + // missing one byte + r->buf[0] = r->buf[r->buf_pos]; + r->buf[1] = c1; + r->buf_pos = 0; + r->buf_len = 2; + return AV_RB16(r->buf); + } + + // missing two bytes + c2 = ff_text_r8(r); + if (avio_feof(r->pb)) { + r->buf[0] = c1; + r->buf_pos = 0; + r->buf_len = 1; + return 0; + } + + r->buf[0] = c1; + r->buf[1] = c2; + r->buf_pos = 0; + r->buf_len = 2; + return AV_RB16(r->buf); +} + AVPacket *ff_subtitles_queue_insert(FFDemuxSubtitlesQueue *q, const uint8_t *event, size_t len, int merge) { @@ -459,13 +496,14 @@ ptrdiff_t ff_subtitles_read_line(FFTextReader *tr, char *buf, size_t size) buf[cur++] = c; buf[cur] = '\0'; } - // don't eat \n\n if (c == '\r') { - // sub/ticket5032-rrn.srt has \r\r\n - while (ff_text_peek_r8(tr) == '\r') - ff_text_r8(tr); if (ff_text_peek_r8(tr) == '\n') ff_text_r8(tr); + else if (ff_text_peek_r16(tr) == AV_RB16("\r\n")) { + // ticket5032-rrn.srt has \r\r\n + ff_text_r8(tr); + ff_text_r8(tr); + } } return cur; } diff --git a/libavformat/subtitles.h b/libavformat/subtitles.h index 88665663c5..2a92044976 100644 --- a/libavformat/subtitles.h +++ b/libavformat/subtitles.h @@ -94,6 +94,11 @@ int ff_text_eof(FFTextReader *r); */ int ff_text_peek_r8(FFTextReader *r); +/** + * Like ff_text_peek_r8(), but peek two bytes and return them as a big-endian number. + */ +int ff_text_peek_r16(FFTextReader *r); + /** * Read the given number of bytes (in UTF-8). On error or EOF, \0 bytes are * written. -- 2.39.2 [-- Attachment #3: Type: text/plain, Size: 251 bytes --] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
* [FFmpeg-devel] [PATCH 1/2] lavf/subtitles: Add ff_text_peek_r16(), only accept \r, \n, \r\n and \r\r\n line endings 2024-03-28 22:55 [FFmpeg-devel] [PATCH 1/3] lavf/subtitles: Do not eat \n\n Tomas Härdin ` (2 preceding siblings ...) 2024-03-28 23:06 ` [FFmpeg-devel] [PATCH 3/3] lavf/subtitles: Unfix ticket #5032 Tomas Härdin @ 2024-03-30 0:08 ` Tomas Härdin 3 siblings, 0 replies; 20+ messages in thread From: Tomas Härdin @ 2024-03-30 0:08 UTC (permalink / raw) To: FFmpeg development discussions and patches [-- Attachment #1: Type: text/plain, Size: 505 bytes --] Here's an alternative first patch that rolls patch 1+3 into one. I'd like some feedback on this before I continue hacking on patch 2. While I don't like that we accept any old broken srt file, especially without knowing what software made it, I'm not completely opposed to compromising in this specific case. But I'd rather we didn't, and stuck to \r, \n and \r\n. What I really don't want is runs of \r being eaten without being "terminated" by a \n, because this messes up Mac support. /Tomas [-- Attachment #2: 0001-lavf-subtitles-Add-ff_text_peek_r16-only-accept-r-n-.patch --] [-- Type: text/x-patch, Size: 3379 bytes --] From 2ec68c51e4599b8493a2e103793f571451d872d3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomas=20H=C3=A4rdin?= <git@haerdin.se> Date: Thu, 28 Mar 2024 20:30:37 +0100 Subject: [PATCH 1/2] lavf/subtitles: Add ff_text_peek_r16(), only accept \r, \n, \r\n and \r\r\n line endings --- libavformat/subtitles.c | 53 +++++++++++++++++++++++++++++++++++++---- libavformat/subtitles.h | 5 ++++ 2 files changed, 53 insertions(+), 5 deletions(-) diff --git a/libavformat/subtitles.c b/libavformat/subtitles.c index 3413763c7b..01187df6ab 100644 --- a/libavformat/subtitles.c +++ b/libavformat/subtitles.c @@ -22,6 +22,7 @@ #include "subtitles.h" #include "avio_internal.h" #include "libavutil/avstring.h" +#include "libavutil/intreadwrite.h" void ff_text_init_avio(void *s, FFTextReader *r, AVIOContext *pb) { @@ -106,6 +107,42 @@ int ff_text_peek_r8(FFTextReader *r) return c; } +int ff_text_peek_r16(FFTextReader *r) +{ + int c1, c2; + if (r->buf_pos < r->buf_len - 1) + return AV_RB16(&r->buf[r->buf_pos]); + + // missing one or two bytes + c1 = ff_text_r8(r); + if (avio_feof(r->pb)) + return 0; + + if (r->buf_pos == r->buf_len - 1) { + // missing one byte + r->buf[0] = r->buf[r->buf_pos]; + r->buf[1] = c1; + r->buf_pos = 0; + r->buf_len = 2; + return AV_RB16(r->buf); + } + + // missing two bytes + c2 = ff_text_r8(r); + if (avio_feof(r->pb)) { + r->buf[0] = c1; + r->buf_pos = 0; + r->buf_len = 1; + return 0; + } + + r->buf[0] = c1; + r->buf[1] = c2; + r->buf_pos = 0; + r->buf_len = 2; + return AV_RB16(r->buf); +} + AVPacket *ff_subtitles_queue_insert(FFDemuxSubtitlesQueue *q, const uint8_t *event, size_t len, int merge) { @@ -446,11 +483,12 @@ int ff_subtitles_read_chunk(AVIOContext *pb, AVBPrint *buf) ptrdiff_t ff_subtitles_read_line(FFTextReader *tr, char *buf, size_t size) { size_t cur = 0; + unsigned char c; if (!size) return 0; buf[0] = '\0'; while (cur + 1 < size) { - unsigned char c = ff_text_r8(tr); + c = ff_text_r8(tr); if (!c) return ff_text_eof(tr) ? cur : AVERROR_INVALIDDATA; if (c == '\r' || c == '\n') @@ -458,9 +496,14 @@ ptrdiff_t ff_subtitles_read_line(FFTextReader *tr, char *buf, size_t size) buf[cur++] = c; buf[cur] = '\0'; } - while (ff_text_peek_r8(tr) == '\r') - ff_text_r8(tr); - if (ff_text_peek_r8(tr) == '\n') - ff_text_r8(tr); + if (c == '\r') { + if (ff_text_peek_r8(tr) == '\n') + ff_text_r8(tr); + else if (ff_text_peek_r16(tr) == AV_RB16("\r\n")) { + // ticket5032-rrn.srt has \r\r\n + ff_text_r8(tr); + ff_text_r8(tr); + } + } return cur; } diff --git a/libavformat/subtitles.h b/libavformat/subtitles.h index 88665663c5..2a92044976 100644 --- a/libavformat/subtitles.h +++ b/libavformat/subtitles.h @@ -94,6 +94,11 @@ int ff_text_eof(FFTextReader *r); */ int ff_text_peek_r8(FFTextReader *r); +/** + * Like ff_text_peek_r8(), but peek two bytes and return them as a big-endian number. + */ +int ff_text_peek_r16(FFTextReader *r); + /** * Read the given number of bytes (in UTF-8). On error or EOF, \0 bytes are * written. -- 2.39.2 [-- Attachment #3: Type: text/plain, Size: 251 bytes --] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2024-04-01 14:34 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-03-28 22:55 [FFmpeg-devel] [PATCH 1/3] lavf/subtitles: Do not eat \n\n Tomas Härdin 2024-03-28 22:56 ` [FFmpeg-devel] [PATCH 2/3] lavf/srtdec: Permit streaming input Tomas Härdin 2024-03-28 22:57 ` Tomas Härdin 2024-03-29 23:35 ` Michael Niedermayer 2024-03-30 0:03 ` Tomas Härdin 2024-03-30 8:31 ` Tomas Härdin 2024-03-30 11:36 ` Paul B Mahol 2024-03-30 11:44 ` Nicolas George 2024-03-30 14:44 ` Tomas Härdin 2024-03-30 14:49 ` Nicolas George 2024-03-30 15:23 ` Tomas Härdin 2024-03-30 15:34 ` Nicolas George 2024-03-30 16:02 ` Andreas Rheinhardt 2024-03-30 16:28 ` Tomas Härdin 2024-04-01 13:15 ` arch1t3cht 2024-04-01 14:34 ` Tomas Härdin 2024-03-28 22:57 ` [FFmpeg-devel] [PATCH 1/3] lavf/subtitles: Do not eat \n\n Tomas Härdin 2024-03-28 23:06 ` [FFmpeg-devel] [PATCH 3/3] lavf/subtitles: Unfix ticket #5032 Tomas Härdin 2024-03-29 12:29 ` Tomas Härdin 2024-03-30 0:08 ` [FFmpeg-devel] [PATCH 1/2] lavf/subtitles: Add ff_text_peek_r16(), only accept \r, \n, \r\n and \r\r\n line endings Tomas Härdin
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git