* [FFmpeg-devel] [PATCH 0/1] Handle ASS format subtitle encoding ambiguity @ 2023-01-27 17:20 Tim Angus 2023-01-27 17:20 ` [FFmpeg-devel] [PATCH 1/1] avformat/assenc: avoid incorrect copy of null terminator Tim Angus 0 siblings, 1 reply; 3+ messages in thread From: Tim Angus @ 2023-01-27 17:20 UTC (permalink / raw) To: ffmpeg-devel; +Cc: Tim Angus Some matroska files embed ASS format subtitles. The header for said subtitles include the header for the subtitle stream in the "codec private data" section. It's not clear whether the last byte of this data is supposed to be 0, i.e. a null terminator for the string data. Among other tools, older versions of Handbrake do include the null terminator and there are many files out in the wild created using Handbrake. Using ffmpeg to extract subtitles for such a file, this header is copied directly to the output file, including the null terminator, if it was present. This results in a file in which there is a null terminator after the header, but preceeding the actual content of the subtitle file. Obviously this is not correct. As a data point, of the ~600 mkvs I have locally, 22 of them have ASS subtitles, and of them 20 include the null terminator, so it doesn't appear to be a rare phenomenon. As another data point, the tool mkvextract from mkvtoolnix avoids the ambiguity by first assuming that the source buffer is *not* null terminated, and then manually adding a (possibly second) null terminator. The buffer is then interpreted as a null terminated string and processed that way. (https://gitlab.com/mbunkus/mkvtoolnix/-/blob/main/src/extract/xtr_textsubs.cpp#L117) My change here refactors the way the output file is created, by treating the source buffer as a string. I posted an earlier iteration of this patch to the list that dealt with the extra null explicitly, but following some #ffmpeg-devel discussion, I suggested I rework it to work more like how mkvtoolnix does things, and this was positively received, so here we are. FATE succeeds as there are no mkvs in the suite that have ASS subtitles embedded. A small test file that exhibits the problem may be found here: https://0x0.st/oFTT.mkv Tim Angus (1): avformat/assenc: avoid incorrect copy of null terminator libavformat/assenc.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) -- 2.25.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 3+ messages in thread
* [FFmpeg-devel] [PATCH 1/1] avformat/assenc: avoid incorrect copy of null terminator 2023-01-27 17:20 [FFmpeg-devel] [PATCH 0/1] Handle ASS format subtitle encoding ambiguity Tim Angus @ 2023-01-27 17:20 ` Tim Angus 2023-02-08 15:00 ` Tim Angus 0 siblings, 1 reply; 3+ messages in thread From: Tim Angus @ 2023-01-27 17:20 UTC (permalink / raw) To: ffmpeg-devel; +Cc: Tim Angus When writing a subtitle SSA/ASS subtitle file, the AVCodecParameters::extradata buffer is written directly to the output. In the case where the buffer is filled from a matroska source file produced by some older versions of Handbrake, this buffer ends with a null terminating character, which is then erroneously copied into the middle of the output file. The refactoring here avoids this problem by copying the source buffer, manually null terminating it, then treating it as a string rather than a raw buffer. This way it is agnostic as to whether the source buffer was null terminated or not. Signed-off-by: Tim Angus <tim@ngus.net> --- libavformat/assenc.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/libavformat/assenc.c b/libavformat/assenc.c index 1600f0a02b..4c9ea6f982 100644 --- a/libavformat/assenc.c +++ b/libavformat/assenc.c @@ -24,6 +24,7 @@ #include "internal.h" #include "libavutil/opt.h" +#include "libavutil/mem.h" typedef struct DialogueLine { int readorder; @@ -55,6 +56,7 @@ static int write_header(AVFormatContext *s) avpriv_set_pts_info(s->streams[0], 64, 1, 100); if (par->extradata_size > 0) { size_t header_size = par->extradata_size; + char *header_string = NULL; uint8_t *trailer = strstr(par->extradata, "\n[Events]"); if (trailer) @@ -69,9 +71,20 @@ static int write_header(AVFormatContext *s) ass->trailer = trailer; } - avio_write(s->pb, par->extradata, header_size); - if (par->extradata[header_size - 1] != '\n') - avio_write(s->pb, "\r\n", 2); + header_string = av_malloc(header_size + 1); + if (!header_string) + return AVERROR(ENOMEM); + + memcpy(header_string, par->extradata, header_size); + header_string[header_size] = 0; + + avio_printf(s->pb, "%s", header_string); + + if (header_string[strlen(header_string) - 1] != '\n') + avio_printf(s->pb, "\r\n"); + + av_free(header_string); + ass->ssa_mode = !strstr(par->extradata, "\n[V4+ Styles]"); if (!strstr(par->extradata, "\n[Events]")) avio_printf(s->pb, "[Events]\r\nFormat: %s, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text\r\n", -- 2.25.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/1] avformat/assenc: avoid incorrect copy of null terminator 2023-01-27 17:20 ` [FFmpeg-devel] [PATCH 1/1] avformat/assenc: avoid incorrect copy of null terminator Tim Angus @ 2023-02-08 15:00 ` Tim Angus 0 siblings, 0 replies; 3+ messages in thread From: Tim Angus @ 2023-02-08 15:00 UTC (permalink / raw) To: ffmpeg-devel On 27/01/2023 17:20, Tim Angus wrote: > When writing a subtitle SSA/ASS subtitle file, the > AVCodecParameters::extradata buffer is written directly to the output. > In the case where the buffer is filled from a matroska source file > produced by some older versions of Handbrake, this buffer ends with a > null terminating character, which is then erroneously copied into the > middle of the output file. The refactoring here avoids this problem by > copying the source buffer, manually null terminating it, then treating > it as a string rather than a raw buffer. This way it is agnostic as to > whether the source buffer was null terminated or not. > Could somebody give this a look please? Thanks. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-02-08 15:01 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-01-27 17:20 [FFmpeg-devel] [PATCH 0/1] Handle ASS format subtitle encoding ambiguity Tim Angus 2023-01-27 17:20 ` [FFmpeg-devel] [PATCH 1/1] avformat/assenc: avoid incorrect copy of null terminator Tim Angus 2023-02-08 15:00 ` Tim Angus
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git