From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 8F560453BE for ; Fri, 27 Jan 2023 17:21:05 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D475568BE6F; Fri, 27 Jan 2023 19:21:03 +0200 (EET) Received: from mailrelay.ngus.net (mailrelay.ngus.net [109.237.26.52]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B1C4D68BDE8 for ; Fri, 27 Jan 2023 19:20:57 +0200 (EET) Received: from cpc102338-sgyl38-2-0-cust655.18-2.cable.virginm.net ([77.102.34.144] helo=zebop) by mailrelay.ngus.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1pLSPI-00BuUg-IM for ffmpeg-devel@ffmpeg.org; Fri, 27 Jan 2023 17:20:56 +0000 Received: from [192.168.101.3] (helo=localhost.localdomain) by zebop with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1pLSPE-004QtN-Av; Fri, 27 Jan 2023 17:20:56 +0000 From: Tim Angus To: ffmpeg-devel@ffmpeg.org Date: Fri, 27 Jan 2023 17:20:46 +0000 Message-Id: <20230127172047.1024276-1-tim@ngus.net> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Spam-Score: -2.9 (--) Subject: [FFmpeg-devel] [PATCH 0/1] Handle ASS format subtitle encoding ambiguity X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Tim Angus Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Some matroska files embed ASS format subtitles. The header for said subtitles include the header for the subtitle stream in the "codec private data" section. It's not clear whether the last byte of this data is supposed to be 0, i.e. a null terminator for the string data. Among other tools, older versions of Handbrake do include the null terminator and there are many files out in the wild created using Handbrake. Using ffmpeg to extract subtitles for such a file, this header is copied directly to the output file, including the null terminator, if it was present. This results in a file in which there is a null terminator after the header, but preceeding the actual content of the subtitle file. Obviously this is not correct. As a data point, of the ~600 mkvs I have locally, 22 of them have ASS subtitles, and of them 20 include the null terminator, so it doesn't appear to be a rare phenomenon. As another data point, the tool mkvextract from mkvtoolnix avoids the ambiguity by first assuming that the source buffer is *not* null terminated, and then manually adding a (possibly second) null terminator. The buffer is then interpreted as a null terminated string and processed that way. (https://gitlab.com/mbunkus/mkvtoolnix/-/blob/main/src/extract/xtr_textsubs.cpp#L117) My change here refactors the way the output file is created, by treating the source buffer as a string. I posted an earlier iteration of this patch to the list that dealt with the extra null explicitly, but following some #ffmpeg-devel discussion, I suggested I rework it to work more like how mkvtoolnix does things, and this was positively received, so here we are. FATE succeeds as there are no mkvs in the suite that have ASS subtitles embedded. A small test file that exhibits the problem may be found here: https://0x0.st/oFTT.mkv Tim Angus (1): avformat/assenc: avoid incorrect copy of null terminator libavformat/assenc.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) -- 2.25.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".