From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 16CFD46927 for ; Sat, 23 Sep 2023 23:21:17 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B831168C9CF; Sun, 24 Sep 2023 02:21:09 +0300 (EEST) Received: from mail-oa1-f54.google.com (mail-oa1-f54.google.com [209.85.160.54]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2FC5E68C94A for ; Sun, 24 Sep 2023 02:21:02 +0300 (EEST) Received: by mail-oa1-f54.google.com with SMTP id 586e51a60fabf-1dce0c05171so1252655fac.3 for ; Sat, 23 Sep 2023 16:21:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695511259; x=1696116059; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=aJjEi/C2e0UJXadTOijpcXsfw9kDKMijgTUiOlisogU=; b=kbR4K1ObOTt/2xkCP6BQV03PFcsCmf0FYFnn7XnlrlkdsZlojhFw/aaMYkOOdpXUPz aYt1rtOfItfBW9gHv6QPjXNiJTw6UfY1rg/jyJ787D2h++tQ5wLCEFDLDZ6vH38lI+GA Sr0UX7jJvqJr/RlNJ6DtaSh32Z5/HcbECNix3y84xyi3uPPZOK2Jk+PKMzhOy0p4ISit TMNidxfO5jGOhY5sY9ILN3fvK+qMpIgE0OEb70m8eUn3o8iMNdB3SOXid1tPlpP0wvur +cGjDGh44G7xzOadapsJ76YFikbNVYWI9tx70Jcug/R3TG2AYBN8m3C8aWVre5WbyDU+ XlQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695511259; x=1696116059; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=aJjEi/C2e0UJXadTOijpcXsfw9kDKMijgTUiOlisogU=; b=dMCGukJlSOhCM/lcMHUH+R9DbPoS96laFfx2yZGDZUCsJmIV3h7M1P7wRzoZZlR9Dw j9U4kP6xSNqaIrBhxL77goHh7VdaHG3WnBxamgOHtVdpstacBvC6kXRogDK61I3eyxwz K2c3H/TM+sKroad6EY/KNv6AfOz9eUWRFnsWZLjeDxcDW+LX0qMbUJRm6uIQV9b8NFyN +IBLup9CEUp2oLOYAJMgbXEFI0koJ70JdsKuiUVN+BMepRFYAYQ8FtlJ8XqGnR2r/tEz V5BUVK/VScdHzj9V73oyBS88vYWqlDtPiDuL6aonoST6A5ur6ulq603dlpV8LDkJpacL 55+w== X-Gm-Message-State: AOJu0Yzbmm2+s5ea8s8XfokYzWsGK1aBQAK7hzO8TuufQ7ZT4xoVM/Yo duFv9p5W/TgR73vOGeH/t1FUOGuaW/E= X-Google-Smtp-Source: AGHT+IFICU3CHrkB2ne+NfYDU3QdMeagNar4WWQFdiIBk1LhFquJSIBR/ucHnSH9CHoSx6y4GfaJqA== X-Received: by 2002:a05:6870:a256:b0:1d6:96f9:66fa with SMTP id g22-20020a056870a25600b001d696f966famr4185126oai.54.1695511259587; Sat, 23 Sep 2023 16:20:59 -0700 (PDT) Received: from localhost.localdomain ([103.194.71.93]) by smtp.gmail.com with ESMTPSA id j12-20020a63b60c000000b00577d53c50f7sm4677479pgf.75.2023.09.23.16.20.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 23 Sep 2023 16:20:59 -0700 (PDT) From: llyyr To: ffmpeg-devel@ffmpeg.org Date: Sun, 24 Sep 2023 04:50:49 +0530 Message-ID: <20230923232049.14119-1-llyyr.public@gmail.com> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] avformat/subtitles: check for double BOM in UTF-16 files X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: llyyr Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: While these files certainly aren't the norm, and might not even be considered valid by many programs, there are plenty of older ASS tracks in UTF-16 LE/BE encoding that contain double BOMs. This patch teaches ff_text_init_avio about double BOMs and makes it check for them in UTF-16 LE/BE files. This works by reading two more bytes after the first BOM check, and seeking back if a second BOM doesn't exist. If it does exist, we simply procede with buf_pos two bytes ahead. While this hack could certainly live in assdec.c, and would be much simpler that way, there certainly isn't any harm in allowing other subtitle format readers to be aware of double BOMs too. --- libavformat/subtitles.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/libavformat/subtitles.c b/libavformat/subtitles.c index 3413763c7b..a7b83cbb69 100644 --- a/libavformat/subtitles.c +++ b/libavformat/subtitles.c @@ -44,6 +44,20 @@ void ff_text_init_avio(void *s, FFTextReader *r, AVIOContext *pb) r->buf_pos += 3; } } + if (r->type != FF_UTF_8) { + // Check for double BOM in UTF-16 LE/BE files + for (i = 0; i < 2; i++) + r->buf[r->buf_len++] = avio_r8(r->pb); + if (strncmp("\xFF\xFE\xFF\xFE", r->buf, 4) == 0 || + strncmp("\xFE\xFF\xFE\xFF", r->buf, 4) == 0) { + // We did find a second BOM, so move buf_pos two bytes ahead + r->buf_pos += 2; + } else { + // We did not find a second BOM, undo the seek + r->buf_len -= 2; + avio_seek(r->pb, -2, SEEK_CUR); // Seek back two bytes + } + } if (s && (r->type == FF_UTF16LE || r->type == FF_UTF16BE)) av_log(s, AV_LOG_INFO, "UTF16 is automatically converted to UTF8, do not specify a character encoding\n"); -- 2.42.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".