From: Oneric <oneric@oneric.de> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes Date: Fri, 4 Feb 2022 22:52:08 +0100 Message-ID: <Yf2gCH+2EzYdhUqy@dismail.de> (raw) In-Reply-To: <DM8P223MB03658E1710E3005D39CE5BC9BA299@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM> <AM7PR03MB6660AA28DB9F9B2895314CCE8F299@AM7PR03MB6660.eurprd03.prod.outlook.com> On Fri, Feb 04, 2022 at 02:30:37 +0100, Andreas Rheinhardt wrote: > All text-based subtitles are supposed to be UTF-8 when they reach the > decoder; if it isn't, the user has to set the appropriate -sub_charenc > and -sub_charenc_mode. > > - Andreas Thanks for the info! Then at least the UTF-8 assumption is no problem after all. On Fri, Feb 04, 2022 at 01:57:48 +0000, Soft Works wrote: > > There's no way of knowing whether the word-joiner comes from > > a conversion performed by ffmpeg in the past or already existed > > in the original source. > > That might be true, but I think it's valid to say that such characters > are very unusual "original" subtitle sources and that's why I don't > think it's a good idea for ffmpeg to start injecting them. Don't underestimate what subtitle authors can come up with :) > Subtitle implementations are often rather minimal, especially in > hardware devices and might not always cover the full range of > UTF-8 specifics. The wordjoiner lies in the Basic Multilingual Plane, so even ancient UTF-8 implementations assuming all of Unicode’s codepoints fit in 16bits (i.e. 3-bytes max per codepoint in UTF-8) will be able to understand it. > > However, the wordjoiner does not alter the visually appearance and > > is unlikely to change line-breaking properties; that's why I chose > > a word-joiner. Therefore I don't think removing (only) the inserted > > word-joiners is possible, > > Why not? As it seems to be required for ASS encoding only, all other > output formats should remain unaffected. Because — as written before — it can exist in the original source. Unicode recommends using the wordjoiner eg to prevent linebreaks between two characters without any additional side-effects as eg the combining-grapheme-joiner would cause. > > but also not necessary. > > I'm not sure whether all ffmpeg text-sub encoders can handle > those chars - which could be verified of course. Since it's in the BMP and ffmpeg already seems happy to assume some UTF-8 support by converting everything to it, I'm not worried about this until proven wrong. > Finally, those chars are a pest. I'm using them myself for a > specific use case, but when you don't know they are there, it can > drive you totally mad, eventually even thinking your system or > software is faulty. > > Example: > > Open your patch file [2/2] and search for the string > "123456\NAscending". You can see the string in two lines, but search > will only find one of them. > > Or just look at the two lines directly. They are preceded by + and - > even though both appear identical. Actually, I see this with helpful colouring lost here: -Dialogue: 0,0:00:55.00,0:01:00.00,Default,,0,0,0,,Descending: 123456\NAscending: 123456^M +Dialogue: 0,0:00:55.00,0:01:00.00,Default,,0,0,0,,Descending: <200f>123456<200e>\NAscending: 123456^M More plain-text oriented editors likely won't show them though, yes. On this topic, finding raw bidi-marks in ASS subtitles for RTL-languages is not that unusual though, to give an example for "invisible characters" being used manually in the original source. (Because VSFilters (and libass in the interest of compatibility) assumes LTR by default and other things) Even if I thought removing all wordjoiners when converting from ASS was a good idea, I still wouldn't know where to do this (or where to look to remove possibly lingering attempts to recollapse \\ into \). _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2022-02-04 21:52 UTC|newest] Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-01-16 18:16 Oneric 2022-01-16 18:16 ` [FFmpeg-devel] [PATCH 2/2] avcodec/webvttdec: honour bidi marks Oneric 2022-02-01 17:38 ` [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes Oneric 2022-02-01 19:44 ` Soft Works 2022-02-01 20:06 ` Oneric 2022-02-01 20:41 ` Soft Works 2022-02-01 23:25 ` Oneric 2022-02-02 4:44 ` Soft Works 2022-02-02 17:03 ` Oneric 2022-02-02 22:18 ` Soft Works 2022-02-02 22:44 ` Soft Works 2022-02-03 2:11 ` Oneric 2022-02-03 20:51 ` Soft Works 2022-02-04 1:01 ` Oneric 2022-02-04 1:30 ` Andreas Rheinhardt 2022-02-04 21:52 ` Oneric [this message] 2022-02-04 23:24 ` Soft Works 2022-02-05 1:20 ` Oneric 2022-02-05 2:08 ` Soft Works 2022-02-05 21:59 ` Oneric 2022-02-06 1:08 ` Soft Works 2022-02-06 1:37 ` Soft Works 2022-02-04 1:57 ` Soft Works 2022-02-04 5:34 ` Soft Works 2022-02-04 5:59 ` Soft Works 2022-02-04 6:48 ` Soft Works 2022-02-04 21:19 ` Oneric 2022-02-04 22:23 ` Soft Works
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=Yf2gCH+2EzYdhUqy@dismail.de \ --to=oneric@oneric.de \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git