From: Oneric <oneric@oneric.de> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes Date: Sat, 5 Feb 2022 02:20:16 +0100 Message-ID: <Yf3Q0JOl3dEsY0Li@oneric.de> (raw) In-Reply-To: <DM8P223MB0365A113498D1AF8CD5EDEBFBA299@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM> On Fri, Feb 04, 2022 at 23:24:58 +0000, Soft Works wrote: > You want to "pollute" gazillions of subtitle streams in the > world from multiple subtitle formats with invisible > characters in order to solve an escaping problem in ffmpeg? I do not consider using characters that are explicitly recommended to be used by Unicode to be “polluting”. Further consider that as mentioned invisible characters in ASS are not uncommon anyway already and conversion from ASS to something else are rare due to being generally lossy. Lossy with regards to typesetting that is, removing breaking hints in form of plain Unicode characters would be a new form of lossyness. > [From the other mail:] > I'm not into changing ffmpeg's ass output, it's all > about the internally used ass format and the escaping is > a central problem there. I’m not interested in reworking ffmpeg’s internal subtitle handling. The proposed patch is a clear improvement over the status quo which is plain incorrect. Within reasonable effort and sound arguments for it adjustments to the patch can be made; reworking ffmpeg internals is imo not “reasonable” effort to correct an uncontestedly wrong escape. You have two options: Either finally tell me what I asked about: where (as in which file and function) removing wordjoiners should even happen and where possible lingering “\\ → \” conversions presumably are and if it’s simple enough I can add a removal accompanied by a comment pointing out that this can go wrong. Or go ahead and create your own patch. ~~~~~~ > > > I'm not sure whether all ffmpeg text-sub encoders can handle > > > those chars - which could be verified of course. > > > > Since it's in the BMP and ffmpeg already seems happy to assume some > > UTF-8 > > support by converting everything to it, I'm not worried about this > > until > > proven wrong. > > Proven wrong: https://github.com/libass/libass/issues/507 This issue is not at all wordjoiner specific despite the name. As far as I recall this never lead to wrong rendering. With HarfBuzz, the only fully featured shaping backend of libass, control characters were and are handled by HarfBuzz. And even with FriBiDi U+2060 was ignored since long before (2012) the linked issue was opened. What that issue really is about is a combination of two more general issues. libass is currently not caching failure to lookup a glyph leading to multiple messages and at worst a perf degradation if no font on the font pool contained a glyph for a particular glyph. And the realisation that libass’ font-fallback strategy is not ideal for prefix-type control characters, characters which visibly affect both neighbours and a few others. The word-joiner is only highlighted here as due to its usage as an backslash escape its commonly passed to libass and a high enough percentage of fonts doesn’t contain it to create reports about it. For further reference: U+2060 was added in Unicode 3.2 released 2002. If you want to strip it because it might not render correctly you should also strip most emoji, the uppercase eszett ẞ and several actively used writing systems in their entirety. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2022-02-05 1:20 UTC|newest] Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-01-16 18:16 Oneric 2022-01-16 18:16 ` [FFmpeg-devel] [PATCH 2/2] avcodec/webvttdec: honour bidi marks Oneric 2022-02-01 17:38 ` [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes Oneric 2022-02-01 19:44 ` Soft Works 2022-02-01 20:06 ` Oneric 2022-02-01 20:41 ` Soft Works 2022-02-01 23:25 ` Oneric 2022-02-02 4:44 ` Soft Works 2022-02-02 17:03 ` Oneric 2022-02-02 22:18 ` Soft Works 2022-02-02 22:44 ` Soft Works 2022-02-03 2:11 ` Oneric 2022-02-03 20:51 ` Soft Works 2022-02-04 1:01 ` Oneric 2022-02-04 1:30 ` Andreas Rheinhardt 2022-02-04 21:52 ` Oneric 2022-02-04 23:24 ` Soft Works 2022-02-05 1:20 ` Oneric [this message] 2022-02-05 2:08 ` Soft Works 2022-02-05 21:59 ` Oneric 2022-02-06 1:08 ` Soft Works 2022-02-06 1:37 ` Soft Works 2022-02-04 1:57 ` Soft Works 2022-02-04 5:34 ` Soft Works 2022-02-04 5:59 ` Soft Works 2022-02-04 6:48 ` Soft Works 2022-02-04 21:19 ` Oneric 2022-02-04 22:23 ` Soft Works
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=Yf3Q0JOl3dEsY0Li@oneric.de \ --to=oneric@oneric.de \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git