From: Soft Works <softworkz@hotmail.com>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes
Date: Fri, 4 Feb 2022 06:48:40 +0000
Message-ID: <DM8P223MB036528BBFFF0FD2A10844DC9BA299@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <DM8P223MB0365CF3B6A60A4A07158B5F5BA299@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM>
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Soft
> Works
> Sent: Friday, February 4, 2022 6:34 AM
> To: FFmpeg development discussions and patches <ffmpeg-
> devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix
> handling of backslashes
>
>
>
> > -----Original Message-----
> > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> Soft
> > Works
> > Sent: Friday, February 4, 2022 2:58 AM
> > To: FFmpeg development discussions and patches <ffmpeg-
> > devel@ffmpeg.org>
> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}:
> fix
> > handling of backslashes
> >
> >
> >
> > > -----Original Message-----
> > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> > > Oneric
> > > Sent: Friday, February 4, 2022 2:01 AM
> > > To: FFmpeg development discussions and patches <ffmpeg-
> > > devel@ffmpeg.org>
> > > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}:
> > fix
> > > handling of backslashes
> > >
> > > On Thu, Feb 03, 2022 at 20:51:16 +0000, Soft Works wrote:
> > > > I think when you inject that word-joiner as a workaround for ass
> > > > parsing, you'll also need to make sure that it gets removed
> > > > when encoding to other formats.
> > >
> > > There's no way of knowing whether the word-joiner comes from
> > > a conversion performed by ffmpeg in the past or already existed
> > > in the original source.
> >
> > That might be true, but I think it's valid to say that such
> characters
> > are very unusual "original" subtitle sources and that's why I don't
> > think it's a good idea for ffmpeg to start injecting them.
> >
> > Subtitle implementations are often rather minimal, especially in
> > hardware devices and might not always cover the full range of
> > UTF-8 specifics.
> >
> > > However, the wordjoiner does not alter the visually appearance and
> > > is unlikely to change line-breaking properties; that's why I chose
> > > a word-joiner. Therefore I don't think removing (only) the
> inserted
> > > word-joiners is possible,
> >
> > Why not? As it seems to be required for ASS encoding only, all other
> > output formats should remain unaffected.
> >
> > > but also not necessary.
> >
> > I'm not sure whether all ffmpeg text-sub encoders can handle
> > those chars - which could be verified of course.
> >
> > But what remains is the question about the effect on end devices
> > which are consuming that output.
> >
> > Finally, those chars are a pest. I'm using them myself for a
> > specific use case, but when you don't know they are there, it can
> > drive you totally mad, eventually even thinking your system or
> > software is faulty.
> >
> > Example:
> >
> > Open your patch file [2/2] and search for the string
> > "123456\NAscending". You can see the string in two lines, but search
> > will only find one of them.
> >
> > Or just look at the two lines directly. They are preceded by + and -
> > even though both appear identical.
> >
> >
> > So, this also needs consideration of the consequences, like how
> > many developers (inside and outside of ffmpeg) this would be driving
> > nuts over the years and make them start hating ffmpeg for doing so
> > once they've found out.
>
> As I really hate how many devs on this ML keep saying 'no' to
> submitted
> code without having a better suggestion, assuming that this is all
> that
> it takes, I don't want to assimilate in this regard.
>
> Hence I want to propose the following solution:
>
> First of all, the existing code in ff_ass_bprint_text_event() is
> totally
> wrong already. Not only with regard to the backslash escaping (as you
> have already pointed out), but also the curly brace escaping is
> invalid.
> There is no curly-brace escaping in ASS either.
>
> In fact it is impossible with ASS to display an opening curly brace
> followed
> by a closing curly brace at a subsequent position (each one alone may
> work
> depending on implementation).
>
> If it was about ASS alone, we might just drop those braces, so we
> could
> at least avoid the text in-between from being hidden (when outputting
> ASS), but ASS is also the internal ("uncompressed/raw") subtitle
> format
> in ffmpeg that is used for conversion (and subtitle filtering).
> So it would be hard-to-sell when curly braces would get lost when
> converting from one text-sub format to another with none of them
> even being ASS.
>
> What we need is to stop creating invalid ASS and at the same time
> ensure proper conversion of curly braces. How? We substitute them!
>
> And still, UTF-8 can come to the rescue. There are two suitable
> candidates for that:
>
> SMALL LEFT CURLY BRACKET (U+FE5B, Ps): ﹛
> SMALL RIGHT CURLY BRACKET (U+FE5C, Pe): ﹜
> FULLWIDTH LEFT CURLY BRACKET (U+FF5B, Ps): {
> FULLWIDTH RIGHT CURLY BRACKET (U+FF5D, Pe): }
>
> Substitution of curly braces with one of those will prevent ASS from
> treating
> any possible subtitle content as override code.
>
> What remains to be handled now is the backslash case. Now that we can
> be sure
> that we are never inside a sequence that ASS would consider an
> override code,
> only 3 cases are remaining where the backslash has a meaning in ASS
> dialog
> text: '\n', '\N' and '\h'.
>
> We can simply escape those sequences by inserting a (no-op) override
> code
> between the backslash and the char. Suitable for this is: {\r}
> This code resets inline styles, but since we are coming from plain
> text subs
> in ff_ass_bprint_text_event(), we know that we don't have any inline
> styles
> and it's a no-op to reset the style.
>
> Needless to say that we will of course change the substituted curly
> braces
> back to the regular ones at the encoding side for all but ASS.
> Remains the question what to do when encoding to ASS: We can either
> keep the alternate brace characters or just remove them (or maybe
> replace
> with square brackets).
>
> I'm not sure about that last point, but in total, this will be a clean
> solution
> without injecting any weird chars into the subtitle output, and it
> will fix
> multiple incorrect behaviors in the current implementation.
I've found out where the \{ and \} escaping has come from: libass
They decided at some time to introduce this kind of escaping which
is actually incompatible with normal ASS syntax and libass specific:
https://github.com/libass/libass/issues/194#issuecomment-352213210
This doesn't mean though, that the ffmpeg internal ASS format needs
to follow the libass route in this regard. It only matters for the
libass output encoder, because \{\r}N is broken by that libass
decision, so for this case, we'll need a different way.
sw
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2022-02-04 6:48 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-16 18:16 Oneric
2022-01-16 18:16 ` [FFmpeg-devel] [PATCH 2/2] avcodec/webvttdec: honour bidi marks Oneric
2022-02-01 17:38 ` [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes Oneric
2022-02-01 19:44 ` Soft Works
2022-02-01 20:06 ` Oneric
2022-02-01 20:41 ` Soft Works
2022-02-01 23:25 ` Oneric
2022-02-02 4:44 ` Soft Works
2022-02-02 17:03 ` Oneric
2022-02-02 22:18 ` Soft Works
2022-02-02 22:44 ` Soft Works
2022-02-03 2:11 ` Oneric
2022-02-03 20:51 ` Soft Works
2022-02-04 1:01 ` Oneric
2022-02-04 1:30 ` Andreas Rheinhardt
2022-02-04 21:52 ` Oneric
2022-02-04 23:24 ` Soft Works
2022-02-05 1:20 ` Oneric
2022-02-05 2:08 ` Soft Works
2022-02-05 21:59 ` Oneric
2022-02-06 1:08 ` Soft Works
2022-02-06 1:37 ` Soft Works
2022-02-04 1:57 ` Soft Works
2022-02-04 5:34 ` Soft Works
2022-02-04 5:59 ` Soft Works
2022-02-04 6:48 ` Soft Works [this message]
2022-02-04 21:19 ` Oneric
2022-02-04 22:23 ` Soft Works
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DM8P223MB036528BBFFF0FD2A10844DC9BA299@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM \
--to=softworkz@hotmail.com \
--cc=ffmpeg-devel@ffmpeg.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git