Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
From: Soft Works <softworkz@hotmail.com>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes
Date: Fri, 4 Feb 2022 06:48:40 +0000
Message-ID: <DM8P223MB036528BBFFF0FD2A10844DC9BA299@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <DM8P223MB0365CF3B6A60A4A07158B5F5BA299@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM>



> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Soft
> Works
> Sent: Friday, February 4, 2022 6:34 AM
> To: FFmpeg development discussions and patches <ffmpeg-
> devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix
> handling of backslashes
> 
> 
> 
> > -----Original Message-----
> > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> Soft
> > Works
> > Sent: Friday, February 4, 2022 2:58 AM
> > To: FFmpeg development discussions and patches <ffmpeg-
> > devel@ffmpeg.org>
> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}:
> fix
> > handling of backslashes
> >
> >
> >
> > > -----Original Message-----
> > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> > > Oneric
> > > Sent: Friday, February 4, 2022 2:01 AM
> > > To: FFmpeg development discussions and patches <ffmpeg-
> > > devel@ffmpeg.org>
> > > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}:
> > fix
> > > handling of backslashes
> > >
> > > On Thu, Feb 03, 2022 at 20:51:16 +0000, Soft Works wrote:
> > > > I think when you inject that word-joiner as a workaround for ass
> > > > parsing, you'll also need to make sure that it gets removed
> > > > when encoding to other formats.
> > >
> > > There's no way of knowing whether the word-joiner comes from
> > > a conversion performed by ffmpeg in the past or already existed
> > > in the original source.
> >
> > That might be true, but I think it's valid to say that such
> characters
> > are very unusual "original" subtitle sources and that's why I don't
> > think it's a good idea for ffmpeg to start injecting them.
> >
> > Subtitle implementations are often rather minimal, especially in
> > hardware devices and might not always cover the full range of
> > UTF-8 specifics.
> >
> > > However, the wordjoiner does not alter the visually appearance and
> > > is unlikely to change line-breaking properties; that's why I chose
> > > a word-joiner. Therefore I don't think removing (only) the
> inserted
> > > word-joiners is possible,
> >
> > Why not? As it seems to be required for ASS encoding only, all other
> > output formats should remain unaffected.
> >
> > > but also not necessary.
> >
> > I'm not sure whether all ffmpeg text-sub encoders can handle
> > those chars - which could be verified of course.
> >
> > But what remains is the question about the effect on end devices
> > which are consuming that output.
> >
> > Finally, those chars are a pest. I'm using them myself for a
> > specific use case, but when you don't know they are there, it can
> > drive you totally mad, eventually even thinking your system or
> > software is faulty.
> >
> > Example:
> >
> > Open your patch file [2/2] and search for the string
> > "123456\NAscending". You can see the string in two lines, but search
> > will only find one of them.
> >
> > Or just look at the two lines directly. They are preceded by + and -
> > even though both appear identical.
> >
> >
> > So, this also needs consideration of the consequences, like how
> > many developers (inside and outside of ffmpeg) this would be driving
> > nuts over the years and make them start hating ffmpeg for doing so
> > once they've found out.
> 
> As I really hate how many devs on this ML keep saying 'no' to
> submitted
> code without having a better suggestion, assuming that this is all
> that
> it takes, I don't want to assimilate in this regard.
> 
> Hence I want to propose the following solution:
> 
> First of all, the existing code in ff_ass_bprint_text_event() is
> totally
> wrong already. Not only with regard to the backslash escaping (as you
> have already pointed out), but also the curly brace escaping is
> invalid.
> There is no curly-brace escaping in ASS either.
> 
> In fact it is impossible with ASS to display an opening curly brace
> followed
> by a closing curly brace at a subsequent position (each one alone may
> work
> depending on implementation).
> 
> If it was about ASS alone, we might just drop those braces, so we
> could
> at least avoid the text in-between from being hidden (when outputting
> ASS), but ASS is also the internal ("uncompressed/raw") subtitle
> format
> in ffmpeg that is used for conversion (and subtitle filtering).
> So it would be hard-to-sell when curly braces would get lost when
> converting from one text-sub format to another with none of them
> even being ASS.
> 
> What we need is to stop creating invalid ASS and at the same time
> ensure proper conversion of curly braces. How? We substitute them!
> 
> And still, UTF-8 can come to the rescue. There are two suitable
> candidates for that:
> 
> SMALL LEFT CURLY BRACKET (U+FE5B, Ps): ﹛
> SMALL RIGHT CURLY BRACKET (U+FE5C, Pe): ﹜
> FULLWIDTH LEFT CURLY BRACKET (U+FF5B, Ps): {
> FULLWIDTH RIGHT CURLY BRACKET (U+FF5D, Pe): }
> 
> Substitution of curly braces with one of those will prevent ASS from
> treating
> any possible subtitle content as override code.
> 
> What remains to be handled now is the backslash case. Now that we can
> be sure
> that we are never inside a sequence that ASS would consider an
> override code,
> only 3 cases are remaining where the backslash has a meaning in ASS
> dialog
> text:  '\n', '\N' and '\h'.
> 
> We can simply escape those sequences by inserting a (no-op) override
> code
> between the backslash and the char. Suitable for this is: {\r}
> This code resets inline styles, but since we are coming from plain
> text subs
> in ff_ass_bprint_text_event(), we know that we don't have any inline
> styles
> and it's a no-op to reset the style.
> 
> Needless to say that we will of course change the substituted curly
> braces
> back to the regular ones at the encoding side for all but ASS.
> Remains the question what to do when encoding to ASS: We can either
> keep the alternate brace characters or just remove them (or maybe
> replace
> with square brackets).
> 
> I'm not sure about that last point, but in total, this will be a clean
> solution
> without injecting any weird chars into the subtitle output, and it
> will fix
> multiple incorrect behaviors in the current implementation.


I've found out where the \{ and \} escaping has come from: libass
They decided at some time to introduce this kind of escaping which
is actually incompatible with normal ASS syntax and libass specific:
https://github.com/libass/libass/issues/194#issuecomment-352213210


This doesn't mean though, that the ffmpeg internal ASS format needs
to follow the libass route in this regard. It only matters for the
libass output encoder, because \{\r}N is broken by that libass
decision, so for this case, we'll need a different way.

sw



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

  parent reply	other threads:[~2022-02-04  6:48 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-16 18:16 Oneric
2022-01-16 18:16 ` [FFmpeg-devel] [PATCH 2/2] avcodec/webvttdec: honour bidi marks Oneric
2022-02-01 17:38 ` [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes Oneric
2022-02-01 19:44   ` Soft Works
2022-02-01 20:06     ` Oneric
2022-02-01 20:41       ` Soft Works
2022-02-01 23:25         ` Oneric
2022-02-02  4:44           ` Soft Works
2022-02-02 17:03             ` Oneric
2022-02-02 22:18               ` Soft Works
2022-02-02 22:44                 ` Soft Works
2022-02-03  2:11                   ` Oneric
2022-02-03 20:51                     ` Soft Works
2022-02-04  1:01                       ` Oneric
2022-02-04  1:30                         ` Andreas Rheinhardt
2022-02-04 21:52                           ` Oneric
2022-02-04 23:24                             ` Soft Works
2022-02-05  1:20                               ` Oneric
2022-02-05  2:08                                 ` Soft Works
2022-02-05 21:59                                   ` Oneric
2022-02-06  1:08                                     ` Soft Works
2022-02-06  1:37                                       ` Soft Works
2022-02-04  1:57                         ` Soft Works
2022-02-04  5:34                           ` Soft Works
2022-02-04  5:59                             ` Soft Works
2022-02-04  6:48                             ` Soft Works [this message]
2022-02-04 21:19                               ` Oneric
2022-02-04 22:23                                 ` Soft Works

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM8P223MB036528BBFFF0FD2A10844DC9BA299@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM \
    --to=softworkz@hotmail.com \
    --cc=ffmpeg-devel@ffmpeg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git