Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
From: Oneric <oneric@oneric.de>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes
Date: Fri, 4 Feb 2022 22:52:08 +0100
Message-ID: <Yf2gCH+2EzYdhUqy@dismail.de> (raw)
In-Reply-To: <DM8P223MB03658E1710E3005D39CE5BC9BA299@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM> <AM7PR03MB6660AA28DB9F9B2895314CCE8F299@AM7PR03MB6660.eurprd03.prod.outlook.com>

On Fri, Feb 04, 2022 at 02:30:37 +0100, Andreas Rheinhardt wrote:
> All text-based subtitles are supposed to be UTF-8 when they reach the
> decoder; if it isn't, the user has to set the appropriate -sub_charenc
> and -sub_charenc_mode.
> 
> - Andreas

Thanks for the info! Then at least the UTF-8 assumption
is no problem after all.


On Fri, Feb 04, 2022 at 01:57:48 +0000, Soft Works wrote:
> > There's no way of knowing whether the word-joiner comes from
> > a conversion performed by ffmpeg in the past or already existed
> > in the original source.
> 
> That might be true, but I think it's valid to say that such characters
> are very unusual "original" subtitle sources and that's why I don't
> think it's a good idea for ffmpeg to start injecting them.

Don't underestimate what subtitle authors can come up with :)

> Subtitle implementations are often rather minimal, especially in
> hardware devices and might not always cover the full range of 
> UTF-8 specifics.

The wordjoiner lies in the Basic Multilingual Plane, so even ancient UTF-8 
implementations assuming all of Unicode’s codepoints fit in 16bits
(i.e. 3-bytes max per codepoint in UTF-8) will be able to understand it.

> > However, the wordjoiner does not alter the visually appearance and
> > is unlikely to change line-breaking properties; that's why I chose
> > a word-joiner. Therefore I don't think removing (only) the inserted
> > word-joiners is possible,
> 
> Why not? As it seems to be required for ASS encoding only, all other
> output formats should remain unaffected. 

Because — as written before — it can exist in the original source.
Unicode recommends using the wordjoiner eg to prevent linebreaks
between two characters without any additional side-effects as eg
the combining-grapheme-joiner would cause.

> > but also not necessary.
> 
> I'm not sure whether all ffmpeg text-sub encoders can handle 
> those chars - which could be verified of course.

Since it's in the BMP and ffmpeg already seems happy to assume some UTF-8 
support by converting everything to it, I'm not worried about this until
proven wrong.


> Finally, those chars are a pest. I'm using them myself for a 
> specific use case, but when you don't know they are there, it can
> drive you totally mad, eventually even thinking your system or
> software is faulty.
> 
> Example: 
> 
> Open your patch file [2/2] and search for the string
> "123456\NAscending". You can see the string in two lines, but search
> will only find one of them.
> 
> Or just look at the two lines directly. They are preceded by + and -
> even though both appear identical. 

Actually, I see this with helpful colouring lost here:

  -Dialogue: 0,0:00:55.00,0:01:00.00,Default,,0,0,0,,Descending: 123456\NAscending: 123456^M
  +Dialogue: 0,0:00:55.00,0:01:00.00,Default,,0,0,0,,Descending: <200f>123456<200e>\NAscending: 123456^M

More plain-text oriented editors likely won't show them though, yes.

On this topic, finding raw bidi-marks in ASS subtitles for RTL-languages
is not that unusual though, to give an example for "invisible characters"
being used manually in the original source.
(Because VSFilters (and libass in the interest of compatibility)
 assumes LTR by default and other things)

Even if I thought removing all wordjoiners when converting from ASS
was a good idea, I still wouldn't know where to do this (or where to
look to remove possibly lingering attempts to recollapse \\ into \).
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

  reply	other threads:[~2022-02-04 21:52 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-16 18:16 Oneric
2022-01-16 18:16 ` [FFmpeg-devel] [PATCH 2/2] avcodec/webvttdec: honour bidi marks Oneric
2022-02-01 17:38 ` [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes Oneric
2022-02-01 19:44   ` Soft Works
2022-02-01 20:06     ` Oneric
2022-02-01 20:41       ` Soft Works
2022-02-01 23:25         ` Oneric
2022-02-02  4:44           ` Soft Works
2022-02-02 17:03             ` Oneric
2022-02-02 22:18               ` Soft Works
2022-02-02 22:44                 ` Soft Works
2022-02-03  2:11                   ` Oneric
2022-02-03 20:51                     ` Soft Works
2022-02-04  1:01                       ` Oneric
2022-02-04  1:30                         ` Andreas Rheinhardt
2022-02-04 21:52                           ` Oneric [this message]
2022-02-04 23:24                             ` Soft Works
2022-02-05  1:20                               ` Oneric
2022-02-05  2:08                                 ` Soft Works
2022-02-05 21:59                                   ` Oneric
2022-02-06  1:08                                     ` Soft Works
2022-02-06  1:37                                       ` Soft Works
2022-02-04  1:57                         ` Soft Works
2022-02-04  5:34                           ` Soft Works
2022-02-04  5:59                             ` Soft Works
2022-02-04  6:48                             ` Soft Works
2022-02-04 21:19                               ` Oneric
2022-02-04 22:23                                 ` Soft Works

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yf2gCH+2EzYdhUqy@dismail.de \
    --to=oneric@oneric.de \
    --cc=ffmpeg-devel@ffmpeg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git