* [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes @ 2022-01-16 18:16 Oneric 2022-01-16 18:16 ` [FFmpeg-devel] [PATCH 2/2] avcodec/webvttdec: honour bidi marks Oneric 2022-02-01 17:38 ` [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes Oneric 0 siblings, 2 replies; 28+ messages in thread From: Oneric @ 2022-01-16 18:16 UTC (permalink / raw) To: ffmpeg-devel Backslashes cannot be escaped by backslashes in any ASS renderer, but unless followed by a few specific characters it is just printed as a regular character. Insert a word-joiner character after a backslash to break up the active sequences without changing the visual output. Also the existing \{ and \} escapes are specific to libass only. --- The patch assumes UTF-8 encoding in ff_ass_bprint_text_event (WebVTT requires UTF-8 per sepc). If we cannot assume a particular encoding, please advise how to best insert a word-joiner character in the correct encoding. --- libavcodec/ass.c | 5 ++++- libavcodec/webvttdec.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/libavcodec/ass.c b/libavcodec/ass.c index 725e4d42ba..461e110ca4 100644 --- a/libavcodec/ass.c +++ b/libavcodec/ass.c @@ -157,8 +157,11 @@ void ff_ass_bprint_text_event(AVBPrint *buf, const char *p, int size, /* standard ASS escaping so random characters don't get mis-interpreted * as ASS */ - } else if (!keep_ass_markup && strchr("{}\\", *p)) { + } else if (!keep_ass_markup && strchr("{}", *p)) { av_bprintf(buf, "\\%c", *p); + } else if (!keep_ass_markup && *p == '\\') { + // append word-joiner U+2060 as UTF-8 to break up sequences like \N + av_bprintf(buf, "\\\xe2\x81\xa0"); /* some packets might end abruptly (no \0 at the end, like for example * in some cases of demuxing from a classic video container), some diff --git a/libavcodec/webvttdec.c b/libavcodec/webvttdec.c index 0093f328fa..8cb739697a 100644 --- a/libavcodec/webvttdec.c +++ b/libavcodec/webvttdec.c @@ -37,7 +37,7 @@ static const struct { {"<i>", "{\\i1}"}, {"</i>", "{\\i0}"}, {"<b>", "{\\b1}"}, {"</b>", "{\\b0}"}, {"<u>", "{\\u1}"}, {"</u>", "{\\u0}"}, - {"{", "\\{"}, {"}", "\\}"}, // escape to avoid ASS markup conflicts + {"{", "\\{"}, {"}", "\\}"}, {"\\", "\\\xe2\x81\xa0"}, // escape to avoid ASS markup conflicts {">", ">"}, {"<", "<"}, {"‎", ""}, {"‏", ""}, // FIXME: properly honor bidi marks {"&", "&"}, {" ", "\\h"}, -- 2.30.2 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* [FFmpeg-devel] [PATCH 2/2] avcodec/webvttdec: honour bidi marks 2022-01-16 18:16 [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes Oneric @ 2022-01-16 18:16 ` Oneric 2022-02-01 17:38 ` [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes Oneric 1 sibling, 0 replies; 28+ messages in thread From: Oneric @ 2022-01-16 18:16 UTC (permalink / raw) To: ffmpeg-devel WebVTT files are required to be encoded as UTF-8 by its spec, so just insert the bytes for UTF-8 encoded bidi-marks. --- libavcodec/webvttdec.c | 2 +- tests/ref/fate/sub-webvtt2 | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/webvttdec.c b/libavcodec/webvttdec.c index 8cb739697a..7d996928eb 100644 --- a/libavcodec/webvttdec.c +++ b/libavcodec/webvttdec.c @@ -39,7 +39,7 @@ static const struct { {"<u>", "{\\u1}"}, {"</u>", "{\\u0}"}, {"{", "\\{"}, {"}", "\\}"}, {"\\", "\\\xe2\x81\xa0"}, // escape to avoid ASS markup conflicts {">", ">"}, {"<", "<"}, - {"‎", ""}, {"‏", ""}, // FIXME: properly honor bidi marks + {"‎", "\xe2\x80\x8e"}, {"‏", "\xe2\x80\x8f"}, {"&", "&"}, {" ", "\\h"}, }; diff --git a/tests/ref/fate/sub-webvtt2 b/tests/ref/fate/sub-webvtt2 index 357b8178ea..4cd1d86a9a 100644 --- a/tests/ref/fate/sub-webvtt2 +++ b/tests/ref/fate/sub-webvtt2 @@ -20,6 +20,6 @@ Dialogue: 0,0:00:12.50,0:00:32.50,Default,,0,0,0,,OK, let’s go. Dialogue: 0,0:00:38.00,0:00:43.00,Default,,0,0,0,,I want to 愛あい love you\NThat's not proper English! Dialogue: 0,0:00:43.00,0:00:46.00,Default,,0,0,0,,{\i1}キツネ{\i0}じゃない キツネじゃない\N乙女おとめは Dialogue: 0,0:00:50.00,0:00:55.00,Default,,0,0,0,,Some time ago in a rather distant place.... -Dialogue: 0,0:00:55.00,0:01:00.00,Default,,0,0,0,,Descending: 123456\NAscending: 123456 +Dialogue: 0,0:00:55.00,0:01:00.00,Default,,0,0,0,,Descending: 123456\NAscending: 123456 Dialogue: 0,0:01:00.00,0:01:05.00,Default,,0,0,0,,>> Never gonna give you up Never gonna let you down\NNever\hgonna\hrun\haround & desert\hyou Dialogue: 0,0:55:00.00,1:00:00.00,Default,,0,0,0,,Transcrit par Célestes™ -- 2.30.2 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-01-16 18:16 [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes Oneric 2022-01-16 18:16 ` [FFmpeg-devel] [PATCH 2/2] avcodec/webvttdec: honour bidi marks Oneric @ 2022-02-01 17:38 ` Oneric 2022-02-01 19:44 ` Soft Works 1 sibling, 1 reply; 28+ messages in thread From: Oneric @ 2022-02-01 17:38 UTC (permalink / raw) To: FFmpeg development discussions and patches On Sun, Jan 16, 2022 at 19:16:54 +0100, Oneric wrote: > [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes > libavcodec/ass.c | 5 ++++- > libavcodec/webvttdec.c | 2 +- > 2 files changed, 5 insertions(+), 2 deletions(-) > [PATCH 2/2] avcodec/webvttdec: honour bidi marks > libavcodec/webvttdec.c | 2 +- > tests/ref/fate/sub-webvtt2 | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) ping. In case anyone is wondering why patchwork fails to apply the second patch, this is probably once again because the patch updates one of FATE's ASS reference files which use CRLF line-endings. Locally git am applies both without a hitch for me on top of current master (and FATE passes after applying each patch). _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-01 17:38 ` [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes Oneric @ 2022-02-01 19:44 ` Soft Works 2022-02-01 20:06 ` Oneric 0 siblings, 1 reply; 28+ messages in thread From: Soft Works @ 2022-02-01 19:44 UTC (permalink / raw) To: FFmpeg development discussions and patches > -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Oneric > Sent: Tuesday, February 1, 2022 6:39 PM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix > handling of backslashes > > On Sun, Jan 16, 2022 at 19:16:54 +0100, Oneric wrote: > > [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes > > libavcodec/ass.c | 5 ++++- > > libavcodec/webvttdec.c | 2 +- > > 2 files changed, 5 insertions(+), 2 deletions(-) > > > [PATCH 2/2] avcodec/webvttdec: honour bidi marks > > libavcodec/webvttdec.c | 2 +- > > tests/ref/fate/sub-webvtt2 | 2 +- > > 2 files changed, 2 insertions(+), 2 deletions(-) > > ping. > > In case anyone is wondering why patchwork fails to apply the second patch, > this is probably once again because the patch updates one of FATE's ASS > reference files which use CRLF line-endings. > Locally git am applies both without a hitch for me on top of current master > (and FATE passes after applying each patch). You can add a .gitattributes file to tests/ref/fate/ which includes the line sub-webvtt2 -diff Then your local git format-patch will create a binary diff for the file. softworkz _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-01 19:44 ` Soft Works @ 2022-02-01 20:06 ` Oneric 2022-02-01 20:41 ` Soft Works 0 siblings, 1 reply; 28+ messages in thread From: Oneric @ 2022-02-01 20:06 UTC (permalink / raw) To: FFmpeg development discussions and patches On Tue, Feb 01, 2022 at 19:44:24 +0000, Soft Works wrote: > > On Sun, Jan 16, 2022 at 19:16:54 +0100, Oneric wrote: > > > > In case anyone is wondering why patchwork fails to apply the second patch, > > this is probably once again because the patch updates one of FATE's ASS > > reference files which use CRLF line-endings. > > Locally git am applies both without a hitch for me on top of current master > > (and FATE passes after applying each patch). > > > You can add a .gitattributes file to tests/ref/fate/ which includes the line > > sub-webvtt2 -diff > > Then your local git format-patch will create a binary diff for the file. Thanks for your suggestion. However, a binary diff would look like this which isn't great for seeing what's going on during review: diff --git a/tests/ref/fate/sub-webvtt2 b/tests/ref/fate/sub-webvtt2 index 357b8178ea1cf224ad47dcf78b24f1948ece6665..4cd1d86a9a58ccf65812131bf84a17531c2c6cfa 100644 GIT binary patch delta 24 gcmeys^NnXiIV;bjhJHgMV-r)eM-6?GYgs=70DwpeRR910 delta 18 Zcmeyy^MPkWIV+o?k+F%X+2m%{&j3Hw26O-b Also as noted, locally plain `git am` has no issues applying the regular (non-binary) patch; iirc patchwork uses some additional flags to make it more restrictive because regular codefiles shouldn't use CRLF line-endings. I recall using some .gitattribute settings to force crlf (without making the file binary?) were discussed last time this happened, but deemed to be not worth it because of some other issues with it. > > softworkz _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-01 20:06 ` Oneric @ 2022-02-01 20:41 ` Soft Works 2022-02-01 23:25 ` Oneric 0 siblings, 1 reply; 28+ messages in thread From: Soft Works @ 2022-02-01 20:41 UTC (permalink / raw) To: FFmpeg development discussions and patches > -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Oneric > Sent: Tuesday, February 1, 2022 9:07 PM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix > handling of backslashes > > On Tue, Feb 01, 2022 at 19:44:24 +0000, Soft Works wrote: > > > On Sun, Jan 16, 2022 at 19:16:54 +0100, Oneric wrote: > > > > > > In case anyone is wondering why patchwork fails to apply the second > patch, > > > this is probably once again because the patch updates one of FATE's ASS > > > reference files which use CRLF line-endings. > > > Locally git am applies both without a hitch for me on top of current > master > > > (and FATE passes after applying each patch). > > > > > > You can add a .gitattributes file to tests/ref/fate/ which includes the > line > > > > sub-webvtt2 -diff > > > > Then your local git format-patch will create a binary diff for the file. > > Thanks for your suggestion. However, a binary diff would look like this which > isn't great for seeing what's going on during review: > > diff --git a/tests/ref/fate/sub-webvtt2 b/tests/ref/fate/sub-webvtt2 > index > 357b8178ea1cf224ad47dcf78b24f1948ece6665..4cd1d86a9a58ccf65812131bf84a17531c2 > c6cfa 100644 > GIT binary patch > delta 24 > gcmeys^NnXiIV;bjhJHgMV-r)eM-6?GYgs=70DwpeRR910 > > delta 18 > Zcmeyy^MPkWIV+o?k+F%X+2m%{&j3Hw26O-b Yes, I know, but the question is probably what's more important.. ..that it can be applied or that the text is viewable? I assume that a reviewer would often apply the patch locally anyway, then the text changes become visible again as this doesn't affect git log. Given that it's just a handful files from a few thousands, this should be acceptable. > Also as noted, locally plain `git am` has no issues applying the regular > (non-binary) patch; Yes, because it hasn't been sent around via e-mail yet. SMTP requires CRLF line endings, so essentially it depends on the receiving e-mail client whether it outputs LF or CRLF. It's then up to git whether it ignores or adjust the line ending when applying a patch. Maybe that's what is configured @patchwork, I can't tell. The most extreme example is sub-textenc (and there was another one iirc). It has mixed line endings - some LF and some CRLF. At least at that point there's no way out, because those endings will get unrecoverably lost as soon as it is sent around via e-mail as text-diff. That's how I came to the binary diff as only working option. (or you just don't send patches around via e-mail at all ;-) > I recall using some .gitattribute settings to force crlf (without making > the file binary?) were discussed last time this happened, but deemed to be > not worth it because of some other issues with it. Doesn't work with mixed endings. BTW, I don't find it wrong to make such files binary because logically these are not text files, but test reference results, which are expected to precisely match (rather than "have the same text"). If you don't want to mark the files as binary, you could probably use .gitattributes without commiting, so it would only affect format-patch generation. Best, softworkz _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-01 20:41 ` Soft Works @ 2022-02-01 23:25 ` Oneric 2022-02-02 4:44 ` Soft Works 0 siblings, 1 reply; 28+ messages in thread From: Oneric @ 2022-02-01 23:25 UTC (permalink / raw) To: FFmpeg development discussions and patches On Tue, Feb 01, 2022 at 20:41:37 +0000, Soft Works wrote: > > On Tue, Feb 01, 2022 at 19:44:24 +0000, Soft Works wrote: > > > > On Sun, Jan 16, 2022 at 19:16:54 +0100, Oneric wrote: > > > > > > > > In case anyone is wondering why patchwork fails to apply the second patch, > > > > this is probably once again because the patch updates one of FATE's ASS > > > > reference files which use CRLF line-endings. > > > > Locally git am applies both without a hitch for me on top of current master > > > > (and FATE passes after applying each patch). > > > > > > > > > You can add a .gitattributes file to tests/ref/fate/ which includes the line > > > > > > sub-webvtt2 -diff > > > > > > Then your local git format-patch will create a binary diff for the file. > > > > Thanks for your suggestion. However, a binary diff would look like this which > > isn't great for seeing what's going on during review: > > > > [...] > > Yes, I know, but the question is probably what's more important.. > > ..that it can be applied or that the text is viewable? It CAN be applied (as I've now written twice) and of course I verified this with the mail received from the list. > > Also as noted, locally plain `git am` has no issues applying the regular > > (non-binary) patch; > > Yes, because it hasn't been sent around via e-mail yet. SMTP requires > CRLF line endings, so essentially it depends on the receiving e-mail client > whether it outputs LF or CRLF. [...] The mail content is base64 encoded during client-server and server-server transfer. Perhaps the raw base64 text is using all CRLF line-endings, but this doesn't matter. Once decoded it will byte-for-byte match the format-patch I created locally except for the ffmpeg-devel footer and additional headers in the mail version — but git am will just ignore those. See the headers of the second patch's mail: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 For comparison, here are the same headers for your first reply: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit If you tried to send a patch with your mail client by pasting in the diff, then yes, presumably the line-endings would indeed get mangled. That's why doc/developer.texi tells you not to do this (instead recommending git send-email or binary (i.e. base64-encoded) attachments). > The most extreme example is sub-textenc (and there was another one iirc). > It has mixed line endings - some LF and some CRLF. At least at that point > there's no way out, because those endings will get unrecoverably lost > as soon as it is sent around via e-mail as text-diff. > > That's how I came to the binary diff as only working option. > (or you just don't send patches around via e-mail at all ;-) Mixed line-endings also aren't a problem with a base64-encoded mail body as eg git send-email will automatically use. There's one caveat specific to (ffmpeg's) patchwork I didn't know about before: https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220116181655.6407-2-oneric@oneric.de/ If I download the “diff” from the above page, CRLF and LF line-endings are preserved as they should be. However, the files served from the “mbox” and “series” buttons only contain LF line-endings. As everyone can test, the mail I sent evidently does use mixed CRLF and LF line-endings matching the file being patched (just like the result of the “diff” button). I believe this is an issue separate from patchwork using --no-keep-cr and should be fixed on patchwork's side. Tl;dr: Mixed line-endings in a diff are no technical issue, provided the patch is sent correctly. The issue is purely with patchwork, which: a) uses --no-keep-cr or similar presumably to protect against CRLF sneaking into regular source files. (This is a minor annoyance when CRLF is actually required, but otherwise harmless) b) mangles the line-endings for the “mbox” and “series” options but does the correct thing for the “diff” option. This is harmful and imho should be fixed by patchwork. Nonetheless, the patch can still be correctly applied from the _mail_ on the list. The patch on the list can easily be applied from the received mail by a plain `git am` or `git am --keep-cr` if the value of am.keepcr is set to false. (In my case am.keepcr is not set at all and plain git am does the right thing.) If someone wants to apply the patches and has trouble with getting the patch out of the mailclient I can on request resend the patch with a binary diff for the reference file. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-01 23:25 ` Oneric @ 2022-02-02 4:44 ` Soft Works 2022-02-02 17:03 ` Oneric 0 siblings, 1 reply; 28+ messages in thread From: Soft Works @ 2022-02-02 4:44 UTC (permalink / raw) To: FFmpeg development discussions and patches > -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Oneric > Sent: Wednesday, February 2, 2022 12:26 AM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix > handling of backslashes > > On Tue, Feb 01, 2022 at 20:41:37 +0000, Soft Works wrote: > > > On Tue, Feb 01, 2022 at 19:44:24 +0000, Soft Works wrote: > > > > > On Sun, Jan 16, 2022 at 19:16:54 +0100, Oneric wrote: > > > > > > > > > > In case anyone is wondering why patchwork fails to apply the second > patch, > > > > > this is probably once again because the patch updates one of FATE's > ASS > > > > > reference files which use CRLF line-endings. > > > > > Locally git am applies both without a hitch for me on top of current > master > > > > > (and FATE passes after applying each patch). > > > > > > > > > > > > You can add a .gitattributes file to tests/ref/fate/ which includes the > line > > > > > > > > sub-webvtt2 -diff > > > > > > > > Then your local git format-patch will create a binary diff for the > file. > > > > > > Thanks for your suggestion. However, a binary diff would look like this > which > > > isn't great for seeing what's going on during review: > > > > > > [...] > > > > Yes, I know, but the question is probably what's more important.. > > > > ..that it can be applied or that the text is viewable? > > It CAN be applied (as I've now written twice) and > of course I verified this with the mail received from the list. I meant that it can be applied by everybody, including Patchwork and Michael. > If you tried to send a patch with your mail client by pasting in the diff, > then yes, presumably the line-endings would indeed get mangled. I use git format-patch, just like many others and afaik it can't create base64 encoded content. BTW. BASE64 doesn't seem to be widely used here, just checked the frequent committers among the recent submissions. just quickly checked Michael, Andreas,Gyan,Marton,James,Anton,Derek - all of > [..] That's why > doc/developer.texi tells you not to do this (instead recommending > git send-email or binary (i.e. base64-encoded) attachments). Interestingly, even the author of those lines is sending patches with Content-Transfer-Encoding: 7bit :-) > Tl;dr: > Mixed line-endings in a diff are no technical issue, > provided the patch is sent correctly.^ LOL - so the majority on this ML is sending patches incorrectly? I think, marking such files as binary is a bit more practical than expecting everybody to change ones working process. Best, softworkz _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-02 4:44 ` Soft Works @ 2022-02-02 17:03 ` Oneric 2022-02-02 22:18 ` Soft Works 0 siblings, 1 reply; 28+ messages in thread From: Oneric @ 2022-02-02 17:03 UTC (permalink / raw) To: FFmpeg development discussions and patches It appears my previous mail has been interpreted as some sort of attack against you; this is not the case. I am sorry for creating such a misunderstanding. The mail was merely meant to clear up the misconception that it would be impossible to send mixed line-endings via mail due to some SMTP property using some illustrative examples from the current thread. (I have no idea how SMTP works in detail) I'll respond to the new misunderstandings below in more detail if you're interested, but I see little point in continuing to discuss how mails are transferred or encoded. Can we now start to discuss the patches themselves instead? If you want to apply them and have trouble with getting the correct patch out of your client (and don't want to use Patchwork's diff), let me repeat my previous offer: > > If someone wants to apply the patches and has trouble with getting the patch > > out of the mailclient I can on request resend the patch with a binary diff > > for the reference file. ~~~~~~ On Wed, Feb 02, 2022 at 04:44:54 +0000, Soft Works wrote: > > It CAN be applied (as I've now written twice) and > > of course I verified this with the mail received from the list. > > I meant that it can be applied by everybody, including Patchwork > and Michael. Everybody but Patchwork can apply the mail and Patchwork's _diff_ works too. In case any mail client acts up I already offered to resend the patch with a binary flag for the reference file on request. > I use git format-patch, just like many others and afaik it can't create base64 > encoded content. It doesn't need to. When attaching a format-patch (as ffmpeg's documentation recommends when not using send-email) the _mail client_ is supposed to choose and perform a suitable transfer encoding. Depending on the mail client this can also work with the main message, but some clients have trouble preserving everything in the main message. (Because they assume the main body to be plain text and eg "normalise" line-ending, automatically break long lines, remove or replace special characters, etc) > BTW. BASE64 doesn't seem to be widely used here, [...] Note, that I wrote send-email uses the appropriate transfer-encoding “automatically” not “always base64”; if there are no characters in the message beyond US-ASCII (and mayhaps only LF line-endings), using base64 or quoted-printable is not necessary. Whether or not mixed line-endings _require_ one of the latter two encodings due to some SMTP limitation I do not know. It's certainly _possible_ to preserve them this way though. (And git send-email choose to use base64 for the second patch.) > Interestingly, even the author of those lines is sending patches with > Content-Transfer-Encoding: 7bit :-) see previous. (second patch contains bidi-marks and mixed line-endings, first neither) > LOL - so the majority on this ML is sending patches incorrectly? No, see second last. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-02 17:03 ` Oneric @ 2022-02-02 22:18 ` Soft Works 2022-02-02 22:44 ` Soft Works 0 siblings, 1 reply; 28+ messages in thread From: Soft Works @ 2022-02-02 22:18 UTC (permalink / raw) To: FFmpeg development discussions and patches > -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of > Oneric > Sent: Wednesday, February 2, 2022 6:03 PM > To: FFmpeg development discussions and patches <ffmpeg- > devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix > handling of backslashes > > It appears my previous mail has been interpreted as some sort of > attack > against you; this is not the case. I am sorry for creating such a > misunderstanding. Don't know who said that, but I don't feel attacked. It's about technical details and opinions about it. All good. > The mail was merely meant to clear up the misconception that it would > be > impossible to send mixed line-endings via mail due to some SMTP > property > using some illustrative examples from the current thread. > (I have no idea how SMTP works in detail) Where did you see such misconception? I think almost everybody knows that this is doable over SMTP in some way, and there are in fact many ways to do this, even when it would be as stupid like attaching as zip archive. It's never been a matter of SMTP. It's the GIT tooling in the context of "patch-over-SMTP" transport and the way it is using plain-ascii-text e-mails for this purpose. IMO, this is a design flaw or just a bug when git-send-email and format-patch are creating output with transfer-encodings 7bit/8bit but mixed EOLs. I suppose this is due to the tendency of GIT to consider text file basically as files with lines of text where the line endings are exchangeable and rather a platform detail than part of the text file's content. And text files with mixed line endings are an accidental side effect of multi-platform work and the endings can be re-unified at any time. That's also the reason why format-patch cannot warn, because bazillions of code files exist which have mixed EOLs (accidentally and exchangeable). Yes, yes yes - don't need to respond that GIT can handle those cases flawlessly and precisely when configured appropriately (and when not using the e-mail features). That goes without saying. I'm just speculating about history, why it might have happened that it doesn't work flawlessly when using e-mail. Via e-mail, it only works when using base64 encoding like you said, but this had only been added at a later time and wasn't part of the original functionality. Further - I could find no evidence for your claims that base64 encoding would be chosen automatically depending on the content, neither is it on by default. There's an 'auto' mode which chooses between 7bit, 8bit and quoted-printable, but not base64, which always needs to be specified explicitly (or set in .gitconfig). This is the way it is documented and I'm seeing exactly that documented behavior on both Linux and Windows when testing with a mocked smtp server. You had contradicted my statement that it would be desirable that format-patch would be able to do base64 encoding as well, saying that this would be the task of a mail-client. This is not necessarily the case, though. format-patch can generate patch e-mails with multipart-mime content, where the patch content is created as an attachment mime part. format-patch does that with a transfer-encoding of 8bit, but it can't do it with base64 encoding. (I'm stripping the rest of your reply as it is all covered above already) Regards, softworkz _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-02 22:18 ` Soft Works @ 2022-02-02 22:44 ` Soft Works 2022-02-03 2:11 ` Oneric 0 siblings, 1 reply; 28+ messages in thread From: Soft Works @ 2022-02-02 22:44 UTC (permalink / raw) To: FFmpeg development discussions and patches > -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Soft > Works > Sent: Wednesday, February 2, 2022 11:19 PM > To: FFmpeg development discussions and patches <ffmpeg- > devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix > handling of backslashes > > > > > -----Original Message----- > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of > > Oneric > > Sent: Wednesday, February 2, 2022 6:03 PM > > To: FFmpeg development discussions and patches <ffmpeg- > > devel@ffmpeg.org> > > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: > fix > > handling of backslashes > > > > It appears my previous mail has been interpreted as some sort of > > attack > > against you; this is not the case. I am sorry for creating such a > > misunderstanding. > > Don't know who said that, but I don't feel attacked. It's about > technical > details and opinions about it. All good. > > > The mail was merely meant to clear up the misconception that it > would > > be > > impossible to send mixed line-endings via mail due to some SMTP > > property > > using some illustrative examples from the current thread. > > (I have no idea how SMTP works in detail) > > Where did you see such misconception? I think almost everybody knows > that this is doable over SMTP in some way, and there are in > fact many ways to do this, even when it would be as stupid like > attaching > as zip archive. > > It's never been a matter of SMTP. It's the GIT tooling in the context > of > "patch-over-SMTP" transport and the way it is using plain-ascii-text > e-mails for this purpose. IMO, this is a design flaw or just a bug > when git-send-email and format-patch are creating output with > transfer-encodings > 7bit/8bit but mixed EOLs. > > I suppose this is due to the tendency of GIT to consider text file > basically > as files with lines of text where the line endings are exchangeable > and > rather a platform detail than part of the text file's content. > And text files with mixed line endings are an accidental side effect > of > multi-platform work and the endings can be re-unified at any time. > That's also the reason why format-patch cannot warn, because > bazillions > of code files exist which have mixed EOLs (accidentally and > exchangeable). > > Yes, yes yes - don't need to respond that GIT can handle those cases > flawlessly and precisely when configured appropriately (and when not > using the e-mail features). That goes without saying. > I'm just speculating about history, why it might have happened that > it doesn't work flawlessly when using e-mail. > > Via e-mail, it only works when using base64 encoding like you said, > but this had only been added at a later time and wasn't part of the > original functionality. Haha, I almost forgot to get to my own point.. From that situation above, I would conclude that at least at an earlier time, the git developers (when asked about a case like the ffmpeg ref files) might have argued that such files with mixed EOL where the individual EOLs also form a relevant part of the content - are not meant to be handled as text, but as binary files (either) or at to use binary diffs when being send via e-mail. . sw _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-02 22:44 ` Soft Works @ 2022-02-03 2:11 ` Oneric 2022-02-03 20:51 ` Soft Works 0 siblings, 1 reply; 28+ messages in thread From: Oneric @ 2022-02-03 2:11 UTC (permalink / raw) To: FFmpeg development discussions and patches On Wed, Feb 02, 2022 at 22:44:18 +0000, Soft Works wrote: > > Sent: Wednesday, February 2, 2022 11:19 PM > > To: FFmpeg development discussions and patches <ffmpeg- > > devel@ffmpeg.org> > > > > [claims about git and smtp: part 1] > > > > [claims about git and smtp: part 2] > None of this relates to the topic of the patches, which is backslash-escaping in ASS and converting BIDI-marks from WebVTT. As written before, I see no point in advancing this offtopic talk; please actually discuss the content of the patches in this thread. If you are convinced that the reference files should be treated as binary, then submit a suitable gitattributes patch to the list and discuss it with whom it may interest. I’ll ignore all further offtopic messages here. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-03 2:11 ` Oneric @ 2022-02-03 20:51 ` Soft Works 2022-02-04 1:01 ` Oneric 0 siblings, 1 reply; 28+ messages in thread From: Soft Works @ 2022-02-03 20:51 UTC (permalink / raw) To: FFmpeg development discussions and patches > -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of > Oneric > Sent: Thursday, February 3, 2022 3:11 AM > To: FFmpeg development discussions and patches <ffmpeg- > devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix > handling of backslashes > > On Wed, Feb 02, 2022 at 22:44:18 +0000, Soft Works wrote: > > > Sent: Wednesday, February 2, 2022 11:19 PM > > > To: FFmpeg development discussions and patches <ffmpeg- > > > devel@ffmpeg.org> > > > > > > [claims about git and smtp: part 1] > > > > > > > [claims about git and smtp: part 2] > > > I’ll ignore all further offtopic messages here. Aha, suddenly it's off-topic, but ok. > please actually discuss the content of the patches in this thread. I think when you inject that word-joiner as a workaround for ass parsing, you'll also need to make sure that it gets removed when encoding to other formats. softworkz _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-03 20:51 ` Soft Works @ 2022-02-04 1:01 ` Oneric 2022-02-04 1:30 ` Andreas Rheinhardt 2022-02-04 1:57 ` Soft Works 0 siblings, 2 replies; 28+ messages in thread From: Oneric @ 2022-02-04 1:01 UTC (permalink / raw) To: FFmpeg development discussions and patches On Thu, Feb 03, 2022 at 20:51:16 +0000, Soft Works wrote: > I think when you inject that word-joiner as a workaround for ass > parsing, you'll also need to make sure that it gets removed > when encoding to other formats. There's no way of knowing whether the word-joiner comes from a conversion performed by ffmpeg in the past or already existed in the original source. However, the wordjoiner does not alter the visually appearance and is unlikely to change line-breaking properties; that's why I chose a word-joiner. Therefore I don't think removing (only) the inserted word-joiners is possible, but also not necessary. Also, was the wrong \\ recollapsed into a single \ somehwere? If yes, then this code can be dropped as well (but I don't know where it is). My biggest concern with this is the unconditional assumption of UTF-8 in ff_ass_bprint_text_event? Is there a way to improve on this or does ffmpeg convert all text to UTF-8 before it's passed to this function? _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-04 1:01 ` Oneric @ 2022-02-04 1:30 ` Andreas Rheinhardt 2022-02-04 21:52 ` Oneric 2022-02-04 1:57 ` Soft Works 1 sibling, 1 reply; 28+ messages in thread From: Andreas Rheinhardt @ 2022-02-04 1:30 UTC (permalink / raw) To: ffmpeg-devel Oneric: > On Thu, Feb 03, 2022 at 20:51:16 +0000, Soft Works wrote: >> I think when you inject that word-joiner as a workaround for ass >> parsing, you'll also need to make sure that it gets removed >> when encoding to other formats. > > There's no way of knowing whether the word-joiner comes from > a conversion performed by ffmpeg in the past or already existed > in the original source. > However, the wordjoiner does not alter the visually appearance and > is unlikely to change line-breaking properties; that's why I chose > a word-joiner. Therefore I don't think removing (only) the inserted > word-joiners is possible, but also not necessary. > > Also, was the wrong \\ recollapsed into a single \ somehwere? If yes, > then this code can be dropped as well (but I don't know where it is). > > My biggest concern with this is the unconditional assumption of UTF-8 in > ff_ass_bprint_text_event? Is there a way to improve on this or does ffmpeg > convert all text to UTF-8 before it's passed to this function? All text-based subtitles are supposed to be UTF-8 when they reach the decoder; if it isn't, the user has to set the appropriate -sub_charenc and -sub_charenc_mode. - Andreas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-04 1:30 ` Andreas Rheinhardt @ 2022-02-04 21:52 ` Oneric 2022-02-04 23:24 ` Soft Works 0 siblings, 1 reply; 28+ messages in thread From: Oneric @ 2022-02-04 21:52 UTC (permalink / raw) To: FFmpeg development discussions and patches On Fri, Feb 04, 2022 at 02:30:37 +0100, Andreas Rheinhardt wrote: > All text-based subtitles are supposed to be UTF-8 when they reach the > decoder; if it isn't, the user has to set the appropriate -sub_charenc > and -sub_charenc_mode. > > - Andreas Thanks for the info! Then at least the UTF-8 assumption is no problem after all. On Fri, Feb 04, 2022 at 01:57:48 +0000, Soft Works wrote: > > There's no way of knowing whether the word-joiner comes from > > a conversion performed by ffmpeg in the past or already existed > > in the original source. > > That might be true, but I think it's valid to say that such characters > are very unusual "original" subtitle sources and that's why I don't > think it's a good idea for ffmpeg to start injecting them. Don't underestimate what subtitle authors can come up with :) > Subtitle implementations are often rather minimal, especially in > hardware devices and might not always cover the full range of > UTF-8 specifics. The wordjoiner lies in the Basic Multilingual Plane, so even ancient UTF-8 implementations assuming all of Unicode’s codepoints fit in 16bits (i.e. 3-bytes max per codepoint in UTF-8) will be able to understand it. > > However, the wordjoiner does not alter the visually appearance and > > is unlikely to change line-breaking properties; that's why I chose > > a word-joiner. Therefore I don't think removing (only) the inserted > > word-joiners is possible, > > Why not? As it seems to be required for ASS encoding only, all other > output formats should remain unaffected. Because — as written before — it can exist in the original source. Unicode recommends using the wordjoiner eg to prevent linebreaks between two characters without any additional side-effects as eg the combining-grapheme-joiner would cause. > > but also not necessary. > > I'm not sure whether all ffmpeg text-sub encoders can handle > those chars - which could be verified of course. Since it's in the BMP and ffmpeg already seems happy to assume some UTF-8 support by converting everything to it, I'm not worried about this until proven wrong. > Finally, those chars are a pest. I'm using them myself for a > specific use case, but when you don't know they are there, it can > drive you totally mad, eventually even thinking your system or > software is faulty. > > Example: > > Open your patch file [2/2] and search for the string > "123456\NAscending". You can see the string in two lines, but search > will only find one of them. > > Or just look at the two lines directly. They are preceded by + and - > even though both appear identical. Actually, I see this with helpful colouring lost here: -Dialogue: 0,0:00:55.00,0:01:00.00,Default,,0,0,0,,Descending: 123456\NAscending: 123456^M +Dialogue: 0,0:00:55.00,0:01:00.00,Default,,0,0,0,,Descending: <200f>123456<200e>\NAscending: 123456^M More plain-text oriented editors likely won't show them though, yes. On this topic, finding raw bidi-marks in ASS subtitles for RTL-languages is not that unusual though, to give an example for "invisible characters" being used manually in the original source. (Because VSFilters (and libass in the interest of compatibility) assumes LTR by default and other things) Even if I thought removing all wordjoiners when converting from ASS was a good idea, I still wouldn't know where to do this (or where to look to remove possibly lingering attempts to recollapse \\ into \). _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-04 21:52 ` Oneric @ 2022-02-04 23:24 ` Soft Works 2022-02-05 1:20 ` Oneric 0 siblings, 1 reply; 28+ messages in thread From: Soft Works @ 2022-02-04 23:24 UTC (permalink / raw) To: FFmpeg development discussions and patches > -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of > Oneric > Sent: Friday, February 4, 2022 10:52 PM > To: FFmpeg development discussions and patches <ffmpeg- > devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix > handling of backslashes > > On Fri, Feb 04, 2022 at 02:30:37 +0100, Andreas Rheinhardt wrote: > > All text-based subtitles are supposed to be UTF-8 when they reach > the > > decoder; if it isn't, the user has to set the appropriate - > sub_charenc > > and -sub_charenc_mode. > > > > - Andreas > > Thanks for the info! Then at least the UTF-8 assumption > is no problem after all. > > [..] > > > > I'm not sure whether all ffmpeg text-sub encoders can handle > > those chars - which could be verified of course. > > Since it's in the BMP and ffmpeg already seems happy to assume some > UTF-8 > support by converting everything to it, I'm not worried about this > until > proven wrong. Proven wrong: https://github.com/libass/libass/issues/507 > > Finally, those chars are a pest. I'm using them myself for a > > specific use case, but when you don't know they are there, it can > > drive you totally mad, eventually even thinking your system or > > software is faulty. > > > > Example: > > > > Open your patch file [2/2] and search for the string > > "123456\NAscending". You can see the string in two lines, but search > > will only find one of them. > > > > Or just look at the two lines directly. They are preceded by + and - > > even though both appear identical. > > Actually, I see this with helpful colouring lost here: > > -Dialogue: 0,0:00:55.00,0:01:00.00,Default,,0,0,0,,Descending: > 123456\NAscending: 123456^M > +Dialogue: 0,0:00:55.00,0:01:00.00,Default,,0,0,0,,Descending: > <200f>123456<200e>\NAscending: 123456^M I didn't say you won't be a able to find a viewer that can display them. :-) > More plain-text oriented editors likely won't show them though, yes. Yes => pest > > That might be true, but I think it's valid to say that such > characters > > are very unusual "original" subtitle sources and that's why I don't > > think it's a good idea for ffmpeg to start injecting them. > > Don't underestimate what subtitle authors can come up with :) Sure. But a subtitle author is responsible for their authored subtitles while ffmpeg is responsible for encoding of large part of the world's subtitles. And from that same perspective I find the relation of this proposal somewhat insane: You want to "pollute" gazillions of subtitle streams in the world from multiple subtitle formats with invisible characters in order to solve an escaping problem in ffmpeg? softworkz _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-04 23:24 ` Soft Works @ 2022-02-05 1:20 ` Oneric 2022-02-05 2:08 ` Soft Works 0 siblings, 1 reply; 28+ messages in thread From: Oneric @ 2022-02-05 1:20 UTC (permalink / raw) To: FFmpeg development discussions and patches On Fri, Feb 04, 2022 at 23:24:58 +0000, Soft Works wrote: > You want to "pollute" gazillions of subtitle streams in the > world from multiple subtitle formats with invisible > characters in order to solve an escaping problem in ffmpeg? I do not consider using characters that are explicitly recommended to be used by Unicode to be “polluting”. Further consider that as mentioned invisible characters in ASS are not uncommon anyway already and conversion from ASS to something else are rare due to being generally lossy. Lossy with regards to typesetting that is, removing breaking hints in form of plain Unicode characters would be a new form of lossyness. > [From the other mail:] > I'm not into changing ffmpeg's ass output, it's all > about the internally used ass format and the escaping is > a central problem there. I’m not interested in reworking ffmpeg’s internal subtitle handling. The proposed patch is a clear improvement over the status quo which is plain incorrect. Within reasonable effort and sound arguments for it adjustments to the patch can be made; reworking ffmpeg internals is imo not “reasonable” effort to correct an uncontestedly wrong escape. You have two options: Either finally tell me what I asked about: where (as in which file and function) removing wordjoiners should even happen and where possible lingering “\\ → \” conversions presumably are and if it’s simple enough I can add a removal accompanied by a comment pointing out that this can go wrong. Or go ahead and create your own patch. ~~~~~~ > > > I'm not sure whether all ffmpeg text-sub encoders can handle > > > those chars - which could be verified of course. > > > > Since it's in the BMP and ffmpeg already seems happy to assume some > > UTF-8 > > support by converting everything to it, I'm not worried about this > > until > > proven wrong. > > Proven wrong: https://github.com/libass/libass/issues/507 This issue is not at all wordjoiner specific despite the name. As far as I recall this never lead to wrong rendering. With HarfBuzz, the only fully featured shaping backend of libass, control characters were and are handled by HarfBuzz. And even with FriBiDi U+2060 was ignored since long before (2012) the linked issue was opened. What that issue really is about is a combination of two more general issues. libass is currently not caching failure to lookup a glyph leading to multiple messages and at worst a perf degradation if no font on the font pool contained a glyph for a particular glyph. And the realisation that libass’ font-fallback strategy is not ideal for prefix-type control characters, characters which visibly affect both neighbours and a few others. The word-joiner is only highlighted here as due to its usage as an backslash escape its commonly passed to libass and a high enough percentage of fonts doesn’t contain it to create reports about it. For further reference: U+2060 was added in Unicode 3.2 released 2002. If you want to strip it because it might not render correctly you should also strip most emoji, the uppercase eszett ẞ and several actively used writing systems in their entirety. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-05 1:20 ` Oneric @ 2022-02-05 2:08 ` Soft Works 2022-02-05 21:59 ` Oneric 0 siblings, 1 reply; 28+ messages in thread From: Soft Works @ 2022-02-05 2:08 UTC (permalink / raw) To: FFmpeg development discussions and patches > -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of > Oneric > Sent: Saturday, February 5, 2022 2:20 AM > To: FFmpeg development discussions and patches <ffmpeg- > devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix > handling of backslashes > > On Fri, Feb 04, 2022 at 23:24:58 +0000, Soft Works wrote: > > You want to "pollute" gazillions of subtitle streams in the > > world from multiple subtitle formats with invisible > > characters in order to solve an escaping problem in ffmpeg? > > I do not consider using characters that are explicitly recommended to > be > used by Unicode to be “polluting”. Further consider that as mentioned > invisible characters in ASS are not uncommon anyway already and > conversion > from ASS to something else are rare due to being generally lossy. > Lossy > with regards to typesetting that is, removing breaking hints in form > of > plain Unicode characters would be a new form of lossyness. > > > [From the other mail:] > > I'm not into changing ffmpeg's ass output, it's all > > about the internally used ass format and the escaping is > > a central problem there. > > I’m not interested in reworking ffmpeg’s internal subtitle handling. > The proposed patch is a clear improvement over the status quo which > is plain incorrect. Within reasonable effort and sound arguments for > it adjustments to the patch can be made; reworking ffmpeg internals is > imo not “reasonable” effort to correct an uncontestedly wrong escape. > > You have two options: > Either finally tell me what I asked about: > where (as in which file and function) removing wordjoiners should > even happen and where possible lingering “\\ → \” conversions > presumably > are and if it’s simple enough I can add a removal accompanied by a > comment > pointing out that this can go wrong. > Or go ahead and create your own patch. > > ~~~~~~ > > > > > I'm not sure whether all ffmpeg text-sub encoders can handle > > > > those chars - which could be verified of course. > > > > > > Since it's in the BMP and ffmpeg already seems happy to assume > some > > > UTF-8 > > > support by converting everything to it, I'm not worried about this > > > until > > > proven wrong. > > > > Proven wrong: https://github.com/libass/libass/issues/507 > > This issue is not at all wordjoiner specific despite the name. > As far as I recall this never lead to wrong rendering. > With HarfBuzz, the only fully featured shaping backend of libass, > control characters were and are handled by HarfBuzz. > And even with FriBiDi U+2060 was ignored since long before (2012) > the linked issue was opened. > > What that issue really is about is a combination of two more general > issues. libass is currently not caching failure to lookup a glyph > leading > to multiple messages and at worst a perf degradation if no font on the > font pool contained a glyph for a particular glyph. And the > realisation > that libass’ font-fallback strategy is not ideal for prefix-type > control > characters, characters which visibly affect both neighbours and a few > others. > The word-joiner is only highlighted here as due to its usage as an > backslash escape its commonly passed to libass and a high enough > percentage of fonts doesn’t contain it to create reports about it. > > > For further reference: U+2060 was added in Unicode 3.2 released 2002. > If you want to strip it because it might not render correctly you > should > also strip most emoji, the uppercase eszett ẞ and several actively > used writing systems in their entirety. Let's try to approach this from a different side. Which case is your [1/2] commit actually supposed to fix? How did you test your patch? Can we please go over an example? Thanks, sw _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-05 2:08 ` Soft Works @ 2022-02-05 21:59 ` Oneric 2022-02-06 1:08 ` Soft Works 0 siblings, 1 reply; 28+ messages in thread From: Oneric @ 2022-02-05 21:59 UTC (permalink / raw) To: FFmpeg development discussions and patches [-- Attachment #1: Type: text/plain, Size: 2068 bytes --] On Sat, Feb 05, 2022 at 02:08:48 +0000, Soft Works wrote: > Let's try to approach this from a different side. Which case is > your [1/2] commit actually supposed to fix? Escape backslashes when converting from WebVTT to not accidentally introduce active ASS sequences and replace the wrong backslash-escape in ff_ass_bprint_text_event with an escape that actually works. > How did you test your patch? > Can we please go over an example? Take a look at the attached WebVTT file. We expect the second event to be rendered like this, as from WebVTT’s point of view it’s just normal text: our final \h approach \N into \ Coruscant. What we currently get after conversion to ASS is like this though (pay attention to the number of spaces): our final approach into \ Coruscant. If we escape \ as \\ like ff_ass_bprint_text_event currently does, we instead get the following rendering which is also wrong: our final \ approach \ into \\ Coruscant. If instead the word-joiner is appended as in my patch, the visual output matches the expectation (mail does not contain U+2060): our final \h approach \N into \ Coruscant. You can confirm this for yourself by feeding the original WebVTT to a native WebVTT renderer, e.g. https://zcorpan.github.io/live-webvtt-viewer/ and using ./ffmpeg -i test2.vtt -f ass tmp_conv.ass on master, master + my patch and master + a modified patch to use \\ then watching how the converted file gets rendered by mpv, VLC or so. I was not able to create a sample using ff_ass_bprint_text_event as the only users seem to be teletext and some rawtext(?) subtitles and I don't know how to create such files. However, as nothing special happened to the \\ for WebVTT after this sequence was inserted for the internal representation, and since the preceding comment (wrongly) declares \\ to be an ASS rather than an internal escape, it seems highly unlikely to behave any different. You are very welcome to provide a sample using the ff_ass_bprint_text_event codepath and keep_ass_markup=false to verify or debunk this. [-- Attachment #2: test2.vtt --] [-- Type: text/vtt, Size: 129 bytes --] WEBVTT 00:00.000 --> 00:01.000 Senator, we're {A} making 00:01.000 --> 01:01.000 our final \h approach \N into \ Coruscant. [-- Attachment #3: Type: text/plain, Size: 251 bytes --] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-05 21:59 ` Oneric @ 2022-02-06 1:08 ` Soft Works 2022-02-06 1:37 ` Soft Works 0 siblings, 1 reply; 28+ messages in thread From: Soft Works @ 2022-02-06 1:08 UTC (permalink / raw) To: FFmpeg development discussions and patches > -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of > Oneric > Sent: Saturday, February 5, 2022 11:00 PM > To: FFmpeg development discussions and patches <ffmpeg- > devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix > handling of backslashes > > On Sat, Feb 05, 2022 at 02:08:48 +0000, Soft Works wrote: > > Let's try to approach this from a different side. Which case is > > your [1/2] commit actually supposed to fix? > > Escape backslashes when converting from WebVTT to not accidentally > introduce active ASS sequences and replace the wrong backslash-escape > in ff_ass_bprint_text_event with an escape that actually works. > > > How did you test your patch? > > Can we please go over an example? > > Take a look at the attached WebVTT file. Thanks a lot for the test file! > We expect the second event to be rendered like this, > as from WebVTT’s point of view it’s just normal text: > > our final \h approach \N into \ Coruscant. > > What we currently get after conversion to ASS is like this though > (pay attention to the number of spaces): > > our final approach > into \ Coruscant. Yup, no doubt that this is wrong. > If instead the word-joiner is appended as in my patch, the > visual output matches the expectation (mail does not contain U+2060): > > our final \h approach \N into \ Coruscant. I can also confirm that your patch would "work" with regards to ass output when trying with both "old" libass and new libass. I haven't tried with other implementations yet, but when this would turn out to be working with all usual implementations, I might even be OK with the word-joiner. But this is where the agreement ends. - If at all, the word-joiner insertion would need to be limited to ASS output ONLY - it would need to be controllable through an option in the ASS encoder - The word joiners should not be used in internal processing and only be (optionally) added when encoding to ass - Unfortunately, the FATE tests for the other subtitle formats do not include these sequences in the test source files, and that means, before making such change that might potentially affect all other text subtitle encoders, those sequences would need to be added to make sure these conversion won't be affected - Generally, those changes (also the BIDI mark insertion) should be evaluated with regards to all text subtitle encoders, making sure there's no side effect. You said: > I’m not interested in reworking ffmpeg’s internal subtitle handling. > The proposed patch is a clear improvement over the status quo which > is plain incorrect. Within reasonable effort and sound arguments for > it adjustments to the patch can be made; reworking ffmpeg internals is > imo not “reasonable” effort to correct an uncontestedly wrong escape. And that is a problem. The changes you are proposing are making changes to ffmpeg’s internal subtitle handling, so you need to decide whether you want to work on it or not. > You have two options: [..] > Or go ahead and create your own patch. I will do this, but "only" on top of my subtitle filtering patchset because that's my current focus area and just two weeks ago I had to add a temporary hack for a related case which is about ASS dialog lines like: Dialogue: 0,0:00:00.00,0:00:05.00,Default,,0,0,0,,{comment text: \....} Currently, ffmpeg does not recognize this as override code, even though it's valid in ASS that a backslash with the actual code doesn't appear immediately after the opening curly brace. What comes on top of this is that other subtitle decoders do NOT escape the curly braces like WebVTTdec and ass_bprint_text_event(). When you look at SubRip_capability_tester.srt from the FATE suite, you can see that it contains ASS codes that are expected to be recognized and applied, but when there's normal text in curly braces without a backslash, it should be treated at normal text. This is quite a mess that needs to be cleaned up with a plan and it's all related. Like I said already: A central point to this is the escaping and what's needed is a solution that can cover all of those things. I had put this subject aside as I've been unsure about how to do it, but this discussion has been very helpful to see the issues more clearly. How about separating the BIDI part from your patch? I'd see only two things remaining: - Go through all text subtitle encoders and think/research whether those marks would be acceptable in those formats or whether they would need to be removed (like now) - Think about whether the Unicode bidi marks should be replaced back to the html-style codes It's not these wouldn't work, but it's again about visibility and I tend to think that it would be preferable to have them visible in the output softworkz _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-06 1:08 ` Soft Works @ 2022-02-06 1:37 ` Soft Works 0 siblings, 0 replies; 28+ messages in thread From: Soft Works @ 2022-02-06 1:37 UTC (permalink / raw) To: FFmpeg development discussions and patches > -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Soft > Works > Sent: Sunday, February 6, 2022 2:09 AM > To: FFmpeg development discussions and patches <ffmpeg- > devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix > handling of backslashes > > > > > -----Original Message----- > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of > > Oneric > > Sent: Saturday, February 5, 2022 11:00 PM > > To: FFmpeg development discussions and patches <ffmpeg- > > devel@ffmpeg.org> > > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: > fix > > handling of backslashes > > > > On Sat, Feb 05, 2022 at 02:08:48 +0000, Soft Works wrote: > > > Let's try to approach this from a different side. Which case is > > > your [1/2] commit actually supposed to fix? > > > > Escape backslashes when converting from WebVTT to not accidentally > > introduce active ASS sequences and replace the wrong backslash- > escape > > in ff_ass_bprint_text_event with an escape that actually works. > > > > > How did you test your patch? > > > Can we please go over an example? > > > > Take a look at the attached WebVTT file. > > Thanks a lot for the test file! > > > > We expect the second event to be rendered like this, > > as from WebVTT’s point of view it’s just normal text: > > > > our final \h approach \N into \ Coruscant. > > > > What we currently get after conversion to ASS is like this though > > (pay attention to the number of spaces): > > > > our final approach > > into \ Coruscant. > > Yup, no doubt that this is wrong. > > > > If instead the word-joiner is appended as in my patch, the > > visual output matches the expectation (mail does not contain > U+2060): > > > > our final \h approach \N into \ Coruscant. > > I can also confirm that your patch would "work" with regards > to ass output when trying with both "old" libass and new libass. > I haven't tried with other implementations yet, but when this > would turn out to be working with all usual implementations, > I might even be OK with the word-joiner. > > But this is where the agreement ends. > > - If at all, the word-joiner insertion would need to be > limited to ASS output ONLY > - it would need to be controllable through an option in the ASS > encoder > - The word joiners should not be used in internal processing and > only be (optionally) added when encoding to ass > - Unfortunately, the FATE tests for the other subtitle formats > do not include these sequences in the test source files, and > that means, before making such change that might potentially > affect all other text subtitle encoders, those sequences would > need to be added to make sure these conversion won't be affected > - Generally, those changes (also the BIDI mark insertion) should > be evaluated with regards to all text subtitle encoders, > making sure there's no side effect. > > You said: > > > I’m not interested in reworking ffmpeg’s internal subtitle handling. > > The proposed patch is a clear improvement over the status quo which > > is plain incorrect. Within reasonable effort and sound arguments for > > it adjustments to the patch can be made; reworking ffmpeg internals > is > > imo not “reasonable” effort to correct an uncontestedly wrong > escape. > > And that is a problem. The changes you are proposing are making > changes to ffmpeg’s internal subtitle handling, so you need to decide > whether you want to work on it or not. > > > > You have two options: > [..] > > Or go ahead and create your own patch. > > I will do this, but "only" on top of my subtitle filtering patchset > because that's my current focus area and just two weeks ago I had to > add a temporary hack for a related case which is about ASS dialog > lines > like: > > Dialogue: 0,0:00:00.00,0:00:05.00,Default,,0,0,0,,{comment text: > \....} > > Currently, ffmpeg does not recognize this as override code, even > though > it's valid in ASS that a backslash with the actual code doesn't appear > immediately after the opening curly brace. > What comes on top of this is that other subtitle decoders do NOT > escape > the curly braces like WebVTTdec and ass_bprint_text_event(). > When you look at SubRip_capability_tester.srt from the FATE suite, you > can see that it contains ASS codes that are expected to be recognized > and applied, but when there's normal text in curly braces without > a backslash, it should be treated at normal text. > > This is quite a mess that needs to be cleaned up with a plan and it's > all related. Like I said already: A central point to this is the > escaping > and what's needed is a solution that can cover all of those things. > > I had put this subject aside as I've been unsure about how to do it, > but this discussion has been very helpful to see the issues more > clearly. > > How about separating the BIDI part from your patch? I'd see only two > things remaining: > > - Go through all text subtitle encoders and think/research whether > those marks would be acceptable in those formats or whether they > would need to be removed (like now) > - Think about whether the Unicode bidi marks should be replaced back > to the html-style codes > It's not these wouldn't work, but it's again about visibility and > I tend to think that it would be preferable to have them visible > in the output > > softworkz What I would also find acceptable is to just remove the escaping of backslashes in ff_ass_bprint_text_event() without adding a word-joiner, as this is clearly nonsense and it would still be an improvement. sw _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-04 1:01 ` Oneric 2022-02-04 1:30 ` Andreas Rheinhardt @ 2022-02-04 1:57 ` Soft Works 2022-02-04 5:34 ` Soft Works 1 sibling, 1 reply; 28+ messages in thread From: Soft Works @ 2022-02-04 1:57 UTC (permalink / raw) To: FFmpeg development discussions and patches > -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of > Oneric > Sent: Friday, February 4, 2022 2:01 AM > To: FFmpeg development discussions and patches <ffmpeg- > devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix > handling of backslashes > > On Thu, Feb 03, 2022 at 20:51:16 +0000, Soft Works wrote: > > I think when you inject that word-joiner as a workaround for ass > > parsing, you'll also need to make sure that it gets removed > > when encoding to other formats. > > There's no way of knowing whether the word-joiner comes from > a conversion performed by ffmpeg in the past or already existed > in the original source. That might be true, but I think it's valid to say that such characters are very unusual "original" subtitle sources and that's why I don't think it's a good idea for ffmpeg to start injecting them. Subtitle implementations are often rather minimal, especially in hardware devices and might not always cover the full range of UTF-8 specifics. > However, the wordjoiner does not alter the visually appearance and > is unlikely to change line-breaking properties; that's why I chose > a word-joiner. Therefore I don't think removing (only) the inserted > word-joiners is possible, Why not? As it seems to be required for ASS encoding only, all other output formats should remain unaffected. > but also not necessary. I'm not sure whether all ffmpeg text-sub encoders can handle those chars - which could be verified of course. But what remains is the question about the effect on end devices which are consuming that output. Finally, those chars are a pest. I'm using them myself for a specific use case, but when you don't know they are there, it can drive you totally mad, eventually even thinking your system or software is faulty. Example: Open your patch file [2/2] and search for the string "123456\NAscending". You can see the string in two lines, but search will only find one of them. Or just look at the two lines directly. They are preceded by + and - even though both appear identical. So, this also needs consideration of the consequences, like how many developers (inside and outside of ffmpeg) this would be driving nuts over the years and make them start hating ffmpeg for doing so once they've found out. softworkz _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-04 1:57 ` Soft Works @ 2022-02-04 5:34 ` Soft Works 2022-02-04 5:59 ` Soft Works 2022-02-04 6:48 ` Soft Works 0 siblings, 2 replies; 28+ messages in thread From: Soft Works @ 2022-02-04 5:34 UTC (permalink / raw) To: FFmpeg development discussions and patches > -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Soft > Works > Sent: Friday, February 4, 2022 2:58 AM > To: FFmpeg development discussions and patches <ffmpeg- > devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix > handling of backslashes > > > > > -----Original Message----- > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of > > Oneric > > Sent: Friday, February 4, 2022 2:01 AM > > To: FFmpeg development discussions and patches <ffmpeg- > > devel@ffmpeg.org> > > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: > fix > > handling of backslashes > > > > On Thu, Feb 03, 2022 at 20:51:16 +0000, Soft Works wrote: > > > I think when you inject that word-joiner as a workaround for ass > > > parsing, you'll also need to make sure that it gets removed > > > when encoding to other formats. > > > > There's no way of knowing whether the word-joiner comes from > > a conversion performed by ffmpeg in the past or already existed > > in the original source. > > That might be true, but I think it's valid to say that such characters > are very unusual "original" subtitle sources and that's why I don't > think it's a good idea for ffmpeg to start injecting them. > > Subtitle implementations are often rather minimal, especially in > hardware devices and might not always cover the full range of > UTF-8 specifics. > > > However, the wordjoiner does not alter the visually appearance and > > is unlikely to change line-breaking properties; that's why I chose > > a word-joiner. Therefore I don't think removing (only) the inserted > > word-joiners is possible, > > Why not? As it seems to be required for ASS encoding only, all other > output formats should remain unaffected. > > > but also not necessary. > > I'm not sure whether all ffmpeg text-sub encoders can handle > those chars - which could be verified of course. > > But what remains is the question about the effect on end devices > which are consuming that output. > > Finally, those chars are a pest. I'm using them myself for a > specific use case, but when you don't know they are there, it can > drive you totally mad, eventually even thinking your system or > software is faulty. > > Example: > > Open your patch file [2/2] and search for the string > "123456\NAscending". You can see the string in two lines, but search > will only find one of them. > > Or just look at the two lines directly. They are preceded by + and - > even though both appear identical. > > > So, this also needs consideration of the consequences, like how > many developers (inside and outside of ffmpeg) this would be driving > nuts over the years and make them start hating ffmpeg for doing so > once they've found out. As I really hate how many devs on this ML keep saying 'no' to submitted code without having a better suggestion, assuming that this is all that it takes, I don't want to assimilate in this regard. Hence I want to propose the following solution: First of all, the existing code in ff_ass_bprint_text_event() is totally wrong already. Not only with regard to the backslash escaping (as you have already pointed out), but also the curly brace escaping is invalid. There is no curly-brace escaping in ASS either. In fact it is impossible with ASS to display an opening curly brace followed by a closing curly brace at a subsequent position (each one alone may work depending on implementation). If it was about ASS alone, we might just drop those braces, so we could at least avoid the text in-between from being hidden (when outputting ASS), but ASS is also the internal ("uncompressed/raw") subtitle format in ffmpeg that is used for conversion (and subtitle filtering). So it would be hard-to-sell when curly braces would get lost when converting from one text-sub format to another with none of them even being ASS. What we need is to stop creating invalid ASS and at the same time ensure proper conversion of curly braces. How? We substitute them! And still, UTF-8 can come to the rescue. There are two suitable candidates for that: SMALL LEFT CURLY BRACKET (U+FE5B, Ps): ﹛ SMALL RIGHT CURLY BRACKET (U+FE5C, Pe): ﹜ FULLWIDTH LEFT CURLY BRACKET (U+FF5B, Ps): { FULLWIDTH RIGHT CURLY BRACKET (U+FF5D, Pe): } Substitution of curly braces with one of those will prevent ASS from treating any possible subtitle content as override code. What remains to be handled now is the backslash case. Now that we can be sure that we are never inside a sequence that ASS would consider an override code, only 3 cases are remaining where the backslash has a meaning in ASS dialog text: '\n', '\N' and '\h'. We can simply escape those sequences by inserting a (no-op) override code between the backslash and the char. Suitable for this is: {\r} This code resets inline styles, but since we are coming from plain text subs in ff_ass_bprint_text_event(), we know that we don't have any inline styles and it's a no-op to reset the style. Needless to say that we will of course change the substituted curly braces back to the regular ones at the encoding side for all but ASS. Remains the question what to do when encoding to ASS: We can either keep the alternate brace characters or just remove them (or maybe replace with square brackets). I'm not sure about that last point, but in total, this will be a clean solution without injecting any weird chars into the subtitle output, and it will fix multiple incorrect behaviors in the current implementation. Regards, softworkz _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-04 5:34 ` Soft Works @ 2022-02-04 5:59 ` Soft Works 2022-02-04 6:48 ` Soft Works 1 sibling, 0 replies; 28+ messages in thread From: Soft Works @ 2022-02-04 5:59 UTC (permalink / raw) To: FFmpeg development discussions and patches > -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Soft > Works > Sent: Friday, February 4, 2022 6:34 AM > To: FFmpeg development discussions and patches <ffmpeg- > devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix > handling of backslashes > > > > > -----Original Message----- > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of > Soft > > Works > > Sent: Friday, February 4, 2022 2:58 AM > > To: FFmpeg development discussions and patches <ffmpeg- > > devel@ffmpeg.org> > > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: > fix > > handling of backslashes > > > > > > > > > -----Original Message----- > > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of > > > Oneric > > > Sent: Friday, February 4, 2022 2:01 AM > > > To: FFmpeg development discussions and patches <ffmpeg- > > > devel@ffmpeg.org> > > > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: > > fix > > > handling of backslashes > > > > > > On Thu, Feb 03, 2022 at 20:51:16 +0000, Soft Works wrote: [..] > Needless to say that we will of course change the substituted curly > braces > back to the regular ones at the encoding side for all but ASS. > Remains the question what to do when encoding to ASS: We can either > keep the alternate brace characters or just remove them (or maybe > replace > with square brackets). > > I'm not sure about that last point The reason why I'm saying this is because we can't be sure that the font being used at the target would include glyphs for those alternate curly braces, and when it doesn't it would typically cause ugly replacement glyphs (usually a square outline) to be rendered. But to set the record straight: we're not talking about a degradation to current here. Right now this doesn't work at all (text inside curly braces being invisible) sw _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-04 5:34 ` Soft Works 2022-02-04 5:59 ` Soft Works @ 2022-02-04 6:48 ` Soft Works 2022-02-04 21:19 ` Oneric 1 sibling, 1 reply; 28+ messages in thread From: Soft Works @ 2022-02-04 6:48 UTC (permalink / raw) To: FFmpeg development discussions and patches > -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Soft > Works > Sent: Friday, February 4, 2022 6:34 AM > To: FFmpeg development discussions and patches <ffmpeg- > devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix > handling of backslashes > > > > > -----Original Message----- > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of > Soft > > Works > > Sent: Friday, February 4, 2022 2:58 AM > > To: FFmpeg development discussions and patches <ffmpeg- > > devel@ffmpeg.org> > > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: > fix > > handling of backslashes > > > > > > > > > -----Original Message----- > > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of > > > Oneric > > > Sent: Friday, February 4, 2022 2:01 AM > > > To: FFmpeg development discussions and patches <ffmpeg- > > > devel@ffmpeg.org> > > > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: > > fix > > > handling of backslashes > > > > > > On Thu, Feb 03, 2022 at 20:51:16 +0000, Soft Works wrote: > > > > I think when you inject that word-joiner as a workaround for ass > > > > parsing, you'll also need to make sure that it gets removed > > > > when encoding to other formats. > > > > > > There's no way of knowing whether the word-joiner comes from > > > a conversion performed by ffmpeg in the past or already existed > > > in the original source. > > > > That might be true, but I think it's valid to say that such > characters > > are very unusual "original" subtitle sources and that's why I don't > > think it's a good idea for ffmpeg to start injecting them. > > > > Subtitle implementations are often rather minimal, especially in > > hardware devices and might not always cover the full range of > > UTF-8 specifics. > > > > > However, the wordjoiner does not alter the visually appearance and > > > is unlikely to change line-breaking properties; that's why I chose > > > a word-joiner. Therefore I don't think removing (only) the > inserted > > > word-joiners is possible, > > > > Why not? As it seems to be required for ASS encoding only, all other > > output formats should remain unaffected. > > > > > but also not necessary. > > > > I'm not sure whether all ffmpeg text-sub encoders can handle > > those chars - which could be verified of course. > > > > But what remains is the question about the effect on end devices > > which are consuming that output. > > > > Finally, those chars are a pest. I'm using them myself for a > > specific use case, but when you don't know they are there, it can > > drive you totally mad, eventually even thinking your system or > > software is faulty. > > > > Example: > > > > Open your patch file [2/2] and search for the string > > "123456\NAscending". You can see the string in two lines, but search > > will only find one of them. > > > > Or just look at the two lines directly. They are preceded by + and - > > even though both appear identical. > > > > > > So, this also needs consideration of the consequences, like how > > many developers (inside and outside of ffmpeg) this would be driving > > nuts over the years and make them start hating ffmpeg for doing so > > once they've found out. > > As I really hate how many devs on this ML keep saying 'no' to > submitted > code without having a better suggestion, assuming that this is all > that > it takes, I don't want to assimilate in this regard. > > Hence I want to propose the following solution: > > First of all, the existing code in ff_ass_bprint_text_event() is > totally > wrong already. Not only with regard to the backslash escaping (as you > have already pointed out), but also the curly brace escaping is > invalid. > There is no curly-brace escaping in ASS either. > > In fact it is impossible with ASS to display an opening curly brace > followed > by a closing curly brace at a subsequent position (each one alone may > work > depending on implementation). > > If it was about ASS alone, we might just drop those braces, so we > could > at least avoid the text in-between from being hidden (when outputting > ASS), but ASS is also the internal ("uncompressed/raw") subtitle > format > in ffmpeg that is used for conversion (and subtitle filtering). > So it would be hard-to-sell when curly braces would get lost when > converting from one text-sub format to another with none of them > even being ASS. > > What we need is to stop creating invalid ASS and at the same time > ensure proper conversion of curly braces. How? We substitute them! > > And still, UTF-8 can come to the rescue. There are two suitable > candidates for that: > > SMALL LEFT CURLY BRACKET (U+FE5B, Ps): ﹛ > SMALL RIGHT CURLY BRACKET (U+FE5C, Pe): ﹜ > FULLWIDTH LEFT CURLY BRACKET (U+FF5B, Ps): { > FULLWIDTH RIGHT CURLY BRACKET (U+FF5D, Pe): } > > Substitution of curly braces with one of those will prevent ASS from > treating > any possible subtitle content as override code. > > What remains to be handled now is the backslash case. Now that we can > be sure > that we are never inside a sequence that ASS would consider an > override code, > only 3 cases are remaining where the backslash has a meaning in ASS > dialog > text: '\n', '\N' and '\h'. > > We can simply escape those sequences by inserting a (no-op) override > code > between the backslash and the char. Suitable for this is: {\r} > This code resets inline styles, but since we are coming from plain > text subs > in ff_ass_bprint_text_event(), we know that we don't have any inline > styles > and it's a no-op to reset the style. > > Needless to say that we will of course change the substituted curly > braces > back to the regular ones at the encoding side for all but ASS. > Remains the question what to do when encoding to ASS: We can either > keep the alternate brace characters or just remove them (or maybe > replace > with square brackets). > > I'm not sure about that last point, but in total, this will be a clean > solution > without injecting any weird chars into the subtitle output, and it > will fix > multiple incorrect behaviors in the current implementation. I've found out where the \{ and \} escaping has come from: libass They decided at some time to introduce this kind of escaping which is actually incompatible with normal ASS syntax and libass specific: https://github.com/libass/libass/issues/194#issuecomment-352213210 This doesn't mean though, that the ffmpeg internal ASS format needs to follow the libass route in this regard. It only matters for the libass output encoder, because \{\r}N is broken by that libass decision, so for this case, we'll need a different way. sw _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-04 6:48 ` Soft Works @ 2022-02-04 21:19 ` Oneric 2022-02-04 22:23 ` Soft Works 0 siblings, 1 reply; 28+ messages in thread From: Oneric @ 2022-02-04 21:19 UTC (permalink / raw) To: FFmpeg development discussions and patches On Fri, Feb 04, 2022 at 06:48:40 +0000, Soft Works wrote: > > [two-part message removed for brevity] > > I've found out where the \{ and \} escaping has come from: libass As already written in the commit-message of the first patch.. You already noticed your proposal only works with VSFilters, but even without this it's a terrible approach. Consider: - fullwidth characters have different metrics then the "regular" ones - fullwidth and small characters have a different visual appearance - support for fullwidth and small characters in fonts is much rarer than support for plain {} - fullwidth characters are commonly used _as fullwidth characters_ e.g. in text using one of the CJK writing systems. Replacing them with non-fullwidth variants when transforming away from ASS is guaranteed to be disastrous. - Not sure if applies, but something to keep in mind: {\r} is not a noop if the source-format had any sort of per-event styling which got prepended to the ASS event text before using the plain-text conversion for the rest of the event. As noted in the discussion of the libass issue you linked, it’s not unusual for ASS subtitle authors to employ fullwidth curly braces for displaying curly braces in all ASS-renderers. However, they have tight control over the fonts used and can carefully select them to match the visual appearance and compensate differing metrics with bespoke local adjustments to \fs and negative \fsp. ffmpeg does not have tight control over the fonts and it'd be silly to require users to pass in special fonts just to render curly braces. If you want to make the rendering in VSFilters not complettly broken, try to do what the libass-wiki recommends and add an empty command block after an escaped opening curly brace. This way VSFilters will display a lone backslash instead of a opening curly brace but otherwise work fine without swallowing up text. If done consistently closing curly braces won't need to be escaped at all anymore. However, such a VSFilter-compatibility change is unrelated to fixing the broken \\ escape which doesn't work anywhere. (Note that the wordjoiner doesn't have font or spacing issues as it’s defined to be invisible and zero-width. If a user supplies a special font like FontoXMLWhitespace¹ showing the word-joiner that's not a problem anymore but a deliberate choice) [1]: https://documentation.fontoxml.com/latest/whitespace-visualization-5b5c6b154c78#id-18b2211e-79cb-4b85-8390-1e54d0f45466 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes 2022-02-04 21:19 ` Oneric @ 2022-02-04 22:23 ` Soft Works 0 siblings, 0 replies; 28+ messages in thread From: Soft Works @ 2022-02-04 22:23 UTC (permalink / raw) To: FFmpeg development discussions and patches > -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of > Oneric > Sent: Friday, February 4, 2022 10:19 PM > To: FFmpeg development discussions and patches <ffmpeg- > devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix > handling of backslashes > > On Fri, Feb 04, 2022 at 06:48:40 +0000, Soft Works wrote: > > > [two-part message removed for brevity] > > > > I've found out where the \{ and \} escaping has come from: libass > > As already written in the commit-message of the first patch.. > > > You already noticed your proposal only works with VSFilters, > but even without this it's a terrible approach. Consider: > - fullwidth characters have different metrics then the "regular" ones > - fullwidth and small characters have a different visual appearance > - support for fullwidth and small characters in fonts is much rarer > than support for plain {} > - fullwidth characters are commonly used _as fullwidth characters_ > e.g. in text using one of the CJK writing systems. > Replacing them with non-fullwidth variants when transforming > away from ASS is guaranteed to be disastrous. > - Not sure if applies, but something to keep in mind: > {\r} is not a noop if the source-format had any sort of per-event > styling which got prepended to the ASS event text before > using the plain-text conversion for the rest of the event. No, this doesn't apply from what I've seen, but {} might still be preferable. > As noted in the discussion of the libass issue you linked, > it’s not unusual for ASS subtitle authors to employ > fullwidth curly braces for displaying curly braces > in all ASS-renderers. However, they have tight control over the > fonts used and can carefully select them to match the visual > appearance and compensate differing metrics with bespoke > local adjustments to \fs and negative \fsp. > ffmpeg does not have tight control over the fonts and it'd be silly > to require users to pass in special fonts just to render curly braces. I basically agree to everything you say - except that there's a little misunderstanding. Maybe I haven't explained well enough. We use ASS as the internal format to which all text subtitles are decoded and from which all text subtitles are encoded for output and for the upcoming subtitle filtering it's also the one and only format for text subtitles. BUT: That does not necessarily mean that the internally used ASS must be exactly the same that we're outputting when encoding to ASS. And that's why we need to consider this as two separate steps. It would also be possible to have options at the ass encoder to control the compatibility of the encoded ASS output. That internal ASS format already has some quirks that some had introduced in order to achieve certain things when other subtitle formats are involved at the input and at the output. That's why we should not continue adding one workaround on top of another but try to clean those things up instead. With your submission, you are actually pointing at a core point of evil: the escaping of braces in combination with the backslash logic introduces an unresolvable ambiguity. And when we don't clean that up, it won't be possible to get on a sane path. > If you want to make the rendering in VSFilters not complettly broken, > try to do what the libass-wiki recommends and add an empty command > block after an escaped opening curly brace. This way VSFilters > will display a lone backslash instead of a opening curly brace > but otherwise work fine without swallowing up text. > If done consistently closing curly braces won't need to be > escaped at all anymore. > However, such a VSFilter-compatibility change is unrelated to > fixing the broken \\ escape which doesn't work anywhere. see above, I'm not into changing ffmpeg's ass output, it's all about the internally used ass format and the escaping is a central problem there. > (Note that the wordjoiner doesn't have font or spacing issues as > it’s defined to be invisible and zero-width. Yes, and the use of this has already created issues, even in libass: https://github.com/libass/libass/issues/507 So it's very likely to cause issues in other implementations as well, and not many are developed as actively as libass. (and even that doesn't help when you don't get an update for your device/software). I wouldn't want to close like that, but I'm getting distracted right now. Will get back later. softworkz _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2022-02-06 1:38 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-01-16 18:16 [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes Oneric 2022-01-16 18:16 ` [FFmpeg-devel] [PATCH 2/2] avcodec/webvttdec: honour bidi marks Oneric 2022-02-01 17:38 ` [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes Oneric 2022-02-01 19:44 ` Soft Works 2022-02-01 20:06 ` Oneric 2022-02-01 20:41 ` Soft Works 2022-02-01 23:25 ` Oneric 2022-02-02 4:44 ` Soft Works 2022-02-02 17:03 ` Oneric 2022-02-02 22:18 ` Soft Works 2022-02-02 22:44 ` Soft Works 2022-02-03 2:11 ` Oneric 2022-02-03 20:51 ` Soft Works 2022-02-04 1:01 ` Oneric 2022-02-04 1:30 ` Andreas Rheinhardt 2022-02-04 21:52 ` Oneric 2022-02-04 23:24 ` Soft Works 2022-02-05 1:20 ` Oneric 2022-02-05 2:08 ` Soft Works 2022-02-05 21:59 ` Oneric 2022-02-06 1:08 ` Soft Works 2022-02-06 1:37 ` Soft Works 2022-02-04 1:57 ` Soft Works 2022-02-04 5:34 ` Soft Works 2022-02-04 5:59 ` Soft Works 2022-02-04 6:48 ` Soft Works 2022-02-04 21:19 ` Oneric 2022-02-04 22:23 ` Soft Works
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git