* Re: [FFmpeg-devel] [PATCH] Percent-encode URL paths in HLS playlists
2025-05-23 14:53 [FFmpeg-devel] [PATCH] Percent-encode URL paths in HLS playlists Timothy Allen via ffmpeg-devel
@ 2025-05-23 20:31 ` Marton Balint
2025-05-23 21:05 ` softworkz .
2025-05-23 21:36 ` softworkz .
2025-05-24 16:32 ` Rémi Denis-Courmont
2 siblings, 1 reply; 5+ messages in thread
From: Marton Balint @ 2025-05-23 20:31 UTC (permalink / raw)
To: Timothy Allen via ffmpeg-devel
On Fri, 23 May 2025, Timothy Allen via ffmpeg-devel wrote:
> This commit closes trac ticket 10679.
And what if the M3U8 is referencing absolute URLs with scheme? Or is using
a data URI? Or if it is a local DOS path? What if the URL is already
percent encoded?
This issue is not trivial to fix, so I really meant it when I wrote that
its worth checking what heuristics other software is using for this issue.
Thanks,
Marton
>
> Signed-off-by: Timothy Allen <tim@treehouse.org.za>
> ---
> libavformat/hls.c | 9 +++++++++
> libavformat/url.c | 35 +++++++++++++++++++++++++++++++++++
> libavformat/url.h | 9 +++++++++
> 3 files changed, 53 insertions(+)
>
> diff --git a/libavformat/hls.c b/libavformat/hls.c
> index c7b655c83c..49436d8184 100644
> --- a/libavformat/hls.c
> +++ b/libavformat/hls.c
> @@ -1021,6 +1021,15 @@ static int parse_playlist(HLSContext *c, const char *url,
> seg->key = NULL;
> }
>
> + ff_percent_encode_url(tmp_str, sizeof(tmp_str), line);
> + if (!tmp_str[0]) {
> + ret = AVERROR_INVALIDDATA;
> + if (seg->key)
> + av_free(seg->key);
> + av_free(seg);
> + goto fail;
> + }
> + strcpy(line, av_strdup(tmp_str));
> ff_make_absolute_url(tmp_str, sizeof(tmp_str), url, line);
> if (!tmp_str[0]) {
> ret = AVERROR_INVALIDDATA;
> diff --git a/libavformat/url.c b/libavformat/url.c
> index d5dd6a4666..69d68d4248 100644
> --- a/libavformat/url.c
> +++ b/libavformat/url.c
> @@ -324,6 +324,41 @@ int ff_make_absolute_url(char *buf, int size, const char *base,
> return ff_make_absolute_url2(buf, size, base, rel, HAVE_DOS_PATHS);
> }
>
> +int ff_percent_encode_url(char *buf, int size, const char *url)
> +{
> + const char *hex = "0123456789abcdef";
> +
> + av_assert0(url);
> +
> + int p = 0;
> + int len = strlen(url);
> + for (int i = 0; i < len; i++) {
> + if (i + 1 < len && ':' == url[i] && '/' == url[i+1])
> + buf[p++] = url[i];
> + else if (('a' <= url[i] && url[i] <= 'z')
> + || ('A' <= url[i] && url[i] <= 'Z')
> + || ('0' <= url[i] && url[i] <= '9')
> + || '/' == url[i] || '.' == url[i] || '~' == url[i]
> + || '-' == url[i] || '_' == url[i] || '+' == url[i]
> + || '?' == url[i] || '=' == url[i] || '&' == url[i]
> + || '#' == url[i])
> + {
> + buf[p++] = url[i];
> + } else if (' ' == url[i])
> + {
> + buf[p++] = '+';
> + } else
> + {
> + buf[p++] = '%';
> + buf[p++] = hex[url[i] >> 4];
> + buf[p++] = hex[url[i] & 15];
> + }
> + }
> + buf[p] = '\0';
> +
> + return 0;
> +}
> +
> AVIODirEntry *ff_alloc_dir_entry(void)
> {
> AVIODirEntry *entry = av_mallocz(sizeof(AVIODirEntry));
> diff --git a/libavformat/url.h b/libavformat/url.h
> index 0784d77b64..85f94f06ce 100644
> --- a/libavformat/url.h
> +++ b/libavformat/url.h
> @@ -331,6 +331,15 @@ int ff_make_absolute_url2(char *buf, int size, const char *base,
> int ff_make_absolute_url(char *buf, int size, const char *base,
> const char *rel);
>
> +/**
> + * Percent-encode a URL, to avoid it being incorrectly parsed.
> + *
> + * @param buf the buffer where output absolute url is written
> + * @param size the size of buf
> + * @param url the url to encode
> + */
> +int ff_percent_encode_url(char *buf, int size, const char *url);
> +
> /**
> * Allocate directory entry with default values.
> *
> --
> 2.43.0
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH] Percent-encode URL paths in HLS playlists
2025-05-23 20:31 ` Marton Balint
@ 2025-05-23 21:05 ` softworkz .
0 siblings, 0 replies; 5+ messages in thread
From: softworkz . @ 2025-05-23 21:05 UTC (permalink / raw)
To: FFmpeg development discussions and patches
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Marton
> Balint
> Sent: Freitag, 23. Mai 2025 22:31
> To: Timothy Allen via ffmpeg-devel <ffmpeg-devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH] Percent-encode URL paths in HLS playlists
>
>
>
> On Fri, 23 May 2025, Timothy Allen via ffmpeg-devel wrote:
>
> > This commit closes trac ticket 10679.
>
> And what if the M3U8 is referencing absolute URLs with scheme? Or is using
> a data URI? Or if it is a local DOS path? What if the URL is already
> percent encoded?
>
> This issue is not trivial to fix, so I really meant it when I wrote that
> its worth checking what heuristics other software is using for this issue.
>
> Thanks,
> Marton
I agree with Marton, it cannot be done that way.
What I had actually in mind was a more targeted approach:
1. If the playlist has absolute URLs
=> leave them alone - we can assume they must be valid
2. If the playlist has relative URLs
=> don't touch the base URL. It is already working+
(otherwise we couldn't have loaded the playlist)
=> only percent-encode the url fragment from the playlist
For Marton's additional point about the relative URL being URL-encoded
already, you could check for a list of chars that are disallowed in URLs
(excepting percent sign). Then URL-encode only when such char is present,
as that means it's not URL-encoded yet.
The edge case of a percent-sign needing to be escaped is probably
negligible.
Best,
sw
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH] Percent-encode URL paths in HLS playlists
2025-05-23 14:53 [FFmpeg-devel] [PATCH] Percent-encode URL paths in HLS playlists Timothy Allen via ffmpeg-devel
2025-05-23 20:31 ` Marton Balint
@ 2025-05-23 21:36 ` softworkz .
2025-05-24 16:32 ` Rémi Denis-Courmont
2 siblings, 0 replies; 5+ messages in thread
From: softworkz . @ 2025-05-23 21:36 UTC (permalink / raw)
To: FFmpeg development discussions and patches; +Cc: Timothy Allen
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Timothy
> Allen via ffmpeg-devel
> Sent: Freitag, 23. Mai 2025 16:53
> To: ffmpeg-devel@ffmpeg.org
> Cc: Timothy Allen <tim@treehouse.org.za>
> Subject: [FFmpeg-devel] [PATCH] Percent-encode URL paths in HLS playlists
>
> This commit closes trac ticket 10679.
>
> Signed-off-by: Timothy Allen <tim@treehouse.org.za>
> ---
> libavformat/hls.c | 9 +++++++++
> libavformat/url.c | 35 +++++++++++++++++++++++++++++++++++
> libavformat/url.h | 9 +++++++++
> 3 files changed, 53 insertions(+)
>
> diff --git a/libavformat/hls.c b/libavformat/hls.c
> index c7b655c83c..49436d8184 100644
> --- a/libavformat/hls.c
> +++ b/libavformat/hls.c
> @@ -1021,6 +1021,15 @@ static int parse_playlist(HLSContext *c, const char
> *url,
> seg->key = NULL;
> }
>
> + ff_percent_encode_url(tmp_str, sizeof(tmp_str), line);
> + if (!tmp_str[0]) {
> + ret = AVERROR_INVALIDDATA;
> + if (seg->key)
> + av_free(seg->key);
> + av_free(seg);
> + goto fail;
> + }
> + strcpy(line, av_strdup(tmp_str));
> ff_make_absolute_url(tmp_str, sizeof(tmp_str), url, line);
> if (!tmp_str[0]) {
> ret = AVERROR_INVALIDDATA;
> diff --git a/libavformat/url.c b/libavformat/url.c
> index d5dd6a4666..69d68d4248 100644
> --- a/libavformat/url.c
> +++ b/libavformat/url.c
> @@ -324,6 +324,41 @@ int ff_make_absolute_url(char *buf, int size, const char
> *base,
> return ff_make_absolute_url2(buf, size, base, rel, HAVE_DOS_PATHS);
> }
>
> +int ff_percent_encode_url(char *buf, int size, const char *url)
> +{
> + const char *hex = "0123456789abcdef";
> +
> + av_assert0(url);
> +
> + int p = 0;
> + int len = strlen(url);
> + for (int i = 0; i < len; i++) {
What about the slash:
> + if (i + 1 < len && ':' == url[i] && '/' == url[i+1])
> + buf[p++] = url[i];
> + else if (('a' <= url[i] && url[i] <= 'z')
> + || ('A' <= url[i] && url[i] <= 'Z')
> + || ('0' <= url[i] && url[i] <= '9')
> + || '/' == url[i] || '.' == url[i] || '~' == url[i]
> + || '-' == url[i] || '_' == url[i] || '+' == url[i]
> + || '?' == url[i] || '=' == url[i] || '&' == url[i]
> + || '#' == url[i])
> + {
> + buf[p++] = url[i];
> + } else if (' ' == url[i])
> + {
Space must be %20. The plus sign is only used for encoding form fields.
> + buf[p++] = '+';
> + } else
> + {
Check buffer size:
> + buf[p++] = '%';
Use HEX for upper case:
> + buf[p++] = hex[url[i] >> 4];
> + buf[p++] = hex[url[i] & 15];
> + }
> + }
Check buffer size:
> + buf[p] = '\0';
> +
> + return 0;
> +}
> +
Please also check behavior with signed/unsigned char, I'm not sure
whether the shifts are always right.
As a tip: using AVBPrint for building the string might be easier.
HLS URLs can be pretty long sometimes.
Regards
sw
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH] Percent-encode URL paths in HLS playlists
2025-05-23 14:53 [FFmpeg-devel] [PATCH] Percent-encode URL paths in HLS playlists Timothy Allen via ffmpeg-devel
2025-05-23 20:31 ` Marton Balint
2025-05-23 21:36 ` softworkz .
@ 2025-05-24 16:32 ` Rémi Denis-Courmont
2 siblings, 0 replies; 5+ messages in thread
From: Rémi Denis-Courmont @ 2025-05-24 16:32 UTC (permalink / raw)
To: ffmpeg-devel
Le perjantaina 23. toukokuuta 2025, 17.53.02 Itä-Euroopan kesäaika Timothy
Allen via ffmpeg-devel a écrit :
> This commit closes trac ticket 10679.
>
> Signed-off-by: Timothy Allen <tim@treehouse.org.za>
> ---
> libavformat/hls.c | 9 +++++++++
> libavformat/url.c | 35 +++++++++++++++++++++++++++++++++++
> libavformat/url.h | 9 +++++++++
> 3 files changed, 53 insertions(+)
>
> diff --git a/libavformat/hls.c b/libavformat/hls.c
> index c7b655c83c..49436d8184 100644
> --- a/libavformat/hls.c
> +++ b/libavformat/hls.c
> @@ -1021,6 +1021,15 @@ static int parse_playlist(HLSContext *c, const char
> *url, seg->key = NULL;
> }
>
> + ff_percent_encode_url(tmp_str, sizeof(tmp_str), line);
> + if (!tmp_str[0]) {
> + ret = AVERROR_INVALIDDATA;
> + if (seg->key)
> + av_free(seg->key);
> + av_free(seg);
> + goto fail;
> + }
> + strcpy(line, av_strdup(tmp_str));
> ff_make_absolute_url(tmp_str, sizeof(tmp_str), url, line);
> if (!tmp_str[0]) {
> ret = AVERROR_INVALIDDATA;
> diff --git a/libavformat/url.c b/libavformat/url.c
> index d5dd6a4666..69d68d4248 100644
> --- a/libavformat/url.c
> +++ b/libavformat/url.c
> @@ -324,6 +324,41 @@ int ff_make_absolute_url(char *buf, int size, const
> char *base, return ff_make_absolute_url2(buf, size, base, rel,
> HAVE_DOS_PATHS); }
>
> +int ff_percent_encode_url(char *buf, int size, const char *url)
> +{
> + const char *hex = "0123456789abcdef";
> +
> + av_assert0(url);
> +
> + int p = 0;
> + int len = strlen(url);
> + for (int i = 0; i < len; i++) {
> + if (i + 1 < len && ':' == url[i] && '/' == url[i+1])
> + buf[p++] = url[i];
> + else if (('a' <= url[i] && url[i] <= 'z')
> + || ('A' <= url[i] && url[i] <= 'Z')
> + || ('0' <= url[i] && url[i] <= '9')
> + || '/' == url[i] || '.' == url[i] || '~' == url[i]
> + || '-' == url[i] || '_' == url[i] || '+' == url[i]
> + || '?' == url[i] || '=' == url[i] || '&' == url[i]
> + || '#' == url[i])
> + {
> + buf[p++] = url[i];
> + } else if (' ' == url[i])
> + {
> + buf[p++] = '+';
> + } else
> + {
> + buf[p++] = '%';
> + buf[p++] = hex[url[i] >> 4];
> + buf[p++] = hex[url[i] & 15];
> + }
> + }
> + buf[p] = '\0';
> +
> + return 0;
> +}
> +
Percent-encoding is an algorithm that applies to individual URL fragments, and
then it only makes sense for some types of fragments (path fragments and
request parameters). You cannot apply it a whole URL. A whole URL is by
*definition* already encoded. A URL simply cannot exist in a not encoded form,
be it relative or absolute.
Applying an encoding algorithm meant for fragment on the URL is bound to
corrupt some legal URLs. An obvious example is if the URL already contains
percentage signs.
If you need to deal with corrupt URLs or with file paths mistaken for URLs,
then you first need to ensure that the input is not a valid URL. Only then, you
can try to apply some heuristics to "fix" it, but systematic percent-encoding
only makes sense for host-relative and path-relative URLs, not for absolute
and scheme-relative URls.
So this systematic algorithm is just plain wrong. And for that matter,
strictly speaking, percent-encoding also encodes colons, brackets, slashes,
hashes, etc (which goes back to the point about URL fragments vs URLs).
--
ヅニ-クーモン・レミ
Tapio's place new town, former Finnish Republic of Uusimaa
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread