Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
To: ffmpeg-devel@ffmpeg.org
Subject: Re: [FFmpeg-devel] [PATCH v2] avformat/dhav: fix backward scanning for get_duration and optimize seeking
Date: Tue, 20 May 2025 21:18:22 +0200
Message-ID: <AS8P250MB07449F0EB8A33C927B9F5B228F9FA@AS8P250MB0744.EURP250.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <20250520152717.15343-1-derek.buitenhuis@gmail.com>

Derek Buitenhuis:
> From: Justin Ruggles <justinr@vimeo.com>
> 
> The backwards scanning done for incomplete final packets should not
> assume a specific alignment at the end of the file. Truncated files
> result in hundreds of thousands of seeks if the final packet does not
> fall on a specific byte boundary, which can be extremely slow.
> For example, with HTTP, each backwards seek results in a separate
> HTTP request.
> 
> This changes the scanning to check for the end tag 1 byte at a time
> and buffers the last 1 MiB using ffio_ensure_seekback to avoid additional
> seek operations.
> 
> Signed-off-by: Justin Ruggles <justinr@vimeo.com>
> Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
> ---
> Addresses Andreas' comments from the previous round.
> ---
>  libavformat/dhav.c | 48 +++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 37 insertions(+), 11 deletions(-)
> 
> diff --git a/libavformat/dhav.c b/libavformat/dhav.c
> index b2ead99609..27e2f4db23 100644
> --- a/libavformat/dhav.c
> +++ b/libavformat/dhav.c
> @@ -22,6 +22,7 @@
>  
>  #include <time.h>
>  
> +#include "libavutil/intreadwrite.h"
>  #include "libavutil/mem.h"
>  #include "libavutil/parseutils.h"
>  #include "avio_internal.h"
> @@ -232,34 +233,59 @@ static void get_timeinfo(unsigned date, struct tm *timeinfo)
>      timeinfo->tm_sec  = sec;
>  }
>  
> +#define MAX_DURATION_BUFFER_SIZE (1024*1024)
> +
>  static int64_t get_duration(AVFormatContext *s)
>  {
>      DHAVContext *dhav = s->priv_data;
>      int64_t start_pos = avio_tell(s->pb);
> +    int64_t end_pos = -1;
>      int64_t start = 0, end = 0;
>      struct tm timeinfo;
> -    int max_interations = 100000;
> +    uint8_t *end_buffer;
> +    int64_t end_buffer_size;
> +    int64_t end_buffer_pos;
> +    int64_t offset;
> +    int ret;
>  
>      if (!s->pb->seekable)
>          return 0;
>  
> -    avio_seek(s->pb, avio_size(s->pb) - 8, SEEK_SET);
> -    while (avio_tell(s->pb) > 12 && max_interations--) {
> -        if (avio_rl32(s->pb) == MKTAG('d','h','a','v')) {
> -            int64_t seek_back = avio_rl32(s->pb);
> +    end_buffer_size = FFMIN(MAX_DURATION_BUFFER_SIZE, avio_size(s->pb));
> +    end_buffer = av_malloc(end_buffer_size);
> +    if (!end_buffer)
> +        return 0;
> +    end_buffer_pos = avio_size(s->pb) - end_buffer_size;
> +    avio_seek(s->pb, end_buffer_pos, SEEK_SET);
> +    ret = ffio_ensure_seekback(s->pb, end_buffer_size);
> +    if (ret < 0)
> +        return 0;
> +    avio_read(s->pb, end_buffer, end_buffer_size);
>  
> -            avio_seek(s->pb, -seek_back, SEEK_CUR);
> -            read_chunk(s);
> -            get_timeinfo(dhav->date, &timeinfo);
> -            end = av_timegm(&timeinfo) * 1000LL;
> +    offset = end_buffer_size - 8;
> +    while (offset > 0) {
> +        if (AV_RL32(end_buffer + offset) == MKTAG('d','h','a','v')) {
> +            int64_t seek_back = AV_RL32(end_buffer + offset + 4);
> +            end_pos = end_buffer_pos + offset - seek_back + 8;
>              break;
>          } else {
> -            avio_seek(s->pb, -12, SEEK_CUR);
> +            offset -= 9;
>          }
>      }
>  
> -    avio_seek(s->pb, start_pos, SEEK_SET);
> +    av_freep(&end_buffer);
> +
> +    if (end_pos < 0) {
> +        avio_seek(s->pb, start_pos, SEEK_SET);
> +        return 0;
> +    }
> +
> +    avio_seek(s->pb, end_pos, SEEK_SET);
> +    read_chunk(s);
> +    get_timeinfo(dhav->date, &timeinfo);
> +    end = av_timegm(&timeinfo) * 1000LL;

The only thing that you want from parsing the end is the end timestamp
and this is given by data alone. date is a 32bit le number at offset 16
from the start of the dhav tag; it can be read directly from the buffer,
so you do not need to seek again and use read_chunk to get the end.
It also means that your ffio_ensure_seekback() is unnecessary (unless
you wanted to ensure that the seek back to start_pos works (which is
currently not guaranteed at all and not ensured by your code and would
probably also be bad in general given that this would cache the whole
file in memory).
(The above is based on the presumption that we are not really interested
in what parse_ext() may parse in the chunk at the end; I don't know
whether this is true or not.)

- Andreas

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

  reply	other threads:[~2025-05-20 19:18 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-20 15:27 Derek Buitenhuis
2025-05-20 19:18 ` Andreas Rheinhardt [this message]
2025-05-21 13:23   ` Derek Buitenhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AS8P250MB07449F0EB8A33C927B9F5B228F9FA@AS8P250MB0744.EURP250.PROD.OUTLOOK.COM \
    --to=andreas.rheinhardt@outlook.com \
    --cc=ffmpeg-devel@ffmpeg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git