From: Pavel Koshevoy <pkoshevoy@gmail.com>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH] avformat/mov: (v4) fix get_eia608_packet
Date: Fri, 14 Feb 2025 05:11:15 -0700
Message-ID: <CAJgjuozbQHeF3yBGhXPSt33WWQ6hONZnsa67NKhin7T6J2a-Ug@mail.gmail.com> (raw)
In-Reply-To: <AS8P250MB07444AE485058304292FE77D8FFE2@AS8P250MB0744.EURP250.PROD.OUTLOOK.COM>
On Thu, Feb 13, 2025, 22:04 Andreas Rheinhardt <
andreas.rheinhardt@outlook.com> wrote:
> Pavel Koshevoy:
> > The problem is reproducible with "Test for Quicktime 608 CC file.mov"
> > from https://samples.ffmpeg.org/MPEG2/subcc/
> >
> > ffmpeg -i "Test for Quicktime 608 CC file.mov" -map 0 -c copy -y
> remuxed.mov
> >
> > Prior to the fix QuickTime Player playback of remuxed.mov would
> > render garbage text for "English CC" subtitles.
>
> Is remuxing necessary for there being garbage?
>
> > ---
> > libavformat/mov.c | 70 +++++++++++++++++++++++++++++++++++++++--------
> > 1 file changed, 59 insertions(+), 11 deletions(-)
> >
> > diff --git a/libavformat/mov.c b/libavformat/mov.c
> > index 85aef33b19..5a91ef5b8c 100644
> > --- a/libavformat/mov.c
> > +++ b/libavformat/mov.c
> > @@ -10788,25 +10788,73 @@ static int mov_change_extradata(AVStream *st,
> AVPacket *pkt)
> > return 0;
> > }
> >
> > -static int get_eia608_packet(AVIOContext *pb, AVPacket *pkt, int size)
> > +static int get_eia608_packet(AVIOContext *pb, AVPacket *pkt, int
> src_size)
> > {
> > - int new_size, ret;
> > + /* We can't make assumptions about the structure of the payload,
> > + because it may include multiple cdat and cdt2 samples. */
> > + const uint32_t cdat = AV_RB32("cdat");
> > + const uint32_t cdt2 = AV_RB32("cdt2");
>
> I don't think that using (non-variable) variables for these improves
> clarity (e.g. it means that the definition of the actual values used for
> the comparisons below is now further away from its use). Why not simply
> use MKBETAG('c','d','a','t') below?
>
> > + int ret, out_size = 0;
> >
> > - if (size <= 8)
> > + /* a valid payload must have size, 4cc, and at least 1 byte pair: */
> > + if (src_size < 10)
> > return AVERROR_INVALIDDATA;
> > - new_size = ((size - 8) / 2) * 3;
> > - ret = av_new_packet(pkt, new_size);
> > +
> > + /* avoid an int overflow: */
> > + if ((src_size - 8) / 2 >= INT_MAX / 3)
> > + return AVERROR_INVALIDDATA;
> > +
> > + ret = av_new_packet(pkt, ((src_size - 8) / 2) * 3);
> > if (ret < 0)
> > return ret;
> >
> > - avio_skip(pb, 8);
> > - for (int j = 0; j < new_size; j += 3) {
> > - pkt->data[j] = 0xFC;
> > - pkt->data[j+1] = avio_r8(pb);
> > - pkt->data[j+2] = avio_r8(pb);
> > + /* parse and re-format the c608 payload in one pass. */
> > + while (src_size >= 10) {
> > + const uint32_t atom_size = avio_rb32(pb);
> > + const uint32_t atom_type = avio_rb32(pb);
> > + const uint32_t data_size = atom_size - 8;
>
> This may wrap around (if atom_size is < 8). If int is 32 bits, then the
> data_size > src_size check will catch this, but in case of 64 bit ints
> it may not. Relying on (unsigned, defined) integer wraparound should be
> avoided unless it is advantageous to use it; in this case, this is just
> not true: Just compare atom_size to 10 below.
>
> > + const uint8_t cc_field =
> > + atom_type == cdat ? 1 :
> > + atom_type == cdt2 ? 2 :
> > + 0;
> > +
> > + /* account for bytes consumed for atom size and type. */
> > + src_size -= 8;
> > +
> > + /* make sure the data size stays within the buffer boundaries.
> */
> > + if (data_size < 2 || data_size > src_size) {
> > + ret = AVERROR_INVALIDDATA;
> > + break;
> > + }
> > +
> > + /* make sure the data size is consistent with N byte pairs. */
> > + if (data_size % 2 != 0) {
>
> We typically try to avoid redundant "!= 0".
>
> > + ret = AVERROR_INVALIDDATA;
> > + break;
> > + }
> > +
> > + if (!cc_field) {
> > + /* neither cdat or cdt2 ... skip it */
> > + avio_skip(pb, data_size);
> > + src_size -= data_size;
> > + continue;
> > + }
> > +
> > + for (int32_t i = 0; i < data_size; i += 2) {
>
> int32_t? Why signed? (And why use a separate loop counter at all? Simply
> decrement data_size by 2 in each iteration.
>
> > + pkt->data[out_size] = (0x1F << 3) | (1 << 2) | (cc_field -
> 1);
> > + pkt->data[out_size + 1] = avio_r8(pb);
> > + pkt->data[out_size + 2] = avio_r8(pb);
> > + out_size += 3;
> > + src_size -= 2;
> > + }
> > }
> >
> > - return 0;
> > + if (src_size > 0)
> > + /* skip any remaining unread portion of the input payload */
> > + avio_skip(pb, src_size);
> > +
> > + av_shrink_packet(pkt, out_size);
> > + return ret;
> > }
> >
> > static int mov_finalize_packet(AVFormatContext *s, AVStream *st,
> AVIndexEntry *sample,
>
> Generally, I believe that reading the input into pkt->data[size / 2]
> would be advantageous: It would make it simple to check for EOF and I/O
> errors (notice that the avio_r* reads above are unchecked) and would
> read the data in one go, avoiding all the avio_skip().
>
> - Andreas
>
Then perhaps you would find v2 of the patch more agreeable to your taste,
could you review that instead?
This function has been corrupting closed captions since 2020. There was a
different fix posted in 2023 (mentioned by Devin in the 1st version of this
patch), perhaps that should be merged instead, as it also solves the
problem.
Pavel.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2025-02-14 12:11 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-13 21:22 Pavel Koshevoy
2025-02-14 5:04 ` Andreas Rheinhardt
2025-02-14 12:11 ` Pavel Koshevoy [this message]
2025-02-14 12:30 ` Pavel Koshevoy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJgjuozbQHeF3yBGhXPSt33WWQ6hONZnsa67NKhin7T6J2a-Ug@mail.gmail.com \
--to=pkoshevoy@gmail.com \
--cc=ffmpeg-devel@ffmpeg.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git