Re: [FFmpeg-devel] [PATCH] avformat/mov: (v4) fix get_eia608_packet

From: Pavel Koshevoy <pkoshevoy@gmail.com>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH] avformat/mov: (v4) fix get_eia608_packet
Date: Sat, 15 Feb 2025 10:25:44 -0700
Message-ID: <CAJgjuoyekkiHtf4y1EYLHpDgpFh-kKH55k+KbHSDM_A_cnoOnQ@mail.gmail.com> (raw)
In-Reply-To: <CAJgjuoyrjHH6SE0YXLnUM1O19B7yQ2eAeV5qz+imOHLP0C-9vg@mail.gmail.com>

On Fri, Feb 14, 2025 at 5:30 AM Pavel Koshevoy <pkoshevoy@gmail.com> wrote:

>
>
> On Thu, Feb 13, 2025, 22:04 Andreas Rheinhardt <
> andreas.rheinhardt@outlook.com> wrote:
>
>> Pavel Koshevoy:
>> > The problem is reproducible with "Test for Quicktime 608 CC file.mov"
>> > from https://samples.ffmpeg.org/MPEG2/subcc/
>> >
>> > ffmpeg -i "Test for Quicktime 608 CC file.mov" -map 0 -c copy -y
>> remuxed.mov
>> >
>> > Prior to the fix QuickTime Player playback of remuxed.mov would
>> > render garbage text for "English CC" subtitles.
>>
>> Is remuxing necessary for there being garbage?
>>
>
> The original file displays correct English CC text in QuickTime Player,
> and the remuxed file (prior to the fix) does not.
>
>
>
>> > ---
>> >  libavformat/mov.c | 70 +++++++++++++++++++++++++++++++++++++++--------
>> >  1 file changed, 59 insertions(+), 11 deletions(-)
>> >
>> > diff --git a/libavformat/mov.c b/libavformat/mov.c
>> > index 85aef33b19..5a91ef5b8c 100644
>> > --- a/libavformat/mov.c
>> > +++ b/libavformat/mov.c
>> > @@ -10788,25 +10788,73 @@ static int mov_change_extradata(AVStream *st,
>> AVPacket *pkt)
>> >      return 0;
>> >  }
>> >
>> > -static int get_eia608_packet(AVIOContext *pb, AVPacket *pkt, int size)
>> > +static int get_eia608_packet(AVIOContext *pb, AVPacket *pkt, int
>> src_size)
>> >  {
>> > -    int new_size, ret;
>> > +    /* We can't make assumptions about the structure of the payload,
>> > +       because it may include multiple cdat and cdt2 samples. */
>> > +    const uint32_t cdat = AV_RB32("cdat");
>> > +    const uint32_t cdt2 = AV_RB32("cdt2");
>>
>> I don't think that using (non-variable) variables for these improves
>> clarity (e.g. it means that the definition of the actual values used for
>> the comparisons below is now further away from its use). Why not simply
>> use MKBETAG('c','d','a','t') below?
>>
>
>
> That is a matter of personal preference.  I personally find "cdat" more
> readable (and searchable) than any MKBETAG.
>
>
>
>> > +    int ret, out_size = 0;
>> >
>> > -    if (size <= 8)
>> > +    /* a valid payload must have size, 4cc, and at least 1 byte pair:
>> */
>> > +    if (src_size < 10)
>> >          return AVERROR_INVALIDDATA;
>> > -    new_size = ((size - 8) / 2) * 3;
>> > -    ret = av_new_packet(pkt, new_size);
>> > +
>> > +    /* avoid an int overflow: */
>> > +    if ((src_size - 8) / 2 >= INT_MAX / 3)
>> > +        return AVERROR_INVALIDDATA;
>> > +
>> > +    ret = av_new_packet(pkt, ((src_size - 8) / 2) * 3);
>> >      if (ret < 0)
>> >          return ret;
>> >
>> > -    avio_skip(pb, 8);
>> > -    for (int j = 0; j < new_size; j += 3) {
>> > -        pkt->data[j] = 0xFC;
>> > -        pkt->data[j+1] = avio_r8(pb);
>> > -        pkt->data[j+2] = avio_r8(pb);
>> > +    /* parse and re-format the c608 payload in one pass. */
>> > +    while (src_size >= 10) {
>> > +        const uint32_t atom_size = avio_rb32(pb);
>> > +        const uint32_t atom_type = avio_rb32(pb);
>> > +        const uint32_t data_size = atom_size - 8;
>>
>> This may wrap around (if atom_size is < 8). If int is 32 bits, then the
>> data_size > src_size check will catch this, but in case of 64 bit ints
>> it may not. Relying on (unsigned, defined) integer wraparound should be
>> avoided unless it is advantageous to use it; in this case, this is just
>> not true: Just compare atom_size to 10 below.
>>
>
> I fully expect the size of uint32_t to be 32 bits, on any platform.  It
> should be a compile time assertio n, but that is outside the scope of this
> fix.  The name of the data type says it's 32 bit long, so it must be so.
>
>
>
>> > +        const uint8_t cc_field =
>> > +            atom_type == cdat ? 1 :
>> > +            atom_type == cdt2 ? 2 :
>> > +            0;
>> > +
>> > +        /* account for bytes consumed for atom size and type. */
>> > +        src_size -= 8;
>> > +
>> > +        /* make sure the data size stays within the buffer boundaries.
>> */
>> > +        if (data_size < 2 || data_size > src_size) {
>> > +            ret = AVERROR_INVALIDDATA;
>> > +            break;
>> > +        }
>> > +
>> > +        /* make sure the data size is consistent with N byte pairs. */
>> > +        if (data_size % 2 != 0) {
>>
>> We typically try to avoid redundant "!= 0".
>>
>
> Again, this is a matter of personal preference.  If you would prefer to
> tweak the patch to suit your personal preference before merging -- you are
> free to do so, but I don't think it's a valid reason to delay a fix for a
> parser that has been mis-parsing well-formed files for the past 5 years.
>
>
>
>> > +            ret = AVERROR_INVALIDDATA;
>> > +            break;
>> > +        }
>> > +
>> > +        if (!cc_field) {
>> > +            /* neither cdat or cdt2 ... skip it */
>> > +            avio_skip(pb, data_size);
>> > +            src_size -= data_size;
>> > +            continue;
>> > +        }
>> > +
>> > +        for (int32_t i = 0; i < data_size; i += 2) {
>>
>> int32_t? Why signed? (And why use a separate loop counter at all? Simply
>> decrement data_size by 2 in each iteration.
>>
>
> Please feel free to make additional improvements to whatever fix you
> decide to merge.
>
>
>
>> > +            pkt->data[out_size] = (0x1F << 3) | (1 << 2) | (cc_field -
>> 1);
>> > +            pkt->data[out_size + 1] = avio_r8(pb);
>> > +            pkt->data[out_size + 2] = avio_r8(pb);
>> > +            out_size += 3;
>> > +            src_size -= 2;
>> > +        }
>> >      }
>> >
>> > -    return 0;
>> > +    if (src_size > 0)
>> > +        /* skip any remaining unread portion of the input payload */
>> > +        avio_skip(pb, src_size);
>> > +
>> > +    av_shrink_packet(pkt, out_size);
>> > +    return ret;
>> >  }
>> >
>> >  static int mov_finalize_packet(AVFormatContext *s, AVStream *st,
>> AVIndexEntry *sample,
>>
>> Generally, I believe that reading the input into pkt->data[size / 2]
>> would be advantageous: It would make it simple to check for EOF and I/O
>> errors (notice that the avio_r* reads above are unchecked) and would
>> read the data in one go, avoiding all the avio_skip().
>>
>> - Andreas
>>
>>
>>
I've created a bug report for this issue, with screenshots demonstrating
the problem and the fix
https://trac.ffmpeg.org/ticket/11470

Pavel.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".