From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTP id 6E08943379
	for <ffmpegdev@gitmailbox.com>; Thu,  2 Jun 2022 10:31:22 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8798168B745;
	Thu,  2 Jun 2022 13:31:19 +0300 (EEST)
Received: from mail-yb1-f182.google.com (mail-yb1-f182.google.com
 [209.85.219.182])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6804C68B730
 for <ffmpeg-devel@ffmpeg.org>; Thu,  2 Jun 2022 13:31:13 +0300 (EEST)
Received: by mail-yb1-f182.google.com with SMTP id l204so7527271ybf.10
 for <ffmpeg-devel@ffmpeg.org>; Thu, 02 Jun 2022 03:31:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=mime-version:from:date:message-id:subject:to;
 bh=GaDVRajtUYAmVDyqIpyYBvcryFa+Kt7Y8AUv7Y+tJhg=;
 b=FrpyDwhhj23BHGFp1tvdhBVYp+bovGsWEOFZq/gCWKt918/GU9M21gkdtJPXVuVSDu
 d2SxOSir36ne+MXgzYyCTuHxOAspAsHS6r+QC9QRedNI5moQEAVqud8jzwLnRWyDuGih
 tsWZAbppI9a/V3AgLEOf9DnyqXh1jedT89WVeq0QmuNCkUbgB4/HKCOCyPiXa7ltlsMh
 i162gk6quObK7nLyt2ytXyC+zxpUKr76nE91VYx2+0P3fLvyqgA9AvVqZvN7n+bCiP67
 6NsJ6qYZyo9L6YwCgN2tuap7nyBLbpNMQMY1iBkZWaDGBU3c2BiozUYLOEpuD3Hc+krt
 Xv0w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
 bh=GaDVRajtUYAmVDyqIpyYBvcryFa+Kt7Y8AUv7Y+tJhg=;
 b=ltWxwpLQQulF7ulHv4GPuV2gYED/+SLOS8bJJmBkG2aGZSM4FJH6r0sZplEYjZceyG
 APcwld2OAGU7NCxRqWgLNbhu9g9yaSTPXjHZnYW9GdpDBmBTvQ0sVu+nLZtHg2163od7
 /wSwky/1wCCYKH2d6JGrwUoG2F4CxWxapESekJe/waa0YKHnvq9urZ98eQcV8TB+kWWN
 2zmUIfeXC09pRglBUsotVN2tZFaKctgCNCwkJknlqrsusCs+Or8bZsORJsn668vNK02p
 fIvyEI2W23W9G266IImhciy0zUsIM4i0zVmqQnnN/ZXZBlYvYn9jCWTamYKguhKtq7wl
 snoA==
X-Gm-Message-State: AOAM532P6q2nbbiNHwOBZXhiz6fkSUsq19xXbqBhQRmhhI7INN7yCxRh
 uwicH5gfiYleNijLHjk4XEghUmJggffUMiGMHuONOT0ksfs=
X-Google-Smtp-Source: ABdhPJzsV04vW4yUOcrldZmfnKUNoMEFm/E+oe0D6serAMJ8vxWVB4LYOL9dF7gKsaZrg6uQpDYs1F0aci5wpbhbuS8=
X-Received: by 2002:a25:8407:0:b0:65c:b9df:c272 with SMTP id
 u7-20020a258407000000b0065cb9dfc272mr4126312ybk.24.1654165871579; Thu, 02 Jun
 2022 03:31:11 -0700 (PDT)
MIME-Version: 1.0
From: =?UTF-8?B?0JHQvtC70YzRiNC+0Lkg0KfQtdC70L7QstC10Lo=?=
 <cheloveck2.0@gmail.com>
Date: Thu, 2 Jun 2022 16:29:40 +0600
Message-ID: <CAN1V-9AqoYbgOvfpOmE38ou6oVOK6HAf5kRvSKhosrURDXozYw@mail.gmail.com>
To: ffmpeg-devel@ffmpeg.org
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
Subject: [FFmpeg-devel] How to upsample and then encode audio
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/CAN1V-9AqoYbgOvfpOmE38ou6oVOK6HAf5kRvSKhosrURDXozYw@mail.gmail.com/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

 Basically after transcoding pcm_alaw 8khz to mp3 44.1khz, I can hear only
some brief or even swift sound in first 1-2 seconds, unrecognizable sound.
So something is wrong with pts/dts, packed to planar convertion, or
upsampling.

 My application does transcoding rtsp camera stream to file. Video and
audio. Video works fine and audio remuxing as well. Now I have pcm_alaw
8khz audio stream and want to transcode it to mp4 file along with video.

  Code is quite cumbersome to construct reproducible part, so firstly I
want to know if my logic is right. Here is my draft process (assume all
error are checked and handled):

create encoder:
```
    codec_ = avcodec_find_encoder(AV_CODEC_ID_MP3);

    enc_ctx_ = avcodec_alloc_context3(codec_);

    enc_ctx_->bit_rate = 64000;
    enc_ctx_->codec_type = AVMEDIA_TYPE_AUDIO;

    enc_ctx_->sample_fmt   = codec_->sample_fmts ? codec_->sample_fmts[0] :
AV_SAMPLE_FMT_S32P;

    // functions from here
https://www.ffmpeg.org/doxygen/4.1/encode_audio_8c-example.html
    enc_ctx_->sample_rate    = select_sample_rate(codec_);
    enc_ctx_->channel_layout = select_channel_layout(codec_);
    enc_ctx_->channels       =
av_get_channel_layout_nb_channels(enc_ctx_->channel_layout);
    enc_ctx_->time_base = (AVRational){1, enc_ctx_->sample_rate};
    enc_ctx_->strict_std_compliance = FF_COMPLIANCE_EXPERIMENTAL;

    if (is_global_header) {
        enc_ctx_->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
    }

    avcodec_open2(enc_ctx_, codec_, nullptr);
```

create resampler (in_frame):
```
    audio_fifo_ = av_audio_fifo_alloc(enc_ctx_->sample_fmt,
enc_ctx_->channels, 1));

    in_ch_layout_ = in_frame->channel_layout;
    in_sample_fmt = in_frame->format;
    in_sample_rate_ = in_frame->sample_rate;

    swr_ctx_ = swr_alloc_set_opts(NULL,                       // we're
allocating a new context
                             enc_ctx_->channel_layout,        //
out_ch_layout
                             enc_ctx_->sample_fmt,            //
out_sample_fmt
                             enc_ctx_->sample_rate,           //
out_sample_rate
                             in_frame->channel_layout,        //
in_ch_layout
                             (AVSampleFormat)in_frame->format, //
in_sample_fmt
                             in_frame->sample_rate,            //
in_sample_rate
                             0,                                // log_offset
                             NULL);                            // log_ctx

    swr_init(swr_ctx_);
```

resample (in_frame, start_pts, start_dts):
```
    auto resampled_frame = av_frame_alloc();

    auto dst_nb_samples = av_rescale_rnd(swr_get_delay(swr_ctx_,
in_frame->sample_rate) +
                                    in_frame->nb_samples,
enc_ctx_->sample_rate, in_frame->sample_rate, AV_ROUND_UP);

    // resampled_frame->nb_samples     = dst_nb_samples;
    resampled_frame->format         = enc_ctx_->sample_fmt;
    resampled_frame->channel_layout = enc_ctx_->channel_layout;
    // resampled_frame->channels       = enc_ctx_->channels;
    resampled_frame->sample_rate    = enc_ctx_->sample_rate;

    error = swr_convert_frame(swr_ctx_, resampled_frame, in_frame);

    /* Make the FIFO as large as it needs to be to hold both,
     * the old and the new samples. */
    if (av_audio_fifo_size(audio_fifo_) < dst_nb_samples) {
        av_audio_fifo_realloc(audio_fifo_, dst_nb_samples);
    }

    /* Store the new samples in the FIFO buffer. */
    auto nb_samples = av_audio_fifo_write(audio_fifo_,
                                          reinterpret_cast<void
**>(resampled_frame->extended_data),
                                          resampled_frame->nb_samples);


    int delay = 0;
    // trying to split resampled frame to desired chunks
    while (av_audio_fifo_size(audio_fifo_) > 0) {
        const int frame_size = FFMIN(av_audio_fifo_size(audio_fifo_),
enc_ctx_->frame_size);

        auto out_frame = av_frame_alloc();


        out_frame->nb_samples       = frame_size;
        out_frame->format           = enc_ctx_->sample_fmt;
        out_frame->channel_layout   = enc_ctx_->channel_layout;
        out_frame->channels         = enc_ctx_->channels;
        out_frame->sample_rate      = enc_ctx_->sample_rate;

        av_frame_get_buffer(out_frame, 0);

        av_audio_fifo_read(audio_fifo_, (void **)out_frame->data,
frame_size) < frame_size);

// ***** tried both cases
        out_frame->pts = in_frame->pts + delay;
        out_frame->pkt_dts = in_frame->pkt_dts + delay;
        // swr_next_pts(swr_ctx_, in_frame->pts) + delay;
        // swr_next_pts(swr_ctx_, in_frame->pkt_dts) + delay;

        result.push_back(out_frame);

        delay += frame_size;
    }

    return result;
```


encoding and muxing (in_frame):
```
bool DoesNeedResample(const AVFrame * in_frame) {
   assert(("DoesNeedResample: in_frame is empty", in_frame));
   assert(("DoesNeedResample: encoder is not started", is_init_));

   if (in_frame->sample_rate != enc_ctx_->sample_rate ||
in_frame->channel_layout != enc_ctx_->channel_layout ||
in_frame->channels != enc_ctx_->channels ||
in_frame->format != enc_ctx_->sample_fmt) {
return true;
   }

   return false;
}

    av_frame_make_writable(in_frame);


    streamserver::AVFrames encoding_frames;
    if (DoesNeedResample(in_frame)) {
        encoding_frames = Resample(in_frame,
        av_rescale_q(in_frame->pts, in_audio_stream_timebase_,
out_audio_stream_->time_base),
        av_rescale_q(in_frame->pkt_dts, in_audio_stream_timebase_,
out_audio_stream_->time_base));
    } else {
        encoding_frames.push_back(av_frame_clone(in_frame));
    }


    for (auto frame : encoding_frames) {
        if ((err = avcodec_send_frame(encoder_ctx, frame)) < 0) {
            AVFrameFree(&frame);
        }

        while (err >= 0) {
            pkt_->data = NULL;
            pkt_->size = 0;
            av_init_packet(pkt_);

            err = avcodec_receive_packet(encoder_ctx, pkt_);
            if (err == AVERROR(EAGAIN) || err == AVERROR_EOF) {
                break;
            } else if (err < 0) {
                break;
            }

            pkt_->stream_index = out_audio_stream_->index;

            av_interleaved_write_frame(ofmt_ctx_, pkt_);
        }

        av_packet_unref(pkt_);
    }
```

Sound in result video is corrupted, see first paragraph for description.

In https://www.ffmpeg.org/doxygen/4.1/transcode_aac_8c-example.html
there are lines:
```
        /*
        * Perform a sanity check so that the number of converted samples is
        * not greater than the number of samples to be converted.
        * If the sample rates differ, this case has to be handled
differently
        */
        av_assert0(output_codec_context->sample_rate ==
input_codec_context->sample_rate);
```

How to handle such cases? I tried to split resampled frames via fifo in
example above!
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".