From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 6E08943379 for ; Thu, 2 Jun 2022 10:31:22 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8798168B745; Thu, 2 Jun 2022 13:31:19 +0300 (EEST) Received: from mail-yb1-f182.google.com (mail-yb1-f182.google.com [209.85.219.182]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6804C68B730 for ; Thu, 2 Jun 2022 13:31:13 +0300 (EEST) Received: by mail-yb1-f182.google.com with SMTP id l204so7527271ybf.10 for ; Thu, 02 Jun 2022 03:31:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:from:date:message-id:subject:to; bh=GaDVRajtUYAmVDyqIpyYBvcryFa+Kt7Y8AUv7Y+tJhg=; b=FrpyDwhhj23BHGFp1tvdhBVYp+bovGsWEOFZq/gCWKt918/GU9M21gkdtJPXVuVSDu d2SxOSir36ne+MXgzYyCTuHxOAspAsHS6r+QC9QRedNI5moQEAVqud8jzwLnRWyDuGih tsWZAbppI9a/V3AgLEOf9DnyqXh1jedT89WVeq0QmuNCkUbgB4/HKCOCyPiXa7ltlsMh i162gk6quObK7nLyt2ytXyC+zxpUKr76nE91VYx2+0P3fLvyqgA9AvVqZvN7n+bCiP67 6NsJ6qYZyo9L6YwCgN2tuap7nyBLbpNMQMY1iBkZWaDGBU3c2BiozUYLOEpuD3Hc+krt Xv0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=GaDVRajtUYAmVDyqIpyYBvcryFa+Kt7Y8AUv7Y+tJhg=; b=ltWxwpLQQulF7ulHv4GPuV2gYED/+SLOS8bJJmBkG2aGZSM4FJH6r0sZplEYjZceyG APcwld2OAGU7NCxRqWgLNbhu9g9yaSTPXjHZnYW9GdpDBmBTvQ0sVu+nLZtHg2163od7 /wSwky/1wCCYKH2d6JGrwUoG2F4CxWxapESekJe/waa0YKHnvq9urZ98eQcV8TB+kWWN 2zmUIfeXC09pRglBUsotVN2tZFaKctgCNCwkJknlqrsusCs+Or8bZsORJsn668vNK02p fIvyEI2W23W9G266IImhciy0zUsIM4i0zVmqQnnN/ZXZBlYvYn9jCWTamYKguhKtq7wl snoA== X-Gm-Message-State: AOAM532P6q2nbbiNHwOBZXhiz6fkSUsq19xXbqBhQRmhhI7INN7yCxRh uwicH5gfiYleNijLHjk4XEghUmJggffUMiGMHuONOT0ksfs= X-Google-Smtp-Source: ABdhPJzsV04vW4yUOcrldZmfnKUNoMEFm/E+oe0D6serAMJ8vxWVB4LYOL9dF7gKsaZrg6uQpDYs1F0aci5wpbhbuS8= X-Received: by 2002:a25:8407:0:b0:65c:b9df:c272 with SMTP id u7-20020a258407000000b0065cb9dfc272mr4126312ybk.24.1654165871579; Thu, 02 Jun 2022 03:31:11 -0700 (PDT) MIME-Version: 1.0 From: =?UTF-8?B?0JHQvtC70YzRiNC+0Lkg0KfQtdC70L7QstC10Lo=?= Date: Thu, 2 Jun 2022 16:29:40 +0600 Message-ID: To: ffmpeg-devel@ffmpeg.org X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: [FFmpeg-devel] How to upsample and then encode audio X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Basically after transcoding pcm_alaw 8khz to mp3 44.1khz, I can hear only some brief or even swift sound in first 1-2 seconds, unrecognizable sound. So something is wrong with pts/dts, packed to planar convertion, or upsampling. My application does transcoding rtsp camera stream to file. Video and audio. Video works fine and audio remuxing as well. Now I have pcm_alaw 8khz audio stream and want to transcode it to mp4 file along with video. Code is quite cumbersome to construct reproducible part, so firstly I want to know if my logic is right. Here is my draft process (assume all error are checked and handled): create encoder: ``` codec_ = avcodec_find_encoder(AV_CODEC_ID_MP3); enc_ctx_ = avcodec_alloc_context3(codec_); enc_ctx_->bit_rate = 64000; enc_ctx_->codec_type = AVMEDIA_TYPE_AUDIO; enc_ctx_->sample_fmt = codec_->sample_fmts ? codec_->sample_fmts[0] : AV_SAMPLE_FMT_S32P; // functions from here https://www.ffmpeg.org/doxygen/4.1/encode_audio_8c-example.html enc_ctx_->sample_rate = select_sample_rate(codec_); enc_ctx_->channel_layout = select_channel_layout(codec_); enc_ctx_->channels = av_get_channel_layout_nb_channels(enc_ctx_->channel_layout); enc_ctx_->time_base = (AVRational){1, enc_ctx_->sample_rate}; enc_ctx_->strict_std_compliance = FF_COMPLIANCE_EXPERIMENTAL; if (is_global_header) { enc_ctx_->flags |= AV_CODEC_FLAG_GLOBAL_HEADER; } avcodec_open2(enc_ctx_, codec_, nullptr); ``` create resampler (in_frame): ``` audio_fifo_ = av_audio_fifo_alloc(enc_ctx_->sample_fmt, enc_ctx_->channels, 1)); in_ch_layout_ = in_frame->channel_layout; in_sample_fmt = in_frame->format; in_sample_rate_ = in_frame->sample_rate; swr_ctx_ = swr_alloc_set_opts(NULL, // we're allocating a new context enc_ctx_->channel_layout, // out_ch_layout enc_ctx_->sample_fmt, // out_sample_fmt enc_ctx_->sample_rate, // out_sample_rate in_frame->channel_layout, // in_ch_layout (AVSampleFormat)in_frame->format, // in_sample_fmt in_frame->sample_rate, // in_sample_rate 0, // log_offset NULL); // log_ctx swr_init(swr_ctx_); ``` resample (in_frame, start_pts, start_dts): ``` auto resampled_frame = av_frame_alloc(); auto dst_nb_samples = av_rescale_rnd(swr_get_delay(swr_ctx_, in_frame->sample_rate) + in_frame->nb_samples, enc_ctx_->sample_rate, in_frame->sample_rate, AV_ROUND_UP); // resampled_frame->nb_samples = dst_nb_samples; resampled_frame->format = enc_ctx_->sample_fmt; resampled_frame->channel_layout = enc_ctx_->channel_layout; // resampled_frame->channels = enc_ctx_->channels; resampled_frame->sample_rate = enc_ctx_->sample_rate; error = swr_convert_frame(swr_ctx_, resampled_frame, in_frame); /* Make the FIFO as large as it needs to be to hold both, * the old and the new samples. */ if (av_audio_fifo_size(audio_fifo_) < dst_nb_samples) { av_audio_fifo_realloc(audio_fifo_, dst_nb_samples); } /* Store the new samples in the FIFO buffer. */ auto nb_samples = av_audio_fifo_write(audio_fifo_, reinterpret_cast(resampled_frame->extended_data), resampled_frame->nb_samples); int delay = 0; // trying to split resampled frame to desired chunks while (av_audio_fifo_size(audio_fifo_) > 0) { const int frame_size = FFMIN(av_audio_fifo_size(audio_fifo_), enc_ctx_->frame_size); auto out_frame = av_frame_alloc(); out_frame->nb_samples = frame_size; out_frame->format = enc_ctx_->sample_fmt; out_frame->channel_layout = enc_ctx_->channel_layout; out_frame->channels = enc_ctx_->channels; out_frame->sample_rate = enc_ctx_->sample_rate; av_frame_get_buffer(out_frame, 0); av_audio_fifo_read(audio_fifo_, (void **)out_frame->data, frame_size) < frame_size); // ***** tried both cases out_frame->pts = in_frame->pts + delay; out_frame->pkt_dts = in_frame->pkt_dts + delay; // swr_next_pts(swr_ctx_, in_frame->pts) + delay; // swr_next_pts(swr_ctx_, in_frame->pkt_dts) + delay; result.push_back(out_frame); delay += frame_size; } return result; ``` encoding and muxing (in_frame): ``` bool DoesNeedResample(const AVFrame * in_frame) { assert(("DoesNeedResample: in_frame is empty", in_frame)); assert(("DoesNeedResample: encoder is not started", is_init_)); if (in_frame->sample_rate != enc_ctx_->sample_rate || in_frame->channel_layout != enc_ctx_->channel_layout || in_frame->channels != enc_ctx_->channels || in_frame->format != enc_ctx_->sample_fmt) { return true; } return false; } av_frame_make_writable(in_frame); streamserver::AVFrames encoding_frames; if (DoesNeedResample(in_frame)) { encoding_frames = Resample(in_frame, av_rescale_q(in_frame->pts, in_audio_stream_timebase_, out_audio_stream_->time_base), av_rescale_q(in_frame->pkt_dts, in_audio_stream_timebase_, out_audio_stream_->time_base)); } else { encoding_frames.push_back(av_frame_clone(in_frame)); } for (auto frame : encoding_frames) { if ((err = avcodec_send_frame(encoder_ctx, frame)) < 0) { AVFrameFree(&frame); } while (err >= 0) { pkt_->data = NULL; pkt_->size = 0; av_init_packet(pkt_); err = avcodec_receive_packet(encoder_ctx, pkt_); if (err == AVERROR(EAGAIN) || err == AVERROR_EOF) { break; } else if (err < 0) { break; } pkt_->stream_index = out_audio_stream_->index; av_interleaved_write_frame(ofmt_ctx_, pkt_); } av_packet_unref(pkt_); } ``` Sound in result video is corrupted, see first paragraph for description. In https://www.ffmpeg.org/doxygen/4.1/transcode_aac_8c-example.html there are lines: ``` /* * Perform a sanity check so that the number of converted samples is * not greater than the number of samples to be converted. * If the sample rates differ, this case has to be handled differently */ av_assert0(output_codec_context->sample_rate == input_codec_context->sample_rate); ``` How to handle such cases? I tried to split resampled frames via fifo in example above! _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".