From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTP id 12F2E42A42
	for <ffmpegdev@gitmailbox.com>; Wed, 13 Apr 2022 04:11:35 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9482468B353;
	Wed, 13 Apr 2022 07:11:32 +0300 (EEST)
Received: from mail-yw1-f171.google.com (mail-yw1-f171.google.com
 [209.85.128.171])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D4D3A68B272
 for <ffmpeg-devel@ffmpeg.org>; Wed, 13 Apr 2022 07:11:26 +0300 (EEST)
Received: by mail-yw1-f171.google.com with SMTP id
 00721157ae682-2db2add4516so9526857b3.1
 for <ffmpeg-devel@ffmpeg.org>; Tue, 12 Apr 2022 21:11:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
 bh=GURH7lKWdAyKAef5CzJMh8GtSSXaO8a5q4Jh5O2NDy4=;
 b=C6b/ISGeyNyFuXZkwYw89E+dVaG98UZpnkihVZJtxptmFpSvqP0VHZTNc3j3SDdRSv
 uXtDl52JAsbR1KUtSthL6Pa/TQrBjNgD6p1xC8qf8O31M8LM+eG1voDO0vvTmliByMCq
 BpSekM++oPasLBWj0uaUcfashLchQ/+llsr+NOwLqmEqy5HoUmtPNP0IeUBtDihqaJkI
 Mn2PKhy9vGftQBwa2MoqMOGzYXAaOadjXEWSWmWiy/07V8nRllktIguuFxrvV+l56XU5
 zIYo9Kk+5sr/FkEY2EoNJtsONgL6FzvRXti2FkabD8BE7BelYGEY2F110215BBaBdVZz
 X3/A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to;
 bh=GURH7lKWdAyKAef5CzJMh8GtSSXaO8a5q4Jh5O2NDy4=;
 b=4X3aDte+Pvm1NatPMFu0L3wlPrKgQiZ3B7K87AfBKsgQZO6eSpsKFPEPQQbUne+Wni
 4/Ozw3AwJeANqg0qNhCoP8mivCAbfNAU7xGoZ/M3dcmX1ak4OQLmlDs1tHPFDI0z6seN
 issdjh3gmU6l4jTewWcBbaohTttHbNzua/1xqDBSp7bP0NZOmcABfxEIR2a//+Fd69m7
 hhJ9XA+MUJGA4KvM6ei1ENRAzSqS5PhPnt3fD7lp9470vOhmGy64Rx4w9sT8H7h+/WQq
 hDdXrHTcGTZMO33/QT2FwUXJ60FHeek+SSqf08lPEqnfvYDQHtAiD5fkWPTdY85N+wHU
 nIug==
X-Gm-Message-State: AOAM530AV0NPNSLZUgfP99wzOOyFMl/z5VUZLKGwyhQciSoeJrP9iusu
 fwtjOOaNLWP/fELw6WUCgZuF8qjMpvdjOWv/4VulI5JDXLY=
X-Google-Smtp-Source: ABdhPJwfCaqUIZV/gAvuuB8aGrijhIj8Cws+KZsbdB4aEKYJTp5S7ShJNQZjm6TA85M+NPcALrFF+OERjqt/s7CfoRw=
X-Received: by 2002:a0d:ff42:0:b0:2eb:3711:f77f with SMTP id
 p63-20020a0dff42000000b002eb3711f77fmr32135738ywf.488.1649823084767; Tue, 12
 Apr 2022 21:11:24 -0700 (PDT)
MIME-Version: 1.0
References: <20220327060800.3732289-1-wangcao@google.com>
 <2ec3971d-f698-1369-1ace-eac613afc980@passwd.hu>
 <CAPYw7P5hbCSLVVDH+zqkLsLXO6kCzQCg9P9aYCyfLmmQWUzipQ@mail.gmail.com>
 <a382865a-18dd-f1f1-3593-939e7642771e@passwd.hu>
 <CAL3k9=qGDKpx6vTUvO83i8-7TB2V3=pdn52UXPXdEG8xnaCTFA@mail.gmail.com>
 <CAPYw7P5P+gCqowVqFXNG1D+7Hzqq9C+548krhC-3N8VhzdNrgw@mail.gmail.com>
 <CAPYw7P6sKs-EmqCet_FsqL90iPNb2UR71NAFtWHtBwSJ2vR9tg@mail.gmail.com>
 <CAL3k9=rU8Qj-Y2jmyAARyGPKLeDNFO9iKhGLwHgWnJR3d96xfg@mail.gmail.com>
 <CAPYw7P7RJWh3O+ZDqdvVyAvnC93=8VOjua1OZuKpZYaSnVo2ZQ@mail.gmail.com>
 <CAL3k9=rpRWJEmAXw6iof=TGnU13XybW_Uu3ja1s9Zt-e3DkLoA@mail.gmail.com>
 <CAPYw7P5Ou=5yejsq4vYRniEGsKPaPO_siAuoEe4L5nAvJKBrag@mail.gmail.com>
 <CAL3k9=o-CwegCd5G-GrRbTjh6y6yxqWujfyjGTNOwouOvmj73A@mail.gmail.com>
 <CAPYw7P7RWeQ0oO=RYrsRUVK6=m9-x-kZW-DXg=g+fzYG=Vd+3g@mail.gmail.com>
In-Reply-To: <CAPYw7P7RWeQ0oO=RYrsRUVK6=m9-x-kZW-DXg=g+fzYG=Vd+3g@mail.gmail.com>
From: Wang Cao <wangcao-at-google.com@ffmpeg.org>
Date: Tue, 12 Apr 2022 21:11:13 -0700
Message-ID: <CAL3k9=oPcRXAYU8_U1C=CG=oR=0bbR6nsfFCUOLoCE_AkdBQYw@mail.gmail.com>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
Subject: Re: [FFmpeg-devel] [PATCH] avfilter/alimiter: Add "flush_buffer"
 option to flush the remaining valid data to the output
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/CAL3k9=oPcRXAYU8_U1C=CG=oR=0bbR6nsfFCUOLoCE_AkdBQYw@mail.gmail.com/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

On Tue, Apr 12, 2022 at 12:40 PM Paul B Mahol <onemda@gmail.com> wrote:

> On Mon, Apr 11, 2022 at 10:59 PM Wang Cao <
> wangcao-at-google.com@ffmpeg.org>
> wrote:
>
> > On Sat, Apr 9, 2022 at 5:35 AM Paul B Mahol <onemda@gmail.com> wrote:
> >
> > > On Fri, Apr 8, 2022 at 10:41 PM Wang Cao <
> > wangcao-at-google.com@ffmpeg.org
> > > >
> > > wrote:
> > >
> > > > On Fri, Apr 8, 2022 at 11:40 AM Paul B Mahol <onemda@gmail.com>
> wrote:
> > > >
> > > > > On Thu, Apr 7, 2022 at 11:56 PM Wang Cao <
> > > > wangcao-at-google.com@ffmpeg.org
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > On Thu, Apr 7, 2022 at 12:44 AM Paul B Mahol <onemda@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > On Wed, Apr 6, 2022 at 1:49 PM Paul B Mahol <onemda@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Apr 5, 2022 at 8:57 PM Wang Cao <
> > > > > > > wangcao-at-google.com@ffmpeg.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> On Mon, Apr 4, 2022 at 3:28 PM Marton Balint <cus@passwd.hu
> >
> > > > wrote:
> > > > > > > >>
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > On Mon, 4 Apr 2022, Paul B Mahol wrote:
> > > > > > > >> >
> > > > > > > >> > > On Sun, Mar 27, 2022 at 11:41 PM Marton Balint <
> > > cus@passwd.hu
> > > > >
> > > > > > > wrote:
> > > > > > > >> > >
> > > > > > > >> > >>
> > > > > > > >> > >>
> > > > > > > >> > >> On Sat, 26 Mar 2022, Wang Cao wrote:
> > > > > > > >> > >>
> > > > > > > >> > >>> The change in the commit will add some samples to the
> > end
> > > of
> > > > > the
> > > > > > > >> audio
> > > > > > > >> > >>> stream. The intention is to add a "zero_delay" option
> > > > > eventually
> > > > > > > to
> > > > > > > >> not
> > > > > > > >> > >>> have the delay in the begining the output from
> alimiter
> > > due
> > > > to
> > > > > > > >> > >>> lookahead.
> > > > > > > >> > >>
> > > > > > > >> > >> I was very much suprised to see that the alimiter
> filter
> > > > > actually
> > > > > > > >> delays
> > > > > > > >> > >> the audio - as in extra samples are inserted in the
> > > beginning
> > > > > and
> > > > > > > >> some
> > > > > > > >> > >> samples are cut in the end. This trashes A-V sync, so
> it
> > > is a
> > > > > bug
> > > > > > > >> IMHO.
> > > > > > > >> > >>
> > > > > > > >> > >> So unless somebody has some valid usecase for the
> legacy
> > > way
> > > > of
> > > > > > > >> > operation
> > > > > > > >> > >> I'd just simply change it to be "zero delay" without
> any
> > > > > > additional
> > > > > > > >> user
> > > > > > > >> > >> option, in a single patch.
> > > > > > > >> > >>
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > This is done by this patch in very complicated way and
> > also
> > > it
> > > > > > > really
> > > > > > > >> > > should be optional.
> > > > > > > >> >
> > > > > > > >> > But why does it make sense to keep the current (IMHO
> buggy)
> > > > > > > operational
> > > > > > > >> > mode which adds silence in the beginning and trims the
> end?
> > I
> > > > > > > understand
> > > > > > > >> > that the original implementation worked like this, but
> > > > libavfilter
> > > > > > has
> > > > > > > >> > packet timestamps and N:M filtering so there is absolutely
> > no
> > > > > reason
> > > > > > > to
> > > > > > > >> > use an 1:1 implementation and live with its limitations.
> > > > > > > >> >
> > > > > > > >> Hello Paul and Marton, thank you so much for taking time to
> > > review
> > > > > my
> > > > > > > >> patch.
> > > > > > > >> I totally understand that my patch may seem a little bit
> > > > complicated
> > > > > > > but I
> > > > > > > >> can
> > > > > > > >> show with a FATE test that if we set the alimiter to behave
> > as a
> > > > > > > >> passthrough filter,
> > > > > > > >> the output frames will be the same from "framecrc" with my
> > > patch.
> > > > > The
> > > > > > > >> existing
> > > > > > > >> behavior will not work for all gapless audio processing.
> > > > > > > >>
> > > > > > > >> The complete patch to fix this issue is at
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220330210314.2055201-1-wangcao@google.com/
> > > > > > > >>
> > > > > > > >> Regarding Paul's concern, I personally don't have any
> > preference
> > > > > > whether
> > > > > > > >> to
> > > > > > > >> put
> > > > > > > >> the patch as an extra option or not. With respect to the
> > > > > > implementation,
> > > > > > > >> the patch
> > > > > > > >> is the best I can think of by preserving as much information
> > as
> > > > > > possible
> > > > > > > >> from input
> > > > > > > >> frames. I also understand it may break concept that
> > > "filter_frame"
> > > > > > > outputs
> > > > > > > >> one frame
> > > > > > > >> at a time. For alimiter with my patch, depending on the size
> > of
> > > > the
> > > > > > > >> lookahead buffer,
> > > > > > > >> it may take a few frames before one output frame can be
> > > generated.
> > > > > > This
> > > > > > > is
> > > > > > > >> inevitable
> > > > > > > >> to compensate for the delay of the lookahead buffer.
> > > > > > > >>
> > > > > > > >> Thanks again for reviewing my patch and I'm looking forward
> to
> > > > > hearing
> > > > > > > >> from
> > > > > > > >> you :)
> > > > > > > >>
> > > > > > > >
> > > > > > > > Better than (because its no more 1 frame X nb_samples in, 1
> > > frame X
> > > > > > > > nb_samples out) to replace .filter_frame/.request_frame with
> > > > > .activate
> > > > > > > > logic.
> > > > > > > >
> > > > > > > > And make this output delay compensation filtering optional.
> > > > > > > >
> > > > > > > > In this process make sure that output PTS frame timestamps
> are
> > > > > > unchanged
> > > > > > > > from input one, by keeping reference of needed frames in
> filter
> > > > > queue.
> > > > > > > >
> > > > > > > > Look how speechnorm/dynaudnorm does it.
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Alternatively, use current logic in ladspa filter, (also add
> > option
> > > > > with
> > > > > > > same name).
> > > > > > >
> > > > > > > This would need less code, and probably better approach, as
> there
> > > is
> > > > no
> > > > > > > extra latency introduced.
> > > > > > >
> > > > > > > Than mapping 1:1 between same number of samples per frame is
> not
> > > hold
> > > > > any
> > > > > > > more, but I think that is not much important any more.
> > > > > > >
> > > > > > Thank you for replying to me with your valuable feedback! I have
> > > > checked
> > > > > > af_ladspa
> > > > > > and the "request_frame" function in af_ladspa looks similar to
> what
> > > I'm
> > > > > > doing. The
> > > > > > difference comes from the fact that I need an internal frame
> buffer
> > > to
> > > > > keep
> > > > > > track of
> > > > > > output frames. Essentially I add a frame to the internal buffer
> > when
> > > an
> > > > > > input frame
> > > > > > comes in. The frames in this buffer will be the future output
> > frames.
> > > > We
> > > > > > start writing
> > > > > > these output frame data buffers only when the internal lookahead
> > > buffer
> > > > > has
> > > > > > been filled.
> > > > > > When there is no more input frames, we flushing all the remaining
> > > data
> > > > in
> > > > > > the internal
> > > > > > frame buffers and lookahead buffers. Can you advise on my
> approach
> > > > here?
> > > > > > Thank you!
> > > > > >
> > > > > > I can put my current implementation of "filter_frame" and
> > > > "request_frame"
> > > > > > into the "activate" approach as you suggested with
> > > > speechnorm/dynaudnorm.
> > > > > >
> > > > >
> > > > > No need to wait for all buffers to fill up, only lookahead buffer.
> > > > >
> > > > > Just trim initial samples that is size of lookahead, and than start
> > > > > outputing samples
> > > > > just once you get whatever modulo of current frame number of
> samples.
> > > > >
> > > > > So there should not be need for extra buffers to keep audio
> samples.
> > > > > Just buffers to keep input pts and number of samples of previous
> > input
> > > > > frames, like in ladspa filter.
> > > > >
> > > > Thank you for the suggestion! From my understanding, we have two ways
> > to
> > > > achieve
> > > > "zero_delay" functionality here.
> > > >
> > > > Option 1: as you mentioned, we can trim the initial samples of
> > lookahead
> > > > size.
> > > > The size of the lookahead buffer can be multiple frames. For example
> > when
> > > > the
> > > > attack is 0.08 second, sample rate is 44100 and frame size is 1024,
> the
> > > > lookahead
> > > > buffer size is about 3 frames so the filter needs to see at least 3
> > input
> > > > audio frames
> > > > before it can output one audio frame. We also need to make
> assumptions
> > > > about the
> > > > size of the audio frame (meaning the number of audio samples per
> frame)
> > > > when flushing.
> > > > The frame is probably 1024 conventionally but I think it's better to
> > make
> > > > less assumption
> > > > as possible to allow the filter to be used as flexible as possible.
> > > >
> > > > Option 2: this is what I proposed before. We basically map the same
> > > number
> > > > of input
> > > > frames to the output and we also make sure everything about the frame
> > the
> > > > same as
> > > > the input except for the audio signal data itself, which will be
> > changed
> > > by
> > > > whatever
> > > > processing alimiter has to do with that. I think it is safer to make
> > the
> > > > filter only work on
> > > > the signal itself. It can help other people who use this filter
> without
> > > > worrying about
> > > > any unexpected change on the frame. The downside is that the filter
> > > > internally needs to
> > > > store some empty frames, which will be written as the lookahead
> buffer
> > is
> > > > filled.
> > > >
> > > > I don't see any performance difference between these two options but
> > > option
> > > > 2 looks
> > > > better to me because it works solely on the signals without any
> changes
> > > on
> > > > the frame
> > > >
> > >
> > > option 1 does not add extra delay in processing chain at all, and
> option
> > 2
> > > adds extra delay.
> > >
> > > Just look at latest version of af_ladspa.c filter code.
> > >
> > Hi Paul, I have checked af_ladspa filter. Option 1 certainly doesn't
> > introduce any delay.
> > Option 2 will at most introduce one audio frame of delay in the
> processing
> > chain. Option
> > 2 needs to fill the lookahead buffer and some frames of samples to push
> the
> > data in the
> > lookahead buffer out for one output audio frame. The processing delay is
> > probably not a
> > big concern.
> >
> > The reason I proposed option 2 is about the sync of metadata for the
> output
> > frames:
> >
> > It appears to me that the only metadata we need to worry about is PTS and
> > the number of
> > samples in af_ladspa. However, AVFrame has other fields that also seem to
> > specify some metadata
> > for the frame:
> > 1. AVDictionary *metadata
> > 2. void *opaque (this seems to be client-specific)
> > 3. AVFrameSideData** side_data (this is not handled by
> > av_frame_copy_props).
> > If we go for option 1, it is likely these fields in the input frame will
> > not be mapped to the corresponding
> > output frames. I believe this is also an unexpected behavior for other
> > users who rely on these fields. I
> > understand other filters may not consider this as important but
> > conceptually I believe it is better to make
> > the filter focus on changing what it is supposed to, which is the audio
> > signal itself.
> >
> > Looking forward to hearing your opinion on this, thanks!
> >
>
>
> AVFrame Metadata is mostly useful for cases when input frame after
> filtering
> with filter that adds metadata no longer changes output frames later in
> filter graph.
> Thus using something like astats=metadata=1,alimiter is not very useful and
> logical.
>
> Why by, lightly insisting on no extra delay/latency introduction, and
> prefer 1, option for alimiter filter
> is mostly because in audio domain, latency is relevant and important
> subject for most audio processing specially for limiters.
> So it is very important to keep it at minimum if possible.
>
Thank you so much for introducing me with the context. As I understand the
possible metadata can be, I think I have fully understood what I can do.
Thanks!

-- 

Wang Cao |  Software Engineer |  wangcao@google.com |  650-203-7807
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".