From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 12F2E42A42 for ; Wed, 13 Apr 2022 04:11:35 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9482468B353; Wed, 13 Apr 2022 07:11:32 +0300 (EEST) Received: from mail-yw1-f171.google.com (mail-yw1-f171.google.com [209.85.128.171]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D4D3A68B272 for ; Wed, 13 Apr 2022 07:11:26 +0300 (EEST) Received: by mail-yw1-f171.google.com with SMTP id 00721157ae682-2db2add4516so9526857b3.1 for ; Tue, 12 Apr 2022 21:11:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=GURH7lKWdAyKAef5CzJMh8GtSSXaO8a5q4Jh5O2NDy4=; b=C6b/ISGeyNyFuXZkwYw89E+dVaG98UZpnkihVZJtxptmFpSvqP0VHZTNc3j3SDdRSv uXtDl52JAsbR1KUtSthL6Pa/TQrBjNgD6p1xC8qf8O31M8LM+eG1voDO0vvTmliByMCq BpSekM++oPasLBWj0uaUcfashLchQ/+llsr+NOwLqmEqy5HoUmtPNP0IeUBtDihqaJkI Mn2PKhy9vGftQBwa2MoqMOGzYXAaOadjXEWSWmWiy/07V8nRllktIguuFxrvV+l56XU5 zIYo9Kk+5sr/FkEY2EoNJtsONgL6FzvRXti2FkabD8BE7BelYGEY2F110215BBaBdVZz X3/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=GURH7lKWdAyKAef5CzJMh8GtSSXaO8a5q4Jh5O2NDy4=; b=4X3aDte+Pvm1NatPMFu0L3wlPrKgQiZ3B7K87AfBKsgQZO6eSpsKFPEPQQbUne+Wni 4/Ozw3AwJeANqg0qNhCoP8mivCAbfNAU7xGoZ/M3dcmX1ak4OQLmlDs1tHPFDI0z6seN issdjh3gmU6l4jTewWcBbaohTttHbNzua/1xqDBSp7bP0NZOmcABfxEIR2a//+Fd69m7 hhJ9XA+MUJGA4KvM6ei1ENRAzSqS5PhPnt3fD7lp9470vOhmGy64Rx4w9sT8H7h+/WQq hDdXrHTcGTZMO33/QT2FwUXJ60FHeek+SSqf08lPEqnfvYDQHtAiD5fkWPTdY85N+wHU nIug== X-Gm-Message-State: AOAM530AV0NPNSLZUgfP99wzOOyFMl/z5VUZLKGwyhQciSoeJrP9iusu fwtjOOaNLWP/fELw6WUCgZuF8qjMpvdjOWv/4VulI5JDXLY= X-Google-Smtp-Source: ABdhPJwfCaqUIZV/gAvuuB8aGrijhIj8Cws+KZsbdB4aEKYJTp5S7ShJNQZjm6TA85M+NPcALrFF+OERjqt/s7CfoRw= X-Received: by 2002:a0d:ff42:0:b0:2eb:3711:f77f with SMTP id p63-20020a0dff42000000b002eb3711f77fmr32135738ywf.488.1649823084767; Tue, 12 Apr 2022 21:11:24 -0700 (PDT) MIME-Version: 1.0 References: <20220327060800.3732289-1-wangcao@google.com> <2ec3971d-f698-1369-1ace-eac613afc980@passwd.hu> In-Reply-To: From: Wang Cao Date: Tue, 12 Apr 2022 21:11:13 -0700 Message-ID: To: FFmpeg development discussions and patches X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [FFmpeg-devel] [PATCH] avfilter/alimiter: Add "flush_buffer" option to flush the remaining valid data to the output X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Tue, Apr 12, 2022 at 12:40 PM Paul B Mahol wrote: > On Mon, Apr 11, 2022 at 10:59 PM Wang Cao < > wangcao-at-google.com@ffmpeg.org> > wrote: > > > On Sat, Apr 9, 2022 at 5:35 AM Paul B Mahol wrote: > > > > > On Fri, Apr 8, 2022 at 10:41 PM Wang Cao < > > wangcao-at-google.com@ffmpeg.org > > > > > > > wrote: > > > > > > > On Fri, Apr 8, 2022 at 11:40 AM Paul B Mahol > wrote: > > > > > > > > > On Thu, Apr 7, 2022 at 11:56 PM Wang Cao < > > > > wangcao-at-google.com@ffmpeg.org > > > > > > > > > > > wrote: > > > > > > > > > > > On Thu, Apr 7, 2022 at 12:44 AM Paul B Mahol > > > wrote: > > > > > > > > > > > > > On Wed, Apr 6, 2022 at 1:49 PM Paul B Mahol > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Apr 5, 2022 at 8:57 PM Wang Cao < > > > > > > > wangcao-at-google.com@ffmpeg.org> > > > > > > > > wrote: > > > > > > > > > > > > > > > >> On Mon, Apr 4, 2022 at 3:28 PM Marton Balint > > > > > wrote: > > > > > > > >> > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > On Mon, 4 Apr 2022, Paul B Mahol wrote: > > > > > > > >> > > > > > > > > >> > > On Sun, Mar 27, 2022 at 11:41 PM Marton Balint < > > > cus@passwd.hu > > > > > > > > > > > > wrote: > > > > > > > >> > > > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> On Sat, 26 Mar 2022, Wang Cao wrote: > > > > > > > >> > >> > > > > > > > >> > >>> The change in the commit will add some samples to the > > end > > > of > > > > > the > > > > > > > >> audio > > > > > > > >> > >>> stream. The intention is to add a "zero_delay" option > > > > > eventually > > > > > > > to > > > > > > > >> not > > > > > > > >> > >>> have the delay in the begining the output from > alimiter > > > due > > > > to > > > > > > > >> > >>> lookahead. > > > > > > > >> > >> > > > > > > > >> > >> I was very much suprised to see that the alimiter > filter > > > > > actually > > > > > > > >> delays > > > > > > > >> > >> the audio - as in extra samples are inserted in the > > > beginning > > > > > and > > > > > > > >> some > > > > > > > >> > >> samples are cut in the end. This trashes A-V sync, so > it > > > is a > > > > > bug > > > > > > > >> IMHO. > > > > > > > >> > >> > > > > > > > >> > >> So unless somebody has some valid usecase for the > legacy > > > way > > > > of > > > > > > > >> > operation > > > > > > > >> > >> I'd just simply change it to be "zero delay" without > any > > > > > > additional > > > > > > > >> user > > > > > > > >> > >> option, in a single patch. > > > > > > > >> > >> > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > This is done by this patch in very complicated way and > > also > > > it > > > > > > > really > > > > > > > >> > > should be optional. > > > > > > > >> > > > > > > > > >> > But why does it make sense to keep the current (IMHO > buggy) > > > > > > > operational > > > > > > > >> > mode which adds silence in the beginning and trims the > end? > > I > > > > > > > understand > > > > > > > >> > that the original implementation worked like this, but > > > > libavfilter > > > > > > has > > > > > > > >> > packet timestamps and N:M filtering so there is absolutely > > no > > > > > reason > > > > > > > to > > > > > > > >> > use an 1:1 implementation and live with its limitations. > > > > > > > >> > > > > > > > > >> Hello Paul and Marton, thank you so much for taking time to > > > review > > > > > my > > > > > > > >> patch. > > > > > > > >> I totally understand that my patch may seem a little bit > > > > complicated > > > > > > > but I > > > > > > > >> can > > > > > > > >> show with a FATE test that if we set the alimiter to behave > > as a > > > > > > > >> passthrough filter, > > > > > > > >> the output frames will be the same from "framecrc" with my > > > patch. > > > > > The > > > > > > > >> existing > > > > > > > >> behavior will not work for all gapless audio processing. > > > > > > > >> > > > > > > > >> The complete patch to fix this issue is at > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220330210314.2055201-1-wangcao@google.com/ > > > > > > > >> > > > > > > > >> Regarding Paul's concern, I personally don't have any > > preference > > > > > > whether > > > > > > > >> to > > > > > > > >> put > > > > > > > >> the patch as an extra option or not. With respect to the > > > > > > implementation, > > > > > > > >> the patch > > > > > > > >> is the best I can think of by preserving as much information > > as > > > > > > possible > > > > > > > >> from input > > > > > > > >> frames. I also understand it may break concept that > > > "filter_frame" > > > > > > > outputs > > > > > > > >> one frame > > > > > > > >> at a time. For alimiter with my patch, depending on the size > > of > > > > the > > > > > > > >> lookahead buffer, > > > > > > > >> it may take a few frames before one output frame can be > > > generated. > > > > > > This > > > > > > > is > > > > > > > >> inevitable > > > > > > > >> to compensate for the delay of the lookahead buffer. > > > > > > > >> > > > > > > > >> Thanks again for reviewing my patch and I'm looking forward > to > > > > > hearing > > > > > > > >> from > > > > > > > >> you :) > > > > > > > >> > > > > > > > > > > > > > > > > Better than (because its no more 1 frame X nb_samples in, 1 > > > frame X > > > > > > > > nb_samples out) to replace .filter_frame/.request_frame with > > > > > .activate > > > > > > > > logic. > > > > > > > > > > > > > > > > And make this output delay compensation filtering optional. > > > > > > > > > > > > > > > > In this process make sure that output PTS frame timestamps > are > > > > > > unchanged > > > > > > > > from input one, by keeping reference of needed frames in > filter > > > > > queue. > > > > > > > > > > > > > > > > Look how speechnorm/dynaudnorm does it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Alternatively, use current logic in ladspa filter, (also add > > option > > > > > with > > > > > > > same name). > > > > > > > > > > > > > > This would need less code, and probably better approach, as > there > > > is > > > > no > > > > > > > extra latency introduced. > > > > > > > > > > > > > > Than mapping 1:1 between same number of samples per frame is > not > > > hold > > > > > any > > > > > > > more, but I think that is not much important any more. > > > > > > > > > > > > > Thank you for replying to me with your valuable feedback! I have > > > > checked > > > > > > af_ladspa > > > > > > and the "request_frame" function in af_ladspa looks similar to > what > > > I'm > > > > > > doing. The > > > > > > difference comes from the fact that I need an internal frame > buffer > > > to > > > > > keep > > > > > > track of > > > > > > output frames. Essentially I add a frame to the internal buffer > > when > > > an > > > > > > input frame > > > > > > comes in. The frames in this buffer will be the future output > > frames. > > > > We > > > > > > start writing > > > > > > these output frame data buffers only when the internal lookahead > > > buffer > > > > > has > > > > > > been filled. > > > > > > When there is no more input frames, we flushing all the remaining > > > data > > > > in > > > > > > the internal > > > > > > frame buffers and lookahead buffers. Can you advise on my > approach > > > > here? > > > > > > Thank you! > > > > > > > > > > > > I can put my current implementation of "filter_frame" and > > > > "request_frame" > > > > > > into the "activate" approach as you suggested with > > > > speechnorm/dynaudnorm. > > > > > > > > > > > > > > > > No need to wait for all buffers to fill up, only lookahead buffer. > > > > > > > > > > Just trim initial samples that is size of lookahead, and than start > > > > > outputing samples > > > > > just once you get whatever modulo of current frame number of > samples. > > > > > > > > > > So there should not be need for extra buffers to keep audio > samples. > > > > > Just buffers to keep input pts and number of samples of previous > > input > > > > > frames, like in ladspa filter. > > > > > > > > > Thank you for the suggestion! From my understanding, we have two ways > > to > > > > achieve > > > > "zero_delay" functionality here. > > > > > > > > Option 1: as you mentioned, we can trim the initial samples of > > lookahead > > > > size. > > > > The size of the lookahead buffer can be multiple frames. For example > > when > > > > the > > > > attack is 0.08 second, sample rate is 44100 and frame size is 1024, > the > > > > lookahead > > > > buffer size is about 3 frames so the filter needs to see at least 3 > > input > > > > audio frames > > > > before it can output one audio frame. We also need to make > assumptions > > > > about the > > > > size of the audio frame (meaning the number of audio samples per > frame) > > > > when flushing. > > > > The frame is probably 1024 conventionally but I think it's better to > > make > > > > less assumption > > > > as possible to allow the filter to be used as flexible as possible. > > > > > > > > Option 2: this is what I proposed before. We basically map the same > > > number > > > > of input > > > > frames to the output and we also make sure everything about the frame > > the > > > > same as > > > > the input except for the audio signal data itself, which will be > > changed > > > by > > > > whatever > > > > processing alimiter has to do with that. I think it is safer to make > > the > > > > filter only work on > > > > the signal itself. It can help other people who use this filter > without > > > > worrying about > > > > any unexpected change on the frame. The downside is that the filter > > > > internally needs to > > > > store some empty frames, which will be written as the lookahead > buffer > > is > > > > filled. > > > > > > > > I don't see any performance difference between these two options but > > > option > > > > 2 looks > > > > better to me because it works solely on the signals without any > changes > > > on > > > > the frame > > > > > > > > > > option 1 does not add extra delay in processing chain at all, and > option > > 2 > > > adds extra delay. > > > > > > Just look at latest version of af_ladspa.c filter code. > > > > > Hi Paul, I have checked af_ladspa filter. Option 1 certainly doesn't > > introduce any delay. > > Option 2 will at most introduce one audio frame of delay in the > processing > > chain. Option > > 2 needs to fill the lookahead buffer and some frames of samples to push > the > > data in the > > lookahead buffer out for one output audio frame. The processing delay is > > probably not a > > big concern. > > > > The reason I proposed option 2 is about the sync of metadata for the > output > > frames: > > > > It appears to me that the only metadata we need to worry about is PTS and > > the number of > > samples in af_ladspa. However, AVFrame has other fields that also seem to > > specify some metadata > > for the frame: > > 1. AVDictionary *metadata > > 2. void *opaque (this seems to be client-specific) > > 3. AVFrameSideData** side_data (this is not handled by > > av_frame_copy_props). > > If we go for option 1, it is likely these fields in the input frame will > > not be mapped to the corresponding > > output frames. I believe this is also an unexpected behavior for other > > users who rely on these fields. I > > understand other filters may not consider this as important but > > conceptually I believe it is better to make > > the filter focus on changing what it is supposed to, which is the audio > > signal itself. > > > > Looking forward to hearing your opinion on this, thanks! > > > > > AVFrame Metadata is mostly useful for cases when input frame after > filtering > with filter that adds metadata no longer changes output frames later in > filter graph. > Thus using something like astats=metadata=1,alimiter is not very useful and > logical. > > Why by, lightly insisting on no extra delay/latency introduction, and > prefer 1, option for alimiter filter > is mostly because in audio domain, latency is relevant and important > subject for most audio processing specially for limiters. > So it is very important to keep it at minimum if possible. > Thank you so much for introducing me with the context. As I understand the possible metadata can be, I think I have fully understood what I can do. Thanks! -- Wang Cao | Software Engineer | wangcao@google.com | 650-203-7807 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".