From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 24E2547BFE for ; Sat, 4 Nov 2023 16:22:23 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D0A1568CE6F; Sat, 4 Nov 2023 18:22:20 +0200 (EET) Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com [209.85.167.51]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 20DEB68CE64 for ; Sat, 4 Nov 2023 18:22:14 +0200 (EET) Received: by mail-lf1-f51.google.com with SMTP id 2adb3069b0e04-507c78d258fso4061206e87.2 for ; Sat, 04 Nov 2023 09:22:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699114933; x=1699719733; darn=ffmpeg.org; h=content-transfer-encoding:in-reply-to:cc:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=OWuRSqOh/wCOQiq1DWJSHxwQ9aSVU60jPnFPXBrKulg=; b=btzXxoAfQhWjX0XglG5ZxhapOQzZJjparrcsbC5W1F6eX2umNNpBhz7QJ+46gvxFza LqK8v9ASpIQRVeY/UrbFQmMneiOmPJs7pe9VxmCblmnJsqzYbJX1y6auS6pGIgeAjDQw 7QSWbumOPsvGnmLAHXfvck+Hkp8x3GJmbNlztwU6JcQ+hOa1HbPyFdLdSMrt/qoUIgjI FZCnUOZqHTBhp7UhjweBuI0l97dnRDfSSO9VHGikS9dDFXyaopWw+9bgBNohdarwNMwU aboKpYdj6CCHLVGgMgQw6D0ZePi1ZfqbJeO+aI8O20QTVu4d6gqv91e2DbDWBcMU1Goi KfDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699114933; x=1699719733; h=content-transfer-encoding:in-reply-to:cc:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OWuRSqOh/wCOQiq1DWJSHxwQ9aSVU60jPnFPXBrKulg=; b=twHVB3g6AyvWTQo/3+DIt8kKHqOaq70GsviaN9hhiDv9iolVH0va9Rj9cKo+hbQPVy sHb/pCH6QsO/O8WXTaS/G7Dp0yWbfj6qBtQD977nZfL/m0maqZRA1RQMg7GqMzJbyncn 5IjvHylo+jgUX4ckqDD2AICvuw07m4gU9LFTZLA6m6lewktYY+i/zFr5Qiwm/2mefJSZ eiuWSuzJuMEj1eku9GnLZt7/5ox10anbAaX9xqHH3HmecZeSb19idlY8COAiQJpkUnrC +a/A+Th8tVaudLMQE36RfzrzRfIowsrZTqmn+J3qKNTOwroHNPFWdBDgsQAMZl27HY96 H2YQ== X-Gm-Message-State: AOJu0YxTmaNGyTR3m3MAWGKQtOkgSbpzkbVIEo6UlFr8DlpEan8fMd5S 5kC27JbHTQBBtTtGdWaoaYCeEyyNUvw= X-Google-Smtp-Source: AGHT+IES8iRIIJDwOippalgdh7B/GRywkP1hqVlgp1Tk+a3nmIcgbzm5BDqBgKIYGXieFIRpkFLU7g== X-Received: by 2002:a05:6512:b87:b0:509:4424:2e13 with SMTP id b7-20020a0565120b8700b0050944242e13mr9935883lfv.65.1699114932197; Sat, 04 Nov 2023 09:22:12 -0700 (PDT) Received: from [192.168.1.106] (33bf2829.skybroadband.com. [51.191.40.41]) by smtp.googlemail.com with ESMTPSA id v3-20020a5d4a43000000b003232380ffd7sm4711456wrs.102.2023.11.04.09.22.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 04 Nov 2023 09:22:11 -0700 (PDT) Message-ID: <1e243762-a5bf-4374-a39f-3ed846caa4d5@gmail.com> Date: Sat, 4 Nov 2023 16:22:11 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-GB To: ffmpeg-devel@ffmpeg.org References: From: Derek Buitenhuis In-Reply-To: Subject: Re: [FFmpeg-devel] [PATCH 1/2] decode: add ff_decode_skip_samples function X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: =?UTF-8?Q?Martin_Storsj=C3=B6?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Hi, I'm going to opine a bit here, and also comment on the mov/MP4 patch[0] that accompanies this set. This is for both historical purposes, and to distill IRC logs into something more digestible for others on the list to gain context on the issue, so apologies for re-treading ground. On 10/30/2023 5:09 AM, Lynne wrote: > This is a convenience function, which is required to be called by decoders > needing to skip samples every time. > It automatically creates and increments side data. > > The idea is to get rid of skip_samples eventually and replace it with this > function. So there is a lot to cover here, and lot of nuance to how things should be handled, that I want to list out, before discussing the specific changes of these two sets of patches. A bit of of a 'state of the world'. The goal of this patchset seems to be to aid in gapless playback and correct seeking in AAC streams (or later, more generally MDCT-styl audio codecs in general), but properly skipping initial priming samples (which include pre-roll), pre-roll (both normal, and extra incurred due to SBR) when seeking, and and, though not covered in these sets, I'll mention, end padding. First a note on terminology: 'Algorithmic delay' as it is being used here is not quite correct, for two reasons: * Latency and pre-roll (or roll distance) are separate things. Opus, for example, can have a latency as low as 2.5ms, but pre-roll is always at least 80ms - they are different things which serve different purposes, and I confirmed this with people who definitely know more about audio than me[1]. Pre-roll is often larger than latency, and the values stored in file metadata reflect this. * Pre-roll, or roll distance, are the industry standard terms. Making up out own terms because we disagree is silly and stubborn, and makes it harder on API users trying to use the API correctly, or understnd our code. Next, a quick breakdown of the AAC situation, in terms of both how this it is stored, what we support, and the state of the ecosystem and types of files that exist: * 'Raw' ADTS streams have no way to store any of this. The best we can do is guess the pre-roll. We should not guess priming or end padding, as no matter what we do, it'll be wrong, and any value will be a cargo culted hack value. * MP4 - there are two places to store this metadata - one standard, and one proprietary Apple way. There are, separately, two ways to signal priming length when SBR is present. * MP4s may contain a user data box with 'iTunSMPB' which contains priming, pre-roll, and end padding data. We support reading only priming data from this at the moment, and we set skip samples based on this. This is 'iTunes style' metadata. * The standards compliant (read: non-iTunes) way is to use an edit list to trim the priming samples, and, opionally end padding. End padding may also be trimmed by reducing the sample duration of the last packet in the stts box. Pre-roll is store in the sgpb box with the 'roll', type, which signals the roll distance as a number of packets; for example, -1 indicates you should decode an discard the samples of 1 packet before beginning plaback. Notably, this allows the sgpd box to also be use for video like periodic intra refresh H.264. libavformat does not current parse or export this info, but even if we did, converting number of packets to audio samples can get hairy. * Notably, since in MP4, the edit list represents the exact presentation-level info, when no edit list, or an edit list startiing at 0 is present, no samples, not even pre-roll should be trimmed - all players in the wild handle this properly, and it has been standard practice among streaming services for >10 years to not output the AAC frames representing priming samples at all (even if there is a small hit quality). This is what the patch at [0] is addressing. * My personal opinion is that since priming samples include any inherent delay already, that if we do not know how many priming samples there are, we should not trim anything from the start of the file, regardless of format. I am keen on hearing others Opinions(TM) here, particularily Anton and Martin (sorry for name dropping :)). * Further complicating matters is the fact that, again thanks to Apple, there are a lot of broken files around, since iTunes expects files to *not* include addition delay incurred by SBR in their edit list / priming info, even though, by spec, you are suppose to. This leads to the unfortunate case where you have tons of files in the wild that both do, and do not include SBR delay in their edit lists, and there is no way of detecting when this is the case. I do not have a good solution to this other than proposing a flag somewhere to switch between behaviors. Aside, for Opus, we incorrectly read this info from the codec specific box instead of the sgpd box... but we only ever put it in a write-only field. Now, on to the patches / implementation (opinions warning): * As noted above, I don't think we should apply any guessed priming to initial samples (pre-roll, or 'algorithmic delay, included). No other decoders or players do this, in the world, to my knowledge, and violating the principal of least surpise because we think we're slightly more correct isn't great. I also think trying to 'fix' raw ADTS is destined to always be a hack, an we shouldn't. YMMV. I'd like to hear views from others here. This would make the patch in [0] redundant. * I don't think hardcoding a value of 1024 (or 960) is wise, or necessarily even correct. For example, all HE-AAC MP4s I have signal a seek pre-roll of 2048 samples. Ideally we would pass through seek pre-roll somehow - but I am not sure of a good way in our existing setup - the write-only codecpar field for seek pre-roll is in samples, which is kind of incompatible with the way the sgpd box stores pre-roll as number of packets. We could set it based one the duration of the first packet(s), assuming all audio codecs will have only 1 packet duration (frame size), or we could add a new field. Possibly, we could leave the hardcoded values in place, only for seeking, inside e.g. ADTS. * This kind of brings me to: I don't really think using the same side data for both priming and pre-roll is a great idea. Clearly this set of problems is already confusing enough to many, and conflating the two in the API is a sure way to only make it worse. * If we are going to discard at seek, we need to take this into account in the demuxer's seek code, or we will always be $preroll number of samples off where the user actually wanted to seek and decode. I am almost certain I missed even more nuance, and hopefully Martin or Anton can chime in, or I remember more. But also, given all of this, I think we need to deeply consider how we approach this, so we don't end up with something that only covers certain cases (and I am sure I forgot more cases). To that end, I do not think rushing to get a patchset that can change sync on all AAC files in existence into 6.1 is wise. Even when this does go in, it should be able to sit in master for a good long time before being in a release. As I understand it, FATE is already unhappy, and it shouldn't be treated as being a problem with FATE vs the set. Lastly, some time this weekend/week, I will labour to create a more extensive set of AAC files we can use to test, and use in FATE. Hope that all made sense and I didn't forget any details in my Covid-induced haze. Cheers, - Derek [0] http://ffmpeg.org/pipermail/ffmpeg-devel/2023-October/316389.html [1] https://gist.github.com/dwbuiten/18745d6cb253a2304f776581c9f68b30 [2] https://github.com/nu774/fdkaac/blob/master/README#L165-L177 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".