Re: [FFmpeg-devel] [PATCH] avcodec/mlpdec: Add decoding of object audio data

From: Massimo Eynard <eynard.massimo@gmail.com>
To: ffmpeg-devel@ffmpeg.org
Subject: Re: [FFmpeg-devel] [PATCH] avcodec/mlpdec: Add decoding of object audio data
Date: Mon, 24 Mar 2025 20:07:38 +0100
Message-ID: <fb1381fc-346f-44ba-964d-2b45668b30a9@gmail.com> (raw)
In-Reply-To: <bde807f1-bff3-49c3-acdd-28459811ad7a@gmail.com>

On 24/03/2025 00:00, James Almer wrote:
> On 3/23/2025 6:47 PM, Hendrik Leppkes wrote:
>> On Sun, Mar 23, 2025 at 9:35 PM James Almer <jamrial@gmail.com> wrote:
>>>
>>> On 3/23/2025 4:33 PM, Massimo Eynard wrote:
>>>> On 23/03/2025 20:01, James Almer wrote:
>>>>> On 3/22/2025 2:49 PM, Massimo Eynard wrote:
>>>>>> This patch adds support for decoding the fourth MLP substream
>>>>>> which contains the 16-channel presentation used for Atmos
>>>>>> audio objects.
>>>>>>
>>>>>> By default only the first three substreams are decoded
>>>>>> unless the new extract_objects flag is enabled as the resulting
>>>>>> presentation contains audio object feeds instead of classic
>>>>>> loudspeaker feeds.
>>>>>>
>>>>>> As this introduces interpolation of primitive matrices, precision
>>>>>> has been increased to 2.18 fixed point. Therefore this requires
>>>>>> DSP code upgrade which has been done for C and x86 implementations
>>>>>> but not the ARM implementation.
>>>>>>
>>>>>> Adds two FATE tests using existing atmos.thd sample to reflect
>>>>>> changes.
>>>>>>
>>>>>> Signed-off-by: Massimo Eynard <eynard.massimo@gmail.com>
>>>>>> ---
>>>>>>     libavcodec/arm/mlpdsp_armv5te.S  |   2 +-
>>>>>>     libavcodec/arm/mlpdsp_init_arm.c |   3 +-
>>>>>>     libavcodec/mlp.h                 |  10 +-
>>>>>>     libavcodec/mlp_parse.c           |  31 ++-
>>>>>>     libavcodec/mlp_parse.h           |   1 +
>>>>>>     libavcodec/mlp_parser.c          |  11 +-
>>>>>>     libavcodec/mlpdec.c              | 389 +++++++++++++++++++++++++++----
>>>>>>     libavcodec/mlpdsp.c              |  50 +++-
>>>>>>     libavcodec/mlpdsp.h              |  25 ++
>>>>>>     libavcodec/x86/mlpdsp.asm        |  19 +-
>>>>>>     tests/fate/truehd.mak            |  10 +
>>>>>>     11 files changed, 476 insertions(+), 75 deletions(-)
>>>>>
>>>>> With atmos.thd i get:
>>>>>
>>>>>> [aist#0:0/truehd @ 00000209caf3ee00] Guessed Channel Layout: 7.1.4
>>>>>> Input #0, truehd, from '../samples/truehd/atmos.thd':
>>>>>>     Duration: N/A, start: 0.000000, bitrate: N/A
>>>>>>     Stream #0:0: Audio: truehd (Dolby TrueHD + Dolby Atmos), 48000 Hz, 7.1.4, s32 (24 bit)
>>>>>
>>>>> Which is unlikely to be correct. The file has 11 (or 12) objects, which is exported as 12 channels in an unspecified layout, and automatically assumed to be a 7.1.4 fixed layout.
>>>>>
>>>>
>>>> This is caused by `guess_input_channel_layout` (in `ffmpeg_demux.c`) which tries to assume a layout.
>>>> Would using `AV_CHANNEL_ORDER_CUSTOM` with all channels set to `AV_CHAN_UNKNOWN` (for unknown position, except LFE if present) be a better solution?
>>>
>>> Possibly, but it may make the stream undecodable unless you remap the
>>> channels (probably with a filter in the filterchain).
>>>
>>> Is there no better representation for the output? What are these 12
>>> channels the sample exports? 16 channels (as you say the MLP substream
>>> contains) would match Ambisonics 3rd order, but i assume that doesn't
>>> apply here, unless you should also be outputting something else.
>>>
>>
>> Its object-based audio. Every extra "channel" represents an audio
>> object at any arbitrary position in space, as defined by separate
>> metadata, which you are then supposed to mix together for your final
>> speaker configuration.
>> Typically, the "bed" channels (eg. the base 7.1) will contain audio
>> that doesn't require much localization information, music, background
>> noises, and the objects will contain audio which is more relevant to
>> have full spatial localization. A mixer is then tasked based on the
>> spatial metadata and knowledge of the physical speaker configuration
>> to mix the objects for ideal spatial representation.
>>
>> We don't have a channel layout that would identify this sort of setup
>> as of yet, nevermind a mixer that could actually deal with it, or even
>> exporting the metadata from the TrueHD stream, but baby steps I
>> suppose.
> 
> So we'd need a new layout (or pseudo-channel) where you set arbitrary coordinates? Sort of like what Apple defined in https://developer.apple.com/documentation/coreaudiotypes/audio-channel-coordinates
> 

That would be the best approach I guess. Atmos in TrueHD is the same as in E-AC-3 (except for the audio coding part of course) which is described in section 4 of ETSI TS 103 420.
In the specification, the audio "channels" for objects are called "audio object essences" which are supplied to a mixer/renderer alongside the metadata.
Section 4.4 describes the metadata interface.

However the purpose of this patch is only to decode the essences. What should I do for now?

>>
>> FWIW, taking all this into account, I fully agree that it should by
>> default output the 7.1 representation that everyone can actually
>> process, because the bed+objects representation is rather unexpected
>> and unhandleable at this time.
> Agree.
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".