On 3/23/2025 6:47 PM, Hendrik Leppkes wrote: > On Sun, Mar 23, 2025 at 9:35 PM James Almer <jamrial@gmail.com> wrote: >> >> On 3/23/2025 4:33 PM, Massimo Eynard wrote: >>> On 23/03/2025 20:01, James Almer wrote: >>>> On 3/22/2025 2:49 PM, Massimo Eynard wrote: >>>>> This patch adds support for decoding the fourth MLP substream >>>>> which contains the 16-channel presentation used for Atmos >>>>> audio objects. >>>>> >>>>> By default only the first three substreams are decoded >>>>> unless the new extract_objects flag is enabled as the resulting >>>>> presentation contains audio object feeds instead of classic >>>>> loudspeaker feeds. >>>>> >>>>> As this introduces interpolation of primitive matrices, precision >>>>> has been increased to 2.18 fixed point. Therefore this requires >>>>> DSP code upgrade which has been done for C and x86 implementations >>>>> but not the ARM implementation. >>>>> >>>>> Adds two FATE tests using existing atmos.thd sample to reflect >>>>> changes. >>>>> >>>>> Signed-off-by: Massimo Eynard <eynard.massimo@gmail.com> >>>>> --- >>>>> libavcodec/arm/mlpdsp_armv5te.S | 2 +- >>>>> libavcodec/arm/mlpdsp_init_arm.c | 3 +- >>>>> libavcodec/mlp.h | 10 +- >>>>> libavcodec/mlp_parse.c | 31 ++- >>>>> libavcodec/mlp_parse.h | 1 + >>>>> libavcodec/mlp_parser.c | 11 +- >>>>> libavcodec/mlpdec.c | 389 +++++++++++++++++++++++++++---- >>>>> libavcodec/mlpdsp.c | 50 +++- >>>>> libavcodec/mlpdsp.h | 25 ++ >>>>> libavcodec/x86/mlpdsp.asm | 19 +- >>>>> tests/fate/truehd.mak | 10 + >>>>> 11 files changed, 476 insertions(+), 75 deletions(-) >>>> >>>> With atmos.thd i get: >>>> >>>>> [aist#0:0/truehd @ 00000209caf3ee00] Guessed Channel Layout: 7.1.4 >>>>> Input #0, truehd, from '../samples/truehd/atmos.thd': >>>>> Duration: N/A, start: 0.000000, bitrate: N/A >>>>> Stream #0:0: Audio: truehd (Dolby TrueHD + Dolby Atmos), 48000 Hz, 7.1.4, s32 (24 bit) >>>> >>>> Which is unlikely to be correct. The file has 11 (or 12) objects, which is exported as 12 channels in an unspecified layout, and automatically assumed to be a 7.1.4 fixed layout. >>>> >>> >>> This is caused by `guess_input_channel_layout` (in `ffmpeg_demux.c`) which tries to assume a layout. >>> Would using `AV_CHANNEL_ORDER_CUSTOM` with all channels set to `AV_CHAN_UNKNOWN` (for unknown position, except LFE if present) be a better solution? >> >> Possibly, but it may make the stream undecodable unless you remap the >> channels (probably with a filter in the filterchain). >> >> Is there no better representation for the output? What are these 12 >> channels the sample exports? 16 channels (as you say the MLP substream >> contains) would match Ambisonics 3rd order, but i assume that doesn't >> apply here, unless you should also be outputting something else. >> > > Its object-based audio. Every extra "channel" represents an audio > object at any arbitrary position in space, as defined by separate > metadata, which you are then supposed to mix together for your final > speaker configuration. > Typically, the "bed" channels (eg. the base 7.1) will contain audio > that doesn't require much localization information, music, background > noises, and the objects will contain audio which is more relevant to > have full spatial localization. A mixer is then tasked based on the > spatial metadata and knowledge of the physical speaker configuration > to mix the objects for ideal spatial representation. > > We don't have a channel layout that would identify this sort of setup > as of yet, nevermind a mixer that could actually deal with it, or even > exporting the metadata from the TrueHD stream, but baby steps I > suppose. So we'd need a new layout (or pseudo-channel) where you set arbitrary coordinates? Sort of like what Apple defined in https://developer.apple.com/documentation/coreaudiotypes/audio-channel-coordinates > > FWIW, taking all this into account, I fully agree that it should by > default output the 7.1 representation that everyone can actually > process, because the bed+objects representation is rather unexpected > and unhandleable at this time. Agree.