* [FFmpeg-devel] [PATCH v7 0/8] avformat: introduce AVStreamGroup
@ 2023-12-14 20:14 James Almer
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 1/8] avutil: introduce an Immersive Audio Model and Formats API James Almer
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: James Almer @ 2023-12-14 20:14 UTC (permalink / raw)
  To: ffmpeg-devel
Addressed Anton's comments, plus some extra fixes for issues i found
while testing.
James Almer (8):
  avutil: introduce an Immersive Audio Model and Formats API
  avformat: introduce AVStreamGroup
  ffmpeg: add support for muxing AVStreamGroups
  avcodec/packet: add IAMF Parameters side data types
  avcodec/get_bits: add get_leb()
  avformat/aviobuf: add ffio_read_leb() and ffio_write_leb()
  avformat: Immersive Audio Model and Formats demuxer
  avformat: Immersive Audio Model and Formats muxer
 doc/ffmpeg.texi                 |  200 ++++++
 doc/fftools-common-opts.texi    |   17 +-
 fftools/ffmpeg.h                |    2 +
 fftools/ffmpeg_mux_init.c       |  341 ++++++++++
 fftools/ffmpeg_opt.c            |    2 +
 libavcodec/avpacket.c           |    3 +
 libavcodec/bitstream.h          |    2 +
 libavcodec/bitstream_template.h |   23 +
 libavcodec/get_bits.h           |   24 +
 libavcodec/packet.h             |   24 +
 libavformat/Makefile            |    2 +
 libavformat/allformats.c        |    2 +
 libavformat/avformat.c          |   91 ++-
 libavformat/avformat.h          |  153 +++++
 libavformat/avio_internal.h     |   10 +
 libavformat/aviobuf.c           |   33 +
 libavformat/dump.c              |  147 +++-
 libavformat/iamf.c              |  125 ++++
 libavformat/iamf.h              |  163 +++++
 libavformat/iamf_parse.c        | 1106 +++++++++++++++++++++++++++++++
 libavformat/iamf_parse.h        |   38 ++
 libavformat/iamf_writer.c       |  860 ++++++++++++++++++++++++
 libavformat/iamf_writer.h       |   51 ++
 libavformat/iamfdec.c           |  503 ++++++++++++++
 libavformat/iamfenc.c           |  387 +++++++++++
 libavformat/internal.h          |   33 +
 libavformat/options.c           |  139 ++++
 libavutil/Makefile              |    2 +
 libavutil/iamf.c                |  563 ++++++++++++++++
 libavutil/iamf.h                |  620 +++++++++++++++++
 30 files changed, 5632 insertions(+), 34 deletions(-)
 create mode 100644 libavformat/iamf.c
 create mode 100644 libavformat/iamf.h
 create mode 100644 libavformat/iamf_parse.c
 create mode 100644 libavformat/iamf_parse.h
 create mode 100644 libavformat/iamf_writer.c
 create mode 100644 libavformat/iamf_writer.h
 create mode 100644 libavformat/iamfdec.c
 create mode 100644 libavformat/iamfenc.c
 create mode 100644 libavutil/iamf.c
 create mode 100644 libavutil/iamf.h
-- 
2.43.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply	[flat|nested] 16+ messages in thread
* [FFmpeg-devel] [PATCH 1/8] avutil: introduce an Immersive Audio Model and Formats API
  2023-12-14 20:14 [FFmpeg-devel] [PATCH v7 0/8] avformat: introduce AVStreamGroup James Almer
@ 2023-12-14 20:14 ` James Almer
  2023-12-18 11:04   ` Anton Khirnov
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 2/8] avformat: introduce AVStreamGroup James Almer
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: James Almer @ 2023-12-14 20:14 UTC (permalink / raw)
  To: ffmpeg-devel
Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavutil/Makefile |   2 +
 libavutil/iamf.c   | 563 ++++++++++++++++++++++++++++++++++++++++
 libavutil/iamf.h   | 620 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1185 insertions(+)
 create mode 100644 libavutil/iamf.c
 create mode 100644 libavutil/iamf.h
diff --git a/libavutil/Makefile b/libavutil/Makefile
index 4711f8cde8..62cc1a1831 100644
--- a/libavutil/Makefile
+++ b/libavutil/Makefile
@@ -51,6 +51,7 @@ HEADERS = adler32.h                                                     \
           hwcontext_videotoolbox.h                                      \
           hwcontext_vdpau.h                                             \
           hwcontext_vulkan.h                                            \
+          iamf.h                                                        \
           imgutils.h                                                    \
           intfloat.h                                                    \
           intreadwrite.h                                                \
@@ -140,6 +141,7 @@ OBJS = adler32.o                                                        \
        hdr_dynamic_vivid_metadata.o                                     \
        hmac.o                                                           \
        hwcontext.o                                                      \
+       iamf.o                                                           \
        imgutils.o                                                       \
        integer.o                                                        \
        intmath.o                                                        \
diff --git a/libavutil/iamf.c b/libavutil/iamf.c
new file mode 100644
index 0000000000..62b6051049
--- /dev/null
+++ b/libavutil/iamf.c
@@ -0,0 +1,563 @@
+/*
+ * Immersive Audio Model and Formats helper functions and defines
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <limits.h>
+#include <stddef.h>
+#include <stdint.h>
+
+#include "avassert.h"
+#include "error.h"
+#include "iamf.h"
+#include "log.h"
+#include "mem.h"
+#include "opt.h"
+
+#define IAMF_ADD_FUNC_TEMPLATE(parent_type, parent_name, child_type, child_name, suffix)                   \
+child_type *av_iamf_ ## parent_name ## _add_ ## child_name(parent_type *parent_name)                       \
+{                                                                                                          \
+    child_type **child_name ## suffix, *child_name;                                                        \
+                                                                                                           \
+    if (parent_name->nb_## child_name ## suffix == UINT_MAX)                                               \
+        return NULL;                                                                                       \
+                                                                                                           \
+    child_name ## suffix = av_realloc_array(parent_name->child_name ## suffix,                             \
+                                            parent_name->nb_## child_name ## suffix + 1,                   \
+                                            sizeof(*parent_name->child_name ## suffix));                   \
+    if (!child_name ## suffix)                                                                             \
+        return NULL;                                                                                       \
+                                                                                                           \
+    parent_name->child_name ## suffix = child_name ## suffix;                                              \
+                                                                                                           \
+    child_name = parent_name->child_name ## suffix[parent_name->nb_## child_name ## suffix]                \
+               = av_mallocz(sizeof(*child_name));                                                          \
+    if (!child_name)                                                                                       \
+        return NULL;                                                                                       \
+                                                                                                           \
+    child_name->av_class = &child_name ## _class;                                                          \
+    av_opt_set_defaults(child_name);                                                                       \
+    parent_name->nb_## child_name ## suffix++;                                                             \
+                                                                                                           \
+    return child_name;                                                                                     \
+}
+
+#define FLAGS AV_OPT_FLAG_ENCODING_PARAM
+
+//
+// Param Definition
+//
+#define OFFSET(x) offsetof(AVIAMFMixGain, x)
+static const AVOption mix_gain_options[] = {
+    { "subblock_duration", "set subblock_duration", OFFSET(subblock_duration), AV_OPT_TYPE_INT64, {.i64 = 1 }, 1, UINT_MAX, FLAGS },
+    { "animation_type", "set animation_type", OFFSET(animation_type), AV_OPT_TYPE_INT, {.i64 = 0 }, 0, 2, FLAGS },
+    { "start_point_value", "set start_point_value", OFFSET(animation_type), AV_OPT_TYPE_RATIONAL, {.dbl = 0 }, -128.0, 128.0, FLAGS },
+    { "end_point_value", "set end_point_value", OFFSET(animation_type), AV_OPT_TYPE_RATIONAL, {.dbl = 0 }, -128.0, 128.0, FLAGS },
+    { "control_point_value", "set control_point_value", OFFSET(animation_type), AV_OPT_TYPE_RATIONAL, {.dbl = 0 }, -128.0, 128.0, FLAGS },
+    { "control_point_relative_time", "set control_point_relative_time", OFFSET(animation_type), AV_OPT_TYPE_RATIONAL, {.dbl = 0 }, 0.0, 1.0, FLAGS },
+    { NULL },
+};
+
+static const AVClass mix_gain_class = {
+    .class_name     = "AVIAMFSubmixElement",
+    .item_name      = av_default_item_name,
+    .version        = LIBAVUTIL_VERSION_INT,
+    .option         = mix_gain_options,
+};
+
+#undef OFFSET
+#define OFFSET(x) offsetof(AVIAMFDemixingInfo, x)
+static const AVOption demixing_info_options[] = {
+    { "subblock_duration", "set subblock_duration", OFFSET(subblock_duration), AV_OPT_TYPE_INT64, {.i64 = 1 }, 1, UINT_MAX, FLAGS },
+    { "dmixp_mode", "set dmixp_mode", OFFSET(dmixp_mode), AV_OPT_TYPE_INT, {.i64 = 0 }, 0, 6, FLAGS },
+    { NULL },
+};
+
+static const AVClass demixing_info_class = {
+    .class_name     = "AVIAMFDemixingInfo",
+    .item_name      = av_default_item_name,
+    .version        = LIBAVUTIL_VERSION_INT,
+    .option         = demixing_info_options,
+};
+
+#undef OFFSET
+#define OFFSET(x) offsetof(AVIAMFReconGain, x)
+static const AVOption recon_gain_options[] = {
+    { "subblock_duration", "set subblock_duration", OFFSET(subblock_duration), AV_OPT_TYPE_INT64, {.i64 = 1 }, 1, UINT_MAX, FLAGS },
+    { NULL },
+};
+
+static const AVClass recon_gain_class = {
+    .class_name     = "AVIAMFReconGain",
+    .item_name      = av_default_item_name,
+    .version        = LIBAVUTIL_VERSION_INT,
+    .option         = recon_gain_options,
+};
+
+#undef OFFSET
+#define OFFSET(x) offsetof(AVIAMFParamDefinition, x)
+static const AVOption param_definition_options[] = {
+    { "parameter_id", "set parameter_id", OFFSET(parameter_id), AV_OPT_TYPE_INT64, {.i64 = 0 }, 0, UINT_MAX, FLAGS },
+    { "parameter_rate", "set parameter_rate", OFFSET(parameter_rate), AV_OPT_TYPE_INT64, {.i64 = 0 }, 0, UINT_MAX, FLAGS },
+    { "duration", "set duration", OFFSET(duration), AV_OPT_TYPE_INT64, {.i64 = 0 }, 0, UINT_MAX, FLAGS },
+    { "constant_subblock_duration", "set constant_subblock_duration", OFFSET(constant_subblock_duration), AV_OPT_TYPE_INT64, {.i64 = 0 }, 0, UINT_MAX, FLAGS },
+    { NULL },
+};
+
+static const AVClass *param_definition_child_iterate(void **opaque)
+{
+    uintptr_t i = (uintptr_t)*opaque;
+    const AVClass *ret = NULL;
+
+    switch(i) {
+    case AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN:
+        ret = &mix_gain_class;
+        break;
+    case AV_IAMF_PARAMETER_DEFINITION_DEMIXING:
+        ret = &demixing_info_class;
+        break;
+    case AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN:
+        ret = &recon_gain_class;
+        break;
+    default:
+        break;
+    }
+
+    if (ret)
+        *opaque = (void*)(i + 1);
+    return ret;
+}
+
+static const AVClass param_definition_class = {
+    .class_name          = "AVIAMFParamDefinition",
+    .item_name           = av_default_item_name,
+    .version             = LIBAVUTIL_VERSION_INT,
+    .option              = param_definition_options,
+    .child_class_iterate = param_definition_child_iterate,
+};
+
+const AVClass *av_iamf_param_definition_get_class(void)
+{
+    return ¶m_definition_class;
+}
+
+AVIAMFParamDefinition *av_iamf_param_definition_alloc(enum AVIAMFParamDefinitionType type,
+                                                      unsigned int nb_subblocks, size_t *out_size)
+{
+
+    struct MixGainStruct {
+        AVIAMFParamDefinition p;
+        AVIAMFMixGain m;
+    };
+    struct DemixStruct {
+        AVIAMFParamDefinition p;
+        AVIAMFDemixingInfo d;
+    };
+    struct ReconGainStruct {
+        AVIAMFParamDefinition p;
+        AVIAMFReconGain r;
+    };
+    size_t subblocks_offset, subblock_size;
+    size_t size;
+    AVIAMFParamDefinition *par;
+
+    switch (type) {
+    case AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN:
+        subblocks_offset = offsetof(struct MixGainStruct, m);
+        subblock_size = sizeof(AVIAMFMixGain);
+        break;
+    case AV_IAMF_PARAMETER_DEFINITION_DEMIXING:
+        subblocks_offset = offsetof(struct DemixStruct, d);
+        subblock_size = sizeof(AVIAMFDemixingInfo);
+        break;
+    case AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN:
+        subblocks_offset = offsetof(struct ReconGainStruct, r);
+        subblock_size = sizeof(AVIAMFReconGain);
+        break;
+    default:
+        return NULL;
+    }
+
+    size = subblocks_offset;
+    if (nb_subblocks > (SIZE_MAX - size) / subblock_size)
+        return NULL;
+    size += subblock_size * nb_subblocks;
+
+    par = av_mallocz(size);
+    if (!par)
+        return NULL;
+
+    par->av_class = ¶m_definition_class;
+    av_opt_set_defaults(par);
+
+    par->type = type;
+    par->nb_subblocks = nb_subblocks;
+    par->subblock_size = subblock_size;
+    par->subblocks_offset = subblocks_offset;
+
+    for (int i = 0; i < nb_subblocks; i++) {
+        void *subblock = av_iamf_param_definition_get_subblock(par, i);
+
+        switch (type) {
+        case AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN:
+            ((AVIAMFMixGain *)subblock)->av_class = &mix_gain_class;
+            break;
+        case AV_IAMF_PARAMETER_DEFINITION_DEMIXING:
+            ((AVIAMFDemixingInfo *)subblock)->av_class = &demixing_info_class;
+            break;
+        case AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN:
+            ((AVIAMFReconGain *)subblock)->av_class = &recon_gain_class;
+            break;
+        default:
+            av_assert0(0);
+        }
+
+        av_opt_set_defaults(subblock);
+    }
+
+    if (out_size)
+        *out_size = size;
+
+    return par;
+}
+
+//
+// Audio Element
+//
+#undef OFFSET
+#define OFFSET(x) offsetof(AVIAMFLayer, x)
+static const AVOption layer_options[] = {
+    { "ch_layout", "set ch_layout", OFFSET(ch_layout), AV_OPT_TYPE_CHLAYOUT, {.str = NULL }, 0, 0, FLAGS },
+    { "flags", "set flags", OFFSET(flags), AV_OPT_TYPE_FLAGS,
+        {.i64 = 0 }, 0, AV_IAMF_LAYER_FLAG_RECON_GAIN, FLAGS, "flags" },
+            {"recon_gain",  "Recon gain is present", 0, AV_OPT_TYPE_CONST,
+                {.i64 = AV_IAMF_LAYER_FLAG_RECON_GAIN }, INT_MIN, INT_MAX, FLAGS, "flags"},
+    { "output_gain_flags", "set output_gain_flags", OFFSET(output_gain_flags), AV_OPT_TYPE_FLAGS,
+        {.i64 = 0 }, 0, (1 << 6) - 1, FLAGS, "output_gain_flags" },
+            {"FL",  "Left channel",            0, AV_OPT_TYPE_CONST,
+                {.i64 = 1 << 5 }, INT_MIN, INT_MAX, FLAGS, "output_gain_flags"},
+            {"FR",  "Right channel",           0, AV_OPT_TYPE_CONST,
+                {.i64 = 1 << 4 }, INT_MIN, INT_MAX, FLAGS, "output_gain_flags"},
+            {"BL",  "Left surround channel",   0, AV_OPT_TYPE_CONST,
+                {.i64 = 1 << 3 }, INT_MIN, INT_MAX, FLAGS, "output_gain_flags"},
+            {"BR",  "Right surround channel",  0, AV_OPT_TYPE_CONST,
+                {.i64 = 1 << 2 }, INT_MIN, INT_MAX, FLAGS, "output_gain_flags"},
+            {"TFL", "Left top front channel",  0, AV_OPT_TYPE_CONST,
+                {.i64 = 1 << 1 }, INT_MIN, INT_MAX, FLAGS, "output_gain_flags"},
+            {"TFR", "Right top front channel", 0, AV_OPT_TYPE_CONST,
+                {.i64 = 1 << 0 }, INT_MIN, INT_MAX, FLAGS, "output_gain_flags"},
+    { "output_gain", "set output_gain", OFFSET(output_gain), AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, -128.0, 128.0, FLAGS },
+    { "ambisonics_mode", "set ambisonics_mode", OFFSET(ambisonics_mode), AV_OPT_TYPE_INT,
+            { .i64 = AV_IAMF_AMBISONICS_MODE_MONO },
+            AV_IAMF_AMBISONICS_MODE_MONO, AV_IAMF_AMBISONICS_MODE_PROJECTION, FLAGS, "ambisonics_mode" },
+        { "mono",       NULL, 0, AV_OPT_TYPE_CONST,
+                   { .i64 = AV_IAMF_AMBISONICS_MODE_MONO },       .unit = "ambisonics_mode" },
+        { "projection", NULL, 0, AV_OPT_TYPE_CONST,
+                   { .i64 = AV_IAMF_AMBISONICS_MODE_PROJECTION }, .unit = "ambisonics_mode" },
+    { NULL },
+};
+
+static const AVClass layer_class = {
+    .class_name     = "AVIAMFLayer",
+    .item_name      = av_default_item_name,
+    .version        = LIBAVUTIL_VERSION_INT,
+    .option         = layer_options,
+};
+
+#undef OFFSET
+#define OFFSET(x) offsetof(AVIAMFAudioElement, x)
+static const AVOption audio_element_options[] = {
+    { "audio_element_type", "set audio_element_type", OFFSET(audio_element_type), AV_OPT_TYPE_INT,
+            {.i64 = AV_IAMF_AUDIO_ELEMENT_TYPE_CHANNEL },
+            AV_IAMF_AUDIO_ELEMENT_TYPE_CHANNEL, AV_IAMF_AUDIO_ELEMENT_TYPE_SCENE, FLAGS, "audio_element_type" },
+        { "channel", NULL, 0, AV_OPT_TYPE_CONST,
+                   { .i64 = AV_IAMF_AUDIO_ELEMENT_TYPE_CHANNEL }, .unit = "audio_element_type" },
+        { "scene",   NULL, 0, AV_OPT_TYPE_CONST,
+                   { .i64 = AV_IAMF_AUDIO_ELEMENT_TYPE_SCENE },   .unit = "audio_element_type" },
+    { "default_w", "set default_w", OFFSET(default_w), AV_OPT_TYPE_INT, {.i64 = 0 }, 0, 10, FLAGS },
+    { NULL },
+};
+
+static const AVClass *audio_element_child_iterate(void **opaque)
+{
+    uintptr_t i = (uintptr_t)*opaque;
+    const AVClass *ret = NULL;
+
+    if (i)
+        ret = &layer_class;
+
+    if (ret)
+        *opaque = (void*)(i + 1);
+    return ret;
+}
+
+static const AVClass audio_element_class = {
+    .class_name          = "AVIAMFAudioElement",
+    .item_name           = av_default_item_name,
+    .version             = LIBAVUTIL_VERSION_INT,
+    .option              = audio_element_options,
+    .child_class_iterate = audio_element_child_iterate,
+};
+
+const AVClass *av_iamf_audio_element_get_class(void)
+{
+    return &audio_element_class;
+}
+
+AVIAMFAudioElement *av_iamf_audio_element_alloc(void)
+{
+    AVIAMFAudioElement *audio_element = av_mallocz(sizeof(*audio_element));
+
+    if (audio_element) {
+        audio_element->av_class = &audio_element_class;
+        av_opt_set_defaults(audio_element);
+    }
+
+    return audio_element;
+}
+
+IAMF_ADD_FUNC_TEMPLATE(AVIAMFAudioElement, audio_element, AVIAMFLayer, layer, s)
+
+void av_iamf_audio_element_free(AVIAMFAudioElement **paudio_element)
+{
+    AVIAMFAudioElement *audio_element = *paudio_element;
+
+    if (!audio_element)
+        return;
+
+    for (int i = 0; i < audio_element->nb_layers; i++) {
+        AVIAMFLayer *layer = audio_element->layers[i];
+        av_opt_free(layer);
+        av_free(layer->demixing_matrix);
+        av_free(layer);
+    }
+    av_free(audio_element->layers);
+
+    av_free(audio_element->demixing_info);
+    av_free(audio_element->recon_gain_info);
+    av_freep(paudio_element);
+}
+
+//
+// Mix Presentation
+//
+#undef OFFSET
+#define OFFSET(x) offsetof(AVIAMFSubmixElement, x)
+static const AVOption submix_element_options[] = {
+    { "headphones_rendering_mode", "Headphones rendering mode", OFFSET(headphones_rendering_mode), AV_OPT_TYPE_INT,
+            { .i64 = AV_IAMF_HEADPHONES_MODE_STEREO },
+            AV_IAMF_HEADPHONES_MODE_STEREO, AV_IAMF_HEADPHONES_MODE_BINAURAL, FLAGS, "headphones_rendering_mode" },
+        { "stereo",   NULL, 0, AV_OPT_TYPE_CONST,
+                   { .i64 = AV_IAMF_HEADPHONES_MODE_STEREO },   .unit = "headphones_rendering_mode" },
+        { "binaural", NULL, 0, AV_OPT_TYPE_CONST,
+                   { .i64 = AV_IAMF_HEADPHONES_MODE_BINAURAL }, .unit = "headphones_rendering_mode" },
+    { "default_mix_gain", "Default mix gain", OFFSET(default_mix_gain), AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, -128.0, 128.0, FLAGS },
+    { "annotations", "Annotations", OFFSET(annotations), AV_OPT_TYPE_DICT, { .str = NULL }, 0, 0, FLAGS },
+    { NULL },
+};
+
+static void *submix_element_child_next(void *obj, void *prev)
+{
+    AVIAMFSubmixElement *submix_element = obj;
+    if (!prev)
+        return submix_element->element_mix_config;
+
+    return NULL;
+}
+
+static const AVClass *submix_element_child_iterate(void **opaque)
+{
+    uintptr_t i = (uintptr_t)*opaque;
+    const AVClass *ret = NULL;
+
+    if (i)
+        ret = ¶m_definition_class;
+
+    if (ret)
+        *opaque = (void*)(i + 1);
+    return ret;
+}
+
+static const AVClass element_class = {
+    .class_name          = "AVIAMFSubmixElement",
+    .item_name           = av_default_item_name,
+    .version             = LIBAVUTIL_VERSION_INT,
+    .option              = submix_element_options,
+    .child_next          = submix_element_child_next,
+    .child_class_iterate = submix_element_child_iterate,
+};
+
+IAMF_ADD_FUNC_TEMPLATE(AVIAMFSubmix, submix, AVIAMFSubmixElement, element, s)
+
+#undef OFFSET
+#define OFFSET(x) offsetof(AVIAMFSubmixLayout, x)
+static const AVOption submix_layout_options[] = {
+    { "layout_type", "Layout type", OFFSET(layout_type), AV_OPT_TYPE_INT,
+            { .i64 = AV_IAMF_SUBMIX_LAYOUT_TYPE_LOUDSPEAKERS },
+            AV_IAMF_SUBMIX_LAYOUT_TYPE_LOUDSPEAKERS, AV_IAMF_SUBMIX_LAYOUT_TYPE_BINAURAL, FLAGS, "layout_type" },
+        { "loudspeakers", NULL, 0, AV_OPT_TYPE_CONST,
+                   { .i64 = AV_IAMF_SUBMIX_LAYOUT_TYPE_LOUDSPEAKERS }, .unit = "layout_type" },
+        { "binaural",     NULL, 0, AV_OPT_TYPE_CONST,
+                   { .i64 = AV_IAMF_SUBMIX_LAYOUT_TYPE_BINAURAL },     .unit = "layout_type" },
+    { "sound_system", "Sound System", OFFSET(sound_system), AV_OPT_TYPE_CHLAYOUT, { .str = NULL }, 0, 0, FLAGS },
+    { "integrated_loudness", "Integrated loudness", OFFSET(integrated_loudness), AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, -128.0, 128.0, FLAGS },
+    { "digital_peak", "Digital peak", OFFSET(digital_peak), AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, -128.0, 128.0, FLAGS },
+    { "true_peak", "True peak", OFFSET(true_peak), AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, -128.0, 128.0, FLAGS },
+    { "dialog_anchored_loudness", "Anchored loudness (Dialog)", OFFSET(dialogue_anchored_loudness), AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, -128.0, 128.0, FLAGS },
+    { "album_anchored_loudness", "Anchored loudness (Album)", OFFSET(album_anchored_loudness), AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, -128.0, 128.0, FLAGS },
+    { NULL },
+};
+
+static const AVClass layout_class = {
+    .class_name     = "AVIAMFSubmixLayout",
+    .item_name      = av_default_item_name,
+    .version        = LIBAVUTIL_VERSION_INT,
+    .option         = submix_layout_options,
+};
+
+IAMF_ADD_FUNC_TEMPLATE(AVIAMFSubmix, submix, AVIAMFSubmixLayout, layout, s)
+
+#undef OFFSET
+#define OFFSET(x) offsetof(AVIAMFSubmix, x)
+static const AVOption submix_presentation_options[] = {
+    { "default_mix_gain", "Default mix gain", OFFSET(default_mix_gain), AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, -128.0, 128.0, FLAGS },
+    { NULL },
+};
+
+static void *submix_presentation_child_next(void *obj, void *prev)
+{
+    AVIAMFSubmix *sub_mix = obj;
+    if (!prev)
+        return sub_mix->output_mix_config;
+
+    return NULL;
+}
+
+static const AVClass *submix_presentation_child_iterate(void **opaque)
+{
+    uintptr_t i = (uintptr_t)*opaque;
+    const AVClass *ret = NULL;
+
+    switch(i) {
+    case 0:
+        ret = &element_class;
+        break;
+    case 1:
+        ret = &layout_class;
+        break;
+    case 2:
+        ret = ¶m_definition_class;
+        break;
+    default:
+        break;
+    }
+
+    if (ret)
+        *opaque = (void*)(i + 1);
+    return ret;
+}
+
+static const AVClass submix_class = {
+    .class_name          = "AVIAMFSubmix",
+    .item_name           = av_default_item_name,
+    .version             = LIBAVUTIL_VERSION_INT,
+    .option              = submix_presentation_options,
+    .child_next          = submix_presentation_child_next,
+    .child_class_iterate = submix_presentation_child_iterate,
+};
+
+#undef OFFSET
+#define OFFSET(x) offsetof(AVIAMFMixPresentation, x)
+static const AVOption mix_presentation_options[] = {
+    { "annotations", "set annotations", OFFSET(annotations), AV_OPT_TYPE_DICT, {.str = NULL }, 0, 0, FLAGS },
+    { NULL },
+};
+
+#undef OFFSET
+#undef FLAGS
+
+static const AVClass *mix_presentation_child_iterate(void **opaque)
+{
+    uintptr_t i = (uintptr_t)*opaque;
+    const AVClass *ret = NULL;
+
+    if (i)
+        ret = &submix_class;
+
+    if (ret)
+        *opaque = (void*)(i + 1);
+    return ret;
+}
+
+static const AVClass mix_presentation_class = {
+    .class_name          = "AVIAMFMixPresentation",
+    .item_name           = av_default_item_name,
+    .version             = LIBAVUTIL_VERSION_INT,
+    .option              = mix_presentation_options,
+    .child_class_iterate = mix_presentation_child_iterate,
+};
+
+const AVClass *av_iamf_mix_presentation_get_class(void)
+{
+    return &mix_presentation_class;
+}
+
+AVIAMFMixPresentation *av_iamf_mix_presentation_alloc(void)
+{
+    AVIAMFMixPresentation *mix_presentation = av_mallocz(sizeof(*mix_presentation));
+
+    if (mix_presentation) {
+        mix_presentation->av_class = &mix_presentation_class;
+        av_opt_set_defaults(mix_presentation);
+    }
+
+    return mix_presentation;
+}
+
+IAMF_ADD_FUNC_TEMPLATE(AVIAMFMixPresentation, mix_presentation, AVIAMFSubmix, submix, es)
+
+void av_iamf_mix_presentation_free(AVIAMFMixPresentation **pmix_presentation)
+{
+    AVIAMFMixPresentation *mix_presentation = *pmix_presentation;
+
+    if (!mix_presentation)
+        return;
+
+    for (int i = 0; i < mix_presentation->nb_submixes; i++) {
+        AVIAMFSubmix *sub_mix = mix_presentation->submixes[i];
+        for (int j = 0; j < sub_mix->nb_elements; j++) {
+            AVIAMFSubmixElement *submix_element = sub_mix->elements[j];
+            av_opt_free(submix_element);
+            av_free(submix_element->element_mix_config);
+            av_free(submix_element);
+        }
+        av_free(sub_mix->elements);
+        for (int j = 0; j < sub_mix->nb_layouts; j++) {
+            AVIAMFSubmixLayout *submix_layout = sub_mix->layouts[j];
+            av_opt_free(submix_layout);
+            av_free(submix_layout);
+        }
+        av_free(sub_mix->layouts);
+        av_free(sub_mix->output_mix_config);
+        av_free(sub_mix);
+    }
+    av_opt_free(mix_presentation);
+    av_free(mix_presentation->submixes);
+
+    av_freep(pmix_presentation);
+}
diff --git a/libavutil/iamf.h b/libavutil/iamf.h
new file mode 100644
index 0000000000..7038b71a27
--- /dev/null
+++ b/libavutil/iamf.h
@@ -0,0 +1,620 @@
+/*
+ * Immersive Audio Model and Formats helper functions and defines
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVUTIL_IAMF_H
+#define AVUTIL_IAMF_H
+
+/**
+ * @file
+ * Immersive Audio Model and Formats API header
+ * @see <a href="https://aomediacodec.github.io/iamf/">Immersive Audio Model and Formats</a>
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+
+#include "attributes.h"
+#include "avassert.h"
+#include "channel_layout.h"
+#include "dict.h"
+#include "rational.h"
+
+/**
+ * @defgroup lavf_iamf_params Parameter Definition
+ * @{
+ * Parameters as defined in section 3.6.1 and 3.8 of IAMF.
+ * @}
+ * @defgroup lavf_iamf_audio Audio Element
+ * @{
+ * Audio Elements as defined in section 3.6 of IAMF.
+ * @}
+ * @defgroup lavf_iamf_mix Mix Presentation
+ * @{
+ * Mix Presentations as defined in section 3.7 of IAMF.
+ * @}
+ *
+ * @}
+ * @addtogroup lavf_iamf_params
+ * @{
+ */
+enum AVIAMFAnimationType {
+    AV_IAMF_ANIMATION_TYPE_STEP,
+    AV_IAMF_ANIMATION_TYPE_LINEAR,
+    AV_IAMF_ANIMATION_TYPE_BEZIER,
+};
+
+/**
+ * Mix Gain Parameter Data as defined in section 3.8.1 of IAMF.
+ */
+typedef struct AVIAMFMixGain {
+    const AVClass *av_class;
+
+    /**
+     * Duration for the given subblock. It must not be 0.
+     */
+    unsigned int subblock_duration;
+    /**
+     * The type of animation applied to the parameter values.
+     */
+    enum AVIAMFAnimationType animation_type;
+    /**
+     * Parameter value that is applied at the start of the subblock.
+     * Applies to all defined Animation Types.
+     *
+     * Valid range of values is -128.0 to 128.0
+     */
+    AVRational start_point_value;
+    /**
+     * Parameter value that is applied at the end of the subblock.
+     * Applies only to AV_IAMF_ANIMATION_TYPE_LINEAR and
+     * AV_IAMF_ANIMATION_TYPE_BEZIER Animation Types.
+     *
+     * Valid range of values is -128.0 to 128.0
+     */
+    AVRational end_point_value;
+    /**
+     * Parameter value of the middle control point of a quadratic Bezier
+     * curve, i.e., its y-axis value.
+     * Applies only to AV_IAMF_ANIMATION_TYPE_BEZIER Animation Type.
+     *
+     * Valid range of values is -128.0 to 128.0
+     */
+    AVRational control_point_value;
+    /**
+     * Parameter value of the time of the middle control point of a
+     * quadratic Bezier curve, i.e., its x-axis value.
+     * Applies only to AV_IAMF_ANIMATION_TYPE_BEZIER Animation Type.
+     *
+     * Valid range of values is 0.0 to 1.0
+     */
+    AVRational control_point_relative_time;
+} AVIAMFMixGain;
+
+/**
+ * Demixing Info Parameter Data as defined in section 3.8.2 of IAMF.
+ */
+typedef struct AVIAMFDemixingInfo {
+    const AVClass *av_class;
+
+    /**
+     * Duration for the given subblock. It must not be 0.
+     */
+    unsigned int subblock_duration;
+    /**
+     * Pre-defined combination of demixing parameters.
+     */
+    unsigned int dmixp_mode;
+} AVIAMFDemixingInfo;
+
+/**
+ * Recon Gain Info Parameter Data as defined in section 3.8.3 of IAMF.
+ */
+typedef struct AVIAMFReconGain {
+    const AVClass *av_class;
+
+    /**
+     * Duration for the given subblock. It must not be 0.
+     */
+    unsigned int subblock_duration;
+
+    /**
+     * Array of gain values to be applied to each channel for each layer
+     * defined in the Audio Element referencing the parent Parameter Definition.
+     * Values for layers where the AV_IAMF_LAYER_FLAG_RECON_GAIN flag is not set
+     * are undefined.
+     *
+     * Channel order is: FL, C, FR, SL, SR, TFL, TFR, BL, BR, TBL, TBR, LFE
+     */
+    uint8_t recon_gain[6][12];
+} AVIAMFReconGain;
+
+enum AVIAMFParamDefinitionType {
+   /**
+    * Subblocks are of struct type AVIAMFMixGain
+    */
+    AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN,
+   /**
+    * Subblocks are of struct type AVIAMFDemixingInfo
+    */
+    AV_IAMF_PARAMETER_DEFINITION_DEMIXING,
+   /**
+    * Subblocks are of struct type AVIAMFReconGain
+    */
+    AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN,
+};
+
+/**
+ * Parameters as defined in section 3.6.1 of IAMF.
+ *
+ * The struct is allocated by av_iamf_param_definition_alloc() along with an
+ * array of subblocks, its type depending on the value of type.
+ * This array is placed subblocks_offset bytes after the start of this struct.
+ */
+typedef struct AVIAMFParamDefinition {
+    const AVClass *av_class;
+
+    /**
+     * Offset in bytes from the start of this struct, at which the subblocks
+     * array is located.
+     */
+    size_t subblocks_offset;
+    /**
+     * Size in bytes of each element in the subblocks array.
+     */
+    size_t subblock_size;
+    /**
+     * Number of subblocks in the array.
+     *
+     * Must be 0 if @ref constant_subblock_duration is not 0.
+     */
+    unsigned int nb_subblocks;
+
+    /**
+     * Parameters type. Determines the type of the subblock elements.
+     */
+    enum AVIAMFParamDefinitionType type;
+
+    /**
+     * Identifier for the paremeter substream.
+     */
+    unsigned int parameter_id;
+    /**
+     * Sample rate for the paremeter substream. It must not be 0.
+     */
+    unsigned int parameter_rate;
+
+    /**
+     * The duration of the all subblocks in this parameter definition.
+     *
+     * May be 0, in which case all duration values should be specified in
+     * another parameter definition referencing the same parameter_id.
+     */
+    unsigned int duration;
+    /**
+     * The duration of every subblock in the case where all subblocks, with
+     * the optional exception of the last subblock, have equal durations.
+     *
+     * Must be 0 if subblocks have different durations.
+     */
+    unsigned int constant_subblock_duration;
+} AVIAMFParamDefinition;
+
+const AVClass *av_iamf_param_definition_get_class(void);
+
+/**
+ * Allocates memory for AVIAMFParamDefinition, plus an array of {@code nb_subblocks}
+ * amount of subblocks of the given type and initializes the variables. Can be
+ * freed with a normal av_free() call.
+ *
+ * @param size if non-NULL, the size in bytes of the resulting data array is written here.
+ */
+AVIAMFParamDefinition *av_iamf_param_definition_alloc(enum AVIAMFParamDefinitionType type,
+                                                      unsigned int nb_subblocks, size_t *size);
+
+/**
+ * Get the subblock at the specified {@code idx}. Must be between 0 and nb_subblocks - 1.
+ *
+ * The @ref AVIAMFParamDefinition.type "param definition type" defines
+ * the struct type of the returned pointer.
+ */
+static av_always_inline void*
+av_iamf_param_definition_get_subblock(const AVIAMFParamDefinition *par, unsigned int idx)
+{
+    av_assert0(idx < par->nb_subblocks);
+    return (void *)((uint8_t *)par + par->subblocks_offset + idx * par->subblock_size);
+}
+
+/**
+ * @}
+ * @addtogroup lavf_iamf_audio
+ * @{
+ */
+
+enum AVIAMFAmbisonicsMode {
+    AV_IAMF_AMBISONICS_MODE_MONO,
+    AV_IAMF_AMBISONICS_MODE_PROJECTION,
+};
+
+/**
+ * Recon gain information for the layer is present in AVIAMFReconGain
+ */
+#define AV_IAMF_LAYER_FLAG_RECON_GAIN (1 << 0)
+
+/**
+ * A layer defining a Channel Layout in the Audio Element.
+ *
+ * When @ref AVIAMFAudioElement.audio_element_type "the parent's Audio Element type"
+ * is AV_IAMF_AUDIO_ELEMENT_TYPE_CHANNEL, this corresponds to an Scalable Channel
+ * Layout layer as defined in section 3.6.2 of IAMF.
+ * For AV_IAMF_AUDIO_ELEMENT_TYPE_SCENE, it is an Ambisonics channel
+ * layout as defined in section 3.6.3 of IAMF.
+ */
+typedef struct AVIAMFLayer {
+    const AVClass *av_class;
+
+    AVChannelLayout ch_layout;
+
+    /**
+     * A bitmask which may contain a combination of AV_IAMF_LAYER_FLAG_* flags.
+     */
+    unsigned int flags;
+    /**
+     * Output gain channel flags as defined in section 3.6.2 of IAMF.
+     *
+     * This field is defined only if @ref AVIAMFAudioElement.audio_element_type
+     * "the parent's Audio Element type" is AV_IAMF_AUDIO_ELEMENT_TYPE_CHANNEL,
+     * must be 0 otherwise.
+     */
+    unsigned int output_gain_flags;
+    /**
+     * Output gain as defined in section 3.6.2 of IAMF.
+     *
+     * Must be 0 if @ref output_gain_flags is 0.
+     */
+    AVRational output_gain;
+    /**
+     * Ambisonics mode as defined in section 3.6.3 of IAMF.
+     *
+     * This field is defined only if @ref AVIAMFAudioElement.audio_element_type
+     * "the parent's Audio Element type" is AV_IAMF_AUDIO_ELEMENT_TYPE_SCENE.
+     *
+     * If AV_IAMF_AMBISONICS_MODE_MONO, channel_mapping is defined implicitly
+     * (Ambisonic Order) or explicitly (Custom Order with ambi channels) in
+     * @ref ch_layout.
+     * If AV_IAMF_AMBISONICS_MODE_PROJECTION, @ref demixing_matrix must be set.
+     */
+    enum AVIAMFAmbisonicsMode ambisonics_mode;
+
+    /**
+     * Demixing matrix as defined in section 3.6.3 of IAMF.
+     *
+     * The length of the array is ch_layout.nb_channels multiplied by the sum of
+     * the amount of streams in the group plus the amount of streams in the group
+     * that are stereo.
+     *
+     * May be set only if @ref ambisonics_mode == AV_IAMF_AMBISONICS_MODE_PROJECTION,
+     * must be NULL otherwise.
+     */
+    AVRational *demixing_matrix;
+} AVIAMFLayer;
+
+
+enum AVIAMFAudioElementType {
+    AV_IAMF_AUDIO_ELEMENT_TYPE_CHANNEL,
+    AV_IAMF_AUDIO_ELEMENT_TYPE_SCENE,
+};
+
+typedef struct AVIAMFAudioElement {
+    const AVClass *av_class;
+
+    AVIAMFLayer **layers;
+    /**
+     * Number of layers, or channel groups, in the Audio Element.
+     * There may be 6 layers at most, and for @ref audio_element_type
+     * AV_IAMF_AUDIO_ELEMENT_TYPE_SCENE, there may be exactly 1.
+     *
+     * Set by av_iamf_audio_element_add_layer(), must not be
+     * modified by any other code.
+     */
+    unsigned int nb_layers;
+
+    /**
+     * Demixing information used to reconstruct a scalable channel audio
+     * representation.
+     * The @ref AVIAMFParamDefinition.type "type" must be
+     * AV_IAMF_PARAMETER_DEFINITION_DEMIXING.
+     */
+    AVIAMFParamDefinition *demixing_info;
+    /**
+     * Recon gain information used to reconstruct a scalable channel audio
+     * representation.
+     * The @ref AVIAMFParamDefinition.type "type" must be
+     * AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN.
+     */
+    AVIAMFParamDefinition *recon_gain_info;
+
+    /**
+     * Audio element type as defined in section 3.6 of IAMF.
+     */
+    enum AVIAMFAudioElementType audio_element_type;
+
+    /**
+     * Default weight value as defined in section 3.6 of IAMF.
+     */
+    unsigned int default_w;
+} AVIAMFAudioElement;
+
+const AVClass *av_iamf_audio_element_get_class(void);
+
+/**
+ * Allocates a AVIAMFAudioElement, and initializes its fields with default values.
+ * No layers are allocated. Must be freed with av_iamf_audio_element_free().
+ *
+ * @see av_iamf_audio_element_add_layer()
+ */
+AVIAMFAudioElement *av_iamf_audio_element_alloc(void);
+
+/**
+ * Allocate a layer and add it to a given AVIAMFAudioElement.
+ * It is freed by av_iamf_audio_element_free() alongside the rest of the parent
+ * AVIAMFAudioElement.
+ *
+ * @return a pointer to the allocated layer.
+ */
+AVIAMFLayer *av_iamf_audio_element_add_layer(AVIAMFAudioElement *audio_element);
+
+void av_iamf_audio_element_free(AVIAMFAudioElement **audio_element);
+
+/**
+ * @}
+ * @addtogroup lavf_iamf_mix
+ * @{
+ */
+
+enum AVIAMFHeadphonesMode {
+    /**
+     * The referenced Audio Element shall be rendered to stereo loudspeakers.
+     */
+    AV_IAMF_HEADPHONES_MODE_STEREO,
+    /**
+     * The referenced Audio Element shall be rendered with a binaural renderer.
+     */
+    AV_IAMF_HEADPHONES_MODE_BINAURAL,
+};
+
+typedef struct AVIAMFSubmixElement {
+    const AVClass *av_class;
+
+    /**
+     * The id of the Audio Element this submix element references.
+     */
+    unsigned int audio_element_id;
+
+    /**
+     * Information required required for applying any processing to the
+     * referenced and rendered Audio Element before being summed with other
+     * processed Audio Elements.
+     * The @ref AVIAMFParamDefinition.type "type" must be
+     * AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN.
+     */
+    AVIAMFParamDefinition *element_mix_config;
+
+    /**
+     * Default mix gain value to apply when there are no AVIAMFParamDefinition
+     * with @ref element_mix_config "element_mix_config's"
+     * @ref AVIAMFParamDefinition.parameter_id "parameter_id" available for a
+     * given audio frame.
+     */
+    AVRational default_mix_gain;
+
+    /**
+     * A value that indicates whether the referenced channel-based Audio Element
+     * shall be rendered to stereo loudspeakers or spatialized with a binaural
+     * renderer when played back on headphones.
+     * If the Audio Element is not of @ref AVIAMFAudioElement.audio_element_type
+     * "type" AV_IAMF_AUDIO_ELEMENT_TYPE_CHANNEL, then this field is undefined.
+     */
+    enum AVIAMFHeadphonesMode headphones_rendering_mode;
+
+    /**
+     * A dictionary of strings describing the submix in different languages.
+     * Must have the same amount of entries as
+     * @ref AVIAMFMixPresentation.annotations "the mix's annotations", stored
+     * in the same order, and with the same key strings.
+     *
+     * @ref AVDictionaryEntry.key "key" is a string conforming to BCP-47 that
+     * specifies the language for the string stored in
+     * @ref AVDictionaryEntry.value "value".
+     */
+    AVDictionary *annotations;
+} AVIAMFSubmixElement;
+
+enum AVIAMFSubmixLayoutType {
+    /**
+     * The layout follows the loudspeaker sound system convention of ITU-2051-3.
+     */
+    AV_IAMF_SUBMIX_LAYOUT_TYPE_LOUDSPEAKERS = 2,
+    /**
+     * The layout is binaural.
+     */
+    AV_IAMF_SUBMIX_LAYOUT_TYPE_BINAURAL = 3,
+};
+
+typedef struct AVIAMFSubmixLayout {
+    const AVClass *av_class;
+
+    enum AVIAMFSubmixLayoutType layout_type;
+
+    /**
+     * Channel layout matching one of Sound Systems A to J of ITU-2051-3, plus
+     * 7.1.2ch and 3.1.2ch
+     * If layout_type is not AV_IAMF_SUBMIX_LAYOUT_TYPE_LOUDSPEAKERS, this field
+     * is undefined.
+     */
+    AVChannelLayout sound_system;
+    /**
+     * The program integrated loudness information, as defined in
+     * ITU-1770-4.
+     */
+    AVRational integrated_loudness;
+    /**
+     * The digital (sampled) peak value of the audio signal, as defined
+     * in ITU-1770-4.
+     */
+    AVRational digital_peak;
+    /**
+     * The true peak of the audio signal, as defined in ITU-1770-4.
+     */
+    AVRational true_peak;
+    /**
+     * The Dialogue loudness information, as defined in ITU-1770-4.
+     */
+    AVRational dialogue_anchored_loudness;
+    /**
+     * The Album loudness information, as defined in ITU-1770-4.
+     */
+    AVRational album_anchored_loudness;
+} AVIAMFSubmixLayout;
+
+typedef struct AVIAMFSubmix {
+    const AVClass *av_class;
+
+    /**
+     * Array of submix elements.
+     *
+     * Set by av_iamf_submix_add_element(), must not be modified by any
+     * other code.
+     */
+    AVIAMFSubmixElement **elements;
+    /**
+     * Number of elements in the submix.
+     *
+     * Set by av_iamf_submix_add_element(), must not be modified by any
+     * other code.
+     */
+    unsigned int nb_elements;
+
+    /**
+     * Array of submix layouts.
+     *
+     * Set by av_iamf_submix_add_layout(), must not be modified by any
+     * other code.
+     */
+    AVIAMFSubmixLayout **layouts;
+    /**
+     * Number of layouts in the submix.
+     *
+     * Set by av_iamf_submix_add_layout(), must not be modified by any
+     * other code.
+     */
+    unsigned int nb_layouts;
+
+    /**
+     * Information required for post-processing the mixed audio signal to
+     * generate the audio signal for playback.
+     * The @ref AVIAMFParamDefinition.type "type" must be
+     * AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN.
+     */
+    AVIAMFParamDefinition *output_mix_config;
+
+    /**
+     * Default mix gain value to apply when there are no AVIAMFParamDefinition
+     * with @ref output_mix_config "output_mix_config's"
+     * @ref AVIAMFParamDefinition.parameter_id "parameter_id" available for a
+     * given audio frame.
+     */
+    AVRational default_mix_gain;
+} AVIAMFSubmix;
+
+typedef struct AVIAMFMixPresentation {
+    const AVClass *av_class;
+
+    /**
+     * Array of submixes.
+     *
+     * Set by av_iamf_mix_presentation_add_submix(), must not be modified
+     * by any other code.
+     */
+    AVIAMFSubmix **submixes;
+    /**
+     * Number of submixes in the presentation.
+     *
+     * Set by av_iamf_mix_presentation_add_submix(), must not be modified
+     * by any other code.
+     */
+    unsigned int nb_submixes;
+
+    /**
+     * A dictionary of strings describing the mix in different languages.
+     * Must have the same amount of entries as every
+     * @ref AVIAMFSubmixElement.annotations "Submix element annotations",
+     * stored in the same order, and with the same key strings.
+     *
+     * @ref AVDictionaryEntry.key "key" is a string conforming to BCP-47
+     * that specifies the language for the string stored in
+     * @ref AVDictionaryEntry.value "value".
+     */
+    AVDictionary *annotations;
+} AVIAMFMixPresentation;
+
+const AVClass *av_iamf_mix_presentation_get_class(void);
+
+/**
+ * Allocates a AVIAMFMixPresentation, and initializes its fields with default
+ * values. No submixes are allocated.
+ * Must be freed with av_iamf_mix_presentation_free().
+ *
+ * @see av_iamf_mix_presentation_add_submix()
+ */
+AVIAMFMixPresentation *av_iamf_mix_presentation_alloc(void);
+
+/**
+ * Allocate a submix and add it to a given AVIAMFMixPresentation.
+ * It is freed by av_iamf_mix_presentation_free() alongside the rest of the
+ * parent AVIAMFMixPresentation.
+ *
+ * @return a pointer to the allocated submix.
+ */
+AVIAMFSubmix *av_iamf_mix_presentation_add_submix(AVIAMFMixPresentation *mix_presentation);
+
+/**
+ * Allocate a submix element and add it to a given AVIAMFSubmix.
+ * It is freed by av_iamf_mix_presentation_free() alongside the rest of the
+ * parent AVIAMFSubmix.
+ *
+ * @return a pointer to the allocated submix.
+ */
+AVIAMFSubmixElement *av_iamf_submix_add_element(AVIAMFSubmix *submix);
+
+/**
+ * Allocate a submix layout and add it to a given AVIAMFSubmix.
+ * It is freed by av_iamf_mix_presentation_free() alongside the rest of the
+ * parent AVIAMFSubmix.
+ *
+ * @return a pointer to the allocated submix.
+ */
+AVIAMFSubmixLayout *av_iamf_submix_add_layout(AVIAMFSubmix *submix);
+
+void av_iamf_mix_presentation_free(AVIAMFMixPresentation **mix_presentation);
+/**
+ * @}
+ */
+
+#endif /* AVUTIL_IAMF_H */
-- 
2.43.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply	[flat|nested] 16+ messages in thread
* [FFmpeg-devel] [PATCH 2/8] avformat: introduce AVStreamGroup
  2023-12-14 20:14 [FFmpeg-devel] [PATCH v7 0/8] avformat: introduce AVStreamGroup James Almer
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 1/8] avutil: introduce an Immersive Audio Model and Formats API James Almer
@ 2023-12-14 20:14 ` James Almer
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 3/8] ffmpeg: add support for muxing AVStreamGroups James Almer
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: James Almer @ 2023-12-14 20:14 UTC (permalink / raw)
  To: ffmpeg-devel
Signed-off-by: James Almer <jamrial@gmail.com>
---
 doc/fftools-common-opts.texi |  17 +++-
 libavformat/avformat.c       |  91 +++++++++++++++++++--
 libavformat/avformat.h       | 153 +++++++++++++++++++++++++++++++++++
 libavformat/dump.c           | 147 +++++++++++++++++++++++++++------
 libavformat/internal.h       |  33 ++++++++
 libavformat/options.c        | 139 +++++++++++++++++++++++++++++++
 6 files changed, 546 insertions(+), 34 deletions(-)
diff --git a/doc/fftools-common-opts.texi b/doc/fftools-common-opts.texi
index d9145704d6..f459bfdc1d 100644
--- a/doc/fftools-common-opts.texi
+++ b/doc/fftools-common-opts.texi
@@ -37,9 +37,9 @@ Matches the stream with this index. E.g. @code{-threads:1 4} would set the
 thread count for the second stream to 4. If @var{stream_index} is used as an
 additional stream specifier (see below), then it selects stream number
 @var{stream_index} from the matching streams. Stream numbering is based on the
-order of the streams as detected by libavformat except when a program ID is
-also specified. In this case it is based on the ordering of the streams in the
-program.
+order of the streams as detected by libavformat except when a stream group
+specifier or program ID is also specified. In this case it is based on the
+ordering of the streams in the group or program.
 @item @var{stream_type}[:@var{additional_stream_specifier}]
 @var{stream_type} is one of following: 'v' or 'V' for video, 'a' for audio, 's'
 for subtitle, 'd' for data, and 't' for attachments. 'v' matches all video
@@ -48,6 +48,17 @@ thumbnails or cover arts. If @var{additional_stream_specifier} is used, then
 it matches streams which both have this type and match the
 @var{additional_stream_specifier}. Otherwise, it matches all streams of the
 specified type.
+@item g:@var{group_specifier}[:@var{additional_stream_specifier}]
+Matches streams which are in the group with the specifier @var{group_specifier}.
+if @var{additional_stream_specifier} is used, then it matches streams which both
+are part of the group and match the @var{additional_stream_specifier}.
+@var{group_specifier} may be one of the following:
+@table @option
+@item @var{group_index}
+Match the stream with this group index.
+@item #@var{group_id} or i:@var{group_id}
+Match the stream with this group id.
+@end table
 @item p:@var{program_id}[:@var{additional_stream_specifier}]
 Matches streams which are in the program with the id @var{program_id}. If
 @var{additional_stream_specifier} is used, then it matches streams which both
diff --git a/libavformat/avformat.c b/libavformat/avformat.c
index 5b8bb7879e..7e747c43d5 100644
--- a/libavformat/avformat.c
+++ b/libavformat/avformat.c
@@ -24,6 +24,7 @@
 #include "libavutil/avstring.h"
 #include "libavutil/channel_layout.h"
 #include "libavutil/frame.h"
+#include "libavutil/iamf.h"
 #include "libavutil/intreadwrite.h"
 #include "libavutil/mem.h"
 #include "libavutil/opt.h"
@@ -80,6 +81,32 @@ FF_ENABLE_DEPRECATION_WARNINGS
     av_freep(pst);
 }
 
+void ff_free_stream_group(AVStreamGroup **pstg)
+{
+    AVStreamGroup *stg = *pstg;
+
+    if (!stg)
+        return;
+
+    av_freep(&stg->streams);
+    av_dict_free(&stg->metadata);
+    av_freep(&stg->priv_data);
+    switch (stg->type) {
+    case AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT: {
+        av_iamf_audio_element_free(&stg->params.iamf_audio_element);
+        break;
+    }
+    case AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION: {
+        av_iamf_mix_presentation_free(&stg->params.iamf_mix_presentation);
+        break;
+    }
+    default:
+        break;
+    }
+
+    av_freep(pstg);
+}
+
 void ff_remove_stream(AVFormatContext *s, AVStream *st)
 {
     av_assert0(s->nb_streams>0);
@@ -88,6 +115,14 @@ void ff_remove_stream(AVFormatContext *s, AVStream *st)
     ff_free_stream(&s->streams[ --s->nb_streams ]);
 }
 
+void ff_remove_stream_group(AVFormatContext *s, AVStreamGroup *stg)
+{
+    av_assert0(s->nb_stream_groups > 0);
+    av_assert0(s->stream_groups[ s->nb_stream_groups - 1 ] == stg);
+
+    ff_free_stream_group(&s->stream_groups[ --s->nb_stream_groups ]);
+}
+
 /* XXX: suppress the packet queue */
 void ff_flush_packet_queue(AVFormatContext *s)
 {
@@ -118,6 +153,9 @@ void avformat_free_context(AVFormatContext *s)
 
     for (unsigned i = 0; i < s->nb_streams; i++)
         ff_free_stream(&s->streams[i]);
+    for (unsigned i = 0; i < s->nb_stream_groups; i++)
+        ff_free_stream_group(&s->stream_groups[i]);
+    s->nb_stream_groups = 0;
     s->nb_streams = 0;
 
     for (unsigned i = 0; i < s->nb_programs; i++) {
@@ -139,6 +177,7 @@ void avformat_free_context(AVFormatContext *s)
     av_packet_free(&si->pkt);
     av_packet_free(&si->parse_pkt);
     av_freep(&s->streams);
+    av_freep(&s->stream_groups);
     ff_flush_packet_queue(s);
     av_freep(&s->url);
     av_free(s);
@@ -464,7 +503,7 @@ int av_find_best_stream(AVFormatContext *ic, enum AVMediaType type,
  */
 static int match_stream_specifier(const AVFormatContext *s, const AVStream *st,
                                   const char *spec, const char **indexptr,
-                                  const AVProgram **p)
+                                  const AVStreamGroup **g, const AVProgram **p)
 {
     int match = 1;                      /* Stores if the specifier matches so far. */
     while (*spec) {
@@ -493,6 +532,46 @@ static int match_stream_specifier(const AVFormatContext *s, const AVStream *st,
                 match = 0;
             if (nopic && (st->disposition & AV_DISPOSITION_ATTACHED_PIC))
                 match = 0;
+        } else if (*spec == 'g' && *(spec + 1) == ':') {
+            int64_t group_idx = -1, group_id = -1;
+            int found = 0;
+            char *endptr;
+            spec += 2;
+            if (*spec == '#' || (*spec == 'i' && *(spec + 1) == ':')) {
+                spec += 1 + (*spec == 'i');
+                group_id = strtol(spec, &endptr, 0);
+                if (spec == endptr || (*endptr && *endptr++ != ':'))
+                    return AVERROR(EINVAL);
+                spec = endptr;
+            } else {
+                group_idx = strtol(spec, &endptr, 0);
+                /* Disallow empty id and make sure that if we are not at the end, then another specifier must follow. */
+                if (spec == endptr || (*endptr && *endptr++ != ':'))
+                    return AVERROR(EINVAL);
+                spec = endptr;
+            }
+            if (match) {
+                if (group_id > 0) {
+                    for (unsigned i = 0; i < s->nb_stream_groups; i++) {
+                        if (group_id == s->stream_groups[i]->id) {
+                            group_idx = i;
+                            break;
+                        }
+                    }
+                }
+                if (group_idx < 0 || group_idx > s->nb_stream_groups)
+                    return AVERROR(EINVAL);
+                for (unsigned j = 0; j < s->stream_groups[group_idx]->nb_streams; j++) {
+                    if (st->index == s->stream_groups[group_idx]->streams[j]->index) {
+                        found = 1;
+                        if (g)
+                            *g = s->stream_groups[group_idx];
+                        break;
+                    }
+                }
+            }
+            if (!found)
+                match = 0;
         } else if (*spec == 'p' && *(spec + 1) == ':') {
             int prog_id;
             int found = 0;
@@ -591,10 +670,11 @@ int avformat_match_stream_specifier(AVFormatContext *s, AVStream *st,
     int ret, index;
     char *endptr;
     const char *indexptr = NULL;
+    const AVStreamGroup *g = NULL;
     const AVProgram *p = NULL;
     int nb_streams;
 
-    ret = match_stream_specifier(s, st, spec, &indexptr, &p);
+    ret = match_stream_specifier(s, st, spec, &indexptr, &g, &p);
     if (ret < 0)
         goto error;
 
@@ -612,10 +692,11 @@ int avformat_match_stream_specifier(AVFormatContext *s, AVStream *st,
         return (index == st->index);
 
     /* If we requested a matching stream index, we have to ensure st is that. */
-    nb_streams = p ? p->nb_stream_indexes : s->nb_streams;
+    nb_streams = g ? g->nb_streams : (p ? p->nb_stream_indexes : s->nb_streams);
     for (int i = 0; i < nb_streams && index >= 0; i++) {
-        const AVStream *candidate = s->streams[p ? p->stream_index[i] : i];
-        ret = match_stream_specifier(s, candidate, spec, NULL, NULL);
+        unsigned idx = g ? g->streams[i]->index : (p ? p->stream_index[i] : i);
+        const AVStream *candidate = s->streams[idx];
+        ret = match_stream_specifier(s, candidate, spec, NULL, NULL, NULL);
         if (ret < 0)
             goto error;
         if (ret > 0 && index-- == 0 && st == candidate)
diff --git a/libavformat/avformat.h b/libavformat/avformat.h
index 9e7eca007e..5d0fe82250 100644
--- a/libavformat/avformat.h
+++ b/libavformat/avformat.h
@@ -1018,6 +1018,83 @@ typedef struct AVStream {
     int pts_wrap_bits;
 } AVStream;
 
+enum AVStreamGroupParamsType {
+    AV_STREAM_GROUP_PARAMS_NONE,
+    AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT,
+    AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION,
+};
+
+struct AVIAMFAudioElement;
+struct AVIAMFMixPresentation;
+
+typedef struct AVStreamGroup {
+    /**
+     * A class for @ref avoptions. Set by avformat_stream_group_create().
+     */
+    const AVClass *av_class;
+
+    void *priv_data;
+
+    /**
+     * Group index in AVFormatContext.
+     */
+    unsigned int index;
+
+    /**
+     * Group type-specific group ID.
+     *
+     * decoding: set by libavformat
+     * encoding: may set by the user
+     */
+    int64_t id;
+
+    /**
+     * Group type
+     *
+     * decoding: set by libavformat on group creation
+     * encoding: set by avformat_stream_group_create()
+     */
+    enum AVStreamGroupParamsType type;
+
+    /**
+     * Group type-specific parameters
+     */
+    union {
+        struct AVIAMFAudioElement *iamf_audio_element;
+        struct AVIAMFMixPresentation *iamf_mix_presentation;
+    } params;
+
+    /**
+     * Metadata that applies to the whole group.
+     *
+     * - demuxing: set by libavformat on group creation
+     * - muxing: may be set by the caller before avformat_write_header()
+     *
+     * Freed by libavformat in avformat_free_context().
+     */
+    AVDictionary *metadata;
+
+    /**
+     * Number of elements in AVStreamGroup.streams.
+     *
+     * Set by avformat_stream_group_add_stream() must not be modified by any other code.
+     */
+    unsigned int nb_streams;
+
+    /**
+     * A list of streams in the group. New entries are created with
+     * avformat_stream_group_add_stream().
+     *
+     * - demuxing: entries are created by libavformat on group creation.
+     *             If AVFMTCTX_NOHEADER is set in ctx_flags, then new entries may also
+     *             appear in av_read_frame().
+     * - muxing: entries are created by the user before avformat_write_header().
+     *
+     * Freed by libavformat in avformat_free_context().
+     */
+    AVStream **streams;
+} AVStreamGroup;
+
 struct AVCodecParserContext *av_stream_get_parser(const AVStream *s);
 
 #if FF_API_GET_END_PTS
@@ -1726,6 +1803,26 @@ typedef struct AVFormatContext {
      * @return 0 on success, a negative AVERROR code on failure
      */
     int (*io_close2)(struct AVFormatContext *s, AVIOContext *pb);
+
+    /**
+     * Number of elements in AVFormatContext.stream_groups.
+     *
+     * Set by avformat_stream_group_create(), must not be modified by any other code.
+     */
+    unsigned int nb_stream_groups;
+
+    /**
+     * A list of all stream groups in the file. New groups are created with
+     * avformat_stream_group_create(), and filled with avformat_stream_group_add_stream().
+     *
+     * - demuxing: groups may be created by libavformat in avformat_open_input().
+     *             If AVFMTCTX_NOHEADER is set in ctx_flags, then new groups may also
+     *             appear in av_read_frame().
+     * - muxing: groups may be created by the user before avformat_write_header().
+     *
+     * Freed by libavformat in avformat_free_context().
+     */
+    AVStreamGroup **stream_groups;
 } AVFormatContext;
 
 /**
@@ -1844,6 +1941,37 @@ const AVClass *avformat_get_class(void);
  */
 const AVClass *av_stream_get_class(void);
 
+/**
+ * Get the AVClass for AVStreamGroup. It can be used in combination with
+ * AV_OPT_SEARCH_FAKE_OBJ for examining options.
+ *
+ * @see av_opt_find().
+ */
+const AVClass *av_stream_group_get_class(void);
+
+/**
+ * Add a new empty stream group to a media file.
+ *
+ * When demuxing, it may be called by the demuxer in read_header(). If the
+ * flag AVFMTCTX_NOHEADER is set in s.ctx_flags, then it may also
+ * be called in read_packet().
+ *
+ * When muxing, may be called by the user before avformat_write_header().
+ *
+ * User is required to call avformat_free_context() to clean up the allocation
+ * by avformat_stream_group_create().
+ *
+ * New streams can be added to the group with avformat_stream_group_add_stream().
+ *
+ * @param s media file handle
+ *
+ * @return newly created group or NULL on error.
+ * @see avformat_new_stream, avformat_stream_group_add_stream.
+ */
+AVStreamGroup *avformat_stream_group_create(AVFormatContext *s,
+                                            enum AVStreamGroupParamsType type,
+                                            AVDictionary **options);
+
 /**
  * Add a new stream to a media file.
  *
@@ -1863,6 +1991,31 @@ const AVClass *av_stream_get_class(void);
  */
 AVStream *avformat_new_stream(AVFormatContext *s, const struct AVCodec *c);
 
+/**
+ * Add an already allocated stream to a stream group.
+ *
+ * When demuxing, it may be called by the demuxer in read_header(). If the
+ * flag AVFMTCTX_NOHEADER is set in s.ctx_flags, then it may also
+ * be called in read_packet().
+ *
+ * When muxing, may be called by the user before avformat_write_header() after
+ * having allocated a new group with avformat_stream_group_create() and stream with
+ * avformat_new_stream().
+ *
+ * User is required to call avformat_free_context() to clean up the allocation
+ * by avformat_stream_group_add_stream().
+ *
+ * @param stg stream group belonging to a media file.
+ * @param st  stream in the media file to add to the group.
+ *
+ * @retval 0                 success
+ * @retval AVERROR(EEXIST)   the stream was already in the group
+ * @retval "another negative error code" legitimate errors
+ *
+ * @see avformat_new_stream, avformat_stream_group_create.
+ */
+int avformat_stream_group_add_stream(AVStreamGroup *stg, AVStream *st);
+
 #if FF_API_AVSTREAM_SIDE_DATA
 /**
  * Wrap an existing array as stream side data.
diff --git a/libavformat/dump.c b/libavformat/dump.c
index c0868a1bb3..cc179f284f 100644
--- a/libavformat/dump.c
+++ b/libavformat/dump.c
@@ -24,6 +24,7 @@
 
 #include "libavutil/channel_layout.h"
 #include "libavutil/display.h"
+#include "libavutil/iamf.h"
 #include "libavutil/intreadwrite.h"
 #include "libavutil/log.h"
 #include "libavutil/mastering_display_metadata.h"
@@ -134,28 +135,36 @@ static void print_fps(double d, const char *postfix)
         av_log(NULL, AV_LOG_INFO, "%1.0fk %s", d / 1000, postfix);
 }
 
-static void dump_metadata(void *ctx, const AVDictionary *m, const char *indent)
+static void dump_dictionary(void *ctx, const AVDictionary *m,
+                            const char *name, const char *indent)
 {
-    if (m && !(av_dict_count(m) == 1 && av_dict_get(m, "language", NULL, 0))) {
-        const AVDictionaryEntry *tag = NULL;
-
-        av_log(ctx, AV_LOG_INFO, "%sMetadata:\n", indent);
-        while ((tag = av_dict_iterate(m, tag)))
-            if (strcmp("language", tag->key)) {
-                const char *p = tag->value;
-                av_log(ctx, AV_LOG_INFO,
-                       "%s  %-16s: ", indent, tag->key);
-                while (*p) {
-                    size_t len = strcspn(p, "\x8\xa\xb\xc\xd");
-                    av_log(ctx, AV_LOG_INFO, "%.*s", (int)(FFMIN(255, len)), p);
-                    p += len;
-                    if (*p == 0xd) av_log(ctx, AV_LOG_INFO, " ");
-                    if (*p == 0xa) av_log(ctx, AV_LOG_INFO, "\n%s  %-16s: ", indent, "");
-                    if (*p) p++;
-                }
-                av_log(ctx, AV_LOG_INFO, "\n");
+    const AVDictionaryEntry *tag = NULL;
+
+    if (!m)
+        return;
+
+    av_log(ctx, AV_LOG_INFO, "%s%s:\n", indent, name);
+    while ((tag = av_dict_iterate(m, tag)))
+        if (strcmp("language", tag->key)) {
+            const char *p = tag->value;
+            av_log(ctx, AV_LOG_INFO,
+                   "%s  %-16s: ", indent, tag->key);
+            while (*p) {
+                size_t len = strcspn(p, "\x8\xa\xb\xc\xd");
+                av_log(ctx, AV_LOG_INFO, "%.*s", (int)(FFMIN(255, len)), p);
+                p += len;
+                if (*p == 0xd) av_log(ctx, AV_LOG_INFO, " ");
+                if (*p == 0xa) av_log(ctx, AV_LOG_INFO, "\n%s  %-16s: ", indent, "");
+                if (*p) p++;
             }
-    }
+            av_log(ctx, AV_LOG_INFO, "\n");
+        }
+}
+
+static void dump_metadata(void *ctx, const AVDictionary *m, const char *indent)
+{
+    if (m && !(av_dict_count(m) == 1 && av_dict_get(m, "language", NULL, 0)))
+        dump_dictionary(ctx, m, "Metadata", indent);
 }
 
 /* param change side data*/
@@ -509,7 +518,7 @@ static void dump_sidedata(void *ctx, const AVStream *st, const char *indent)
 
 /* "user interface" functions */
 static void dump_stream_format(const AVFormatContext *ic, int i,
-                               int index, int is_output)
+                               int group_index, int index, int is_output)
 {
     char buf[256];
     int flags = (is_output ? ic->oformat->flags : ic->iformat->flags);
@@ -517,6 +526,8 @@ static void dump_stream_format(const AVFormatContext *ic, int i,
     const FFStream *const sti = cffstream(st);
     const AVDictionaryEntry *lang = av_dict_get(st->metadata, "language", NULL, 0);
     const char *separator = ic->dump_separator;
+    const char *group_indent = group_index >= 0 ? "    " : "";
+    const char *extra_indent = group_index >= 0 ? "        " : "      ";
     AVCodecContext *avctx;
     int ret;
 
@@ -543,7 +554,8 @@ static void dump_stream_format(const AVFormatContext *ic, int i,
     avcodec_string(buf, sizeof(buf), avctx, is_output);
     avcodec_free_context(&avctx);
 
-    av_log(NULL, AV_LOG_INFO, "  Stream #%d:%d", index, i);
+    av_log(NULL, AV_LOG_INFO, "%s  Stream #%d", group_indent, index);
+    av_log(NULL, AV_LOG_INFO, ":%d", i);
 
     /* the pid is an important information, so we display it */
     /* XXX: add a generic system */
@@ -621,9 +633,89 @@ static void dump_stream_format(const AVFormatContext *ic, int i,
         av_log(NULL, AV_LOG_INFO, " (non-diegetic)");
     av_log(NULL, AV_LOG_INFO, "\n");
 
-    dump_metadata(NULL, st->metadata, "    ");
+    dump_metadata(NULL, st->metadata, extra_indent);
+
+    dump_sidedata(NULL, st, extra_indent);
+}
+
+static void dump_stream_group(const AVFormatContext *ic, uint8_t *printed,
+                              int i, int index, int is_output)
+{
+    const AVStreamGroup *stg = ic->stream_groups[i];
+    int flags = (is_output ? ic->oformat->flags : ic->iformat->flags);
+    char buf[512];
+    int ret;
 
-    dump_sidedata(NULL, st, "    ");
+    av_log(NULL, AV_LOG_INFO, "  Stream group #%d:%d", index, i);
+    if (flags & AVFMT_SHOW_IDS)
+        av_log(NULL, AV_LOG_INFO, "[0x%"PRIx64"]", stg->id);
+    av_log(NULL, AV_LOG_INFO, ":");
+
+    switch (stg->type) {
+    case AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT: {
+        const AVIAMFAudioElement *audio_element = stg->params.iamf_audio_element;
+        av_log(NULL, AV_LOG_INFO, " IAMF Audio Element\n");
+        dump_metadata(NULL, stg->metadata, "    ");
+        for (int j = 0; j < audio_element->nb_layers; j++) {
+            const AVIAMFLayer *layer = audio_element->layers[j];
+            int channel_count = layer->ch_layout.nb_channels;
+            av_log(NULL, AV_LOG_INFO, "    Layer %d:", j);
+            ret = av_channel_layout_describe(&layer->ch_layout, buf, sizeof(buf));
+            if (ret >= 0)
+                av_log(NULL, AV_LOG_INFO, " %s", buf);
+            av_log(NULL, AV_LOG_INFO, "\n");
+            for (int k = 0; channel_count > 0 && k < stg->nb_streams; k++) {
+                AVStream *st = stg->streams[k];
+                dump_stream_format(ic, st->index, i, index, is_output);
+                printed[st->index] = 1;
+                channel_count -= st->codecpar->ch_layout.nb_channels;
+            }
+        }
+        break;
+    }
+    case AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION: {
+        const AVIAMFMixPresentation *mix_presentation = stg->params.iamf_mix_presentation;
+        av_log(NULL, AV_LOG_INFO, " IAMF Mix Presentation\n");
+        dump_metadata(NULL, stg->metadata, "    ");
+        dump_dictionary(NULL, mix_presentation->annotations, "Annotations", "    ");
+        for (int j = 0; j < mix_presentation->nb_submixes; j++) {
+            AVIAMFSubmix *sub_mix = mix_presentation->submixes[j];
+            av_log(NULL, AV_LOG_INFO, "    Submix %d:\n", j);
+            for (int k = 0; k < sub_mix->nb_elements; k++) {
+                const AVIAMFSubmixElement *submix_element = sub_mix->elements[k];
+                const AVStreamGroup *audio_element = NULL;
+                for (int l = 0; l < ic->nb_stream_groups; l++)
+                    if (ic->stream_groups[l]->type == AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT &&
+                        ic->stream_groups[l]->id   == submix_element->audio_element_id) {
+                        audio_element = ic->stream_groups[l];
+                        break;
+                    }
+                if (audio_element) {
+                    av_log(NULL, AV_LOG_INFO, "      IAMF Audio Element #%d:%d",
+                           index, audio_element->index);
+                    if (flags & AVFMT_SHOW_IDS)
+                        av_log(NULL, AV_LOG_INFO, "[0x%"PRIx64"]", audio_element->id);
+                    av_log(NULL, AV_LOG_INFO, "\n");
+                    dump_dictionary(NULL, submix_element->annotations, "Annotations", "        ");
+                }
+            }
+            for (int k = 0; k < sub_mix->nb_layouts; k++) {
+                const AVIAMFSubmixLayout *submix_layout = sub_mix->layouts[k];
+                av_log(NULL, AV_LOG_INFO, "      Layout #%d:", k);
+                if (submix_layout->layout_type == 2) {
+                    ret = av_channel_layout_describe(&submix_layout->sound_system, buf, sizeof(buf));
+                    if (ret >= 0)
+                        av_log(NULL, AV_LOG_INFO, " %s", buf);
+                } else if (submix_layout->layout_type == 3)
+                    av_log(NULL, AV_LOG_INFO, " Binaural");
+                av_log(NULL, AV_LOG_INFO, "\n");
+            }
+        }
+        break;
+    }
+    default:
+        break;
+    }
 }
 
 void av_dump_format(AVFormatContext *ic, int index,
@@ -699,7 +791,7 @@ void av_dump_format(AVFormatContext *ic, int index,
             dump_metadata(NULL, program->metadata, "    ");
             for (k = 0; k < program->nb_stream_indexes; k++) {
                 dump_stream_format(ic, program->stream_index[k],
-                                   index, is_output);
+                                   -1, index, is_output);
                 printed[program->stream_index[k]] = 1;
             }
             total += program->nb_stream_indexes;
@@ -708,9 +800,12 @@ void av_dump_format(AVFormatContext *ic, int index,
             av_log(NULL, AV_LOG_INFO, "  No Program\n");
     }
 
+    for (i = 0; i < ic->nb_stream_groups; i++)
+         dump_stream_group(ic, printed, i, index, is_output);
+
     for (i = 0; i < ic->nb_streams; i++)
         if (!printed[i])
-            dump_stream_format(ic, i, index, is_output);
+            dump_stream_format(ic, i, -1, index, is_output);
 
     av_free(printed);
 }
diff --git a/libavformat/internal.h b/libavformat/internal.h
index 7702986c9c..c6181683ef 100644
--- a/libavformat/internal.h
+++ b/libavformat/internal.h
@@ -202,6 +202,7 @@ typedef struct FFStream {
      */
     AVStream pub;
 
+    AVFormatContext *fmtctx;
     /**
      * Set to 1 if the codec allows reordering, so pts can be different
      * from dts.
@@ -427,6 +428,26 @@ static av_always_inline const FFStream *cffstream(const AVStream *st)
     return (const FFStream*)st;
 }
 
+typedef struct FFStreamGroup {
+    /**
+     * The public context.
+     */
+    AVStreamGroup pub;
+
+    AVFormatContext *fmtctx;
+} FFStreamGroup;
+
+
+static av_always_inline FFStreamGroup *ffstreamgroup(AVStreamGroup *stg)
+{
+    return (FFStreamGroup*)stg;
+}
+
+static av_always_inline const FFStreamGroup *cffstreamgroup(const AVStreamGroup *stg)
+{
+    return (const FFStreamGroup*)stg;
+}
+
 #ifdef __GNUC__
 #define dynarray_add(tab, nb_ptr, elem)\
 do {\
@@ -608,6 +629,18 @@ void ff_free_stream(AVStream **st);
  */
 void ff_remove_stream(AVFormatContext *s, AVStream *st);
 
+/**
+ * Frees a stream group without modifying the corresponding AVFormatContext.
+ * Must only be called if the latter doesn't matter or if the stream
+ * is not yet attached to an AVFormatContext.
+ */
+void ff_free_stream_group(AVStreamGroup **pstg);
+/**
+ * Remove a stream group from its AVFormatContext and free it.
+ * The group must be the last stream of the AVFormatContext.
+ */
+void ff_remove_stream_group(AVFormatContext *s, AVStreamGroup *stg);
+
 unsigned int ff_codec_get_tag(const AVCodecTag *tags, enum AVCodecID id);
 
 enum AVCodecID ff_codec_get_id(const AVCodecTag *tags, unsigned int tag);
diff --git a/libavformat/options.c b/libavformat/options.c
index 1d8c52246b..bf6113ca95 100644
--- a/libavformat/options.c
+++ b/libavformat/options.c
@@ -26,6 +26,7 @@
 #include "libavcodec/codec_par.h"
 
 #include "libavutil/avassert.h"
+#include "libavutil/iamf.h"
 #include "libavutil/internal.h"
 #include "libavutil/intmath.h"
 #include "libavutil/opt.h"
@@ -271,6 +272,7 @@ AVStream *avformat_new_stream(AVFormatContext *s, const AVCodec *c)
     if (!st->codecpar)
         goto fail;
 
+    sti->fmtctx = s;
     sti->avctx = avcodec_alloc_context3(NULL);
     if (!sti->avctx)
         goto fail;
@@ -325,6 +327,143 @@ fail:
     return NULL;
 }
 
+static void *stream_group_child_next(void *obj, void *prev)
+{
+    AVStreamGroup *stg = obj;
+    if (!prev) {
+        switch(stg->type) {
+        case AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT:
+            return stg->params.iamf_audio_element;
+        case AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION:
+            return stg->params.iamf_mix_presentation;
+        default:
+            break;
+        }
+    }
+    return NULL;
+}
+
+static const AVClass *stream_group_child_iterate(void **opaque)
+{
+    uintptr_t i = (uintptr_t)*opaque;
+    const AVClass *ret = NULL;
+
+    switch(i) {
+    case AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT:
+        ret = av_iamf_audio_element_get_class();
+        break;
+    case AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION:
+        ret = av_iamf_mix_presentation_get_class();
+        break;
+    default:
+        break;
+    }
+
+    if (ret)
+        *opaque = (void*)(i + 1);
+    return ret;
+}
+
+static const AVOption stream_group_options[] = {
+    {"id", "Set group id", offsetof(AVStreamGroup, id), AV_OPT_TYPE_INT64, {.i64 = 0}, 0, INT64_MAX, AV_OPT_FLAG_ENCODING_PARAM },
+    { NULL }
+};
+
+static const AVClass stream_group_class = {
+    .class_name     = "AVStreamGroup",
+    .item_name      = av_default_item_name,
+    .version        = LIBAVUTIL_VERSION_INT,
+    .option         = stream_group_options,
+    .child_next     = stream_group_child_next,
+    .child_class_iterate = stream_group_child_iterate,
+};
+
+const AVClass *av_stream_group_get_class(void)
+{
+    return &stream_group_class;
+}
+
+AVStreamGroup *avformat_stream_group_create(AVFormatContext *s,
+                                            enum AVStreamGroupParamsType type,
+                                            AVDictionary **options)
+{
+    AVStreamGroup **stream_groups;
+    AVStreamGroup *stg;
+    FFStreamGroup *stgi;
+
+    stream_groups = av_realloc_array(s->stream_groups, s->nb_stream_groups + 1,
+                                     sizeof(*stream_groups));
+    if (!stream_groups)
+        return NULL;
+    s->stream_groups = stream_groups;
+
+    stgi = av_mallocz(sizeof(*stgi));
+    if (!stgi)
+        return NULL;
+    stg = &stgi->pub;
+
+    stg->av_class = &stream_group_class;
+    av_opt_set_defaults(stg);
+    stg->type = type;
+    switch (type) {
+    case AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT:
+        stg->params.iamf_audio_element = av_iamf_audio_element_alloc();
+        if (!stg->params.iamf_audio_element)
+            goto fail;
+        break;
+    case AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION:
+        stg->params.iamf_mix_presentation = av_iamf_mix_presentation_alloc();
+        if (!stg->params.iamf_mix_presentation)
+            goto fail;
+        break;
+    default:
+        goto fail;
+    }
+
+    if (options) {
+        if (av_opt_set_dict2(stg, options, AV_OPT_SEARCH_CHILDREN))
+            goto fail;
+    }
+
+    stgi->fmtctx = s;
+    stg->index   = s->nb_stream_groups;
+
+    s->stream_groups[s->nb_stream_groups++] = stg;
+
+    return stg;
+fail:
+    ff_free_stream_group(&stg);
+    return NULL;
+}
+
+static int stream_group_add_stream(AVStreamGroup *stg, AVStream *st)
+{
+    AVStream **streams = av_realloc_array(stg->streams, stg->nb_streams + 1,
+                                          sizeof(*stg->streams));
+    if (!streams)
+        return AVERROR(ENOMEM);
+
+    stg->streams = streams;
+    stg->streams[stg->nb_streams++] = st;
+
+    return 0;
+}
+
+int avformat_stream_group_add_stream(AVStreamGroup *stg, AVStream *st)
+{
+    const FFStreamGroup *stgi = cffstreamgroup(stg);
+    const FFStream *sti = cffstream(st);
+
+    if (stgi->fmtctx != sti->fmtctx)
+        return AVERROR(EINVAL);
+
+    for (int i = 0; i < stg->nb_streams; i++)
+        if (stg->streams[i]->index == st->index)
+            return AVERROR(EEXIST);
+
+    return stream_group_add_stream(stg, st);
+}
+
 static int option_is_disposition(const AVOption *opt)
 {
     return opt->type == AV_OPT_TYPE_CONST &&
-- 
2.43.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply	[flat|nested] 16+ messages in thread
* [FFmpeg-devel] [PATCH 3/8] ffmpeg: add support for muxing AVStreamGroups
  2023-12-14 20:14 [FFmpeg-devel] [PATCH v7 0/8] avformat: introduce AVStreamGroup James Almer
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 1/8] avutil: introduce an Immersive Audio Model and Formats API James Almer
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 2/8] avformat: introduce AVStreamGroup James Almer
@ 2023-12-14 20:14 ` James Almer
  2023-12-15 21:28   ` James Almer
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 4/8] avcodec/packet: add IAMF Parameters side data types James Almer
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: James Almer @ 2023-12-14 20:14 UTC (permalink / raw)
  To: ffmpeg-devel
Starting with IAMF support.
Signed-off-by: James Almer <jamrial@gmail.com>
---
 doc/ffmpeg.texi           | 200 ++++++++++++++++++++++
 fftools/ffmpeg.h          |   2 +
 fftools/ffmpeg_mux_init.c | 341 ++++++++++++++++++++++++++++++++++++++
 fftools/ffmpeg_opt.c      |   2 +
 4 files changed, 545 insertions(+)
diff --git a/doc/ffmpeg.texi b/doc/ffmpeg.texi
index c503963941..1fadb20686 100644
--- a/doc/ffmpeg.texi
+++ b/doc/ffmpeg.texi
@@ -623,6 +623,206 @@ Not all muxers support embedded thumbnails, and those who do, only support a few
 Creates a program with the specified @var{title}, @var{program_num} and adds the specified
 @var{stream}(s) to it.
 
+@item -stream_group type=@var{type}:st=@var{stream}[:st=@var{stream}][:stg=@var{stream_group}][:id=@var{stream_group_id}...] (@emph{output})
+
+Creates a stream group of the specified @var{type}, @var{stream_group_id} and adds the specified
+@var{stream}(s) and/or previously defined @var{stream_group}(s) to it.
+
+@var{type} can be one of the following:
+@table @option
+
+@item iamf_audio_element
+Groups @var{stream}s that belong to the same IAMF Audio Element
+
+For this group @var{type}, the following options are available
+@table @option
+@item audio_element_type
+The Audio Element type. The following values are supported:
+
+@table @option
+@item channel
+Scalable channel audio representation
+@item scene
+Ambisonics representation
+@end table
+
+@item demixing
+Demixing information used to reconstruct a scalable channel audio representation.
+This option must be separated from the rest with a ',', and takes the following
+key=value options
+
+@table @option
+@item parameter_id
+An identifier parameters blocks in frames may refer to
+@item dmixp_mode
+A pre-defined combination of demixing parameters
+@end table
+
+@item recon_gain
+Recon gain information used to reconstruct a scalable channel audio representation.
+This option must be separated from the rest with a ',', and takes the following
+key=value options
+
+@table @option
+@item parameter_id
+An identifier parameters blocks in frames may refer to
+@end table
+
+@item layer
+A layer defining a Channel Layout in the Audio Element.
+This option must be separated from the rest with a ','. Several ',' separated entries
+can be defined, and at least one must be set.
+
+It takes the following ":"-separated key=value options
+
+@table @option
+@item ch_layout
+The layer's channel layout
+@item flags
+The following flags are available:
+
+@table @option
+@item recon_gain
+Wether to signal if recon_gain is present as metadata in parameter blocks within frames
+@end table
+
+@item output_gain
+@item output_gain_flags
+Which channels output_gain applies to. The following flags are available:
+
+@table @option
+@item FL
+@item FR
+@item BL
+@item BR
+@item TFL
+@item TFR
+@end table
+
+@item ambisonics_mode
+The ambisonics mode. This has no effect if audio_element_type is set to channel.
+
+The following values are supported:
+
+@table @option
+@item mono
+Each ambisonics channel is coded as an individual mono stream in the group
+@end table
+
+@end table
+
+@item default_w
+Default weight value
+
+@end table
+
+@item iamf_mix_presentation
+Groups @var{stream}s that belong to all IAMF Audio Element the same
+IAMF Mix Presentation references
+
+For this group @var{type}, the following options are available
+
+@table @option
+@item submix
+A sub-mix within the Mix Presentation.
+This option must be separated from the rest with a ','. Several ',' separated entries
+can be defined, and at least one must be set.
+
+It takes the following ":"-separated key=value options
+
+@table @option
+@item parameter_id
+An identifier parameters blocks in frames may refer to, for post-processing the mixed
+audio signal to generate the audio signal for playback
+@item parameter_rate
+The sample rate duration fields in parameters blocks in frames that refer to this
+@var{parameter_id} are expressed as
+@item default_mix_gain
+Default mix gain value to apply when there are no parameter blocks sharing the same
+@var{parameter_id} for a given frame
+
+@item element
+References an Audio Element used in this Mix Presentation to generate the final output
+audio signal for playback.
+This option must be separated from the rest with a '|'. Several '|' separated entries
+can be defined, and at least one must be set.
+
+It takes the following ":"-separated key=value options:
+
+@table @option
+@item stg
+The @var{stream_group_id} for an Audio Element which this sub-mix refers to
+@item parameter_id
+An identifier parameters blocks in frames may refer to, for applying any processing to
+the referenced and rendered Audio Element before being summed with other processed Audio
+Elements
+@item parameter_rate
+The sample rate duration fields in parameters blocks in frames that refer to this
+@var{parameter_id} are expressed as
+@item default_mix_gain
+Default mix gain value to apply when there are no parameter blocks sharing the same
+@var{parameter_id} for a given frame
+@item annotations
+A key=value string describing the sub-mix element where "key" is a string conforming to
+BCP-47 that specifies the language for the "value" string. "key" must be the same as the
+one in the mix's @var{annotations}
+@item headphones_rendering_mode
+Indicates whether the input channel-based Audio Element is rendered to stereo loudspeakers
+or spatialized with a binaural renderer when played back on headphones.
+This has no effect if the referenced Audio Element's @var{audio_element_type} is set to
+channel.
+
+The following values are supported:
+
+@table @option
+@item stereo
+@item binaural
+@end table
+
+@end table
+
+@item layout
+Specifies the layouts for this sub-mix on which the loudness information was measured.
+This option must be separated from the rest with a '|'. Several '|' separated entries
+can be defined, and at least one must be set.
+
+It takes the following ":"-separated key=value options:
+
+@table @option
+@item layout_type
+
+@table @option
+@item loudspeakers
+The layout follows the loudspeaker sound system convention of ITU-2051-3.
+@item binaural
+The layout is binaural.
+@end table
+
+@item sound_system
+Channel layout matching one of Sound Systems A to J of ITU-2051-3, plus 7.1.2 and 3.1.2
+This has no effect if @var{layout_type} is set to binaural.
+@item integrated_loudness
+The program integrated loudness information, as defined in ITU-1770-4.
+@item digital_peak
+The digital (sampled) peak value of the audio signal, as defined in ITU-1770-4.
+@item true_peak
+The true peak of the audio signal, as defined in ITU-1770-4.
+@item dialog_anchored_loudness
+The Dialogue loudness information, as defined in ITU-1770-4.
+@item album_anchored_loudness
+The Album loudness information, as defined in ITU-1770-4.
+@end table
+
+@end table
+
+@item annotations
+A key=value string string describing the mix where "key" is a string conforming to BCP-47
+that specifies the language for the "value" string. "key" must be the same as the ones in
+all sub-mix element's @var{annotations}s
+@end table
+
+@end table
+
 @item -target @var{type} (@emph{output})
 Specify target file type (@code{vcd}, @code{svcd}, @code{dvd}, @code{dv},
 @code{dv50}). @var{type} may be prefixed with @code{pal-}, @code{ntsc-} or
diff --git a/fftools/ffmpeg.h b/fftools/ffmpeg.h
index affa80856a..1169f723d1 100644
--- a/fftools/ffmpeg.h
+++ b/fftools/ffmpeg.h
@@ -281,6 +281,8 @@ typedef struct OptionsContext {
     int        nb_disposition;
     SpecifierOpt *program;
     int        nb_program;
+    SpecifierOpt *stream_groups;
+    int        nb_stream_groups;
     SpecifierOpt *time_bases;
     int        nb_time_bases;
     SpecifierOpt *enc_time_bases;
diff --git a/fftools/ffmpeg_mux_init.c b/fftools/ffmpeg_mux_init.c
index f527a083db..0f03ee092e 100644
--- a/fftools/ffmpeg_mux_init.c
+++ b/fftools/ffmpeg_mux_init.c
@@ -40,6 +40,7 @@
 #include "libavutil/dict.h"
 #include "libavutil/display.h"
 #include "libavutil/getenv_utf8.h"
+#include "libavutil/iamf.h"
 #include "libavutil/intreadwrite.h"
 #include "libavutil/log.h"
 #include "libavutil/mem.h"
@@ -2008,6 +2009,342 @@ static int setup_sync_queues(Muxer *mux, AVFormatContext *oc, int64_t buf_size_u
     return 0;
 }
 
+static int of_parse_iamf_audio_element_layers(Muxer *mux, AVStreamGroup *stg, char *ptr)
+{
+    AVIAMFAudioElement *audio_element = stg->params.iamf_audio_element;
+    AVDictionary *dict = NULL;
+    const char *token;
+    int ret = 0;
+
+    audio_element->demixing_info =
+        av_iamf_param_definition_alloc(AV_IAMF_PARAMETER_DEFINITION_DEMIXING, 1, NULL);
+    audio_element->recon_gain_info =
+        av_iamf_param_definition_alloc(AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN, 1, NULL);
+
+    if (!audio_element->demixing_info ||
+        !audio_element->recon_gain_info)
+        return AVERROR(ENOMEM);
+
+    /* process manually set layers and parameters */
+    token = av_strtok(NULL, ",", &ptr);
+    while (token) {
+        const AVDictionaryEntry *e;
+        int demixing = 0, recon_gain = 0;
+        int layer = 0;
+
+        if (av_strstart(token, "layer=", &token))
+            layer = 1;
+        else if (av_strstart(token, "demixing=", &token))
+            demixing = 1;
+        else if (av_strstart(token, "recon_gain=", &token))
+            recon_gain = 1;
+
+        av_dict_free(&dict);
+        ret = av_dict_parse_string(&dict, token, "=", ":", 0);
+        if (ret < 0) {
+            av_log(mux, AV_LOG_ERROR, "Error parsing audio element specification %s\n", token);
+            goto fail;
+        }
+
+        if (layer) {
+            AVIAMFLayer *audio_layer = av_iamf_audio_element_add_layer(audio_element);
+            if (!audio_layer) {
+                av_log(mux, AV_LOG_ERROR, "Error adding layer to stream group %d\n", stg->index);
+                ret = AVERROR(ENOMEM);
+                goto fail;
+            }
+            av_opt_set_dict(audio_layer, &dict);
+        } else if (demixing || recon_gain) {
+            AVIAMFParamDefinition *param = demixing ? audio_element->demixing_info
+                                                    : audio_element->recon_gain_info;
+            void *subblock = av_iamf_param_definition_get_subblock(param, 0);
+
+            av_opt_set_dict(param, &dict);
+            av_opt_set_dict(subblock, &dict);
+        }
+
+        // make sure that no entries are left in the dict
+        e = NULL;
+        if (e = av_dict_iterate(dict, e)) {
+            av_log(mux, AV_LOG_FATAL, "Unknown layer key %s.\n", e->key);
+            ret = AVERROR(EINVAL);
+            goto fail;
+        }
+        token = av_strtok(NULL, ",", &ptr);
+    }
+
+fail:
+    av_dict_free(&dict);
+    if (!ret && !audio_element->nb_layers) {
+        av_log(mux, AV_LOG_ERROR, "No layer in audio element specification\n");
+        ret = AVERROR(EINVAL);
+    }
+
+    return ret;
+}
+
+static int of_parse_iamf_submixes(Muxer *mux, AVStreamGroup *stg, char *ptr)
+{
+    AVFormatContext *oc = mux->fc;
+    AVIAMFMixPresentation *mix = stg->params.iamf_mix_presentation;
+    AVDictionary *dict = NULL;
+    const char *token;
+    char *submix_str = NULL;
+    int ret = 0;
+
+    /* process manually set submixes */
+    token = av_strtok(NULL, ",", &ptr);
+    while (token) {
+        AVIAMFSubmix *submix = NULL;
+        const char *subtoken;
+        char *subptr = NULL;
+
+        if (!av_strstart(token, "submix=", &token)) {
+            av_log(mux, AV_LOG_ERROR, "No submix in mix presentation specification \"%s\"\n", token);
+            goto fail;
+        }
+
+        submix_str = av_strdup(token);
+        if (!submix_str)
+            goto fail;
+
+        submix = av_iamf_mix_presentation_add_submix(mix);
+        if (!submix) {
+            av_log(mux, AV_LOG_ERROR, "Error adding submix to stream group %d\n", stg->index);
+            ret = AVERROR(ENOMEM);
+            goto fail;
+        }
+        submix->output_mix_config =
+            av_iamf_param_definition_alloc(AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN, 0, NULL);
+        if (!submix->output_mix_config) {
+            ret = AVERROR(ENOMEM);
+            goto fail;
+        }
+
+        subptr = NULL;
+        subtoken = av_strtok(submix_str, "|", &subptr);
+        while (subtoken) {
+            const AVDictionaryEntry *e;
+            int element = 0, layout = 0;
+
+            if (av_strstart(subtoken, "element=", &subtoken))
+                element = 1;
+            else if (av_strstart(subtoken, "layout=", &subtoken))
+                layout = 1;
+
+            av_dict_free(&dict);
+            ret = av_dict_parse_string(&dict, subtoken, "=", ":", 0);
+            if (ret < 0) {
+                av_log(mux, AV_LOG_ERROR, "Error parsing submix specification \"%s\"\n", subtoken);
+                goto fail;
+            }
+
+            if (element) {
+                AVIAMFSubmixElement *submix_element;
+                int idx = -1;
+
+                if (e = av_dict_get(dict, "stg", NULL, 0))
+                    idx = strtol(e->value, NULL, 0);
+                av_dict_set(&dict, "stg", NULL, 0);
+                if (idx < 0 || idx >= oc->nb_stream_groups) {
+                    av_log(mux, AV_LOG_ERROR, "Invalid or missing stream group index in "
+                                              "submix element specification \"%s\"\n", subtoken);
+                    ret = AVERROR(EINVAL);
+                    goto fail;
+                }
+                submix_element = av_iamf_submix_add_element(submix);
+                if (!submix_element) {
+                    av_log(mux, AV_LOG_ERROR, "Error adding element to submix\n");
+                    ret = AVERROR(ENOMEM);
+                    goto fail;
+                }
+
+                submix_element->audio_element_id = oc->stream_groups[idx]->id;
+
+                submix_element->element_mix_config =
+                    av_iamf_param_definition_alloc(AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN, 0, NULL);
+                if (!submix_element->element_mix_config)
+                    ret = AVERROR(ENOMEM);
+                av_opt_set_dict2(submix_element, &dict, AV_OPT_SEARCH_CHILDREN);
+            } else if (layout) {
+                AVIAMFSubmixLayout *submix_layout = av_iamf_submix_add_layout(submix);
+                if (!submix_layout) {
+                    av_log(mux, AV_LOG_ERROR, "Error adding layout to submix\n");
+                    ret = AVERROR(ENOMEM);
+                    goto fail;
+                }
+                av_opt_set_dict(submix_layout, &dict);
+            } else
+                av_opt_set_dict2(submix, &dict, AV_OPT_SEARCH_CHILDREN);
+
+            if (ret < 0) {
+                goto fail;
+            }
+
+            // make sure that no entries are left in the dict
+            e = NULL;
+            while (e = av_dict_iterate(dict, e)) {
+                av_log(mux, AV_LOG_FATAL, "Unknown submix key %s.\n", e->key);
+                ret = AVERROR(EINVAL);
+                goto fail;
+            }
+            subtoken = av_strtok(NULL, "|", &subptr);
+        }
+        av_freep(&submix_str);
+
+        if (!submix->nb_elements) {
+            av_log(mux, AV_LOG_ERROR, "No audio elements in submix specification \"%s\"\n", token);
+            ret = AVERROR(EINVAL);
+        }
+        token = av_strtok(NULL, ",", &ptr);
+    }
+
+fail:
+    av_dict_free(&dict);
+    av_free(submix_str);
+
+    return ret;
+}
+
+static int of_parse_group_token(Muxer *mux, const char *token, char *ptr)
+{
+    AVFormatContext *oc = mux->fc;
+    AVStreamGroup *stg;
+    AVDictionary *dict = NULL, *tmp = NULL;
+    const AVDictionaryEntry *e;
+    const AVOption opts[] = {
+        { "type", "Set group type", offsetof(AVStreamGroup, type), AV_OPT_TYPE_INT,
+                { .i64 = 0 }, 0, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "type" },
+            { "iamf_audio_element",    NULL, 0, AV_OPT_TYPE_CONST,
+                { .i64 = AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT },    .unit = "type" },
+            { "iamf_mix_presentation", NULL, 0, AV_OPT_TYPE_CONST,
+                { .i64 = AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION }, .unit = "type" },
+        { NULL },
+    };
+    const AVClass class = {
+        .class_name = "StreamGroupType",
+        .item_name  = av_default_item_name,
+        .option     = opts,
+        .version    = LIBAVUTIL_VERSION_INT,
+    };
+    const AVClass *pclass = &class;
+    int type, ret;
+
+    ret = av_dict_parse_string(&dict, token, "=", ":", AV_DICT_MULTIKEY);
+    if (ret < 0) {
+        av_log(mux, AV_LOG_ERROR, "Error parsing group specification %s\n", token);
+        return ret;
+    }
+
+    // "type" is not a user settable AVOption in AVStreamGroup, so handle it here
+    e = av_dict_get(dict, "type", NULL, 0);
+    if (!e) {
+        av_log(mux, AV_LOG_ERROR, "No type specified for Stream Group in \"%s\"\n", token);
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    ret = av_opt_eval_int(&pclass, opts, e->value, &type);
+    if (!ret && type == AV_STREAM_GROUP_PARAMS_NONE)
+        ret = AVERROR(EINVAL);
+    if (ret < 0) {
+        av_log(mux, AV_LOG_ERROR, "Invalid group type \"%s\"\n", e->value);
+        goto end;
+    }
+
+    av_dict_copy(&tmp, dict, 0);
+    stg = avformat_stream_group_create(oc, type, &tmp);
+    if (!stg) {
+        ret = AVERROR(ENOMEM);
+        goto end;
+    }
+
+    e = NULL;
+    while (e = av_dict_get(dict, "st", e, 0)) {
+        unsigned int idx = strtol(e->value, NULL, 0);
+        if (idx >= oc->nb_streams) {
+            av_log(mux, AV_LOG_ERROR, "Invalid stream index %d\n", idx);
+            ret = AVERROR(EINVAL);
+            goto end;
+        }
+        ret = avformat_stream_group_add_stream(stg, oc->streams[idx]);
+        if (ret < 0)
+            goto end;
+    }
+    while (e = av_dict_get(dict, "stg", e, 0)) {
+        unsigned int idx = strtol(e->value, NULL, 0);
+        if (idx >= oc->nb_stream_groups || idx == stg->index) {
+            av_log(mux, AV_LOG_ERROR, "Invalid stream group index %u\n", idx);
+            ret = AVERROR(EINVAL);
+            goto end;
+        }
+        for (int i = 0; i < oc->stream_groups[idx]->nb_streams; i++) {
+            ret = avformat_stream_group_add_stream(stg, oc->stream_groups[idx]->streams[i]);
+            if (ret < 0)
+                goto end;
+        }
+    }
+
+    switch(type) {
+    case AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT:
+        ret = of_parse_iamf_audio_element_layers(mux, stg, ptr);
+        break;
+    case AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION:
+        ret = of_parse_iamf_submixes(mux, stg, ptr);
+        break;
+    default:
+        av_log(mux, AV_LOG_FATAL, "Unknown group type %d.\n", type);
+        ret = AVERROR(EINVAL);
+        break;
+    }
+
+    if (ret < 0)
+        goto end;
+
+    // make sure that nothing but "st" and "stg" entries are left in the dict
+    e = NULL;
+    av_dict_set(&tmp, "type", NULL, 0);
+    while (e = av_dict_iterate(tmp, e)) {
+        if (!strcmp(e->key, "st") || !strcmp(e->key, "stg"))
+            continue;
+
+        av_log(mux, AV_LOG_FATAL, "Unknown group key %s.\n", e->key);
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    ret = 0;
+end:
+    av_dict_free(&dict);
+    av_dict_free(&tmp);
+
+    return ret;
+}
+
+static int of_add_groups(Muxer *mux, const OptionsContext *o)
+{
+    /* process manually set groups */
+    for (int i = 0; i < o->nb_stream_groups; i++) {
+        const char *token;
+        char *str, *ptr = NULL;
+        int ret = 0;
+
+        str = av_strdup(o->stream_groups[i].u.str);
+        if (!str)
+            return ret;
+
+        token = av_strtok(str, ",", &ptr);
+        if (token)
+            ret = of_parse_group_token(mux, token, ptr);
+
+        av_free(str);
+        if (ret < 0)
+            return ret;
+    }
+
+    return 0;
+}
+
 static int of_add_programs(Muxer *mux, const OptionsContext *o)
 {
     AVFormatContext *oc = mux->fc;
@@ -2793,6 +3130,10 @@ int of_open(const OptionsContext *o, const char *filename, Scheduler *sch)
     if (err < 0)
         return err;
 
+    err = of_add_groups(mux, o);
+    if (err < 0)
+        return err;
+
     err = of_add_programs(mux, o);
     if (err < 0)
         return err;
diff --git a/fftools/ffmpeg_opt.c b/fftools/ffmpeg_opt.c
index 6177a96a4e..915f8e3ea0 100644
--- a/fftools/ffmpeg_opt.c
+++ b/fftools/ffmpeg_opt.c
@@ -1493,6 +1493,8 @@ const OptionDef options[] = {
         "add metadata", "string=string" },
     { "program",        HAS_ARG | OPT_STRING | OPT_SPEC | OPT_OUTPUT, { .off = OFFSET(program) },
         "add program with specified streams", "title=string:st=number..." },
+    { "stream_group",        HAS_ARG | OPT_STRING | OPT_SPEC | OPT_OUTPUT, { .off = OFFSET(stream_groups) },
+        "add stream group with specified streams and group type-specific arguments", "id=number:st=number..." },
     { "dframes",        HAS_ARG | OPT_PERFILE | OPT_EXPERT |
                         OPT_OUTPUT,                                  { .func_arg = opt_data_frames },
         "set the number of data frames to output", "number" },
-- 
2.43.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply	[flat|nested] 16+ messages in thread
* [FFmpeg-devel] [PATCH 4/8] avcodec/packet: add IAMF Parameters side data types
  2023-12-14 20:14 [FFmpeg-devel] [PATCH v7 0/8] avformat: introduce AVStreamGroup James Almer
                   ` (2 preceding siblings ...)
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 3/8] ffmpeg: add support for muxing AVStreamGroups James Almer
@ 2023-12-14 20:14 ` James Almer
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 5/8] avcodec/get_bits: add get_leb() James Almer
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: James Almer @ 2023-12-14 20:14 UTC (permalink / raw)
  To: ffmpeg-devel
Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavcodec/avpacket.c |  3 +++
 libavcodec/packet.h   | 24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+)
diff --git a/libavcodec/avpacket.c b/libavcodec/avpacket.c
index e29725c2d2..0f8c9b77ae 100644
--- a/libavcodec/avpacket.c
+++ b/libavcodec/avpacket.c
@@ -301,6 +301,9 @@ const char *av_packet_side_data_name(enum AVPacketSideDataType type)
     case AV_PKT_DATA_DOVI_CONF:                  return "DOVI configuration record";
     case AV_PKT_DATA_S12M_TIMECODE:              return "SMPTE ST 12-1:2014 timecode";
     case AV_PKT_DATA_DYNAMIC_HDR10_PLUS:         return "HDR10+ Dynamic Metadata (SMPTE 2094-40)";
+    case AV_PKT_DATA_IAMF_MIX_GAIN_PARAM:        return "IAMF Mix Gain Parameter Data";
+    case AV_PKT_DATA_IAMF_DEMIXING_INFO_PARAM:   return "IAMF Demixing Info Parameter Data";
+    case AV_PKT_DATA_IAMF_RECON_GAIN_INFO_PARAM: return "IAMF Recon Gain Info Parameter Data";
     }
     return NULL;
 }
diff --git a/libavcodec/packet.h b/libavcodec/packet.h
index b19409b719..2c57d262c6 100644
--- a/libavcodec/packet.h
+++ b/libavcodec/packet.h
@@ -299,6 +299,30 @@ enum AVPacketSideDataType {
      */
     AV_PKT_DATA_DYNAMIC_HDR10_PLUS,
 
+    /**
+     * IAMF Mix Gain Parameter Data associated with the audio frame. This metadata
+     * is in the form of the AVIAMFParamDefinition struct and contains information
+     * defined in sections 3.6.1 and 3.8.1 of the Immersive Audio Model and
+     * Formats standard.
+     */
+    AV_PKT_DATA_IAMF_MIX_GAIN_PARAM,
+
+    /**
+     * IAMF Demixing Info Parameter Data associated with the audio frame. This
+     * metadata is in the form of the AVIAMFParamDefinition struct and contains
+     * information defined in sections 3.6.1 and 3.8.2 of the Immersive Audio Model
+     * and Formats standard.
+     */
+    AV_PKT_DATA_IAMF_DEMIXING_INFO_PARAM,
+
+    /**
+     * IAMF Recon Gain Info Parameter Data associated with the audio frame. This
+     * metadata is in the form of the AVIAMFParamDefinition struct and contains
+     * information defined in sections 3.6.1 and 3.8.3 of the Immersive Audio Model
+     * and Formats standard.
+     */
+    AV_PKT_DATA_IAMF_RECON_GAIN_INFO_PARAM,
+
     /**
      * The number of side data types.
      * This is not part of the public API/ABI in the sense that it may
-- 
2.43.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply	[flat|nested] 16+ messages in thread
* [FFmpeg-devel] [PATCH 5/8] avcodec/get_bits: add get_leb()
  2023-12-14 20:14 [FFmpeg-devel] [PATCH v7 0/8] avformat: introduce AVStreamGroup James Almer
                   ` (3 preceding siblings ...)
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 4/8] avcodec/packet: add IAMF Parameters side data types James Almer
@ 2023-12-14 20:14 ` James Almer
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 6/8] avformat/aviobuf: add ffio_read_leb() and ffio_write_leb() James Almer
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: James Almer @ 2023-12-14 20:14 UTC (permalink / raw)
  To: ffmpeg-devel
Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavcodec/bitstream.h          |  2 ++
 libavcodec/bitstream_template.h | 23 +++++++++++++++++++++++
 libavcodec/get_bits.h           | 24 ++++++++++++++++++++++++
 3 files changed, 49 insertions(+)
diff --git a/libavcodec/bitstream.h b/libavcodec/bitstream.h
index 35b7873b9c..17f8a5da83 100644
--- a/libavcodec/bitstream.h
+++ b/libavcodec/bitstream.h
@@ -103,6 +103,7 @@
 # define bits_apply_sign    bits_apply_sign_le
 # define bits_read_vlc      bits_read_vlc_le
 # define bits_read_vlc_multi bits_read_vlc_multi_le
+# define bits_read_leb      bits_read_leb_le
 
 #elif defined(BITS_DEFAULT_BE)
 
@@ -132,6 +133,7 @@
 # define bits_apply_sign    bits_apply_sign_be
 # define bits_read_vlc      bits_read_vlc_be
 # define bits_read_vlc_multi bits_read_vlc_multi_be
+# define bits_read_leb      bits_read_leb_be
 
 #endif
 
diff --git a/libavcodec/bitstream_template.h b/libavcodec/bitstream_template.h
index 4f3d07275f..4c7101632f 100644
--- a/libavcodec/bitstream_template.h
+++ b/libavcodec/bitstream_template.h
@@ -562,6 +562,29 @@ static inline int BS_FUNC(read_vlc_multi)(BSCTX *bc, uint8_t dst[8],
     return ret;
 }
 
+/**
+ * Read a unsigned integer coded as a variable number of up to eight
+ * little-endian bytes, where the MSB in a byte signals another byte
+ * must be read.
+ * Values > UINT_MAX are truncated, but all coded bits are read.
+ */
+static inline unsigned BS_FUNC(read_leb)(BSCTX *bc) {
+    int more, i = 0;
+    unsigned leb = 0;
+
+    do {
+        int byte = BS_FUNC(read)(bc, 8);
+        unsigned bits = byte & 0x7f;
+        more = byte & 0x80;
+        if (i <= 4)
+            leb |= bits << (i * 7);
+        if (++i == 8)
+            break;
+    } while (more);
+
+    return leb;
+}
+
 #undef BSCTX
 #undef BS_FUNC
 #undef BS_JOIN3
diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
index cfcf97c021..9e19d2a439 100644
--- a/libavcodec/get_bits.h
+++ b/libavcodec/get_bits.h
@@ -94,6 +94,7 @@ typedef BitstreamContext GetBitContext;
 #define align_get_bits      bits_align
 #define get_vlc2            bits_read_vlc
 #define get_vlc_multi       bits_read_vlc_multi
+#define get_leb             bits_read_leb
 
 #define init_get_bits8_le(s, buffer, byte_size) bits_init8_le((BitstreamContextLE*)s, buffer, byte_size)
 #define get_bits_le(s, n)                       bits_read_le((BitstreamContextLE*)s, n)
@@ -710,6 +711,29 @@ static inline int skip_1stop_8data_bits(GetBitContext *gb)
     return 0;
 }
 
+/**
+ * Read a unsigned integer coded as a variable number of up to eight
+ * little-endian bytes, where the MSB in a byte signals another byte
+ * must be read.
+ * All coded bits are read, but values > UINT_MAX are truncated.
+ */
+static inline unsigned get_leb(GetBitContext *s) {
+    int more, i = 0;
+    unsigned leb = 0;
+
+    do {
+        int byte = get_bits(s, 8);
+        unsigned bits = byte & 0x7f;
+        more = byte & 0x80;
+        if (i <= 4)
+            leb |= bits << (i * 7);
+        if (++i == 8)
+            break;
+    } while (more);
+
+    return leb;
+}
+
 #endif // CACHED_BITSTREAM_READER
 
 #endif /* AVCODEC_GET_BITS_H */
-- 
2.43.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply	[flat|nested] 16+ messages in thread
* [FFmpeg-devel] [PATCH 6/8] avformat/aviobuf: add ffio_read_leb() and ffio_write_leb()
  2023-12-14 20:14 [FFmpeg-devel] [PATCH v7 0/8] avformat: introduce AVStreamGroup James Almer
                   ` (4 preceding siblings ...)
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 5/8] avcodec/get_bits: add get_leb() James Almer
@ 2023-12-14 20:14 ` James Almer
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 7/8] avformat: Immersive Audio Model and Formats demuxer James Almer
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 8/8] avformat: Immersive Audio Model and Formats muxer James Almer
  7 siblings, 0 replies; 16+ messages in thread
From: James Almer @ 2023-12-14 20:14 UTC (permalink / raw)
  To: ffmpeg-devel
Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavformat/avio_internal.h | 10 ++++++++++
 libavformat/aviobuf.c       | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)
diff --git a/libavformat/avio_internal.h b/libavformat/avio_internal.h
index bd58499b64..f2e4ff30cb 100644
--- a/libavformat/avio_internal.h
+++ b/libavformat/avio_internal.h
@@ -146,6 +146,16 @@ int ffio_rewind_with_probe_data(AVIOContext *s, unsigned char **buf, int buf_siz
 
 uint64_t ffio_read_varlen(AVIOContext *bc);
 
+/**
+ * Read a unsigned integer coded as a variable number of up to eight
+ * little-endian bytes, where the MSB in a byte signals another byte
+ * must be read.
+ * All coded bytes are read, but values > UINT_MAX are truncated.
+ */
+unsigned int ffio_read_leb(AVIOContext *s);
+
+void ffio_write_leb(AVIOContext *s, unsigned val);
+
 /**
  * Read size bytes from AVIOContext into buf.
  * Check that exactly size bytes have been read.
diff --git a/libavformat/aviobuf.c b/libavformat/aviobuf.c
index 2899c75521..5a329ce465 100644
--- a/libavformat/aviobuf.c
+++ b/libavformat/aviobuf.c
@@ -971,6 +971,39 @@ uint64_t ffio_read_varlen(AVIOContext *bc){
     return val;
 }
 
+unsigned int ffio_read_leb(AVIOContext *s) {
+    int more, i = 0;
+    unsigned leb = 0;
+
+    do {
+        int byte = avio_r8(s);
+        unsigned bits = byte & 0x7f;
+        more = byte & 0x80;
+        if (i <= 4)
+            leb |= bits << (i * 7);
+        if (++i == 8)
+            break;
+    } while (more);
+
+    return leb;
+}
+
+void ffio_write_leb(AVIOContext *s, unsigned val)
+{
+    int len;
+    uint8_t byte;
+
+    len = (av_log2(val) + 7) / 7;
+
+    for (int i = 0; i < len; i++) {
+        byte = val >> (7 * i) & 0x7f;
+        if (i < len - 1)
+            byte |= 0x80;
+
+        avio_w8(s, byte);
+    }
+}
+
 int ffio_fdopen(AVIOContext **s, URLContext *h)
 {
     uint8_t *buffer = NULL;
-- 
2.43.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply	[flat|nested] 16+ messages in thread
* [FFmpeg-devel] [PATCH 7/8] avformat: Immersive Audio Model and Formats demuxer
  2023-12-14 20:14 [FFmpeg-devel] [PATCH v7 0/8] avformat: introduce AVStreamGroup James Almer
                   ` (5 preceding siblings ...)
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 6/8] avformat/aviobuf: add ffio_read_leb() and ffio_write_leb() James Almer
@ 2023-12-14 20:14 ` James Almer
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 8/8] avformat: Immersive Audio Model and Formats muxer James Almer
  7 siblings, 0 replies; 16+ messages in thread
From: James Almer @ 2023-12-14 20:14 UTC (permalink / raw)
  To: ffmpeg-devel
Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavformat/Makefile     |    1 +
 libavformat/allformats.c |    1 +
 libavformat/iamf.c       |  125 +++++
 libavformat/iamf.h       |  163 ++++++
 libavformat/iamf_parse.c | 1106 ++++++++++++++++++++++++++++++++++++++
 libavformat/iamf_parse.h |   38 ++
 libavformat/iamfdec.c    |  503 +++++++++++++++++
 7 files changed, 1937 insertions(+)
 create mode 100644 libavformat/iamf.c
 create mode 100644 libavformat/iamf.h
 create mode 100644 libavformat/iamf_parse.c
 create mode 100644 libavformat/iamf_parse.h
 create mode 100644 libavformat/iamfdec.c
diff --git a/libavformat/Makefile b/libavformat/Makefile
index 2db83aff81..f23c22792b 100644
--- a/libavformat/Makefile
+++ b/libavformat/Makefile
@@ -258,6 +258,7 @@ OBJS-$(CONFIG_EVC_MUXER)                 += rawenc.o
 OBJS-$(CONFIG_HLS_DEMUXER)               += hls.o hls_sample_encryption.o
 OBJS-$(CONFIG_HLS_MUXER)                 += hlsenc.o hlsplaylist.o avc.o
 OBJS-$(CONFIG_HNM_DEMUXER)               += hnm.o
+OBJS-$(CONFIG_IAMF_DEMUXER)              += iamfdec.o iamf_parse.o iamf.o
 OBJS-$(CONFIG_ICO_DEMUXER)               += icodec.o
 OBJS-$(CONFIG_ICO_MUXER)                 += icoenc.o
 OBJS-$(CONFIG_IDCIN_DEMUXER)             += idcin.o
diff --git a/libavformat/allformats.c b/libavformat/allformats.c
index c8bb4e3866..6e520b78a6 100644
--- a/libavformat/allformats.c
+++ b/libavformat/allformats.c
@@ -212,6 +212,7 @@ extern const FFOutputFormat ff_hevc_muxer;
 extern const AVInputFormat  ff_hls_demuxer;
 extern const FFOutputFormat ff_hls_muxer;
 extern const AVInputFormat  ff_hnm_demuxer;
+extern const AVInputFormat  ff_iamf_demuxer;
 extern const AVInputFormat  ff_ico_demuxer;
 extern const FFOutputFormat ff_ico_muxer;
 extern const AVInputFormat  ff_idcin_demuxer;
diff --git a/libavformat/iamf.c b/libavformat/iamf.c
new file mode 100644
index 0000000000..5de70dc082
--- /dev/null
+++ b/libavformat/iamf.c
@@ -0,0 +1,125 @@
+/*
+ * Immersive Audio Model and Formats common helpers and structs
+ * Copyright (c) 2023 James Almer <jamrial@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/channel_layout.h"
+#include "libavutil/iamf.h"
+#include "libavutil/mem.h"
+#include "iamf.h"
+
+const AVChannelLayout ff_iamf_scalable_ch_layouts[10] = {
+    AV_CHANNEL_LAYOUT_MONO,
+    AV_CHANNEL_LAYOUT_STEREO,
+    // "Loudspeaker configuration for Sound System B"
+    AV_CHANNEL_LAYOUT_5POINT1_BACK,
+    // "Loudspeaker configuration for Sound System C"
+    AV_CHANNEL_LAYOUT_5POINT1POINT2_BACK,
+    // "Loudspeaker configuration for Sound System D"
+    AV_CHANNEL_LAYOUT_5POINT1POINT4_BACK,
+    // "Loudspeaker configuration for Sound System I"
+    AV_CHANNEL_LAYOUT_7POINT1,
+    // "Loudspeaker configuration for Sound System I" + Ltf + Rtf
+    AV_CHANNEL_LAYOUT_7POINT1POINT2,
+    // "Loudspeaker configuration for Sound System J"
+    AV_CHANNEL_LAYOUT_7POINT1POINT4_BACK,
+    // Front subset of "Loudspeaker configuration for Sound System J"
+    AV_CHANNEL_LAYOUT_3POINT1POINT2,
+    // Binaural
+    AV_CHANNEL_LAYOUT_STEREO,
+};
+
+const struct IAMFSoundSystemMap ff_iamf_sound_system_map[13] = {
+    { SOUND_SYSTEM_A_0_2_0, AV_CHANNEL_LAYOUT_STEREO },
+    { SOUND_SYSTEM_B_0_5_0, AV_CHANNEL_LAYOUT_5POINT1_BACK },
+    { SOUND_SYSTEM_C_2_5_0, AV_CHANNEL_LAYOUT_5POINT1POINT2_BACK },
+    { SOUND_SYSTEM_D_4_5_0, AV_CHANNEL_LAYOUT_5POINT1POINT4_BACK },
+    { SOUND_SYSTEM_E_4_5_1,
+        {
+            .nb_channels = 11,
+            .order       = AV_CHANNEL_ORDER_NATIVE,
+            .u.mask      = AV_CH_LAYOUT_5POINT1POINT4_BACK | AV_CH_BOTTOM_FRONT_CENTER,
+        },
+    },
+    { SOUND_SYSTEM_F_3_7_0,  AV_CHANNEL_LAYOUT_7POINT2POINT3 },
+    { SOUND_SYSTEM_G_4_9_0,  AV_CHANNEL_LAYOUT_9POINT1POINT4_BACK },
+    { SOUND_SYSTEM_H_9_10_3, AV_CHANNEL_LAYOUT_22POINT2 },
+    { SOUND_SYSTEM_I_0_7_0,  AV_CHANNEL_LAYOUT_7POINT1 },
+    { SOUND_SYSTEM_J_4_7_0,  AV_CHANNEL_LAYOUT_7POINT1POINT4_BACK },
+    { SOUND_SYSTEM_10_2_7_0, AV_CHANNEL_LAYOUT_7POINT1POINT2 },
+    { SOUND_SYSTEM_11_2_3_0, AV_CHANNEL_LAYOUT_3POINT1POINT2 },
+    { SOUND_SYSTEM_12_0_1_0, AV_CHANNEL_LAYOUT_MONO },
+};
+
+void ff_iamf_free_audio_element(IAMFAudioElement **paudio_element)
+{
+    IAMFAudioElement *audio_element = *paudio_element;
+
+    if (!audio_element)
+        return;
+
+    for (int i = 0; i < audio_element->nb_substreams; i++)
+        avcodec_parameters_free(&audio_element->substreams[i].codecpar);
+    av_free(audio_element->substreams);
+    av_free(audio_element->layers);
+    av_iamf_audio_element_free(&audio_element->element);
+    av_freep(paudio_element);
+}
+
+void ff_iamf_free_mix_presentation(IAMFMixPresentation **pmix_presentation)
+{
+    IAMFMixPresentation *mix_presentation = *pmix_presentation;
+
+    if (!mix_presentation)
+        return;
+
+    for (int i = 0; i < mix_presentation->count_label; i++)
+        av_free(mix_presentation->language_label[i]);
+    av_free(mix_presentation->language_label);
+    av_iamf_mix_presentation_free(&mix_presentation->mix);
+    av_freep(pmix_presentation);
+}
+
+void ff_iamf_uninit_context(IAMFContext *c)
+{
+    if (!c)
+        return;
+
+    for (int i = 0; i < c->nb_codec_configs; i++) {
+        av_free(c->codec_configs[i]->extradata);
+        av_free(c->codec_configs[i]);
+    }
+    av_freep(&c->codec_configs);
+    c->nb_codec_configs = 0;
+
+    for (int i = 0; i < c->nb_audio_elements; i++)
+        ff_iamf_free_audio_element(&c->audio_elements[i]);
+    av_freep(&c->audio_elements);
+    c->nb_audio_elements = 0;
+
+    for (int i = 0; i < c->nb_mix_presentations; i++)
+        ff_iamf_free_mix_presentation(&c->mix_presentations[i]);
+    av_freep(&c->mix_presentations);
+    c->nb_mix_presentations = 0;
+
+    for (int i = 0; i < c->nb_param_definitions; i++)
+        av_free(c->param_definitions[i]);
+    av_freep(&c->param_definitions);
+    c->nb_param_definitions = 0;
+}
diff --git a/libavformat/iamf.h b/libavformat/iamf.h
new file mode 100644
index 0000000000..ce94cb5bc4
--- /dev/null
+++ b/libavformat/iamf.h
@@ -0,0 +1,163 @@
+/*
+ * Immersive Audio Model and Formats common helpers and structs
+ * Copyright (c) 2023 James Almer <jamrial@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVFORMAT_IAMF_H
+#define AVFORMAT_IAMF_H
+
+#include <stdint.h>
+
+#include "libavutil/channel_layout.h"
+#include "libavutil/iamf.h"
+#include "libavcodec/codec_id.h"
+#include "libavcodec/codec_par.h"
+#include "avformat.h"
+
+#define MAX_IAMF_OBU_HEADER_SIZE (1 + 8 * 3)
+
+// OBU types (section 3.2).
+enum IAMF_OBU_Type {
+    IAMF_OBU_IA_CODEC_CONFIG        = 0,
+    IAMF_OBU_IA_AUDIO_ELEMENT       = 1,
+    IAMF_OBU_IA_MIX_PRESENTATION    = 2,
+    IAMF_OBU_IA_PARAMETER_BLOCK     = 3,
+    IAMF_OBU_IA_TEMPORAL_DELIMITER  = 4,
+    IAMF_OBU_IA_AUDIO_FRAME         = 5,
+    IAMF_OBU_IA_AUDIO_FRAME_ID0     = 6,
+    IAMF_OBU_IA_AUDIO_FRAME_ID1     = 7,
+    IAMF_OBU_IA_AUDIO_FRAME_ID2     = 8,
+    IAMF_OBU_IA_AUDIO_FRAME_ID3     = 9,
+    IAMF_OBU_IA_AUDIO_FRAME_ID4     = 10,
+    IAMF_OBU_IA_AUDIO_FRAME_ID5     = 11,
+    IAMF_OBU_IA_AUDIO_FRAME_ID6     = 12,
+    IAMF_OBU_IA_AUDIO_FRAME_ID7     = 13,
+    IAMF_OBU_IA_AUDIO_FRAME_ID8     = 14,
+    IAMF_OBU_IA_AUDIO_FRAME_ID9     = 15,
+    IAMF_OBU_IA_AUDIO_FRAME_ID10    = 16,
+    IAMF_OBU_IA_AUDIO_FRAME_ID11    = 17,
+    IAMF_OBU_IA_AUDIO_FRAME_ID12    = 18,
+    IAMF_OBU_IA_AUDIO_FRAME_ID13    = 19,
+    IAMF_OBU_IA_AUDIO_FRAME_ID14    = 20,
+    IAMF_OBU_IA_AUDIO_FRAME_ID15    = 21,
+    IAMF_OBU_IA_AUDIO_FRAME_ID16    = 22,
+    IAMF_OBU_IA_AUDIO_FRAME_ID17    = 23,
+    // 24~30 reserved.
+    IAMF_OBU_IA_SEQUENCE_HEADER     = 31,
+};
+
+typedef struct IAMFCodecConfig {
+    unsigned codec_config_id;
+    enum AVCodecID codec_id;
+    uint32_t codec_tag;
+    unsigned nb_samples;
+    int seek_preroll;
+    int sample_rate;
+    int extradata_size;
+    uint8_t *extradata;
+} IAMFCodecConfig;
+
+typedef struct IAMFLayer {
+    unsigned int substream_count;
+    unsigned int coupled_substream_count;
+} IAMFLayer;
+
+typedef struct IAMFSubStream {
+    unsigned int audio_substream_id;
+
+    // demux
+    AVCodecParameters *codecpar;
+} IAMFSubStream;
+
+typedef struct IAMFAudioElement {
+    AVIAMFAudioElement *element;
+    unsigned int audio_element_id;
+
+    IAMFSubStream *substreams;
+    unsigned int nb_substreams;
+
+    unsigned int codec_config_id;
+
+    // mux
+    IAMFLayer *layers;
+    unsigned int nb_layers;
+} IAMFAudioElement;
+
+typedef struct IAMFMixPresentation {
+    AVIAMFMixPresentation *mix;
+    unsigned int mix_presentation_id;
+
+    // demux
+    unsigned int count_label;
+    char **language_label;
+} IAMFMixPresentation;
+
+typedef struct IAMFParamDefinition {
+    const IAMFAudioElement *audio_element;
+    AVIAMFParamDefinition *param;
+    int mode;
+    size_t param_size;
+} IAMFParamDefinition;
+
+typedef struct IAMFContext {
+    IAMFCodecConfig **codec_configs;
+    int nb_codec_configs;
+    IAMFAudioElement **audio_elements;
+    int nb_audio_elements;
+    IAMFMixPresentation **mix_presentations;
+    int nb_mix_presentations;
+    IAMFParamDefinition **param_definitions;
+    int nb_param_definitions;
+} IAMFContext;
+
+enum IAMF_Anchor_Element {
+    IAMF_ANCHOR_ELEMENT_UNKNWONW,
+    IAMF_ANCHOR_ELEMENT_DIALOGUE,
+    IAMF_ANCHOR_ELEMENT_ALBUM,
+};
+
+enum IAMF_Sound_System {
+    SOUND_SYSTEM_A_0_2_0  = 0,  // "Loudspeaker configuration for Sound System A"
+    SOUND_SYSTEM_B_0_5_0  = 1,  // "Loudspeaker configuration for Sound System B"
+    SOUND_SYSTEM_C_2_5_0  = 2,  // "Loudspeaker configuration for Sound System C"
+    SOUND_SYSTEM_D_4_5_0  = 3,  // "Loudspeaker configuration for Sound System D"
+    SOUND_SYSTEM_E_4_5_1  = 4,  // "Loudspeaker configuration for Sound System E"
+    SOUND_SYSTEM_F_3_7_0  = 5,  // "Loudspeaker configuration for Sound System F"
+    SOUND_SYSTEM_G_4_9_0  = 6,  // "Loudspeaker configuration for Sound System G"
+    SOUND_SYSTEM_H_9_10_3 = 7,  // "Loudspeaker configuration for Sound System H"
+    SOUND_SYSTEM_I_0_7_0  = 8,  // "Loudspeaker configuration for Sound System I"
+    SOUND_SYSTEM_J_4_7_0  = 9, // "Loudspeaker configuration for Sound System J"
+    SOUND_SYSTEM_10_2_7_0 = 10, // "Loudspeaker configuration for Sound System I" + Ltf + Rtf
+    SOUND_SYSTEM_11_2_3_0 = 11, // Front subset of "Loudspeaker configuration for Sound System J"
+    SOUND_SYSTEM_12_0_1_0 = 12, // Mono
+};
+
+struct IAMFSoundSystemMap {
+    enum IAMF_Sound_System id;
+    AVChannelLayout layout;
+};
+
+extern const AVChannelLayout ff_iamf_scalable_ch_layouts[10];
+extern const struct IAMFSoundSystemMap ff_iamf_sound_system_map[13];
+
+void ff_iamf_free_audio_element(IAMFAudioElement **paudio_element);
+void ff_iamf_free_mix_presentation(IAMFMixPresentation **pmix_presentation);
+void ff_iamf_uninit_context(IAMFContext *c);
+
+#endif /* AVFORMAT_IAMF_H */
diff --git a/libavformat/iamf_parse.c b/libavformat/iamf_parse.c
new file mode 100644
index 0000000000..60305743f9
--- /dev/null
+++ b/libavformat/iamf_parse.c
@@ -0,0 +1,1106 @@
+/*
+ * Immersive Audio Model and Formats parsing
+ * Copyright (c) 2023 James Almer <jamrial@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/avassert.h"
+#include "libavutil/common.h"
+#include "libavutil/iamf.h"
+#include "libavutil/intreadwrite.h"
+#include "libavutil/log.h"
+#include "libavcodec/get_bits.h"
+#include "libavcodec/flac.h"
+#include "libavcodec/mpeg4audio.h"
+#include "libavcodec/put_bits.h"
+#include "avio_internal.h"
+#include "iamf_parse.h"
+#include "isom.h"
+
+static int opus_decoder_config(IAMFCodecConfig *codec_config,
+                               AVIOContext *pb, int len)
+{
+    int left = len - avio_tell(pb);
+
+    if (left < 11)
+        return AVERROR_INVALIDDATA;
+
+    codec_config->extradata = av_malloc(left + 8);
+    if (!codec_config->extradata)
+        return AVERROR(ENOMEM);
+
+    AV_WB32(codec_config->extradata, MKBETAG('O','p','u','s'));
+    AV_WB32(codec_config->extradata + 4, MKBETAG('H','e','a','d'));
+    codec_config->extradata_size = avio_read(pb, codec_config->extradata + 8, left);
+    if (codec_config->extradata_size < left)
+        return AVERROR_INVALIDDATA;
+
+    codec_config->extradata_size += 8;
+    codec_config->sample_rate = 48000;
+
+    return 0;
+}
+
+static int aac_decoder_config(IAMFCodecConfig *codec_config,
+                              AVIOContext *pb, int len, void *logctx)
+{
+    MPEG4AudioConfig cfg = { 0 };
+    int object_type_id, codec_id, stream_type;
+    int ret, tag, left;
+
+    tag = avio_r8(pb);
+    if (tag != MP4DecConfigDescrTag)
+        return AVERROR_INVALIDDATA;
+
+    object_type_id = avio_r8(pb);
+    if (object_type_id != 0x40)
+        return AVERROR_INVALIDDATA;
+
+    stream_type = avio_r8(pb);
+    if (((stream_type >> 2) != 5) || ((stream_type >> 1) & 1))
+        return AVERROR_INVALIDDATA;
+
+    avio_skip(pb, 3); // buffer size db
+    avio_skip(pb, 4); // rc_max_rate
+    avio_skip(pb, 4); // avg bitrate
+
+    codec_id = ff_codec_get_id(ff_mp4_obj_type, object_type_id);
+    if (codec_id && codec_id != codec_config->codec_id)
+        return AVERROR_INVALIDDATA;
+
+    tag = avio_r8(pb);
+    if (tag != MP4DecSpecificDescrTag)
+        return AVERROR_INVALIDDATA;
+
+    left = len - avio_tell(pb);
+    if (left <= 0)
+        return AVERROR_INVALIDDATA;
+
+    codec_config->extradata = av_malloc(left);
+    if (!codec_config->extradata)
+        return AVERROR(ENOMEM);
+
+    codec_config->extradata_size = avio_read(pb, codec_config->extradata, left);
+    if (codec_config->extradata_size < left)
+        return AVERROR_INVALIDDATA;
+
+    ret = avpriv_mpeg4audio_get_config2(&cfg, codec_config->extradata,
+                                        codec_config->extradata_size, 1, logctx);
+    if (ret < 0)
+        return ret;
+
+    codec_config->sample_rate = cfg.sample_rate;
+
+    return 0;
+}
+
+static int flac_decoder_config(IAMFCodecConfig *codec_config,
+                               AVIOContext *pb, int len)
+{
+    int left;
+
+    avio_skip(pb, 4); // METADATA_BLOCK_HEADER
+
+    left = len - avio_tell(pb);
+    if (left < FLAC_STREAMINFO_SIZE)
+        return AVERROR_INVALIDDATA;
+
+    codec_config->extradata = av_malloc(left);
+    if (!codec_config->extradata)
+        return AVERROR(ENOMEM);
+
+    codec_config->extradata_size = avio_read(pb, codec_config->extradata, left);
+    if (codec_config->extradata_size < left)
+        return AVERROR_INVALIDDATA;
+
+    codec_config->sample_rate = AV_RB24(codec_config->extradata + 10) >> 4;
+
+    return 0;
+}
+
+static int ipcm_decoder_config(IAMFCodecConfig *codec_config,
+                               AVIOContext *pb, int len)
+{
+    static const enum AVSampleFormat sample_fmt[2][3] = {
+        { AV_CODEC_ID_PCM_S16BE, AV_CODEC_ID_PCM_S24BE, AV_CODEC_ID_PCM_S32BE },
+        { AV_CODEC_ID_PCM_S16LE, AV_CODEC_ID_PCM_S24LE, AV_CODEC_ID_PCM_S32LE },
+    };
+    int sample_format = avio_r8(pb); // 0 = BE, 1 = LE
+    int sample_size = (avio_r8(pb) / 8 - 2); // 16, 24, 32
+    if (sample_format > 1 || sample_size > 2)
+        return AVERROR_INVALIDDATA;
+
+    codec_config->codec_id = sample_fmt[sample_format][sample_size];
+    codec_config->sample_rate = avio_rb32(pb);
+
+    if (len - avio_tell(pb))
+        return AVERROR_INVALIDDATA;
+
+    return 0;
+}
+
+static int codec_config_obu(void *s, IAMFContext *c, AVIOContext *pb, int len)
+{
+    IAMFCodecConfig **tmp, *codec_config = NULL;
+    FFIOContext b;
+    AVIOContext *pbc;
+    uint8_t *buf;
+    enum AVCodecID avcodec_id;
+    unsigned codec_config_id, nb_samples, codec_id;
+    int16_t seek_preroll;
+    int ret;
+
+    buf = av_malloc(len);
+    if (!buf)
+        return AVERROR(ENOMEM);
+
+    ret = avio_read(pb, buf, len);
+    if (ret != len) {
+        if (ret >= 0)
+            ret = AVERROR_INVALIDDATA;
+        goto fail;
+    }
+
+    ffio_init_context(&b, buf, len, 0, NULL, NULL, NULL, NULL);
+    pbc = &b.pub;
+
+    codec_config_id = ffio_read_leb(pbc);
+    codec_id = avio_rb32(pbc);
+    nb_samples = ffio_read_leb(pbc);
+    seek_preroll = avio_rb16(pbc);
+
+    switch(codec_id) {
+    case MKBETAG('O','p','u','s'):
+        avcodec_id = AV_CODEC_ID_OPUS;
+        break;
+    case MKBETAG('m','p','4','a'):
+        avcodec_id = AV_CODEC_ID_AAC;
+        break;
+    case MKBETAG('f','L','a','C'):
+        avcodec_id = AV_CODEC_ID_FLAC;
+        break;
+    default:
+        avcodec_id = AV_CODEC_ID_NONE;
+        break;
+    }
+
+    for (int i = 0; i < c->nb_codec_configs; i++)
+        if (c->codec_configs[i]->codec_config_id == codec_config_id) {
+            ret = AVERROR_INVALIDDATA;
+            goto fail;
+        }
+
+    tmp = av_realloc_array(c->codec_configs, c->nb_codec_configs + 1, sizeof(*c->codec_configs));
+    if (!tmp) {
+        ret = AVERROR(ENOMEM);
+        goto fail;
+    }
+    c->codec_configs = tmp;
+
+    codec_config = av_mallocz(sizeof(*codec_config));
+    if (!codec_config) {
+        ret = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    codec_config->codec_config_id = codec_config_id;
+    codec_config->codec_id = avcodec_id;
+    codec_config->nb_samples = nb_samples;
+    codec_config->seek_preroll = seek_preroll;
+
+    switch(codec_id) {
+    case MKBETAG('O','p','u','s'):
+        ret = opus_decoder_config(codec_config, pbc, len);
+        break;
+    case MKBETAG('m','p','4','a'):
+        ret = aac_decoder_config(codec_config, pbc, len, s);
+        break;
+    case MKBETAG('f','L','a','C'):
+        ret = flac_decoder_config(codec_config, pbc, len);
+        break;
+    case MKBETAG('i','p','c','m'):
+        ret = ipcm_decoder_config(codec_config, pbc, len);
+        break;
+    default:
+        break;
+    }
+    if (ret < 0)
+        goto fail;
+
+    c->codec_configs[c->nb_codec_configs++] = codec_config;
+
+    len -= avio_tell(pbc);
+    if (len)
+       av_log(s, AV_LOG_WARNING, "Underread in codec_config_obu. %d bytes left at the end\n", len);
+
+    ret = 0;
+fail:
+    av_free(buf);
+    if (ret < 0) {
+        if (codec_config)
+            av_free(codec_config->extradata);
+        av_free(codec_config);
+    }
+    return ret;
+}
+
+static int update_extradata(AVCodecParameters *codecpar)
+{
+    GetBitContext gb;
+    PutBitContext pb;
+    int ret;
+
+    switch(codecpar->codec_id) {
+    case AV_CODEC_ID_OPUS:
+        AV_WB8(codecpar->extradata + 9, codecpar->ch_layout.nb_channels);
+        break;
+    case AV_CODEC_ID_AAC: {
+        uint8_t buf[5];
+
+        init_put_bits(&pb, buf, sizeof(buf));
+        ret = init_get_bits8(&gb, codecpar->extradata, codecpar->extradata_size);
+        if (ret < 0)
+            return ret;
+
+        ret = get_bits(&gb, 5);
+        put_bits(&pb, 5, ret);
+        if (ret == AOT_ESCAPE) // violates section 3.11.2, but better check for it
+            put_bits(&pb, 6, get_bits(&gb, 6));
+        ret = get_bits(&gb, 4);
+        put_bits(&pb, 4, ret);
+        if (ret == 0x0f)
+            put_bits(&pb, 24, get_bits(&gb, 24));
+
+        skip_bits(&gb, 4);
+        put_bits(&pb, 4, codecpar->ch_layout.nb_channels); // set channel config
+        ret = put_bits_left(&pb);
+        put_bits(&pb, ret, get_bits(&gb, ret));
+        flush_put_bits(&pb);
+
+        memcpy(codecpar->extradata, buf, sizeof(buf));
+        break;
+    }
+    case AV_CODEC_ID_FLAC: {
+        uint8_t buf[13];
+
+        init_put_bits(&pb, buf, sizeof(buf));
+        ret = init_get_bits8(&gb, codecpar->extradata, codecpar->extradata_size);
+        if (ret < 0)
+            return ret;
+
+        put_bits32(&pb, get_bits_long(&gb, 32)); // min/max blocksize
+        put_bits64(&pb, 48, get_bits64(&gb, 48)); // min/max framesize
+        put_bits(&pb, 20, get_bits(&gb, 20)); // samplerate
+        skip_bits(&gb, 3);
+        put_bits(&pb, 3, codecpar->ch_layout.nb_channels - 1);
+        ret = put_bits_left(&pb);
+        put_bits(&pb, ret, get_bits(&gb, ret));
+        flush_put_bits(&pb);
+
+        memcpy(codecpar->extradata, buf, sizeof(buf));
+        break;
+    }
+    }
+
+    return 0;
+}
+
+static int scalable_channel_layout_config(void *s, AVIOContext *pb,
+                                          IAMFAudioElement *audio_element,
+                                          const IAMFCodecConfig *codec_config)
+{
+    int nb_layers, k = 0;
+
+    nb_layers = avio_r8(pb) >> 5; // get_bits(&gb, 3);
+    // skip_bits(&gb, 5); //reserved
+
+    if (nb_layers > 6)
+        return AVERROR_INVALIDDATA;
+
+    for (int i = 0; i < nb_layers; i++) {
+        AVIAMFLayer *layer;
+        int loudspeaker_layout, output_gain_is_present_flag;
+        int substream_count, coupled_substream_count;
+        int ret, byte = avio_r8(pb);
+
+        layer = av_iamf_audio_element_add_layer(audio_element->element);
+        if (!layer)
+            return AVERROR(ENOMEM);
+
+        loudspeaker_layout = byte >> 4; // get_bits(&gb, 4);
+        output_gain_is_present_flag = (byte >> 3) & 1; //get_bits1(&gb);
+        if ((byte >> 2) & 1)
+            layer->flags |= AV_IAMF_LAYER_FLAG_RECON_GAIN;
+        substream_count = avio_r8(pb);
+        coupled_substream_count = avio_r8(pb);
+
+        if (output_gain_is_present_flag) {
+            layer->output_gain_flags = avio_r8(pb) >> 2;  // get_bits(&gb, 6);
+            layer->output_gain = av_make_q(sign_extend(avio_rb16(pb), 16), 1 << 8);
+        }
+
+        if (loudspeaker_layout < 10)
+            av_channel_layout_copy(&layer->ch_layout, &ff_iamf_scalable_ch_layouts[loudspeaker_layout]);
+        else
+            layer->ch_layout = (AVChannelLayout){ .order = AV_CHANNEL_ORDER_UNSPEC,
+                                                          .nb_channels = substream_count +
+                                                                         coupled_substream_count };
+
+        for (int j = 0; j < substream_count; j++) {
+            IAMFSubStream *substream = &audio_element->substreams[k++];
+
+            substream->codecpar->ch_layout = coupled_substream_count-- > 0 ? (AVChannelLayout)AV_CHANNEL_LAYOUT_STEREO :
+                                                                             (AVChannelLayout)AV_CHANNEL_LAYOUT_MONO;
+
+            ret = update_extradata(substream->codecpar);
+            if (ret < 0)
+                return ret;
+        }
+
+    }
+
+    return 0;
+}
+
+static int ambisonics_config(void *s, AVIOContext *pb,
+                             IAMFAudioElement *audio_element,
+                             const IAMFCodecConfig *codec_config)
+{
+    AVIAMFLayer *layer;
+    unsigned ambisonics_mode;
+    int output_channel_count, substream_count, order;
+    int ret;
+
+    ambisonics_mode = ffio_read_leb(pb);
+    if (ambisonics_mode > 1)
+        return 0;
+
+    output_channel_count = avio_r8(pb);  // C
+    substream_count = avio_r8(pb);  // N
+    if (audio_element->nb_substreams != substream_count)
+        return AVERROR_INVALIDDATA;
+
+    order = floor(sqrt(output_channel_count - 1));
+    /* incomplete order - some harmonics are missing */
+    if ((order + 1) * (order + 1) != output_channel_count)
+        return AVERROR_INVALIDDATA;
+
+    layer = av_iamf_audio_element_add_layer(audio_element->element);
+    if (!layer)
+        return AVERROR(ENOMEM);
+
+    layer->ambisonics_mode = ambisonics_mode;
+    if (ambisonics_mode == 0) {
+        for (int i = 0; i < substream_count; i++) {
+            IAMFSubStream *substream = &audio_element->substreams[i];
+
+            substream->codecpar->ch_layout = (AVChannelLayout)AV_CHANNEL_LAYOUT_MONO;
+
+            ret = update_extradata(substream->codecpar);
+            if (ret < 0)
+                return ret;
+        }
+
+        layer->ch_layout.order = AV_CHANNEL_ORDER_CUSTOM;
+        layer->ch_layout.nb_channels = output_channel_count;
+        layer->ch_layout.u.map = av_calloc(output_channel_count, sizeof(*layer->ch_layout.u.map));
+        if (!layer->ch_layout.u.map)
+            return AVERROR(ENOMEM);
+
+        for (int i = 0; i < output_channel_count; i++)
+            layer->ch_layout.u.map[i].id = avio_r8(pb) + AV_CHAN_AMBISONIC_BASE;
+    } else {
+        int coupled_substream_count = avio_r8(pb);  // M
+        int nb_demixing_matrix = substream_count + coupled_substream_count;
+        int demixing_matrix_size = nb_demixing_matrix * output_channel_count;
+
+        layer->ch_layout = (AVChannelLayout){ .order = AV_CHANNEL_ORDER_AMBISONIC, .nb_channels = output_channel_count };
+        layer->demixing_matrix = av_malloc_array(demixing_matrix_size, sizeof(*layer->demixing_matrix));
+        if (!layer->demixing_matrix)
+            return AVERROR(ENOMEM);
+
+        for (int i = 0; i < demixing_matrix_size; i++)
+            layer->demixing_matrix[i] = av_make_q(sign_extend(avio_rb16(pb), 16), 1 << 8);
+
+        for (int i = 0; i < substream_count; i++) {
+            IAMFSubStream *substream = &audio_element->substreams[i];
+
+            substream->codecpar->ch_layout = coupled_substream_count-- > 0 ? (AVChannelLayout)AV_CHANNEL_LAYOUT_STEREO :
+                                                                             (AVChannelLayout)AV_CHANNEL_LAYOUT_MONO;
+
+
+            ret = update_extradata(substream->codecpar);
+            if (ret < 0)
+                return ret;
+        }
+    }
+
+    return 0;
+}
+
+static int param_parse(void *s, IAMFContext *c, AVIOContext *pb,
+                       unsigned int type,
+                       const IAMFAudioElement *audio_element,
+                       AVIAMFParamDefinition **out_param_definition)
+{
+    IAMFParamDefinition *param_definition = NULL;
+    AVIAMFParamDefinition *param;
+    unsigned int parameter_id, parameter_rate, mode;
+    unsigned int duration = 0, constant_subblock_duration = 0, nb_subblocks = 0;
+    size_t param_size;
+
+    parameter_id = ffio_read_leb(pb);
+
+    for (int i = 0; i < c->nb_param_definitions; i++)
+        if (c->param_definitions[i]->param->parameter_id == parameter_id) {
+            param_definition = c->param_definitions[i];
+            break;
+        }
+
+    parameter_rate = ffio_read_leb(pb);
+    mode = avio_r8(pb) >> 7;
+
+    if (mode == 0) {
+        duration = ffio_read_leb(pb);
+        constant_subblock_duration = ffio_read_leb(pb);
+        if (constant_subblock_duration == 0)
+            nb_subblocks = ffio_read_leb(pb);
+        else
+            nb_subblocks = duration / constant_subblock_duration;
+    }
+
+    param = av_iamf_param_definition_alloc(type, nb_subblocks, ¶m_size);
+    if (!param)
+        return AVERROR(ENOMEM);
+
+    for (int i = 0; i < nb_subblocks; i++) {
+        void *subblock = av_iamf_param_definition_get_subblock(param, i);
+        unsigned int subblock_duration = constant_subblock_duration;
+
+        if (constant_subblock_duration == 0)
+            subblock_duration = ffio_read_leb(pb);
+
+        switch (type) {
+        case AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN: {
+            AVIAMFMixGain *mix = subblock;
+            mix->subblock_duration = subblock_duration;
+            break;
+        }
+        case AV_IAMF_PARAMETER_DEFINITION_DEMIXING: {
+            AVIAMFDemixingInfo *demix = subblock;
+            demix->subblock_duration = subblock_duration;
+            // DemixingInfoParameterData
+            demix->dmixp_mode = avio_r8(pb) >> 5;
+            break;
+        }
+        case AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN: {
+            AVIAMFReconGain *recon = subblock;
+            recon->subblock_duration = subblock_duration;
+            break;
+        }
+        default:
+            av_free(param);
+            return AVERROR_INVALIDDATA;
+        }
+    }
+
+    param->parameter_id = parameter_id;
+    param->parameter_rate = parameter_rate;
+    param->duration = duration;
+    param->constant_subblock_duration = constant_subblock_duration;
+    param->nb_subblocks = nb_subblocks;
+
+    if (param_definition) {
+        if (param_definition->param_size != param_size || memcmp(param_definition->param, param, param_size)) {
+            av_log(s, AV_LOG_ERROR, "Incosistent parameters for parameter_id %u\n", parameter_id);
+            av_free(param);
+            return AVERROR_INVALIDDATA;
+        }
+    } else {
+        IAMFParamDefinition **tmp = av_realloc_array(c->param_definitions, c->nb_param_definitions + 1,
+                                                     sizeof(*c->param_definitions));
+        if (!tmp) {
+            av_free(param);
+            return AVERROR(ENOMEM);
+        }
+        c->param_definitions = tmp;
+
+        param_definition = av_mallocz(sizeof(*param_definition));
+        if (!param_definition) {
+            av_free(param);
+            return AVERROR(ENOMEM);
+        }
+        param_definition->param = param;
+        param_definition->mode = !mode;
+        param_definition->param_size = param_size;
+        param_definition->audio_element = audio_element;
+
+        c->param_definitions[c->nb_param_definitions++] = param_definition;
+    }
+
+    av_assert0(out_param_definition);
+    *out_param_definition = param;
+
+    return 0;
+}
+
+static IAMFCodecConfig *get_codec_config(IAMFContext *c, unsigned int codec_config_id)
+{
+    for (int i = 0; i < c->nb_codec_configs; i++) {
+        if (c->codec_configs[i]->codec_config_id == codec_config_id)
+            return c->codec_configs[i];
+    }
+
+    return NULL;
+}
+
+static int audio_element_obu(void *s, IAMFContext *c, AVIOContext *pb, int len)
+{
+    const IAMFCodecConfig *codec_config;
+    AVIAMFAudioElement *element;
+    IAMFAudioElement **tmp, *audio_element = NULL;
+    FFIOContext b;
+    AVIOContext *pbc;
+    uint8_t *buf;
+    unsigned audio_element_id, codec_config_id, num_parameters;
+    int audio_element_type, ret;
+
+    buf = av_malloc(len);
+    if (!buf)
+        return AVERROR(ENOMEM);
+
+    ret = avio_read(pb, buf, len);
+    if (ret != len) {
+        if (ret >= 0)
+            ret = AVERROR_INVALIDDATA;
+        goto fail;
+    }
+
+    ffio_init_context(&b, buf, len, 0, NULL, NULL, NULL, NULL);
+    pbc = &b.pub;
+
+    audio_element_id = ffio_read_leb(pbc);
+
+    for (int i = 0; i < c->nb_audio_elements; i++)
+        if (c->audio_elements[i]->audio_element_id == audio_element_id) {
+            av_log(s, AV_LOG_ERROR, "Duplicate audio_element_id %d\n", audio_element_id);
+            ret = AVERROR_INVALIDDATA;
+            goto fail;
+        }
+
+    audio_element_type = avio_r8(pbc) >> 5;
+    codec_config_id = ffio_read_leb(pbc);
+
+    codec_config = get_codec_config(c, codec_config_id);
+    if (!codec_config) {
+        av_log(s, AV_LOG_ERROR, "Non existant codec config id %d referenced in an audio element\n", codec_config_id);
+        ret = AVERROR_INVALIDDATA;
+        goto fail;
+    }
+
+    if (codec_config->codec_id == AV_CODEC_ID_NONE) {
+        av_log(s, AV_LOG_DEBUG, "Unknown codec id referenced in an audio element. Ignoring\n");
+        ret = 0;
+        goto fail;
+    }
+
+    tmp = av_realloc_array(c->audio_elements, c->nb_audio_elements + 1, sizeof(*c->audio_elements));
+    if (!tmp) {
+        ret = AVERROR(ENOMEM);
+        goto fail;
+    }
+    c->audio_elements = tmp;
+
+    audio_element = av_mallocz(sizeof(*audio_element));
+    if (!audio_element) {
+        ret = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    audio_element->nb_substreams = ffio_read_leb(pbc);
+    audio_element->codec_config_id = codec_config_id;
+    audio_element->audio_element_id = audio_element_id;
+    audio_element->substreams = av_calloc(audio_element->nb_substreams, sizeof(*audio_element->substreams));
+    if (!audio_element->substreams) {
+        ret = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    element = audio_element->element = av_iamf_audio_element_alloc();
+    if (!element) {
+        ret = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    element->audio_element_type = audio_element_type;
+
+    for (int i = 0; i < audio_element->nb_substreams; i++) {
+        IAMFSubStream *substream = &audio_element->substreams[i];
+
+        substream->codecpar = avcodec_parameters_alloc();
+        if (!substream->codecpar) {
+            ret = AVERROR(ENOMEM);
+            goto fail;
+        }
+
+        substream->audio_substream_id = ffio_read_leb(pbc);
+
+        substream->codecpar->codec_type = AVMEDIA_TYPE_AUDIO;
+        substream->codecpar->codec_id   = codec_config->codec_id;
+        substream->codecpar->frame_size = codec_config->nb_samples;
+        substream->codecpar->sample_rate = codec_config->sample_rate;
+        substream->codecpar->seek_preroll = codec_config->seek_preroll;
+
+        switch(substream->codecpar->codec_id) {
+        case AV_CODEC_ID_AAC:
+        case AV_CODEC_ID_FLAC:
+        case AV_CODEC_ID_OPUS:
+            substream->codecpar->extradata = av_malloc(codec_config->extradata_size + AV_INPUT_BUFFER_PADDING_SIZE);
+            if (!substream->codecpar->extradata) {
+                ret = AVERROR(ENOMEM);
+                goto fail;
+            }
+            memcpy(substream->codecpar->extradata, codec_config->extradata, codec_config->extradata_size);
+            memset(substream->codecpar->extradata + codec_config->extradata_size, 0, AV_INPUT_BUFFER_PADDING_SIZE);
+            substream->codecpar->extradata_size = codec_config->extradata_size;
+            break;
+        }
+    }
+
+    num_parameters = ffio_read_leb(pbc);
+    if (num_parameters && audio_element_type != 0) {
+        av_log(s, AV_LOG_ERROR, "Audio Element parameter count %u is invalid"
+                                " for Scene representations\n", num_parameters);
+        ret = AVERROR_INVALIDDATA;
+        goto fail;
+    }
+
+    for (int i = 0; i < num_parameters; i++) {
+        unsigned type;
+
+        type = ffio_read_leb(pbc);
+        if (type == AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN) {
+            ret = AVERROR_INVALIDDATA;
+            goto fail;
+        } else if (type == AV_IAMF_PARAMETER_DEFINITION_DEMIXING) {
+            ret = param_parse(s, c, pbc, type, audio_element, &element->demixing_info);
+            if (ret < 0)
+                goto fail;
+
+            element->default_w = avio_r8(pbc) >> 4;
+        } else if (type == AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN) {
+            ret = param_parse(s, c, pbc, type, audio_element, &element->recon_gain_info);
+            if (ret < 0)
+                goto fail;
+        } else {
+            unsigned param_definition_size = ffio_read_leb(pbc);
+            avio_skip(pbc, param_definition_size);
+        }
+    }
+
+    if (audio_element_type == AV_IAMF_AUDIO_ELEMENT_TYPE_CHANNEL) {
+        ret = scalable_channel_layout_config(s, pbc, audio_element, codec_config);
+        if (ret < 0)
+            goto fail;
+    } else if (audio_element_type == AV_IAMF_AUDIO_ELEMENT_TYPE_SCENE) {
+        ret = ambisonics_config(s, pbc, audio_element, codec_config);
+        if (ret < 0)
+            goto fail;
+    } else {
+        unsigned audio_element_config_size = ffio_read_leb(pbc);
+        avio_skip(pbc, audio_element_config_size);
+    }
+
+    c->audio_elements[c->nb_audio_elements++] = audio_element;
+
+    len -= avio_tell(pbc);
+    if (len)
+       av_log(s, AV_LOG_WARNING, "Underread in audio_element_obu. %d bytes left at the end\n", len);
+
+    ret = 0;
+fail:
+    av_free(buf);
+    if (ret < 0)
+        ff_iamf_free_audio_element(&audio_element);
+    return ret;
+}
+
+static int label_string(AVIOContext *pb, char **label)
+{
+    uint8_t buf[128];
+
+    avio_get_str(pb, sizeof(buf), buf, sizeof(buf));
+
+    if (pb->error)
+        return pb->error;
+    if (pb->eof_reached)
+        return AVERROR_INVALIDDATA;
+    *label = av_strdup(buf);
+    if (!*label)
+        return AVERROR(ENOMEM);
+
+    return 0;
+}
+
+static int mix_presentation_obu(void *s, IAMFContext *c, AVIOContext *pb, int len)
+{
+    AVIAMFMixPresentation *mix;
+    IAMFMixPresentation **tmp, *mix_presentation = NULL;
+    FFIOContext b;
+    AVIOContext *pbc;
+    uint8_t *buf;
+    unsigned mix_presentation_id;
+    int ret;
+
+    buf = av_malloc(len);
+    if (!buf)
+        return AVERROR(ENOMEM);
+
+    ret = avio_read(pb, buf, len);
+    if (ret != len) {
+        if (ret >= 0)
+            ret = AVERROR_INVALIDDATA;
+        goto fail;
+    }
+
+    ffio_init_context(&b, buf, len, 0, NULL, NULL, NULL, NULL);
+    pbc = &b.pub;
+
+    mix_presentation_id = ffio_read_leb(pbc);
+
+    for (int i = 0; i < c->nb_mix_presentations; i++)
+        if (c->mix_presentations[i]->mix_presentation_id == mix_presentation_id) {
+            av_log(s, AV_LOG_ERROR, "Duplicate mix_presentation_id %d\n", mix_presentation_id);
+            ret = AVERROR_INVALIDDATA;
+            goto fail;
+        }
+
+    tmp = av_realloc_array(c->mix_presentations, c->nb_mix_presentations + 1, sizeof(*c->mix_presentations));
+    if (!tmp) {
+        ret = AVERROR(ENOMEM);
+        goto fail;
+    }
+    c->mix_presentations = tmp;
+
+    mix_presentation = av_mallocz(sizeof(*mix_presentation));
+    if (!mix_presentation) {
+        ret = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    mix_presentation->mix_presentation_id = mix_presentation_id;
+    mix = mix_presentation->mix = av_iamf_mix_presentation_alloc();
+    if (!mix) {
+        ret = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    mix_presentation->count_label = ffio_read_leb(pbc);
+    mix_presentation->language_label = av_calloc(mix_presentation->count_label,
+                                                 sizeof(*mix_presentation->language_label));
+    if (!mix_presentation->language_label) {
+        ret = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    for (int i = 0; i < mix_presentation->count_label; i++) {
+        ret = label_string(pbc, &mix_presentation->language_label[i]);
+        if (ret < 0)
+            goto fail;
+    }
+
+    for (int i = 0; i < mix_presentation->count_label; i++) {
+        char *annotation = NULL;
+        ret = label_string(pbc, &annotation);
+        if (ret < 0)
+            goto fail;
+        ret = av_dict_set(&mix->annotations, mix_presentation->language_label[i], annotation,
+                          AV_DICT_DONT_STRDUP_VAL | AV_DICT_DONT_OVERWRITE);
+        if (ret < 0)
+            goto fail;
+    }
+
+    mix->nb_submixes = ffio_read_leb(pbc);
+    mix->submixes = av_calloc(mix->nb_submixes, sizeof(*mix->submixes));
+    if (!mix->submixes) {
+        ret = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    for (int i = 0; i < mix->nb_submixes; i++) {
+        AVIAMFSubmix *sub_mix;
+
+        sub_mix = mix->submixes[i] = av_mallocz(sizeof(*sub_mix));
+        if (!sub_mix) {
+            ret = AVERROR(ENOMEM);
+            goto fail;
+        }
+
+        sub_mix->nb_elements = ffio_read_leb(pbc);
+        sub_mix->elements = av_calloc(sub_mix->nb_elements, sizeof(*sub_mix->elements));
+        if (!sub_mix->elements) {
+            ret = AVERROR(ENOMEM);
+            goto fail;
+        }
+
+        for (int j = 0; j < sub_mix->nb_elements; j++) {
+            AVIAMFSubmixElement *submix_element;
+            IAMFAudioElement *audio_element = NULL;
+            unsigned int rendering_config_extension_size;
+
+            submix_element = sub_mix->elements[j] = av_mallocz(sizeof(*submix_element));
+            if (!submix_element) {
+                ret = AVERROR(ENOMEM);
+                goto fail;
+            }
+
+            submix_element->audio_element_id = ffio_read_leb(pbc);
+
+            for (int k = 0; k < c->nb_audio_elements; k++)
+                if (c->audio_elements[k]->audio_element_id == submix_element->audio_element_id) {
+                    audio_element = c->audio_elements[k];
+                    break;
+                }
+
+            if (!audio_element) {
+                av_log(s, AV_LOG_ERROR, "Invalid Audio Element with id %u referenced by Mix Parameters %u\n",
+                       submix_element->audio_element_id, mix_presentation_id);
+                ret = AVERROR_INVALIDDATA;
+                goto fail;
+            }
+
+            for (int k = 0; k < mix_presentation->count_label; k++) {
+                char *annotation = NULL;
+                ret = label_string(pbc, &annotation);
+                if (ret < 0)
+                    goto fail;
+                ret = av_dict_set(&submix_element->annotations, mix_presentation->language_label[k], annotation,
+                                  AV_DICT_DONT_STRDUP_VAL | AV_DICT_DONT_OVERWRITE);
+                if (ret < 0)
+                    goto fail;
+            }
+
+            submix_element->headphones_rendering_mode = avio_r8(pbc) >> 6;
+
+            rendering_config_extension_size = ffio_read_leb(pbc);
+            avio_skip(pbc, rendering_config_extension_size);
+
+            ret = param_parse(s, c, pbc, AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN,
+                              NULL,
+                              &submix_element->element_mix_config);
+            if (ret < 0)
+                goto fail;
+            submix_element->default_mix_gain = av_make_q(sign_extend(avio_rb16(pbc), 16), 1 << 8);
+        }
+
+        ret = param_parse(s, c, pbc, AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN, NULL, &sub_mix->output_mix_config);
+        if (ret < 0)
+            goto fail;
+        sub_mix->default_mix_gain = av_make_q(sign_extend(avio_rb16(pbc), 16), 1 << 8);
+
+        sub_mix->nb_layouts = ffio_read_leb(pbc);
+        sub_mix->layouts = av_calloc(sub_mix->nb_layouts, sizeof(*sub_mix->layouts));
+        if (!sub_mix->layouts) {
+            ret = AVERROR(ENOMEM);
+            goto fail;
+        }
+
+        for (int j = 0; j < sub_mix->nb_layouts; j++) {
+            AVIAMFSubmixLayout *submix_layout;
+            int info_type;
+            int byte = avio_r8(pbc);
+
+            submix_layout = sub_mix->layouts[j] = av_mallocz(sizeof(*submix_layout));
+            if (!submix_layout) {
+                ret = AVERROR(ENOMEM);
+                goto fail;
+            }
+
+            submix_layout->layout_type = byte >> 6;
+            if (submix_layout->layout_type < AV_IAMF_SUBMIX_LAYOUT_TYPE_LOUDSPEAKERS &&
+                submix_layout->layout_type > AV_IAMF_SUBMIX_LAYOUT_TYPE_BINAURAL) {
+                av_log(s, AV_LOG_ERROR, "Invalid Layout type %u in a submix from Mix Presentation %u\n",
+                       submix_layout->layout_type, mix_presentation_id);
+                ret = AVERROR_INVALIDDATA;
+                goto fail;
+            }
+            if (submix_layout->layout_type == 2) {
+                int sound_system;
+                sound_system = (byte >> 2) & 0xF;
+                av_channel_layout_copy(&submix_layout->sound_system, &ff_iamf_sound_system_map[sound_system].layout);
+            }
+
+            info_type = avio_r8(pbc);
+            submix_layout->integrated_loudness = av_make_q(sign_extend(avio_rb16(pbc), 16), 1 << 8);
+            submix_layout->digital_peak = av_make_q(sign_extend(avio_rb16(pbc), 16), 1 << 8);
+
+            if (info_type & 1)
+                submix_layout->true_peak = av_make_q(sign_extend(avio_rb16(pbc), 16), 1 << 8);
+            if (info_type & 2) {
+                unsigned int num_anchored_loudness = avio_r8(pbc);
+
+                for (int k = 0; k < num_anchored_loudness; k++) {
+                    unsigned int anchor_element = avio_r8(pbc);
+                    AVRational anchored_loudness = av_make_q(sign_extend(avio_rb16(pbc), 16), 1 << 8);
+                    if (anchor_element == IAMF_ANCHOR_ELEMENT_DIALOGUE)
+                        submix_layout->dialogue_anchored_loudness = anchored_loudness;
+                    else if (anchor_element <= IAMF_ANCHOR_ELEMENT_ALBUM)
+                        submix_layout->album_anchored_loudness = anchored_loudness;
+                    else
+                        av_log(s, AV_LOG_DEBUG, "Unknown anchor_element. Ignoring\n");
+                }
+            }
+
+            if (info_type & 0xFC) {
+                unsigned int info_type_size = ffio_read_leb(pbc);
+                avio_skip(pbc, info_type_size);
+            }
+        }
+    }
+
+    c->mix_presentations[c->nb_mix_presentations++] = mix_presentation;
+
+    len -= avio_tell(pbc);
+    if (len)
+        av_log(s, AV_LOG_WARNING, "Underread in mix_presentation_obu. %d bytes left at the end\n", len);
+
+    ret = 0;
+fail:
+    av_free(buf);
+    if (ret < 0)
+        ff_iamf_free_mix_presentation(&mix_presentation);
+    return ret;
+}
+
+int ff_iamf_parse_obu_header(const uint8_t *buf, int buf_size,
+                             unsigned *obu_size, int *start_pos, enum IAMF_OBU_Type *type,
+                             unsigned *skip_samples, unsigned *discard_padding)
+{
+    GetBitContext gb;
+    int ret, extension_flag, trimming, start;
+    unsigned skip = 0, discard = 0;
+    unsigned size;
+
+    ret = init_get_bits8(&gb, buf, FFMIN(buf_size, MAX_IAMF_OBU_HEADER_SIZE));
+    if (ret < 0)
+        return ret;
+
+    *type          = get_bits(&gb, 5);
+    /*redundant      =*/ get_bits1(&gb);
+    trimming       = get_bits1(&gb);
+    extension_flag = get_bits1(&gb);
+
+    *obu_size = get_leb(&gb);
+    if (*obu_size > INT_MAX)
+        return AVERROR_INVALIDDATA;
+
+    start = get_bits_count(&gb) / 8;
+
+    if (trimming) {
+        discard = get_leb(&gb); // num_samples_to_trim_at_end
+        skip = get_leb(&gb); // num_samples_to_trim_at_start
+    }
+
+    if (skip_samples)
+        *skip_samples = skip;
+    if (discard_padding)
+        *discard_padding = discard;
+
+    if (extension_flag) {
+        unsigned int extension_bytes;
+        extension_bytes = get_leb(&gb);
+        if (extension_bytes > INT_MAX / 8)
+            return AVERROR_INVALIDDATA;
+        skip_bits_long(&gb, extension_bytes * 8);
+    }
+
+    if (get_bits_left(&gb) < 0)
+        return AVERROR_INVALIDDATA;
+
+    size = *obu_size + start;
+    if (size > INT_MAX)
+        return AVERROR_INVALIDDATA;
+
+    *obu_size -= get_bits_count(&gb) / 8 - start;
+    *start_pos = size - *obu_size;
+
+    return size;
+}
+
+int ff_iamfdec_read_descriptors(IAMFContext *c, AVIOContext *pb,
+                                int max_size, void *log_ctx)
+{
+    uint8_t header[MAX_IAMF_OBU_HEADER_SIZE + AV_INPUT_BUFFER_PADDING_SIZE];
+    int ret;
+
+    while (1) {
+        unsigned obu_size;
+        enum IAMF_OBU_Type type;
+        int start_pos, len, size;
+
+        if ((ret = ffio_ensure_seekback(pb, FFMIN(MAX_IAMF_OBU_HEADER_SIZE, max_size))) < 0)
+            return ret;
+        size = avio_read(pb, header, FFMIN(MAX_IAMF_OBU_HEADER_SIZE, max_size));
+        if (size < 0)
+            return size;
+
+        len = ff_iamf_parse_obu_header(header, size, &obu_size, &start_pos, &type, NULL, NULL);
+        if (len < 0 || obu_size > max_size) {
+            av_log(log_ctx, AV_LOG_ERROR, "Failed to read obu header\n");
+            avio_seek(pb, -size, SEEK_CUR);
+            return len;
+        }
+
+        if (type >= IAMF_OBU_IA_PARAMETER_BLOCK && type < IAMF_OBU_IA_SEQUENCE_HEADER) {
+            avio_seek(pb, -size, SEEK_CUR);
+            break;
+        }
+
+        avio_seek(pb, -(size - start_pos), SEEK_CUR);
+        switch (type) {
+        case IAMF_OBU_IA_CODEC_CONFIG:
+            ret = codec_config_obu(log_ctx, c, pb, obu_size);
+            break;
+        case IAMF_OBU_IA_AUDIO_ELEMENT:
+            ret = audio_element_obu(log_ctx, c, pb, obu_size);
+            break;
+        case IAMF_OBU_IA_MIX_PRESENTATION:
+            ret = mix_presentation_obu(log_ctx, c, pb, obu_size);
+            break;
+        case IAMF_OBU_IA_TEMPORAL_DELIMITER:
+            break;
+        default: {
+            int64_t offset = avio_skip(pb, obu_size);
+            if (offset < 0)
+                ret = offset;
+            break;
+        }
+        }
+        if (ret < 0) {
+            av_log(log_ctx, AV_LOG_ERROR, "Failed to read obu type %d\n", type);
+            return ret;
+        }
+        max_size -= obu_size + start_pos;
+        if (max_size < 0)
+            return AVERROR_INVALIDDATA;
+        if (!max_size)
+            break;
+    }
+
+    return 0;
+}
diff --git a/libavformat/iamf_parse.h b/libavformat/iamf_parse.h
new file mode 100644
index 0000000000..f4f297ecd4
--- /dev/null
+++ b/libavformat/iamf_parse.h
@@ -0,0 +1,38 @@
+/*
+ * Immersive Audio Model and Formats parsing
+ * Copyright (c) 2023 James Almer <jamrial@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVFORMAT_IAMF_PARSE_H
+#define AVFORMAT_IAMF_PARSE_H
+
+#include <stdint.h>
+
+#include "libavutil/iamf.h"
+#include "avio.h"
+#include "iamf.h"
+
+int ff_iamf_parse_obu_header(const uint8_t *buf, int buf_size,
+                             unsigned *obu_size, int *start_pos, enum IAMF_OBU_Type *type,
+                             unsigned *skip_samples, unsigned *discard_padding);
+
+int ff_iamfdec_read_descriptors(IAMFContext *c, AVIOContext *pb,
+                                int size, void *log_ctx);
+
+#endif /* AVFORMAT_IAMF_PARSE_H */
diff --git a/libavformat/iamfdec.c b/libavformat/iamfdec.c
new file mode 100644
index 0000000000..0374d0f241
--- /dev/null
+++ b/libavformat/iamfdec.c
@@ -0,0 +1,503 @@
+/*
+ * Immersive Audio Model and Formats demuxer
+ * Copyright (c) 2023 James Almer <jamrial@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config_components.h"
+
+#include "libavutil/avassert.h"
+#include "libavutil/iamf.h"
+#include "libavutil/intreadwrite.h"
+#include "libavutil/log.h"
+#include "libavcodec/mathops.h"
+#include "avformat.h"
+#include "avio_internal.h"
+#include "demux.h"
+#include "iamf.h"
+#include "iamf_parse.h"
+#include "internal.h"
+
+typedef struct IAMFDemuxContext {
+    IAMFContext iamf;
+
+    // Packet side data
+    AVIAMFParamDefinition *mix;
+    size_t mix_size;
+    AVIAMFParamDefinition *demix;
+    size_t demix_size;
+    AVIAMFParamDefinition *recon;
+    size_t recon_size;
+} IAMFDemuxContext;
+
+static AVStream *find_stream_by_id(AVFormatContext *s, int id)
+{
+    for (int i = 0; i < s->nb_streams; i++)
+        if (s->streams[i]->id == id)
+            return s->streams[i];
+
+    av_log(s, AV_LOG_ERROR, "Invalid stream id %d\n", id);
+    return NULL;
+}
+
+static int audio_frame_obu(AVFormatContext *s, AVPacket *pkt, int len,
+                           enum IAMF_OBU_Type type,
+                           unsigned skip_samples, unsigned discard_padding,
+                           int id_in_bitstream)
+{
+    const IAMFDemuxContext *const c = s->priv_data;
+    AVStream *st;
+    int ret, audio_substream_id;
+
+    if (id_in_bitstream) {
+        unsigned explicit_audio_substream_id;
+        int64_t pos = avio_tell(s->pb);
+        explicit_audio_substream_id = ffio_read_leb(s->pb);
+        len -= avio_tell(s->pb) - pos;
+        audio_substream_id = explicit_audio_substream_id;
+    } else
+        audio_substream_id = type - IAMF_OBU_IA_AUDIO_FRAME_ID0;
+
+    st = find_stream_by_id(s, audio_substream_id);
+    if (!st)
+        return AVERROR_INVALIDDATA;
+
+    ret = av_get_packet(s->pb, pkt, len);
+    if (ret < 0)
+        return ret;
+    if (ret != len)
+        return AVERROR_INVALIDDATA;
+
+    if (skip_samples || discard_padding) {
+        uint8_t *side_data = av_packet_new_side_data(pkt, AV_PKT_DATA_SKIP_SAMPLES, 10);
+        if (!side_data)
+            return AVERROR(ENOMEM);
+        AV_WL32(side_data, skip_samples);
+        AV_WL32(side_data + 4, discard_padding);
+    }
+    if (c->mix) {
+        uint8_t *side_data = av_packet_new_side_data(pkt, AV_PKT_DATA_IAMF_MIX_GAIN_PARAM, c->mix_size);
+        if (!side_data)
+            return AVERROR(ENOMEM);
+        memcpy(side_data, c->mix, c->mix_size);
+    }
+    if (c->demix) {
+        uint8_t *side_data = av_packet_new_side_data(pkt, AV_PKT_DATA_IAMF_DEMIXING_INFO_PARAM, c->demix_size);
+        if (!side_data)
+            return AVERROR(ENOMEM);
+        memcpy(side_data, c->demix, c->demix_size);
+    }
+    if (c->recon) {
+        uint8_t *side_data = av_packet_new_side_data(pkt, AV_PKT_DATA_IAMF_RECON_GAIN_INFO_PARAM, c->recon_size);
+        if (!side_data)
+            return AVERROR(ENOMEM);
+        memcpy(side_data, c->recon, c->recon_size);
+    }
+
+    pkt->stream_index = st->index;
+    return 0;
+}
+
+static const IAMFParamDefinition *get_param_definition(AVFormatContext *s, unsigned int parameter_id)
+{
+    const IAMFDemuxContext *const c = s->priv_data;
+    const IAMFContext *const iamf = &c->iamf;
+    const IAMFParamDefinition *param_definition = NULL;
+
+    for (int i = 0; i < iamf->nb_param_definitions; i++)
+        if (iamf->param_definitions[i]->param->parameter_id == parameter_id) {
+            param_definition = iamf->param_definitions[i];
+            break;
+        }
+
+    return param_definition;
+}
+
+static int parameter_block_obu(AVFormatContext *s, int len)
+{
+    IAMFDemuxContext *const c = s->priv_data;
+    const IAMFParamDefinition *param_definition;
+    const AVIAMFParamDefinition *param;
+    AVIAMFParamDefinition *out_param = NULL;
+    FFIOContext b;
+    AVIOContext *pb;
+    uint8_t *buf;
+    unsigned int duration, constant_subblock_duration;
+    unsigned int nb_subblocks;
+    unsigned int parameter_id;
+    size_t out_param_size;
+    int ret;
+
+    buf = av_malloc(len);
+    if (!buf)
+        return AVERROR(ENOMEM);
+
+    ret = avio_read(s->pb, buf, len);
+    if (ret != len) {
+        if (ret >= 0)
+            ret = AVERROR_INVALIDDATA;
+        goto fail;
+    }
+
+    ffio_init_context(&b, buf, len, 0, NULL, NULL, NULL, NULL);
+    pb = &b.pub;
+
+    parameter_id = ffio_read_leb(pb);
+    param_definition = get_param_definition(s, parameter_id);
+    if (!param_definition) {
+        av_log(s, AV_LOG_VERBOSE, "Non existant parameter_id %d referenced in a parameter block. Ignoring\n",
+               parameter_id);
+        ret = 0;
+        goto fail;
+    }
+
+    param = param_definition->param;
+    if (!param_definition->mode) {
+        duration = ffio_read_leb(pb);
+        constant_subblock_duration = ffio_read_leb(pb);
+        if (constant_subblock_duration == 0)
+            nb_subblocks = ffio_read_leb(pb);
+        else
+            nb_subblocks = duration / constant_subblock_duration;
+    } else {
+        duration = param->duration;
+        constant_subblock_duration = param->constant_subblock_duration;
+        nb_subblocks = param->nb_subblocks;
+        if (!nb_subblocks)
+            nb_subblocks = duration / constant_subblock_duration;
+    }
+
+    out_param = av_iamf_param_definition_alloc(param->type, nb_subblocks, &out_param_size);
+    if (!out_param) {
+        ret = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    out_param->parameter_id = param->parameter_id;
+    out_param->type = param->type;
+    out_param->parameter_rate = param->parameter_rate;
+    out_param->duration = duration;
+    out_param->constant_subblock_duration = constant_subblock_duration;
+    out_param->nb_subblocks = nb_subblocks;
+
+    for (int i = 0; i < nb_subblocks; i++) {
+        void *subblock = av_iamf_param_definition_get_subblock(out_param, i);
+        unsigned int subblock_duration = constant_subblock_duration;
+
+        if (!param_definition->mode && !constant_subblock_duration)
+            subblock_duration = ffio_read_leb(pb);
+
+        switch (param->type) {
+        case AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN: {
+            AVIAMFMixGain *mix = subblock;
+
+            mix->animation_type = ffio_read_leb(pb);
+            if (mix->animation_type > AV_IAMF_ANIMATION_TYPE_BEZIER) {
+                ret = 0;
+                av_free(out_param);
+                goto fail;
+            }
+
+            mix->start_point_value = av_make_q(sign_extend(avio_rb16(pb), 16), 1 << 8);
+            if (mix->animation_type >= AV_IAMF_ANIMATION_TYPE_LINEAR)
+                mix->end_point_value = av_make_q(sign_extend(avio_rb16(pb), 16), 1 << 8);
+            if (mix->animation_type == AV_IAMF_ANIMATION_TYPE_BEZIER) {
+                mix->control_point_value = av_make_q(sign_extend(avio_rb16(pb), 16), 1 << 8);
+                mix->control_point_relative_time = av_make_q(avio_r8(pb), 1 << 8);
+            }
+            mix->subblock_duration = subblock_duration;
+            break;
+        }
+        case AV_IAMF_PARAMETER_DEFINITION_DEMIXING: {
+            AVIAMFDemixingInfo *demix = subblock;
+
+            demix->dmixp_mode = avio_r8(pb) >> 5;
+            demix->subblock_duration = subblock_duration;
+            break;
+        }
+        case AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN: {
+            AVIAMFReconGain *recon = subblock;
+            const IAMFAudioElement *audio_element = param_definition->audio_element;
+            const AVIAMFAudioElement *element = audio_element->element;
+
+            av_assert0(audio_element && element);
+            for (int i = 0; i < element->nb_layers; i++) {
+                const AVIAMFLayer *layer = element->layers[i];
+                if (layer->flags & AV_IAMF_LAYER_FLAG_RECON_GAIN) {
+                    unsigned int recon_gain_flags = ffio_read_leb(pb);
+                    unsigned int bitcount = 7 + 5 * !!(recon_gain_flags & 0x80);
+                    recon_gain_flags = (recon_gain_flags & 0x7F) | ((recon_gain_flags & 0xFF00) >> 1);
+                    for (int j = 0; j < bitcount; j++) {
+                        if (recon_gain_flags & (1 << j))
+                            recon->recon_gain[i][j] = avio_r8(pb);
+                    }
+                }
+            }
+            recon->subblock_duration = subblock_duration;
+            break;
+        }
+        default:
+            av_assert0(0);
+        }
+    }
+
+    len -= avio_tell(pb);
+    if (len) {
+       int level = (s->error_recognition & AV_EF_EXPLODE) ? AV_LOG_ERROR : AV_LOG_WARNING;
+       av_log(s, level, "Underread in parameter_block_obu. %d bytes left at the end\n", len);
+    }
+
+    switch (param->type) {
+    case AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN:
+        av_free(c->mix);
+        c->mix = out_param;
+        c->mix_size = out_param_size;
+        break;
+    case AV_IAMF_PARAMETER_DEFINITION_DEMIXING:
+        av_free(c->demix);
+        c->demix = out_param;
+        c->demix_size = out_param_size;
+        break;
+    case AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN:
+        av_free(c->recon);
+        c->recon = out_param;
+        c->recon_size = out_param_size;
+        break;
+    default:
+        av_assert0(0);
+    }
+
+    ret = 0;
+fail:
+    if (ret < 0)
+        av_free(out_param);
+    av_free(buf);
+
+    return ret;
+}
+
+static int iamf_read_packet(AVFormatContext *s, AVPacket *pkt)
+{
+    IAMFDemuxContext *const c = s->priv_data;
+    uint8_t header[MAX_IAMF_OBU_HEADER_SIZE + AV_INPUT_BUFFER_PADDING_SIZE];
+    unsigned obu_size;
+    int ret;
+
+    while (1) {
+        enum IAMF_OBU_Type type;
+        unsigned skip_samples, discard_padding;
+        int len, size, start_pos;
+
+        if ((ret = ffio_ensure_seekback(s->pb, MAX_IAMF_OBU_HEADER_SIZE)) < 0)
+            return ret;
+        size = avio_read(s->pb, header, MAX_IAMF_OBU_HEADER_SIZE);
+        if (size < 0)
+            return size;
+
+        len = ff_iamf_parse_obu_header(header, size, &obu_size, &start_pos, &type,
+                                       &skip_samples, &discard_padding);
+        if (len < 0) {
+            av_log(s, AV_LOG_ERROR, "Failed to read obu\n");
+            return len;
+        }
+        avio_seek(s->pb, -(size - start_pos), SEEK_CUR);
+
+        if (type >= IAMF_OBU_IA_AUDIO_FRAME && type <= IAMF_OBU_IA_AUDIO_FRAME_ID17)
+            return audio_frame_obu(s, pkt, obu_size, type,
+                                   skip_samples, discard_padding,
+                                   type == IAMF_OBU_IA_AUDIO_FRAME);
+        else if (type == IAMF_OBU_IA_PARAMETER_BLOCK) {
+            ret = parameter_block_obu(s, obu_size);
+            if (ret < 0)
+                return ret;
+        } else if (type == IAMF_OBU_IA_TEMPORAL_DELIMITER) {
+            av_freep(&c->mix);
+            c->mix_size = 0;
+            av_freep(&c->demix);
+            c->demix_size = 0;
+            av_freep(&c->recon);
+            c->recon_size = 0;
+        } else {
+            int64_t offset = avio_skip(s->pb, obu_size);
+            if (offset < 0) {
+                ret = offset;
+                break;
+            }
+        }
+    }
+
+    return ret;
+}
+
+//return < 0 if we need more data
+static int get_score(const uint8_t *buf, int buf_size, enum IAMF_OBU_Type type, int *seq)
+{
+    if (type == IAMF_OBU_IA_SEQUENCE_HEADER) {
+        if (buf_size < 4 || AV_RB32(buf) != MKBETAG('i','a','m','f'))
+            return 0;
+        *seq = 1;
+        return -1;
+    }
+    if (type >= IAMF_OBU_IA_CODEC_CONFIG && type <= IAMF_OBU_IA_TEMPORAL_DELIMITER)
+        return *seq ? -1 : 0;
+    if (type >= IAMF_OBU_IA_AUDIO_FRAME && type <= IAMF_OBU_IA_AUDIO_FRAME_ID17)
+        return *seq ? AVPROBE_SCORE_EXTENSION + 1 : 0;
+    return 0;
+}
+
+static int iamf_probe(const AVProbeData *p)
+{
+    unsigned obu_size;
+    enum IAMF_OBU_Type type;
+    int seq = 0, cnt = 0, start_pos;
+    int ret;
+
+    while (1) {
+        int size = ff_iamf_parse_obu_header(p->buf + cnt, p->buf_size - cnt,
+                                            &obu_size, &start_pos, &type,
+                                            NULL, NULL);
+        if (size < 0)
+            return 0;
+
+        ret = get_score(p->buf + cnt + start_pos,
+                        p->buf_size - cnt - start_pos,
+                        type, &seq);
+        if (ret >= 0)
+            return ret;
+
+        cnt += FFMIN(size, p->buf_size - cnt);
+    }
+    return 0;
+}
+
+static int iamf_read_header(AVFormatContext *s)
+{
+    IAMFDemuxContext *const c = s->priv_data;
+    IAMFContext *const iamf = &c->iamf;
+    int ret;
+
+    ret = ff_iamfdec_read_descriptors(iamf, s->pb, INT_MAX, s);
+    if (ret < 0)
+        return ret;
+
+    for (int i = 0; i < iamf->nb_audio_elements; i++) {
+        IAMFAudioElement *audio_element = iamf->audio_elements[i];
+        AVStreamGroup *stg = avformat_stream_group_create(s, AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT, NULL);
+
+        if (!stg)
+            return AVERROR(ENOMEM);
+
+        stg->id = audio_element->audio_element_id;
+        stg->params.iamf_audio_element = audio_element->element;
+
+        for (int j = 0; j < audio_element->nb_substreams; j++) {
+            IAMFSubStream *substream = &audio_element->substreams[j];
+            AVStream *st = avformat_new_stream(s, NULL);
+
+            if (!st)
+                return AVERROR(ENOMEM);
+
+            ret = avformat_stream_group_add_stream(stg, st);
+            if (ret < 0)
+                return ret;
+
+            ret = avcodec_parameters_copy(st->codecpar, substream->codecpar);
+            if (ret < 0)
+                return ret;
+
+            st->id = substream->audio_substream_id;
+            avpriv_set_pts_info(st, 64, 1, st->codecpar->sample_rate);
+        }
+    }
+
+    for (int i = 0; i < iamf->nb_mix_presentations; i++) {
+        IAMFMixPresentation *mix_presentation = iamf->mix_presentations[i];
+        AVStreamGroup *stg = avformat_stream_group_create(s, AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION, NULL);
+        const AVIAMFMixPresentation *mix = mix_presentation->mix;
+
+        if (!stg)
+            return AVERROR(ENOMEM);
+
+        stg->id = mix_presentation->mix_presentation_id;
+        stg->params.iamf_mix_presentation = mix_presentation->mix;
+
+        for (int j = 0; j < mix->nb_submixes; j++) {
+            AVIAMFSubmix *sub_mix = mix->submixes[j];
+
+            for (int k = 0; k < sub_mix->nb_elements; k++) {
+                AVIAMFSubmixElement *submix_element = sub_mix->elements[k];
+                AVStreamGroup *audio_element = NULL;
+
+                for (int l = 0; l < s->nb_stream_groups; l++)
+                    if (s->stream_groups[l]->type == AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT &&
+                        s->stream_groups[l]->id == submix_element->audio_element_id) {
+                        audio_element = s->stream_groups[l];
+                        break;
+                    }
+                av_assert0(audio_element);
+
+                for (int l = 0; l < audio_element->nb_streams; l++) {
+                    ret = avformat_stream_group_add_stream(stg, audio_element->streams[l]);
+                    if (ret < 0 && ret != AVERROR(EEXIST))
+                        return ret;
+                }
+            }
+        }
+    }
+
+    return 0;
+}
+
+static int iamf_read_close(AVFormatContext *s)
+{
+    IAMFDemuxContext *const c = s->priv_data;
+    IAMFContext *const iamf = &c->iamf;
+
+    for (int i = 0; i < iamf->nb_audio_elements; i++) {
+        IAMFAudioElement *audio_element = iamf->audio_elements[i];
+        audio_element->element = NULL;
+    }
+    for (int i = 0; i < iamf->nb_mix_presentations; i++) {
+        IAMFMixPresentation *mix_presentation = iamf->mix_presentations[i];
+        mix_presentation->mix = NULL;
+    }
+
+    ff_iamf_uninit_context(&c->iamf);
+
+    av_freep(&c->mix);
+    c->mix_size = 0;
+    av_freep(&c->demix);
+    c->demix_size = 0;
+    av_freep(&c->recon);
+    c->recon_size = 0;
+
+    return 0;
+}
+
+const AVInputFormat ff_iamf_demuxer = {
+    .name           = "iamf",
+    .long_name      = NULL_IF_CONFIG_SMALL("Raw Immersive Audio Model and Formats"),
+    .priv_data_size = sizeof(IAMFDemuxContext),
+    .flags_internal = FF_FMT_INIT_CLEANUP,
+    .read_probe     = iamf_probe,
+    .read_header    = iamf_read_header,
+    .read_packet    = iamf_read_packet,
+    .read_close     = iamf_read_close,
+    .extensions     = "iamf",
+    .flags          = AVFMT_GENERIC_INDEX | AVFMT_NO_BYTE_SEEK | AVFMT_NOTIMESTAMPS | AVFMT_SHOW_IDS,
+};
-- 
2.43.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply	[flat|nested] 16+ messages in thread
* [FFmpeg-devel] [PATCH 8/8] avformat: Immersive Audio Model and Formats muxer
  2023-12-14 20:14 [FFmpeg-devel] [PATCH v7 0/8] avformat: introduce AVStreamGroup James Almer
                   ` (6 preceding siblings ...)
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 7/8] avformat: Immersive Audio Model and Formats demuxer James Almer
@ 2023-12-14 20:14 ` James Almer
  7 siblings, 0 replies; 16+ messages in thread
From: James Almer @ 2023-12-14 20:14 UTC (permalink / raw)
  To: ffmpeg-devel
Signed-off-by: James Almer <jamrial@gmail.com>
---
 libavformat/Makefile      |   1 +
 libavformat/allformats.c  |   1 +
 libavformat/iamf_writer.c | 860 ++++++++++++++++++++++++++++++++++++++
 libavformat/iamf_writer.h |  51 +++
 libavformat/iamfenc.c     | 387 +++++++++++++++++
 5 files changed, 1300 insertions(+)
 create mode 100644 libavformat/iamf_writer.c
 create mode 100644 libavformat/iamf_writer.h
 create mode 100644 libavformat/iamfenc.c
diff --git a/libavformat/Makefile b/libavformat/Makefile
index f23c22792b..581e378d95 100644
--- a/libavformat/Makefile
+++ b/libavformat/Makefile
@@ -259,6 +259,7 @@ OBJS-$(CONFIG_HLS_DEMUXER)               += hls.o hls_sample_encryption.o
 OBJS-$(CONFIG_HLS_MUXER)                 += hlsenc.o hlsplaylist.o avc.o
 OBJS-$(CONFIG_HNM_DEMUXER)               += hnm.o
 OBJS-$(CONFIG_IAMF_DEMUXER)              += iamfdec.o iamf_parse.o iamf.o
+OBJS-$(CONFIG_IAMF_MUXER)                += iamfenc.o iamf_writer.o iamf.o
 OBJS-$(CONFIG_ICO_DEMUXER)               += icodec.o
 OBJS-$(CONFIG_ICO_MUXER)                 += icoenc.o
 OBJS-$(CONFIG_IDCIN_DEMUXER)             += idcin.o
diff --git a/libavformat/allformats.c b/libavformat/allformats.c
index 6e520b78a6..ce6be5f04d 100644
--- a/libavformat/allformats.c
+++ b/libavformat/allformats.c
@@ -213,6 +213,7 @@ extern const AVInputFormat  ff_hls_demuxer;
 extern const FFOutputFormat ff_hls_muxer;
 extern const AVInputFormat  ff_hnm_demuxer;
 extern const AVInputFormat  ff_iamf_demuxer;
+extern const FFOutputFormat ff_iamf_muxer;
 extern const AVInputFormat  ff_ico_demuxer;
 extern const FFOutputFormat ff_ico_muxer;
 extern const AVInputFormat  ff_idcin_demuxer;
diff --git a/libavformat/iamf_writer.c b/libavformat/iamf_writer.c
new file mode 100644
index 0000000000..9962845049
--- /dev/null
+++ b/libavformat/iamf_writer.c
@@ -0,0 +1,860 @@
+/*
+ * Immersive Audio Model and Formats muxing helpers and structs
+ * Copyright (c) 2023 James Almer <jamrial@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/channel_layout.h"
+#include "libavutil/intreadwrite.h"
+#include "libavutil/iamf.h"
+#include "libavutil/mem.h"
+#include "libavcodec/get_bits.h"
+#include "libavcodec/flac.h"
+#include "libavcodec/mpeg4audio.h"
+#include "libavcodec/put_bits.h"
+#include "avformat.h"
+#include "avio_internal.h"
+#include "iamf.h"
+#include "iamf_writer.h"
+
+
+static int update_extradata(IAMFCodecConfig *codec_config)
+{
+    GetBitContext gb;
+    PutBitContext pb;
+    int ret;
+
+    switch(codec_config->codec_id) {
+    case AV_CODEC_ID_OPUS:
+        if (codec_config->extradata_size < 19)
+            return AVERROR_INVALIDDATA;
+        codec_config->extradata_size -= 8;
+        memmove(codec_config->extradata, codec_config->extradata + 8, codec_config->extradata_size);
+        AV_WB8(codec_config->extradata + 1, 2); // set channels to stereo
+        break;
+    case AV_CODEC_ID_FLAC: {
+        uint8_t buf[13];
+
+        init_put_bits(&pb, buf, sizeof(buf));
+        ret = init_get_bits8(&gb, codec_config->extradata, codec_config->extradata_size);
+        if (ret < 0)
+            return ret;
+
+        put_bits32(&pb, get_bits_long(&gb, 32)); // min/max blocksize
+        put_bits64(&pb, 48, get_bits64(&gb, 48)); // min/max framesize
+        put_bits(&pb, 20, get_bits(&gb, 20)); // samplerate
+        skip_bits(&gb, 3);
+        put_bits(&pb, 3, 1); // set channels to stereo
+        ret = put_bits_left(&pb);
+        put_bits(&pb, ret, get_bits(&gb, ret));
+        flush_put_bits(&pb);
+
+        memcpy(codec_config->extradata, buf, sizeof(buf));
+        break;
+    }
+    default:
+        break;
+    }
+
+    return 0;
+}
+
+static int fill_codec_config(IAMFContext *iamf, const AVStreamGroup *stg,
+                             IAMFCodecConfig *codec_config)
+{
+    const AVStream *st = stg->streams[0];
+    IAMFCodecConfig **tmp;
+    int j, ret = 0;
+
+    codec_config->codec_id = st->codecpar->codec_id;
+    codec_config->sample_rate = st->codecpar->sample_rate;
+    codec_config->codec_tag = st->codecpar->codec_tag;
+    codec_config->nb_samples = st->codecpar->frame_size;
+    codec_config->seek_preroll = st->codecpar->seek_preroll;
+    if (st->codecpar->extradata_size) {
+        codec_config->extradata = av_memdup(st->codecpar->extradata, st->codecpar->extradata_size);
+        if (!codec_config->extradata)
+            return AVERROR(ENOMEM);
+        codec_config->extradata_size = st->codecpar->extradata_size;
+        ret = update_extradata(codec_config);
+        if (ret < 0)
+            goto fail;
+    }
+
+    for (j = 0; j < iamf->nb_codec_configs; j++) {
+        if (!memcmp(iamf->codec_configs[j], codec_config, offsetof(IAMFCodecConfig, extradata)) &&
+            (!codec_config->extradata_size || !memcmp(iamf->codec_configs[j]->extradata,
+                                                      codec_config->extradata, codec_config->extradata_size)))
+            break;
+    }
+
+    if (j < iamf->nb_codec_configs) {
+        av_free(iamf->codec_configs[j]->extradata);
+        av_free(iamf->codec_configs[j]);
+        iamf->codec_configs[j] = codec_config;
+        return j;
+    }
+
+    tmp = av_realloc_array(iamf->codec_configs, iamf->nb_codec_configs + 1, sizeof(*iamf->codec_configs));
+    if (!tmp) {
+        ret = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    iamf->codec_configs = tmp;
+    iamf->codec_configs[iamf->nb_codec_configs] = codec_config;
+    codec_config->codec_config_id = iamf->nb_codec_configs;
+
+    return iamf->nb_codec_configs++;
+
+fail:
+    av_freep(&codec_config->extradata);
+    return ret;
+}
+
+static IAMFParamDefinition *add_param_definition(IAMFContext *iamf, AVIAMFParamDefinition *param,
+                                                 const IAMFAudioElement *audio_element, void *log_ctx)
+{
+    IAMFParamDefinition **tmp, *param_definition;
+    IAMFCodecConfig *codec_config = NULL;
+
+    tmp = av_realloc_array(iamf->param_definitions, iamf->nb_param_definitions + 1,
+                           sizeof(*iamf->param_definitions));
+    if (!tmp)
+        return NULL;
+
+    iamf->param_definitions = tmp;
+
+    param_definition = av_mallocz(sizeof(*param_definition));
+    if (!param_definition)
+        return NULL;
+
+    if (audio_element)
+        codec_config = iamf->codec_configs[audio_element->codec_config_id];
+
+    if (!param->parameter_rate) {
+        if (!codec_config) {
+            av_log(log_ctx, AV_LOG_ERROR, "parameter_rate needed but not set for parameter_id %u\n",
+                   param->parameter_id);
+            return NULL;
+        }
+        param->parameter_rate = codec_config->sample_rate;
+    }
+    if (codec_config) {
+        if (!param->duration)
+            param->duration = codec_config->nb_samples;
+        if (!param->constant_subblock_duration)
+            param->constant_subblock_duration = codec_config->nb_samples;
+    }
+
+    param_definition->mode = !!param->duration;
+    param_definition->param = param;
+    param_definition->audio_element = audio_element;
+    iamf->param_definitions[iamf->nb_param_definitions++] = param_definition;
+
+    return param_definition;
+}
+
+int ff_iamf_add_audio_element(IAMFContext *iamf, const AVStreamGroup *stg, void *log_ctx)
+{
+    const AVIAMFAudioElement *iamf_audio_element;
+    IAMFAudioElement **tmp, *audio_element;
+    IAMFCodecConfig *codec_config;
+    int ret;
+
+    if (stg->type != AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT)
+        return AVERROR(EINVAL);
+
+    iamf_audio_element = stg->params.iamf_audio_element;
+    if (iamf_audio_element->audio_element_type == AV_IAMF_AUDIO_ELEMENT_TYPE_SCENE) {
+        const AVIAMFLayer *layer = iamf_audio_element->layers[0];
+        if (iamf_audio_element->nb_layers != 1) {
+            av_log(log_ctx, AV_LOG_ERROR, "Invalid amount of layers for SCENE_BASED audio element. Must be 1\n");
+            return AVERROR(EINVAL);
+        }
+        if (layer->ch_layout.order != AV_CHANNEL_ORDER_CUSTOM &&
+            layer->ch_layout.order != AV_CHANNEL_ORDER_AMBISONIC) {
+            av_log(log_ctx, AV_LOG_ERROR, "Invalid channel layout for SCENE_BASED audio element\n");
+            return AVERROR(EINVAL);
+        }
+        if (layer->ambisonics_mode >= AV_IAMF_AMBISONICS_MODE_PROJECTION) {
+            av_log(log_ctx, AV_LOG_ERROR, "Unsuported ambisonics mode %d\n", layer->ambisonics_mode);
+            return AVERROR_PATCHWELCOME;
+        }
+        for (int i = 0; i < stg->nb_streams; i++) {
+            if (stg->streams[i]->codecpar->ch_layout.nb_channels > 1) {
+                av_log(log_ctx, AV_LOG_ERROR, "Invalid amount of channels in a stream for MONO mode ambisonics\n");
+                return AVERROR(EINVAL);
+            }
+        }
+    } else
+        for (int j, i = 0; i < iamf_audio_element->nb_layers; i++) {
+            const AVIAMFLayer *layer = iamf_audio_element->layers[i];
+            for (j = 0; j < FF_ARRAY_ELEMS(ff_iamf_scalable_ch_layouts); j++)
+                if (!av_channel_layout_compare(&layer->ch_layout, &ff_iamf_scalable_ch_layouts[j]))
+                    break;
+
+            if (j >= FF_ARRAY_ELEMS(ff_iamf_scalable_ch_layouts)) {
+                av_log(log_ctx, AV_LOG_ERROR, "Unsupported channel layout in stream group #%d\n", i);
+                return AVERROR(EINVAL);
+            }
+        }
+
+    for (int i = 0; i < iamf->nb_audio_elements; i++) {
+        if (stg->id == iamf->audio_elements[i]->audio_element_id) {
+            av_log(log_ctx, AV_LOG_ERROR, "Duplicated Audio Element id %"PRId64"\n", stg->id);
+            return AVERROR(EINVAL);
+        }
+    }
+
+    codec_config = av_mallocz(sizeof(*codec_config));
+    if (!codec_config)
+        return AVERROR(ENOMEM);
+
+    ret = fill_codec_config(iamf, stg, codec_config);
+    if (ret < 0) {
+        av_free(codec_config);
+        return ret;
+    }
+
+    audio_element = av_mallocz(sizeof(*audio_element));
+    if (!audio_element)
+        return AVERROR(ENOMEM);
+
+    audio_element->element = stg->params.iamf_audio_element;
+    audio_element->audio_element_id = stg->id;
+    audio_element->codec_config_id = ret;
+
+    audio_element->substreams = av_calloc(stg->nb_streams, sizeof(*audio_element->substreams));
+    if (!audio_element->substreams)
+        return AVERROR(ENOMEM);
+    audio_element->nb_substreams = stg->nb_streams;
+
+    audio_element->layers = av_calloc(iamf_audio_element->nb_layers, sizeof(*audio_element->layers));
+    if (!audio_element->layers)
+        return AVERROR(ENOMEM);
+
+    for (int i = 0, j = 0; i < iamf_audio_element->nb_layers; i++) {
+        int nb_channels = iamf_audio_element->layers[i]->ch_layout.nb_channels;
+
+        IAMFLayer *layer = &audio_element->layers[i];
+        if (!layer)
+            return AVERROR(ENOMEM);
+        memset(layer, 0, sizeof(*layer));
+
+        if (i)
+            nb_channels -= iamf_audio_element->layers[i - 1]->ch_layout.nb_channels;
+        for (; nb_channels > 0 && j < stg->nb_streams; j++) {
+            const AVStream *st = stg->streams[j];
+            IAMFSubStream *substream = &audio_element->substreams[j];
+
+            substream->audio_substream_id = st->id;
+            layer->substream_count++;
+            layer->coupled_substream_count += st->codecpar->ch_layout.nb_channels == 2;
+            nb_channels -= st->codecpar->ch_layout.nb_channels;
+        }
+        if (nb_channels) {
+            av_log(log_ctx, AV_LOG_ERROR, "Invalid channel count across substreams in layer %u from stream group %u\n",
+                   i, stg->index);
+            return AVERROR(EINVAL);
+        }
+    }
+
+    if (iamf_audio_element->demixing_info) {
+        AVIAMFParamDefinition *param = iamf_audio_element->demixing_info;
+        IAMFParamDefinition *param_definition = ff_iamf_get_param_definition(iamf, param->parameter_id);
+
+        if (param->nb_subblocks != 1) {
+            av_log(log_ctx, AV_LOG_ERROR, "nb_subblocks in demixing_info for stream group %u is not 1\n", stg->index);
+            return AVERROR(EINVAL);
+        }
+
+        if (!param_definition) {
+            param_definition = add_param_definition(iamf, param, audio_element, log_ctx);
+            if (!param_definition)
+                return AVERROR(ENOMEM);
+        }
+    }
+    if (iamf_audio_element->recon_gain_info) {
+        AVIAMFParamDefinition *param = iamf_audio_element->recon_gain_info;
+        IAMFParamDefinition *param_definition = ff_iamf_get_param_definition(iamf, param->parameter_id);
+
+        if (param->nb_subblocks != 1) {
+            av_log(log_ctx, AV_LOG_ERROR, "nb_subblocks in recon_gain_info for stream group %u is not 1\n", stg->index);
+            return AVERROR(EINVAL);
+        }
+
+        if (!param_definition) {
+            param_definition = add_param_definition(iamf, param, audio_element, log_ctx);
+            if (!param_definition)
+                return AVERROR(ENOMEM);
+        }
+    }
+
+    tmp = av_realloc_array(iamf->audio_elements, iamf->nb_audio_elements + 1, sizeof(*iamf->audio_elements));
+    if (!tmp)
+        return AVERROR(ENOMEM);
+
+    iamf->audio_elements = tmp;
+    iamf->audio_elements[iamf->nb_audio_elements++] = audio_element;
+
+    return 0;
+}
+
+int ff_iamf_add_mix_presentation(IAMFContext *iamf, const AVStreamGroup *stg, void *log_ctx)
+{
+    IAMFMixPresentation **tmp, *mix_presentation;
+
+    if (stg->type != AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION)
+        return AVERROR(EINVAL);
+
+    for (int i = 0; i < iamf->nb_mix_presentations; i++) {
+        if (stg->id == iamf->mix_presentations[i]->mix_presentation_id) {
+            av_log(log_ctx, AV_LOG_ERROR, "Duplicate Mix Presentation id %"PRId64"\n", stg->id);
+            return AVERROR(EINVAL);
+        }
+    }
+
+    mix_presentation = av_mallocz(sizeof(*mix_presentation));
+    if (!mix_presentation)
+        return AVERROR(ENOMEM);
+
+    mix_presentation->mix = stg->params.iamf_mix_presentation;
+    mix_presentation->mix_presentation_id = stg->id;
+
+    for (int i = 0; i < mix_presentation->mix->nb_submixes; i++) {
+        const AVIAMFSubmix *submix = mix_presentation->mix->submixes[i];
+        AVIAMFParamDefinition *param = submix->output_mix_config;
+        IAMFParamDefinition *param_definition;
+
+        if (!param) {
+            av_log(log_ctx, AV_LOG_ERROR, "output_mix_config is not present in submix %u from "
+                                          "Mix Presentation ID %"PRId64"\n", i, stg->id);
+            return AVERROR(EINVAL);
+        }
+
+        param_definition = ff_iamf_get_param_definition(iamf, param->parameter_id);
+        if (!param_definition) {
+            param_definition = add_param_definition(iamf, param, NULL, log_ctx);
+            if (!param_definition)
+                return AVERROR(ENOMEM);
+        }
+
+        for (int j = 0; j < submix->nb_elements; j++) {
+            const AVIAMFSubmixElement *element = submix->elements[j];
+            param = element->element_mix_config;
+
+            if (!param) {
+                av_log(log_ctx, AV_LOG_ERROR, "element_mix_config is not present for element %u in submix %u from "
+                                              "Mix Presentation ID %"PRId64"\n", j, i, stg->id);
+                return AVERROR(EINVAL);
+            }
+            param_definition = ff_iamf_get_param_definition(iamf, param->parameter_id);
+            if (!param_definition) {
+                param_definition = add_param_definition(iamf, param, NULL, log_ctx);
+                if (!param_definition)
+                    return AVERROR(ENOMEM);
+            }
+        }
+    }
+
+    tmp = av_realloc_array(iamf->mix_presentations, iamf->nb_mix_presentations + 1, sizeof(*iamf->mix_presentations));
+    if (!tmp)
+        return AVERROR(ENOMEM);
+
+    iamf->mix_presentations = tmp;
+    iamf->mix_presentations[iamf->nb_mix_presentations++] = mix_presentation;
+
+    return 0;
+}
+
+static int iamf_write_codec_config(const IAMFContext *iamf,
+                                   const IAMFCodecConfig *codec_config,
+                                   AVIOContext *pb)
+{
+    uint8_t header[MAX_IAMF_OBU_HEADER_SIZE];
+    AVIOContext *dyn_bc;
+    uint8_t *dyn_buf = NULL;
+    PutBitContext pbc;
+    int dyn_size;
+
+    int ret = avio_open_dyn_buf(&dyn_bc);
+    if (ret < 0)
+        return ret;
+
+    ffio_write_leb(dyn_bc, codec_config->codec_config_id);
+    avio_wl32(dyn_bc, codec_config->codec_tag);
+
+    ffio_write_leb(dyn_bc, codec_config->nb_samples);
+    avio_wb16(dyn_bc, codec_config->seek_preroll);
+
+    switch(codec_config->codec_id) {
+    case AV_CODEC_ID_OPUS:
+        avio_write(dyn_bc, codec_config->extradata, codec_config->extradata_size);
+        break;
+    case AV_CODEC_ID_AAC:
+        return AVERROR_PATCHWELCOME;
+    case AV_CODEC_ID_FLAC:
+        avio_w8(dyn_bc, 0x80);
+        avio_wb24(dyn_bc, codec_config->extradata_size);
+        avio_write(dyn_bc, codec_config->extradata, codec_config->extradata_size);
+        break;
+    case AV_CODEC_ID_PCM_S16LE:
+        avio_w8(dyn_bc, 0);
+        avio_w8(dyn_bc, 16);
+        avio_wb32(dyn_bc, codec_config->sample_rate);
+        break;
+    case AV_CODEC_ID_PCM_S24LE:
+        avio_w8(dyn_bc, 0);
+        avio_w8(dyn_bc, 24);
+        avio_wb32(dyn_bc, codec_config->sample_rate);
+        break;
+    case AV_CODEC_ID_PCM_S32LE:
+        avio_w8(dyn_bc, 0);
+        avio_w8(dyn_bc, 32);
+        avio_wb32(dyn_bc, codec_config->sample_rate);
+        break;
+    case AV_CODEC_ID_PCM_S16BE:
+        avio_w8(dyn_bc, 1);
+        avio_w8(dyn_bc, 16);
+        avio_wb32(dyn_bc, codec_config->sample_rate);
+        break;
+    case AV_CODEC_ID_PCM_S24BE:
+        avio_w8(dyn_bc, 1);
+        avio_w8(dyn_bc, 24);
+        avio_wb32(dyn_bc, codec_config->sample_rate);
+        break;
+    case AV_CODEC_ID_PCM_S32BE:
+        avio_w8(dyn_bc, 1);
+        avio_w8(dyn_bc, 32);
+        avio_wb32(dyn_bc, codec_config->sample_rate);
+        break;
+    default:
+        break;
+    }
+
+    init_put_bits(&pbc, header, sizeof(header));
+    put_bits(&pbc, 5, IAMF_OBU_IA_CODEC_CONFIG);
+    put_bits(&pbc, 3, 0);
+    flush_put_bits(&pbc);
+
+    dyn_size = avio_close_dyn_buf(dyn_bc, &dyn_buf);
+    avio_write(pb, header, put_bytes_count(&pbc, 1));
+    ffio_write_leb(pb, dyn_size);
+    avio_write(pb, dyn_buf, dyn_size);
+    av_free(dyn_buf);
+
+    return 0;
+}
+
+static inline int rescale_rational(AVRational q, int b)
+{
+    return av_clip_int16(av_rescale(q.num, b, q.den));
+}
+
+static int scalable_channel_layout_config(const IAMFAudioElement *audio_element,
+                                          AVIOContext *dyn_bc)
+{
+    const AVIAMFAudioElement *element = audio_element->element;
+    uint8_t header[MAX_IAMF_OBU_HEADER_SIZE];
+    PutBitContext pb;
+
+    init_put_bits(&pb, header, sizeof(header));
+    put_bits(&pb, 3, element->nb_layers);
+    put_bits(&pb, 5, 0);
+    flush_put_bits(&pb);
+    avio_write(dyn_bc, header, put_bytes_count(&pb, 1));
+    for (int i = 0; i < element->nb_layers; i++) {
+        AVIAMFLayer *layer = element->layers[i];
+        int layout;
+        for (layout = 0; layout < FF_ARRAY_ELEMS(ff_iamf_scalable_ch_layouts); layout++) {
+            if (!av_channel_layout_compare(&layer->ch_layout, &ff_iamf_scalable_ch_layouts[layout]))
+                break;
+        }
+        init_put_bits(&pb, header, sizeof(header));
+        put_bits(&pb, 4, layout);
+        put_bits(&pb, 1, !!layer->output_gain_flags);
+        put_bits(&pb, 1, !!(layer->flags & AV_IAMF_LAYER_FLAG_RECON_GAIN));
+        put_bits(&pb, 2, 0); // reserved
+        put_bits(&pb, 8, audio_element->layers[i].substream_count);
+        put_bits(&pb, 8, audio_element->layers[i].coupled_substream_count);
+        if (layer->output_gain_flags) {
+            put_bits(&pb, 6, layer->output_gain_flags);
+            put_bits(&pb, 2, 0);
+            put_bits(&pb, 16, rescale_rational(layer->output_gain, 1 << 8));
+        }
+        flush_put_bits(&pb);
+        avio_write(dyn_bc, header, put_bytes_count(&pb, 1));
+    }
+
+    return 0;
+}
+
+static int ambisonics_config(const IAMFAudioElement *audio_element,
+                             AVIOContext *dyn_bc)
+{
+    const AVIAMFAudioElement *element = audio_element->element;
+    AVIAMFLayer *layer = element->layers[0];
+
+    ffio_write_leb(dyn_bc, 0); // ambisonics_mode
+    ffio_write_leb(dyn_bc, layer->ch_layout.nb_channels); // output_channel_count
+    ffio_write_leb(dyn_bc, audio_element->nb_substreams); // substream_count
+
+    if (layer->ch_layout.order == AV_CHANNEL_ORDER_AMBISONIC)
+        for (int i = 0; i < layer->ch_layout.nb_channels; i++)
+            avio_w8(dyn_bc, i);
+    else
+        for (int i = 0; i < layer->ch_layout.nb_channels; i++)
+            avio_w8(dyn_bc, layer->ch_layout.u.map[i].id);
+
+    return 0;
+}
+
+static int param_definition(const IAMFContext *iamf,
+                            const IAMFParamDefinition *param_def,
+                            AVIOContext *dyn_bc, void *log_ctx)
+{
+    const AVIAMFParamDefinition *param = param_def->param;
+
+    ffio_write_leb(dyn_bc, param->parameter_id);
+    ffio_write_leb(dyn_bc, param->parameter_rate);
+    avio_w8(dyn_bc, param->duration ? 0 : 1 << 7);
+    if (param->duration) {
+        ffio_write_leb(dyn_bc, param->duration);
+        ffio_write_leb(dyn_bc, param->constant_subblock_duration);
+        if (param->constant_subblock_duration == 0) {
+            ffio_write_leb(dyn_bc, param->nb_subblocks);
+            for (int i = 0; i < param->nb_subblocks; i++) {
+                const void *subblock = av_iamf_param_definition_get_subblock(param, i);
+
+                switch (param->type) {
+                case AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN: {
+                    const AVIAMFMixGain *mix = subblock;
+                    ffio_write_leb(dyn_bc, mix->subblock_duration);
+                    break;
+                }
+                case AV_IAMF_PARAMETER_DEFINITION_DEMIXING: {
+                    const AVIAMFDemixingInfo *demix = subblock;
+                    ffio_write_leb(dyn_bc, demix->subblock_duration);
+                    break;
+                }
+                case AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN: {
+                    const AVIAMFReconGain *recon = subblock;
+                    ffio_write_leb(dyn_bc, recon->subblock_duration);
+                    break;
+                }
+                }
+            }
+        }
+    }
+
+    return 0;
+}
+
+static int iamf_write_audio_element(const IAMFContext *iamf,
+                                    const IAMFAudioElement *audio_element,
+                                    AVIOContext *pb, void *log_ctx)
+{
+    const AVIAMFAudioElement *element = audio_element->element;
+    const IAMFCodecConfig *codec_config = iamf->codec_configs[audio_element->codec_config_id];
+    uint8_t header[MAX_IAMF_OBU_HEADER_SIZE];
+    AVIOContext *dyn_bc;
+    uint8_t *dyn_buf = NULL;
+    PutBitContext pbc;
+    int param_definition_types = AV_IAMF_PARAMETER_DEFINITION_DEMIXING, dyn_size;
+
+    int ret = avio_open_dyn_buf(&dyn_bc);
+    if (ret < 0)
+        return ret;
+
+    ffio_write_leb(dyn_bc, audio_element->audio_element_id);
+
+    init_put_bits(&pbc, header, sizeof(header));
+    put_bits(&pbc, 3, element->audio_element_type);
+    put_bits(&pbc, 5, 0);
+    flush_put_bits(&pbc);
+    avio_write(dyn_bc, header, put_bytes_count(&pbc, 1));
+
+    ffio_write_leb(dyn_bc, audio_element->codec_config_id);
+    ffio_write_leb(dyn_bc, audio_element->nb_substreams);
+
+    for (int i = 0; i < audio_element->nb_substreams; i++)
+        ffio_write_leb(dyn_bc, audio_element->substreams[i].audio_substream_id);
+
+    if (element->nb_layers == 1)
+        param_definition_types &= ~AV_IAMF_PARAMETER_DEFINITION_DEMIXING;
+    if (element->nb_layers > 1)
+        param_definition_types |= AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN;
+    if (codec_config->codec_tag == MKTAG('f','L','a','C') ||
+        codec_config->codec_tag == MKTAG('i','p','c','m'))
+        param_definition_types &= ~AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN;
+
+    ffio_write_leb(dyn_bc, av_popcount(param_definition_types)); // num_parameters
+
+    if (param_definition_types & 1) {
+        const AVIAMFParamDefinition *param = element->demixing_info;
+        const IAMFParamDefinition *param_def;
+        const AVIAMFDemixingInfo *demix;
+
+        if (!param) {
+            av_log(log_ctx, AV_LOG_ERROR, "demixing_info needed but not set in Stream Group #%u\n",
+                   audio_element->audio_element_id);
+            return AVERROR(EINVAL);
+        }
+
+        demix = av_iamf_param_definition_get_subblock(param, 0);
+        ffio_write_leb(dyn_bc, AV_IAMF_PARAMETER_DEFINITION_DEMIXING); // type
+
+        param_def = ff_iamf_get_param_definition(iamf, param->parameter_id);
+        ret = param_definition(iamf, param_def, dyn_bc, log_ctx);
+        if (ret < 0)
+            return ret;
+
+        avio_w8(dyn_bc, demix->dmixp_mode << 5); // dmixp_mode
+        avio_w8(dyn_bc, element->default_w << 4); // default_w
+    }
+    if (param_definition_types & 2) {
+        const AVIAMFParamDefinition *param = element->recon_gain_info;
+        const IAMFParamDefinition *param_def;
+
+        if (!param) {
+            av_log(log_ctx, AV_LOG_ERROR, "recon_gain_info needed but not set in Stream Group #%u\n",
+                   audio_element->audio_element_id);
+            return AVERROR(EINVAL);
+        }
+        ffio_write_leb(dyn_bc, AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN); // type
+
+        param_def = ff_iamf_get_param_definition(iamf, param->parameter_id);
+        ret = param_definition(iamf, param_def, dyn_bc, log_ctx);
+        if (ret < 0)
+            return ret;
+    }
+
+    if (element->audio_element_type == AV_IAMF_AUDIO_ELEMENT_TYPE_CHANNEL) {
+        ret = scalable_channel_layout_config(audio_element, dyn_bc);
+        if (ret < 0)
+            return ret;
+    } else {
+        ret = ambisonics_config(audio_element, dyn_bc);
+        if (ret < 0)
+            return ret;
+    }
+
+    init_put_bits(&pbc, header, sizeof(header));
+    put_bits(&pbc, 5, IAMF_OBU_IA_AUDIO_ELEMENT);
+    put_bits(&pbc, 3, 0);
+    flush_put_bits(&pbc);
+
+    dyn_size = avio_close_dyn_buf(dyn_bc, &dyn_buf);
+    avio_write(pb, header, put_bytes_count(&pbc, 1));
+    ffio_write_leb(pb, dyn_size);
+    avio_write(pb, dyn_buf, dyn_size);
+    av_free(dyn_buf);
+
+    return 0;
+}
+
+static int iamf_write_mixing_presentation(const IAMFContext *iamf,
+                                          const IAMFMixPresentation *mix_presentation,
+                                          AVIOContext *pb, void *log_ctx)
+{
+    uint8_t header[MAX_IAMF_OBU_HEADER_SIZE];
+    const AVIAMFMixPresentation *mix = mix_presentation->mix;
+    const AVDictionaryEntry *tag = NULL;
+    PutBitContext pbc;
+    AVIOContext *dyn_bc;
+    uint8_t *dyn_buf = NULL;
+    int dyn_size;
+
+    int ret = avio_open_dyn_buf(&dyn_bc);
+    if (ret < 0)
+        return ret;
+
+    ffio_write_leb(dyn_bc, mix_presentation->mix_presentation_id); // mix_presentation_id
+    ffio_write_leb(dyn_bc, av_dict_count(mix->annotations)); // count_label
+
+    while ((tag = av_dict_iterate(mix->annotations, tag)))
+        avio_put_str(dyn_bc, tag->key);
+    while ((tag = av_dict_iterate(mix->annotations, tag)))
+        avio_put_str(dyn_bc, tag->value);
+
+    ffio_write_leb(dyn_bc, mix->nb_submixes);
+    for (int i = 0; i < mix->nb_submixes; i++) {
+        const AVIAMFSubmix *sub_mix = mix->submixes[i];
+        const IAMFParamDefinition *param_def;
+
+        ffio_write_leb(dyn_bc, sub_mix->nb_elements);
+        for (int j = 0; j < sub_mix->nb_elements; j++) {
+            const IAMFAudioElement *audio_element = NULL;
+            const AVIAMFSubmixElement *submix_element = sub_mix->elements[j];
+
+            for (int k = 0; k < iamf->nb_audio_elements; k++)
+                if (iamf->audio_elements[k]->audio_element_id == submix_element->audio_element_id) {
+                    audio_element = iamf->audio_elements[k];
+                    break;
+                }
+
+            av_assert0(audio_element);
+            ffio_write_leb(dyn_bc, submix_element->audio_element_id);
+
+            if (av_dict_count(submix_element->annotations) != av_dict_count(mix->annotations)) {
+                av_log(log_ctx, AV_LOG_ERROR, "Inconsistent amount of labels in submix %d from Mix Presentation id #%u\n",
+                       j, audio_element->audio_element_id);
+                return AVERROR(EINVAL);
+            }
+            while ((tag = av_dict_iterate(submix_element->annotations, tag)))
+                avio_put_str(dyn_bc, tag->value);
+
+            init_put_bits(&pbc, header, sizeof(header));
+            put_bits(&pbc, 2, submix_element->headphones_rendering_mode);
+            put_bits(&pbc, 6, 0); // reserved
+            flush_put_bits(&pbc);
+            avio_write(dyn_bc, header, put_bytes_count(&pbc, 1));
+            ffio_write_leb(dyn_bc, 0); // rendering_config_extension_size
+
+            param_def = ff_iamf_get_param_definition(iamf, submix_element->element_mix_config->parameter_id);
+            ret = param_definition(iamf, param_def, dyn_bc, log_ctx);
+            if (ret < 0)
+                return ret;
+
+            avio_wb16(dyn_bc, rescale_rational(submix_element->default_mix_gain, 1 << 8));
+        }
+
+        param_def = ff_iamf_get_param_definition(iamf, sub_mix->output_mix_config->parameter_id);
+        ret = param_definition(iamf, param_def, dyn_bc, log_ctx);
+        if (ret < 0)
+            return ret;
+        avio_wb16(dyn_bc, rescale_rational(sub_mix->default_mix_gain, 1 << 8));
+
+        ffio_write_leb(dyn_bc, sub_mix->nb_layouts); // nb_layouts
+        for (int i = 0; i < sub_mix->nb_layouts; i++) {
+            const AVIAMFSubmixLayout *submix_layout = sub_mix->layouts[i];
+            int layout, info_type;
+            int dialogue = submix_layout->dialogue_anchored_loudness.num &&
+                           submix_layout->dialogue_anchored_loudness.den;
+            int album = submix_layout->album_anchored_loudness.num &&
+                        submix_layout->album_anchored_loudness.den;
+
+            if (layout == FF_ARRAY_ELEMS(ff_iamf_sound_system_map)) {
+                av_log(log_ctx, AV_LOG_ERROR, "Invalid Sound System value in a submix\n");
+                return AVERROR(EINVAL);
+            }
+
+            if (submix_layout->layout_type == AV_IAMF_SUBMIX_LAYOUT_TYPE_LOUDSPEAKERS) {
+                for (layout = 0; layout < FF_ARRAY_ELEMS(ff_iamf_sound_system_map); layout++) {
+                    if (!av_channel_layout_compare(&submix_layout->sound_system, &ff_iamf_sound_system_map[layout].layout))
+                        break;
+                }
+                if (layout == FF_ARRAY_ELEMS(ff_iamf_sound_system_map)) {
+                    av_log(log_ctx, AV_LOG_ERROR, "Invalid Sound System value in a submix\n");
+                    return AVERROR(EINVAL);
+                }
+            }
+            init_put_bits(&pbc, header, sizeof(header));
+            put_bits(&pbc, 2, submix_layout->layout_type); // layout_type
+            if (submix_layout->layout_type == AV_IAMF_SUBMIX_LAYOUT_TYPE_LOUDSPEAKERS) {
+                put_bits(&pbc, 4, ff_iamf_sound_system_map[layout].id); // sound_system
+                put_bits(&pbc, 2, 0); // reserved
+            } else
+                put_bits(&pbc, 6, 0); // reserved
+            flush_put_bits(&pbc);
+            avio_write(dyn_bc, header, put_bytes_count(&pbc, 1));
+
+            info_type  = (submix_layout->true_peak.num && submix_layout->true_peak.den);
+            info_type |= (dialogue || album) << 1;
+            avio_w8(dyn_bc, info_type);
+            avio_wb16(dyn_bc, rescale_rational(submix_layout->integrated_loudness, 1 << 8));
+            avio_wb16(dyn_bc, rescale_rational(submix_layout->digital_peak, 1 << 8));
+            if (info_type & 1)
+                avio_wb16(dyn_bc, rescale_rational(submix_layout->true_peak, 1 << 8));
+            if (info_type & 2) {
+                avio_w8(dyn_bc, dialogue + album); // num_anchored_loudness
+                if (dialogue) {
+                    avio_w8(dyn_bc, IAMF_ANCHOR_ELEMENT_DIALOGUE);
+                    avio_wb16(dyn_bc, rescale_rational(submix_layout->dialogue_anchored_loudness, 1 << 8));
+                }
+                if (album) {
+                    avio_w8(dyn_bc, IAMF_ANCHOR_ELEMENT_ALBUM);
+                    avio_wb16(dyn_bc, rescale_rational(submix_layout->album_anchored_loudness, 1 << 8));
+                }
+            }
+        }
+    }
+
+    init_put_bits(&pbc, header, sizeof(header));
+    put_bits(&pbc, 5, IAMF_OBU_IA_MIX_PRESENTATION);
+    put_bits(&pbc, 3, 0);
+    flush_put_bits(&pbc);
+
+    dyn_size = avio_close_dyn_buf(dyn_bc, &dyn_buf);
+    avio_write(pb, header, put_bytes_count(&pbc, 1));
+    ffio_write_leb(pb, dyn_size);
+    avio_write(pb, dyn_buf, dyn_size);
+    av_free(dyn_buf);
+
+    return 0;
+}
+
+int ff_iamf_write_descriptors(const IAMFContext *iamf, AVIOContext *pb, void *log_ctx)
+{
+    uint8_t header[MAX_IAMF_OBU_HEADER_SIZE];
+    PutBitContext pbc;
+    AVIOContext *dyn_bc;
+    uint8_t *dyn_buf = NULL;
+    int dyn_size;
+
+    int ret = avio_open_dyn_buf(&dyn_bc);
+    if (ret < 0)
+        return ret;
+
+    // Sequence Header
+    init_put_bits(&pbc, header, sizeof(header));
+    put_bits(&pbc, 5, IAMF_OBU_IA_SEQUENCE_HEADER);
+    put_bits(&pbc, 3, 0);
+    flush_put_bits(&pbc);
+
+    avio_write(dyn_bc, header, put_bytes_count(&pbc, 1));
+    ffio_write_leb(dyn_bc, 6);
+    avio_wb32(dyn_bc, MKBETAG('i','a','m','f'));
+    avio_w8(dyn_bc, iamf->nb_audio_elements > 1); // primary_profile
+    avio_w8(dyn_bc, iamf->nb_audio_elements > 1); // additional_profile
+
+    dyn_size = avio_close_dyn_buf(dyn_bc, &dyn_buf);
+    avio_write(pb, dyn_buf, dyn_size);
+    av_free(dyn_buf);
+
+    for (int i = 0; i < iamf->nb_codec_configs; i++) {
+        ret = iamf_write_codec_config(iamf, iamf->codec_configs[i], pb);
+        if (ret < 0)
+            return ret;
+    }
+
+    for (int i = 0; i < iamf->nb_audio_elements; i++) {
+        ret = iamf_write_audio_element(iamf, iamf->audio_elements[i], pb, log_ctx);
+        if (ret < 0)
+            return ret;
+    }
+
+    for (int i = 0; i < iamf->nb_mix_presentations; i++) {
+        ret = iamf_write_mixing_presentation(iamf, iamf->mix_presentations[i], pb, log_ctx);
+        if (ret < 0)
+            return ret;
+    }
+
+    return 0;
+}
diff --git a/libavformat/iamf_writer.h b/libavformat/iamf_writer.h
new file mode 100644
index 0000000000..93354670b8
--- /dev/null
+++ b/libavformat/iamf_writer.h
@@ -0,0 +1,51 @@
+/*
+ * Immersive Audio Model and Formats muxing helpers and structs
+ * Copyright (c) 2023 James Almer <jamrial@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVFORMAT_IAMF_WRITER_H
+#define AVFORMAT_IAMF_WRITER_H
+
+#include <stdint.h>
+
+#include "libavutil/common.h"
+#include "avformat.h"
+#include "avio.h"
+#include "iamf.h"
+
+static inline IAMFParamDefinition *ff_iamf_get_param_definition(const IAMFContext *iamf,
+                                                                unsigned int parameter_id)
+{
+    IAMFParamDefinition *param_definition = NULL;
+
+    for (int i = 0; i < iamf->nb_param_definitions; i++)
+        if (iamf->param_definitions[i]->param->parameter_id == parameter_id) {
+            param_definition = iamf->param_definitions[i];
+            break;
+        }
+
+    return param_definition;
+}
+
+int ff_iamf_add_audio_element(IAMFContext *iamf, const AVStreamGroup *stg, void *log_ctx);
+int ff_iamf_add_mix_presentation(IAMFContext *iamf, const AVStreamGroup *stg, void *log_ctx);
+
+int ff_iamf_write_descriptors(const IAMFContext *iamf, AVIOContext *pb, void *log_ctx);
+
+#endif /* AVFORMAT_IAMF_WRITER_H */
diff --git a/libavformat/iamfenc.c b/libavformat/iamfenc.c
new file mode 100644
index 0000000000..0a043ce3a0
--- /dev/null
+++ b/libavformat/iamfenc.c
@@ -0,0 +1,387 @@
+/*
+ * IAMF muxer
+ * Copyright (c) 2023 James Almer
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <stdint.h>
+
+#include "libavutil/avassert.h"
+#include "libavutil/common.h"
+#include "libavutil/iamf.h"
+#include "libavcodec/get_bits.h"
+#include "libavcodec/put_bits.h"
+#include "avformat.h"
+#include "avio_internal.h"
+#include "iamf.h"
+#include "iamf_writer.h"
+#include "internal.h"
+#include "mux.h"
+
+typedef struct IAMFMuxContext {
+    IAMFContext iamf;
+
+    int first_stream_id;
+} IAMFMuxContext;
+
+static int iamf_init(AVFormatContext *s)
+{
+    IAMFMuxContext *const c = s->priv_data;
+    IAMFContext *const iamf = &c->iamf;
+    int nb_audio_elements = 0, nb_mix_presentations = 0;
+    int ret;
+
+    if (!s->nb_streams) {
+        av_log(s, AV_LOG_ERROR, "There must be at least one stream\n");
+        return AVERROR(EINVAL);
+    }
+
+    for (int i = 0; i < s->nb_streams; i++) {
+        if (s->streams[i]->codecpar->codec_type != AVMEDIA_TYPE_AUDIO ||
+            (s->streams[i]->codecpar->codec_tag != MKTAG('m','p','4','a') &&
+             s->streams[i]->codecpar->codec_tag != MKTAG('O','p','u','s') &&
+             s->streams[i]->codecpar->codec_tag != MKTAG('f','L','a','C') &&
+             s->streams[i]->codecpar->codec_tag != MKTAG('i','p','c','m'))) {
+            av_log(s, AV_LOG_ERROR, "Unsupported codec id %s\n",
+                   avcodec_get_name(s->streams[i]->codecpar->codec_id));
+            return AVERROR(EINVAL);
+        }
+
+        if (s->streams[i]->codecpar->ch_layout.nb_channels > 2) {
+            av_log(s, AV_LOG_ERROR, "Unsupported channel layout on stream #%d\n", i);
+            return AVERROR(EINVAL);
+        }
+
+        for (int j = 0; j < i; j++) {
+            if (s->streams[i]->id == s->streams[j]->id) {
+                av_log(s, AV_LOG_ERROR, "Duplicated stream id %d\n", s->streams[j]->id);
+                return AVERROR(EINVAL);
+            }
+        }
+    }
+
+    if (!s->nb_stream_groups) {
+        av_log(s, AV_LOG_ERROR, "There must be at least two stream groups\n");
+        return AVERROR(EINVAL);
+    }
+
+    for (int i = 0; i < s->nb_stream_groups; i++) {
+        const AVStreamGroup *stg = s->stream_groups[i];
+
+        if (stg->type == AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT)
+            nb_audio_elements++;
+        if (stg->type == AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION)
+            nb_mix_presentations++;
+    }
+    if ((nb_audio_elements < 1 && nb_audio_elements > 2) || nb_mix_presentations < 1) {
+        av_log(s, AV_LOG_ERROR, "There must be >= 1 and <= 2 IAMF_AUDIO_ELEMENT and at least "
+                                "one IAMF_MIX_PRESENTATION stream groups\n");
+        return AVERROR(EINVAL);
+    }
+
+    for (int i = 0; i < s->nb_stream_groups; i++) {
+        const AVStreamGroup *stg = s->stream_groups[i];
+        if (stg->type != AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT)
+            continue;
+
+        ret = ff_iamf_add_audio_element(iamf, stg, s);
+        if (ret < 0)
+            return ret;
+    }
+
+    for (int i = 0; i < s->nb_stream_groups; i++) {
+        const AVStreamGroup *stg = s->stream_groups[i];
+        if (stg->type != AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION)
+            continue;
+
+        ret = ff_iamf_add_mix_presentation(iamf, stg, s);
+        if (ret < 0)
+            return ret;
+    }
+
+    c->first_stream_id = s->streams[0]->id;
+
+    return 0;
+}
+
+static int iamf_write_header(AVFormatContext *s)
+{
+    IAMFMuxContext *const c = s->priv_data;
+    IAMFContext *const iamf = &c->iamf;
+    int ret;
+
+    ret = ff_iamf_write_descriptors(iamf, s->pb, s);
+    if (ret < 0)
+        return ret;
+
+    c->first_stream_id = s->streams[0]->id;
+
+    return 0;
+}
+
+static inline int rescale_rational(AVRational q, int b)
+{
+    return av_clip_int16(av_rescale(q.num, b, q.den));
+}
+
+static int write_parameter_block(AVFormatContext *s, const AVIAMFParamDefinition *param)
+{
+    const IAMFMuxContext *const c = s->priv_data;
+    const IAMFContext *const iamf = &c->iamf;
+    uint8_t header[MAX_IAMF_OBU_HEADER_SIZE];
+    IAMFParamDefinition *param_definition = ff_iamf_get_param_definition(iamf, param->parameter_id);
+    PutBitContext pb;
+    AVIOContext *dyn_bc;
+    uint8_t *dyn_buf = NULL;
+    int dyn_size, ret;
+
+    if (param->type > AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN) {
+        av_log(s, AV_LOG_DEBUG, "Ignoring side data with unknown type %u\n",
+               param->type);
+        return 0;
+    }
+
+    if (!param_definition) {
+        av_log(s, AV_LOG_ERROR, "Non-existent Parameter Definition with ID %u referenced by a packet\n",
+               param->parameter_id);
+        return AVERROR(EINVAL);
+    }
+
+    if (param->type != param_definition->param->type) {
+        av_log(s, AV_LOG_ERROR, "Inconsistent values for Parameter Definition "
+                                "with ID %u in a packet\n",
+               param->parameter_id);
+        return AVERROR(EINVAL);
+    }
+
+    ret = avio_open_dyn_buf(&dyn_bc);
+    if (ret < 0)
+        return ret;
+
+    // Sequence Header
+    init_put_bits(&pb, header, sizeof(header));
+    put_bits(&pb, 5, IAMF_OBU_IA_PARAMETER_BLOCK);
+    put_bits(&pb, 3, 0);
+    flush_put_bits(&pb);
+    avio_write(s->pb, header, put_bytes_count(&pb, 1));
+
+    ffio_write_leb(dyn_bc, param->parameter_id);
+    if (!param_definition->mode) {
+        ffio_write_leb(dyn_bc, param->duration);
+        ffio_write_leb(dyn_bc, param->constant_subblock_duration);
+        if (param->constant_subblock_duration == 0)
+            ffio_write_leb(dyn_bc, param->nb_subblocks);
+    }
+
+    for (int i = 0; i < param->nb_subblocks; i++) {
+        const void *subblock = av_iamf_param_definition_get_subblock(param, i);
+
+        switch (param->type) {
+        case AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN: {
+            const AVIAMFMixGain *mix = subblock;
+            if (!param_definition->mode && param->constant_subblock_duration == 0)
+                ffio_write_leb(dyn_bc, mix->subblock_duration);
+
+            ffio_write_leb(dyn_bc, mix->animation_type);
+
+            avio_wb16(dyn_bc, rescale_rational(mix->start_point_value, 1 << 8));
+            if (mix->animation_type >= AV_IAMF_ANIMATION_TYPE_LINEAR)
+                avio_wb16(dyn_bc, rescale_rational(mix->end_point_value, 1 << 8));
+            if (mix->animation_type == AV_IAMF_ANIMATION_TYPE_BEZIER) {
+                avio_wb16(dyn_bc, rescale_rational(mix->control_point_value, 1 << 8));
+                avio_w8(dyn_bc, av_clip_uint8(av_rescale(mix->control_point_relative_time.num, 1 << 8,
+                                                         mix->control_point_relative_time.den)));
+            }
+            break;
+        }
+        case AV_IAMF_PARAMETER_DEFINITION_DEMIXING: {
+            const AVIAMFDemixingInfo *demix = subblock;
+            if (!param_definition->mode && param->constant_subblock_duration == 0)
+                ffio_write_leb(dyn_bc, demix->subblock_duration);
+
+            avio_w8(dyn_bc, demix->dmixp_mode << 5);
+            break;
+        }
+        case AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN: {
+            const AVIAMFReconGain *recon = subblock;
+            const AVIAMFAudioElement *audio_element = param_definition->audio_element->element;
+
+            if (!param_definition->mode && param->constant_subblock_duration == 0)
+                ffio_write_leb(dyn_bc, recon->subblock_duration);
+
+            if (!audio_element) {
+                av_log(s, AV_LOG_ERROR, "Invalid Parameter Definition with ID %u referenced by a packet\n", param->parameter_id);
+                return AVERROR(EINVAL);
+            }
+
+            for (int j = 0; j < audio_element->nb_layers; j++) {
+                const AVIAMFLayer *layer = audio_element->layers[j];
+
+                if (layer->flags & AV_IAMF_LAYER_FLAG_RECON_GAIN) {
+                    unsigned int recon_gain_flags = 0;
+                    int k = 0;
+
+                    for (; k < 7; k++)
+                        recon_gain_flags |= (1 << k) * !!recon->recon_gain[j][k];
+                    for (; k < 12; k++)
+                        recon_gain_flags |= (2 << k) * !!recon->recon_gain[j][k];
+                    if (recon_gain_flags >> 8)
+                        recon_gain_flags |= (1 << k);
+
+                    ffio_write_leb(dyn_bc, recon_gain_flags);
+                    for (k = 0; k < 12; k++) {
+                        if (recon->recon_gain[j][k])
+                            avio_w8(dyn_bc, recon->recon_gain[j][k]);
+                    }
+                }
+            }
+            break;
+        }
+        default:
+            av_assert0(0);
+        }
+    }
+
+    dyn_size = avio_close_dyn_buf(dyn_bc, &dyn_buf);
+    ffio_write_leb(s->pb, dyn_size);
+    avio_write(s->pb, dyn_buf, dyn_size);
+    av_free(dyn_buf);
+
+    return 0;
+}
+
+static int iamf_write_packet(AVFormatContext *s, AVPacket *pkt)
+{
+    const IAMFMuxContext *const c = s->priv_data;
+    AVStream *st = s->streams[pkt->stream_index];
+    uint8_t header[MAX_IAMF_OBU_HEADER_SIZE];
+    PutBitContext pb;
+    AVIOContext *dyn_bc;
+    uint8_t *side_data, *dyn_buf = NULL;
+    unsigned int skip_samples = 0, discard_padding = 0;
+    size_t side_data_size;
+    int dyn_size, type = st->id <= 17 ? st->id + IAMF_OBU_IA_AUDIO_FRAME_ID0 : IAMF_OBU_IA_AUDIO_FRAME;
+    int ret;
+
+    if (s->nb_stream_groups && st->id == c->first_stream_id) {
+        AVIAMFParamDefinition *mix =
+            (AVIAMFParamDefinition *)av_packet_get_side_data(pkt, AV_PKT_DATA_IAMF_MIX_GAIN_PARAM, NULL);
+        AVIAMFParamDefinition *demix =
+            (AVIAMFParamDefinition *)av_packet_get_side_data(pkt, AV_PKT_DATA_IAMF_DEMIXING_INFO_PARAM, NULL);
+        AVIAMFParamDefinition *recon =
+            (AVIAMFParamDefinition *)av_packet_get_side_data(pkt, AV_PKT_DATA_IAMF_RECON_GAIN_INFO_PARAM, NULL);
+
+        if (mix) {
+            ret = write_parameter_block(s, mix);
+            if (ret < 0)
+               return ret;
+        }
+        if (demix) {
+            ret = write_parameter_block(s, demix);
+            if (ret < 0)
+               return ret;
+        }
+        if (recon) {
+            ret = write_parameter_block(s, recon);
+            if (ret < 0)
+               return ret;
+        }
+    }
+    side_data = av_packet_get_side_data(pkt, AV_PKT_DATA_SKIP_SAMPLES,
+                                        &side_data_size);
+
+    if (side_data && side_data_size >= 10) {
+        skip_samples = AV_RL32(side_data);
+        discard_padding = AV_RL32(side_data + 4);
+    }
+
+    ret = avio_open_dyn_buf(&dyn_bc);
+    if (ret < 0)
+        return ret;
+
+    init_put_bits(&pb, header, sizeof(header));
+    put_bits(&pb, 5, type);
+    put_bits(&pb, 1, 0); // obu_redundant_copy
+    put_bits(&pb, 1, skip_samples || discard_padding);
+    put_bits(&pb, 1, 0); // obu_extension_flag
+    flush_put_bits(&pb);
+    avio_write(s->pb, header, put_bytes_count(&pb, 1));
+
+    if (skip_samples || discard_padding) {
+        ffio_write_leb(dyn_bc, discard_padding);
+        ffio_write_leb(dyn_bc, skip_samples);
+    }
+
+    if (st->id > 17)
+        ffio_write_leb(dyn_bc, st->id);
+
+    dyn_size = avio_close_dyn_buf(dyn_bc, &dyn_buf);
+    ffio_write_leb(s->pb, dyn_size + pkt->size);
+    avio_write(s->pb, dyn_buf, dyn_size);
+    av_free(dyn_buf);
+    avio_write(s->pb, pkt->data, pkt->size);
+
+    return 0;
+}
+
+static void iamf_deinit(AVFormatContext *s)
+{
+    IAMFMuxContext *const c = s->priv_data;
+    IAMFContext *const iamf = &c->iamf;
+
+    for (int i = 0; i < iamf->nb_audio_elements; i++) {
+        IAMFAudioElement *audio_element = iamf->audio_elements[i];
+        audio_element->element = NULL;
+    }
+
+    for (int i = 0; i < iamf->nb_mix_presentations; i++) {
+        IAMFMixPresentation *mix_presentation = iamf->mix_presentations[i];
+        mix_presentation->mix = NULL;
+    }
+
+    ff_iamf_uninit_context(iamf);
+
+    return;
+}
+
+static const AVCodecTag iamf_codec_tags[] = {
+    { AV_CODEC_ID_AAC,       MKTAG('m','p','4','a') },
+    { AV_CODEC_ID_FLAC,      MKTAG('f','L','a','C') },
+    { AV_CODEC_ID_OPUS,      MKTAG('O','p','u','s') },
+    { AV_CODEC_ID_PCM_S16LE, MKTAG('i','p','c','m') },
+    { AV_CODEC_ID_PCM_S16BE, MKTAG('i','p','c','m') },
+    { AV_CODEC_ID_PCM_S24LE, MKTAG('i','p','c','m') },
+    { AV_CODEC_ID_PCM_S24BE, MKTAG('i','p','c','m') },
+    { AV_CODEC_ID_PCM_S32LE, MKTAG('i','p','c','m') },
+    { AV_CODEC_ID_PCM_S32BE, MKTAG('i','p','c','m') },
+    { AV_CODEC_ID_NONE,      MKTAG('i','p','c','m') }
+};
+
+const FFOutputFormat ff_iamf_muxer = {
+    .p.name            = "iamf",
+    .p.long_name       = NULL_IF_CONFIG_SMALL("Raw Immersive Audio Model and Formats"),
+    .p.extensions      = "iamf",
+    .priv_data_size    = sizeof(IAMFMuxContext),
+    .p.audio_codec     = AV_CODEC_ID_OPUS,
+    .init              = iamf_init,
+    .deinit            = iamf_deinit,
+    .write_header      = iamf_write_header,
+    .write_packet      = iamf_write_packet,
+    .p.codec_tag       = (const AVCodecTag* const []){ iamf_codec_tags, NULL },
+    .p.flags           = AVFMT_GLOBALHEADER | AVFMT_NOTIMESTAMPS,
+};
-- 
2.43.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply	[flat|nested] 16+ messages in thread
* [FFmpeg-devel] [PATCH 3/8] ffmpeg: add support for muxing AVStreamGroups
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 3/8] ffmpeg: add support for muxing AVStreamGroups James Almer
@ 2023-12-15 21:28   ` James Almer
  0 siblings, 0 replies; 16+ messages in thread
From: James Almer @ 2023-12-15 21:28 UTC (permalink / raw)
  To: ffmpeg-devel
Starting with IAMF support.
Signed-off-by: James Almer <jamrial@gmail.com>
---
 doc/ffmpeg.texi           | 200 ++++++++++++++++++++++
 fftools/ffmpeg.h          |   2 +
 fftools/ffmpeg_mux_init.c | 342 ++++++++++++++++++++++++++++++++++++++
 fftools/ffmpeg_opt.c      |   2 +
 4 files changed, 546 insertions(+)
diff --git a/doc/ffmpeg.texi b/doc/ffmpeg.texi
index c503963941..1fadb20686 100644
--- a/doc/ffmpeg.texi
+++ b/doc/ffmpeg.texi
@@ -623,6 +623,206 @@ Not all muxers support embedded thumbnails, and those who do, only support a few
 Creates a program with the specified @var{title}, @var{program_num} and adds the specified
 @var{stream}(s) to it.
 
+@item -stream_group type=@var{type}:st=@var{stream}[:st=@var{stream}][:stg=@var{stream_group}][:id=@var{stream_group_id}...] (@emph{output})
+
+Creates a stream group of the specified @var{type}, @var{stream_group_id} and adds the specified
+@var{stream}(s) and/or previously defined @var{stream_group}(s) to it.
+
+@var{type} can be one of the following:
+@table @option
+
+@item iamf_audio_element
+Groups @var{stream}s that belong to the same IAMF Audio Element
+
+For this group @var{type}, the following options are available
+@table @option
+@item audio_element_type
+The Audio Element type. The following values are supported:
+
+@table @option
+@item channel
+Scalable channel audio representation
+@item scene
+Ambisonics representation
+@end table
+
+@item demixing
+Demixing information used to reconstruct a scalable channel audio representation.
+This option must be separated from the rest with a ',', and takes the following
+key=value options
+
+@table @option
+@item parameter_id
+An identifier parameters blocks in frames may refer to
+@item dmixp_mode
+A pre-defined combination of demixing parameters
+@end table
+
+@item recon_gain
+Recon gain information used to reconstruct a scalable channel audio representation.
+This option must be separated from the rest with a ',', and takes the following
+key=value options
+
+@table @option
+@item parameter_id
+An identifier parameters blocks in frames may refer to
+@end table
+
+@item layer
+A layer defining a Channel Layout in the Audio Element.
+This option must be separated from the rest with a ','. Several ',' separated entries
+can be defined, and at least one must be set.
+
+It takes the following ":"-separated key=value options
+
+@table @option
+@item ch_layout
+The layer's channel layout
+@item flags
+The following flags are available:
+
+@table @option
+@item recon_gain
+Wether to signal if recon_gain is present as metadata in parameter blocks within frames
+@end table
+
+@item output_gain
+@item output_gain_flags
+Which channels output_gain applies to. The following flags are available:
+
+@table @option
+@item FL
+@item FR
+@item BL
+@item BR
+@item TFL
+@item TFR
+@end table
+
+@item ambisonics_mode
+The ambisonics mode. This has no effect if audio_element_type is set to channel.
+
+The following values are supported:
+
+@table @option
+@item mono
+Each ambisonics channel is coded as an individual mono stream in the group
+@end table
+
+@end table
+
+@item default_w
+Default weight value
+
+@end table
+
+@item iamf_mix_presentation
+Groups @var{stream}s that belong to all IAMF Audio Element the same
+IAMF Mix Presentation references
+
+For this group @var{type}, the following options are available
+
+@table @option
+@item submix
+A sub-mix within the Mix Presentation.
+This option must be separated from the rest with a ','. Several ',' separated entries
+can be defined, and at least one must be set.
+
+It takes the following ":"-separated key=value options
+
+@table @option
+@item parameter_id
+An identifier parameters blocks in frames may refer to, for post-processing the mixed
+audio signal to generate the audio signal for playback
+@item parameter_rate
+The sample rate duration fields in parameters blocks in frames that refer to this
+@var{parameter_id} are expressed as
+@item default_mix_gain
+Default mix gain value to apply when there are no parameter blocks sharing the same
+@var{parameter_id} for a given frame
+
+@item element
+References an Audio Element used in this Mix Presentation to generate the final output
+audio signal for playback.
+This option must be separated from the rest with a '|'. Several '|' separated entries
+can be defined, and at least one must be set.
+
+It takes the following ":"-separated key=value options:
+
+@table @option
+@item stg
+The @var{stream_group_id} for an Audio Element which this sub-mix refers to
+@item parameter_id
+An identifier parameters blocks in frames may refer to, for applying any processing to
+the referenced and rendered Audio Element before being summed with other processed Audio
+Elements
+@item parameter_rate
+The sample rate duration fields in parameters blocks in frames that refer to this
+@var{parameter_id} are expressed as
+@item default_mix_gain
+Default mix gain value to apply when there are no parameter blocks sharing the same
+@var{parameter_id} for a given frame
+@item annotations
+A key=value string describing the sub-mix element where "key" is a string conforming to
+BCP-47 that specifies the language for the "value" string. "key" must be the same as the
+one in the mix's @var{annotations}
+@item headphones_rendering_mode
+Indicates whether the input channel-based Audio Element is rendered to stereo loudspeakers
+or spatialized with a binaural renderer when played back on headphones.
+This has no effect if the referenced Audio Element's @var{audio_element_type} is set to
+channel.
+
+The following values are supported:
+
+@table @option
+@item stereo
+@item binaural
+@end table
+
+@end table
+
+@item layout
+Specifies the layouts for this sub-mix on which the loudness information was measured.
+This option must be separated from the rest with a '|'. Several '|' separated entries
+can be defined, and at least one must be set.
+
+It takes the following ":"-separated key=value options:
+
+@table @option
+@item layout_type
+
+@table @option
+@item loudspeakers
+The layout follows the loudspeaker sound system convention of ITU-2051-3.
+@item binaural
+The layout is binaural.
+@end table
+
+@item sound_system
+Channel layout matching one of Sound Systems A to J of ITU-2051-3, plus 7.1.2 and 3.1.2
+This has no effect if @var{layout_type} is set to binaural.
+@item integrated_loudness
+The program integrated loudness information, as defined in ITU-1770-4.
+@item digital_peak
+The digital (sampled) peak value of the audio signal, as defined in ITU-1770-4.
+@item true_peak
+The true peak of the audio signal, as defined in ITU-1770-4.
+@item dialog_anchored_loudness
+The Dialogue loudness information, as defined in ITU-1770-4.
+@item album_anchored_loudness
+The Album loudness information, as defined in ITU-1770-4.
+@end table
+
+@end table
+
+@item annotations
+A key=value string string describing the mix where "key" is a string conforming to BCP-47
+that specifies the language for the "value" string. "key" must be the same as the ones in
+all sub-mix element's @var{annotations}s
+@end table
+
+@end table
+
 @item -target @var{type} (@emph{output})
 Specify target file type (@code{vcd}, @code{svcd}, @code{dvd}, @code{dv},
 @code{dv50}). @var{type} may be prefixed with @code{pal-}, @code{ntsc-} or
diff --git a/fftools/ffmpeg.h b/fftools/ffmpeg.h
index affa80856a..1169f723d1 100644
--- a/fftools/ffmpeg.h
+++ b/fftools/ffmpeg.h
@@ -281,6 +281,8 @@ typedef struct OptionsContext {
     int        nb_disposition;
     SpecifierOpt *program;
     int        nb_program;
+    SpecifierOpt *stream_groups;
+    int        nb_stream_groups;
     SpecifierOpt *time_bases;
     int        nb_time_bases;
     SpecifierOpt *enc_time_bases;
diff --git a/fftools/ffmpeg_mux_init.c b/fftools/ffmpeg_mux_init.c
index f527a083db..2134b28512 100644
--- a/fftools/ffmpeg_mux_init.c
+++ b/fftools/ffmpeg_mux_init.c
@@ -40,6 +40,7 @@
 #include "libavutil/dict.h"
 #include "libavutil/display.h"
 #include "libavutil/getenv_utf8.h"
+#include "libavutil/iamf.h"
 #include "libavutil/intreadwrite.h"
 #include "libavutil/log.h"
 #include "libavutil/mem.h"
@@ -2008,6 +2009,343 @@ static int setup_sync_queues(Muxer *mux, AVFormatContext *oc, int64_t buf_size_u
     return 0;
 }
 
+static int of_parse_iamf_audio_element_layers(Muxer *mux, AVStreamGroup *stg, char *ptr)
+{
+    AVIAMFAudioElement *audio_element = stg->params.iamf_audio_element;
+    AVDictionary *dict = NULL;
+    const char *token;
+    int ret = 0;
+
+    audio_element->demixing_info =
+        av_iamf_param_definition_alloc(AV_IAMF_PARAMETER_DEFINITION_DEMIXING, 1, NULL);
+    audio_element->recon_gain_info =
+        av_iamf_param_definition_alloc(AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN, 1, NULL);
+
+    if (!audio_element->demixing_info ||
+        !audio_element->recon_gain_info)
+        return AVERROR(ENOMEM);
+
+    /* process manually set layers and parameters */
+    token = av_strtok(NULL, ",", &ptr);
+    while (token) {
+        const AVDictionaryEntry *e;
+        int demixing = 0, recon_gain = 0;
+        int layer = 0;
+
+        if (av_strstart(token, "layer=", &token))
+            layer = 1;
+        else if (av_strstart(token, "demixing=", &token))
+            demixing = 1;
+        else if (av_strstart(token, "recon_gain=", &token))
+            recon_gain = 1;
+
+        av_dict_free(&dict);
+        ret = av_dict_parse_string(&dict, token, "=", ":", 0);
+        if (ret < 0) {
+            av_log(mux, AV_LOG_ERROR, "Error parsing audio element specification %s\n", token);
+            goto fail;
+        }
+
+        if (layer) {
+            AVIAMFLayer *audio_layer = av_iamf_audio_element_add_layer(audio_element);
+            if (!audio_layer) {
+                av_log(mux, AV_LOG_ERROR, "Error adding layer to stream group %d\n", stg->index);
+                ret = AVERROR(ENOMEM);
+                goto fail;
+            }
+            av_opt_set_dict(audio_layer, &dict);
+        } else if (demixing || recon_gain) {
+            AVIAMFParamDefinition *param = demixing ? audio_element->demixing_info
+                                                    : audio_element->recon_gain_info;
+            void *subblock = av_iamf_param_definition_get_subblock(param, 0);
+
+            av_opt_set_dict(param, &dict);
+            av_opt_set_dict(subblock, &dict);
+        }
+
+        // make sure that no entries are left in the dict
+        e = NULL;
+        if (e = av_dict_iterate(dict, e)) {
+            av_log(mux, AV_LOG_FATAL, "Unknown layer key %s.\n", e->key);
+            ret = AVERROR(EINVAL);
+            goto fail;
+        }
+        token = av_strtok(NULL, ",", &ptr);
+    }
+
+fail:
+    av_dict_free(&dict);
+    if (!ret && !audio_element->nb_layers) {
+        av_log(mux, AV_LOG_ERROR, "No layer in audio element specification\n");
+        ret = AVERROR(EINVAL);
+    }
+
+    return ret;
+}
+
+static int of_parse_iamf_submixes(Muxer *mux, AVStreamGroup *stg, char *ptr)
+{
+    AVFormatContext *oc = mux->fc;
+    AVIAMFMixPresentation *mix = stg->params.iamf_mix_presentation;
+    AVDictionary *dict = NULL;
+    const char *token;
+    char *submix_str = NULL;
+    int ret = 0;
+
+    /* process manually set submixes */
+    token = av_strtok(NULL, ",", &ptr);
+    while (token) {
+        AVIAMFSubmix *submix = NULL;
+        const char *subtoken;
+        char *subptr = NULL;
+
+        if (!av_strstart(token, "submix=", &token)) {
+            av_log(mux, AV_LOG_ERROR, "No submix in mix presentation specification \"%s\"\n", token);
+            goto fail;
+        }
+
+        submix_str = av_strdup(token);
+        if (!submix_str)
+            goto fail;
+
+        submix = av_iamf_mix_presentation_add_submix(mix);
+        if (!submix) {
+            av_log(mux, AV_LOG_ERROR, "Error adding submix to stream group %d\n", stg->index);
+            ret = AVERROR(ENOMEM);
+            goto fail;
+        }
+        submix->output_mix_config =
+            av_iamf_param_definition_alloc(AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN, 0, NULL);
+        if (!submix->output_mix_config) {
+            ret = AVERROR(ENOMEM);
+            goto fail;
+        }
+
+        subptr = NULL;
+        subtoken = av_strtok(submix_str, "|", &subptr);
+        while (subtoken) {
+            const AVDictionaryEntry *e;
+            int element = 0, layout = 0;
+
+            if (av_strstart(subtoken, "element=", &subtoken))
+                element = 1;
+            else if (av_strstart(subtoken, "layout=", &subtoken))
+                layout = 1;
+
+            av_dict_free(&dict);
+            ret = av_dict_parse_string(&dict, subtoken, "=", ":", 0);
+            if (ret < 0) {
+                av_log(mux, AV_LOG_ERROR, "Error parsing submix specification \"%s\"\n", subtoken);
+                goto fail;
+            }
+
+            if (element) {
+                AVIAMFSubmixElement *submix_element;
+                int64_t idx = -1;
+
+                if (e = av_dict_get(dict, "stg", NULL, 0))
+                    idx = strtol(e->value, NULL, 0);
+                av_dict_set(&dict, "stg", NULL, 0);
+                if (idx < 0 || idx >= oc->nb_stream_groups - 1 ||
+                    oc->stream_groups[idx]->type != AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT) {
+                    av_log(mux, AV_LOG_ERROR, "Invalid or missing stream group index in "
+                                              "submix element specification \"%s\"\n", subtoken);
+                    ret = AVERROR(EINVAL);
+                    goto fail;
+                }
+                submix_element = av_iamf_submix_add_element(submix);
+                if (!submix_element) {
+                    av_log(mux, AV_LOG_ERROR, "Error adding element to submix\n");
+                    ret = AVERROR(ENOMEM);
+                    goto fail;
+                }
+
+                submix_element->audio_element_id = oc->stream_groups[idx]->id;
+
+                submix_element->element_mix_config =
+                    av_iamf_param_definition_alloc(AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN, 0, NULL);
+                if (!submix_element->element_mix_config)
+                    ret = AVERROR(ENOMEM);
+                av_opt_set_dict2(submix_element, &dict, AV_OPT_SEARCH_CHILDREN);
+            } else if (layout) {
+                AVIAMFSubmixLayout *submix_layout = av_iamf_submix_add_layout(submix);
+                if (!submix_layout) {
+                    av_log(mux, AV_LOG_ERROR, "Error adding layout to submix\n");
+                    ret = AVERROR(ENOMEM);
+                    goto fail;
+                }
+                av_opt_set_dict(submix_layout, &dict);
+            } else
+                av_opt_set_dict2(submix, &dict, AV_OPT_SEARCH_CHILDREN);
+
+            if (ret < 0) {
+                goto fail;
+            }
+
+            // make sure that no entries are left in the dict
+            e = NULL;
+            while (e = av_dict_iterate(dict, e)) {
+                av_log(mux, AV_LOG_FATAL, "Unknown submix key %s.\n", e->key);
+                ret = AVERROR(EINVAL);
+                goto fail;
+            }
+            subtoken = av_strtok(NULL, "|", &subptr);
+        }
+        av_freep(&submix_str);
+
+        if (!submix->nb_elements) {
+            av_log(mux, AV_LOG_ERROR, "No audio elements in submix specification \"%s\"\n", token);
+            ret = AVERROR(EINVAL);
+        }
+        token = av_strtok(NULL, ",", &ptr);
+    }
+
+fail:
+    av_dict_free(&dict);
+    av_free(submix_str);
+
+    return ret;
+}
+
+static int of_parse_group_token(Muxer *mux, const char *token, char *ptr)
+{
+    AVFormatContext *oc = mux->fc;
+    AVStreamGroup *stg;
+    AVDictionary *dict = NULL, *tmp = NULL;
+    const AVDictionaryEntry *e;
+    const AVOption opts[] = {
+        { "type", "Set group type", offsetof(AVStreamGroup, type), AV_OPT_TYPE_INT,
+                { .i64 = 0 }, 0, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "type" },
+            { "iamf_audio_element",    NULL, 0, AV_OPT_TYPE_CONST,
+                { .i64 = AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT },    .unit = "type" },
+            { "iamf_mix_presentation", NULL, 0, AV_OPT_TYPE_CONST,
+                { .i64 = AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION }, .unit = "type" },
+        { NULL },
+    };
+    const AVClass class = {
+        .class_name = "StreamGroupType",
+        .item_name  = av_default_item_name,
+        .option     = opts,
+        .version    = LIBAVUTIL_VERSION_INT,
+    };
+    const AVClass *pclass = &class;
+    int type, ret;
+
+    ret = av_dict_parse_string(&dict, token, "=", ":", AV_DICT_MULTIKEY);
+    if (ret < 0) {
+        av_log(mux, AV_LOG_ERROR, "Error parsing group specification %s\n", token);
+        return ret;
+    }
+
+    // "type" is not a user settable AVOption in AVStreamGroup, so handle it here
+    e = av_dict_get(dict, "type", NULL, 0);
+    if (!e) {
+        av_log(mux, AV_LOG_ERROR, "No type specified for Stream Group in \"%s\"\n", token);
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    ret = av_opt_eval_int(&pclass, opts, e->value, &type);
+    if (!ret && type == AV_STREAM_GROUP_PARAMS_NONE)
+        ret = AVERROR(EINVAL);
+    if (ret < 0) {
+        av_log(mux, AV_LOG_ERROR, "Invalid group type \"%s\"\n", e->value);
+        goto end;
+    }
+
+    av_dict_copy(&tmp, dict, 0);
+    stg = avformat_stream_group_create(oc, type, &tmp);
+    if (!stg) {
+        ret = AVERROR(ENOMEM);
+        goto end;
+    }
+
+    e = NULL;
+    while (e = av_dict_get(dict, "st", e, 0)) {
+        int64_t idx = strtol(e->value, NULL, 0);
+        if (idx < 0 || idx >= oc->nb_streams) {
+            av_log(mux, AV_LOG_ERROR, "Invalid stream index %"PRId64"\n", idx);
+            ret = AVERROR(EINVAL);
+            goto end;
+        }
+        ret = avformat_stream_group_add_stream(stg, oc->streams[idx]);
+        if (ret < 0)
+            goto end;
+    }
+    while (e = av_dict_get(dict, "stg", e, 0)) {
+        int64_t idx = strtol(e->value, NULL, 0);
+        if (idx < 0 || idx >= oc->nb_stream_groups - 1) {
+            av_log(mux, AV_LOG_ERROR, "Invalid stream group index %"PRId64"\n", idx);
+            ret = AVERROR(EINVAL);
+            goto end;
+        }
+        for (unsigned i = 0; i < oc->stream_groups[idx]->nb_streams; i++) {
+            ret = avformat_stream_group_add_stream(stg, oc->stream_groups[idx]->streams[i]);
+            if (ret < 0)
+                goto end;
+        }
+    }
+
+    switch(type) {
+    case AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT:
+        ret = of_parse_iamf_audio_element_layers(mux, stg, ptr);
+        break;
+    case AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION:
+        ret = of_parse_iamf_submixes(mux, stg, ptr);
+        break;
+    default:
+        av_log(mux, AV_LOG_FATAL, "Unknown group type %d.\n", type);
+        ret = AVERROR(EINVAL);
+        break;
+    }
+
+    if (ret < 0)
+        goto end;
+
+    // make sure that nothing but "st" and "stg" entries are left in the dict
+    e = NULL;
+    av_dict_set(&tmp, "type", NULL, 0);
+    while (e = av_dict_iterate(tmp, e)) {
+        if (!strcmp(e->key, "st") || !strcmp(e->key, "stg"))
+            continue;
+
+        av_log(mux, AV_LOG_FATAL, "Unknown group key %s.\n", e->key);
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    ret = 0;
+end:
+    av_dict_free(&dict);
+    av_dict_free(&tmp);
+
+    return ret;
+}
+
+static int of_add_groups(Muxer *mux, const OptionsContext *o)
+{
+    /* process manually set groups */
+    for (int i = 0; i < o->nb_stream_groups; i++) {
+        const char *token;
+        char *str, *ptr = NULL;
+        int ret = 0;
+
+        str = av_strdup(o->stream_groups[i].u.str);
+        if (!str)
+            return ret;
+
+        token = av_strtok(str, ",", &ptr);
+        if (token)
+            ret = of_parse_group_token(mux, token, ptr);
+
+        av_free(str);
+        if (ret < 0)
+            return ret;
+    }
+
+    return 0;
+}
+
 static int of_add_programs(Muxer *mux, const OptionsContext *o)
 {
     AVFormatContext *oc = mux->fc;
@@ -2793,6 +3131,10 @@ int of_open(const OptionsContext *o, const char *filename, Scheduler *sch)
     if (err < 0)
         return err;
 
+    err = of_add_groups(mux, o);
+    if (err < 0)
+        return err;
+
     err = of_add_programs(mux, o);
     if (err < 0)
         return err;
diff --git a/fftools/ffmpeg_opt.c b/fftools/ffmpeg_opt.c
index 6177a96a4e..915f8e3ea0 100644
--- a/fftools/ffmpeg_opt.c
+++ b/fftools/ffmpeg_opt.c
@@ -1493,6 +1493,8 @@ const OptionDef options[] = {
         "add metadata", "string=string" },
     { "program",        HAS_ARG | OPT_STRING | OPT_SPEC | OPT_OUTPUT, { .off = OFFSET(program) },
         "add program with specified streams", "title=string:st=number..." },
+    { "stream_group",        HAS_ARG | OPT_STRING | OPT_SPEC | OPT_OUTPUT, { .off = OFFSET(stream_groups) },
+        "add stream group with specified streams and group type-specific arguments", "id=number:st=number..." },
     { "dframes",        HAS_ARG | OPT_PERFILE | OPT_EXPERT |
                         OPT_OUTPUT,                                  { .func_arg = opt_data_frames },
         "set the number of data frames to output", "number" },
-- 
2.43.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/8] avutil: introduce an Immersive Audio Model and Formats API
  2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 1/8] avutil: introduce an Immersive Audio Model and Formats API James Almer
@ 2023-12-18 11:04   ` Anton Khirnov
  2023-12-18 18:10     ` James Almer
  0 siblings, 1 reply; 16+ messages in thread
From: Anton Khirnov @ 2023-12-18 11:04 UTC (permalink / raw)
  To: FFmpeg development discussions and patches
Quoting James Almer (2023-12-14 21:14:26)
> +/**
> + * Mix Gain Parameter Data as defined in section 3.8.1 of IAMF.
> + */
> +typedef struct AVIAMFMixGain {
> +    const AVClass *av_class;
> +
> +    /**
> +     * Duration for the given subblock. It must not be 0.
In what units? Same for all durations in this patch.
> +typedef struct AVIAMFParamDefinition {
> +    const AVClass *av_class;
> +
> +    /**
> +     * Offset in bytes from the start of this struct, at which the subblocks
> +     * array is located.
> +     */
> +    size_t subblocks_offset;
> +    /**
> +     * Size in bytes of each element in the subblocks array.
> +     */
> +    size_t subblock_size;
> +    /**
> +     * Number of subblocks in the array.
> +     *
> +     * Must be 0 if @ref constant_subblock_duration is not 0.
> +     */
> +    unsigned int nb_subblocks;
> +
> +    /**
> +     * Parameters type. Determines the type of the subblock elements.
> +     */
> +    enum AVIAMFParamDefinitionType type;
> +
> +    /**
> +     * Identifier for the paremeter substream.
> +     */
> +    unsigned int parameter_id;
> +    /**
> +     * Sample rate for the paremeter substream. It must not be 0.
> +     */
> +    unsigned int parameter_rate;
> +
> +    /**
> +     * The duration of the all subblocks in this parameter definition.
> +     *
> +     * May be 0, in which case all duration values should be specified in
> +     * another parameter definition referencing the same parameter_id.
> +     */
> +    unsigned int duration;
> +    /**
> +     * The duration of every subblock in the case where all subblocks, with
> +     * the optional exception of the last subblock, have equal durations.
> +     *
> +     * Must be 0 if subblocks have different durations.
> +     */
> +    unsigned int constant_subblock_duration;
This also seems like should be a flags field.
Otherwise looks good.
-- 
Anton Khirnov
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/8] avutil: introduce an Immersive Audio Model and Formats API
  2023-12-18 11:04   ` Anton Khirnov
@ 2023-12-18 18:10     ` James Almer
  0 siblings, 0 replies; 16+ messages in thread
From: James Almer @ 2023-12-18 18:10 UTC (permalink / raw)
  To: ffmpeg-devel
On 12/18/2023 8:04 AM, Anton Khirnov wrote:
> Quoting James Almer (2023-12-14 21:14:26)
>> +/**
>> + * Mix Gain Parameter Data as defined in section 3.8.1 of IAMF.
>> + */
>> +typedef struct AVIAMFMixGain {
>> +    const AVClass *av_class;
>> +
>> +    /**
>> +     * Duration for the given subblock. It must not be 0.
> 
> In what units? Same for all durations in this patch.
parameter_rate. Amended.
> 
>> +typedef struct AVIAMFParamDefinition {
>> +    const AVClass *av_class;
>> +
>> +    /**
>> +     * Offset in bytes from the start of this struct, at which the subblocks
>> +     * array is located.
>> +     */
>> +    size_t subblocks_offset;
>> +    /**
>> +     * Size in bytes of each element in the subblocks array.
>> +     */
>> +    size_t subblock_size;
>> +    /**
>> +     * Number of subblocks in the array.
>> +     *
>> +     * Must be 0 if @ref constant_subblock_duration is not 0.
Removed this line as it's bogus.
>> +     */
>> +    unsigned int nb_subblocks;
>> +
>> +    /**
>> +     * Parameters type. Determines the type of the subblock elements.
>> +     */
>> +    enum AVIAMFParamDefinitionType type;
>> +
>> +    /**
>> +     * Identifier for the paremeter substream.
>> +     */
>> +    unsigned int parameter_id;
>> +    /**
>> +     * Sample rate for the paremeter substream. It must not be 0.
>> +     */
>> +    unsigned int parameter_rate;
>> +
>> +    /**
>> +     * The duration of the all subblocks in this parameter definition.
>> +     *
>> +     * May be 0, in which case all duration values should be specified in
>> +     * another parameter definition referencing the same parameter_id.
>> +     */
>> +    unsigned int duration;
>> +    /**
>> +     * The duration of every subblock in the case where all subblocks, with
>> +     * the optional exception of the last subblock, have equal durations.
>> +     *
>> +     * Must be 0 if subblocks have different durations.
>> +     */
>> +    unsigned int constant_subblock_duration;
> 
> This also seems like should be a flags field.
No, duration and subblock duration are not the same thing. The former is 
the accumulated duration of all subblocks in a given parameter 
definition. subblock durations can be smaller, and only if they are 
constant will constant_subblock_duration be set to a value other than 0.
> 
> Otherwise looks good.
> 
Pushed. Thanks a lot for looking at it.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [FFmpeg-devel] [PATCH 3/8] ffmpeg: add support for muxing AVStreamGroups
  2023-12-11 12:46     ` James Almer
@ 2023-12-12 11:26       ` Anton Khirnov
  0 siblings, 0 replies; 16+ messages in thread
From: Anton Khirnov @ 2023-12-12 11:26 UTC (permalink / raw)
  To: FFmpeg development discussions and patches
Quoting James Almer (2023-12-11 13:46:36)
> AVStreamGroup.type is not setteable through AVOptions, but it of course 
> needs to be supported by the CLI. So i catch it and remove it from the 
> dict that will be used for avformat_stream_group_create().
> 
> I can change the comment to "'type' is not a user settable AVOption".
That seems better, thanks.
> > 3) Print the string, not the index.
> > 
> >> +                ret = AVERROR(EINVAL);
> >> +                goto end;
> >> +            }
> >> +
> >> +            ret = av_opt_eval_int(&pclass, opts, e->value, &type);
> >> +            if (ret < 0 || type == AV_STREAM_GROUP_PARAMS_NONE) {
> >> +                av_log(mux, AV_LOG_ERROR, "Invalid group type \"%s\"\n", e->value);
> >> +                goto end;
> >> +            }
> >> +
> >> +            av_dict_copy(&tmp, dict, 0);
> >> +            stg = avformat_stream_group_create(oc, type, &tmp);
> >> +            if (!stg) {
> >> +                ret = AVERROR(ENOMEM);
> >> +                goto end;
> >> +            }
> >> +            av_dict_set(&tmp, "type", NULL, 0);
> >> +
> >> +            e = NULL;
> >> +            while (e = av_dict_get(dict, "st", e, 0)) {
> >> +                unsigned int idx = strtol(e->value, NULL, 0);
> >> +                if (idx >= oc->nb_streams) {
> >> +                    av_log(mux, AV_LOG_ERROR, "Invalid stream index %d\n", idx);
> >> +                    ret = AVERROR(EINVAL);
> >> +                    goto end;
> >> +                }
> > 
> > This block seems confused about signedness of e->value.
> 
> You mean change %d to %u?
I mean strtol will parse the string into a signed number, then you
assign the result into unsigned, and print it as signed. It's probably
more user-friendly to keep parsing it as signed, and add a check for
idx >= 0.
-- 
Anton Khirnov
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [FFmpeg-devel] [PATCH 3/8] ffmpeg: add support for muxing AVStreamGroups
  2023-12-11 11:48   ` Anton Khirnov
@ 2023-12-11 12:46     ` James Almer
  2023-12-12 11:26       ` Anton Khirnov
  0 siblings, 1 reply; 16+ messages in thread
From: James Almer @ 2023-12-11 12:46 UTC (permalink / raw)
  To: ffmpeg-devel
On 12/11/2023 8:48 AM, Anton Khirnov wrote:
> Quoting James Almer (2023-12-05 23:43:57)
>> Starting with IAMF support.
>>
>> Signed-off-by: James Almer <jamrial@gmail.com>
>> ---
>>   fftools/ffmpeg.h          |   2 +
>>   fftools/ffmpeg_mux_init.c | 335 ++++++++++++++++++++++++++++++++++++++
>>   fftools/ffmpeg_opt.c      |   2 +
>>   3 files changed, 339 insertions(+)
> 
> Missing documentation.
Will do.
> 
>> +static int of_add_groups(Muxer *mux, const OptionsContext *o)
>> +{
>> +    AVFormatContext *oc = mux->fc;
>> +    int ret;
>> +
>> +    /* process manually set groups */
>> +    for (int i = 0; i < o->nb_stream_groups; i++) {
>> +        AVDictionary *dict = NULL, *tmp = NULL;
>> +        const AVDictionaryEntry *e;
>> +        AVStreamGroup *stg = NULL;
>> +        int type;
>> +        const char *token;
>> +        char *str, *ptr = NULL;
>> +        const AVOption opts[] = {
>> +            { "type", "Set group type", offsetof(AVStreamGroup, type), AV_OPT_TYPE_INT,
>> +                    { .i64 = 0 }, 0, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "type" },
>> +                { "iamf_audio_element",    NULL, 0, AV_OPT_TYPE_CONST,
>> +                    { .i64 = AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT },    .unit = "type" },
>> +                { "iamf_mix_presentation", NULL, 0, AV_OPT_TYPE_CONST,
>> +                    { .i64 = AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION }, .unit = "type" },
>> +            { NULL },
>> +        };
>> +         const AVClass class = {
>> +            .class_name = "StreamGroupType",
>> +            .item_name  = av_default_item_name,
>> +            .option     = opts,
>> +            .version    = LIBAVUTIL_VERSION_INT,
>> +        };
>> +        const AVClass *pclass = &class;
>> +
>> +        str = av_strdup(o->stream_groups[i].u.str);
>> +        if (!str)
>> +            goto end;
>> +
>> +        token = av_strtok(str, ",", &ptr);
>> +        if (token) {
> 
> Too many indentation levels, move this whole block into a separate
> function.
> 
>> +            ret = av_dict_parse_string(&dict, token, "=", ":", AV_DICT_MULTIKEY);
>> +            if (ret < 0) {
>> +                av_log(mux, AV_LOG_ERROR, "Error parsing group specification %s\n", token);
>> +                goto end;
>> +            }
>> +
>> +            // "type" is not a user settable option in AVStreamGroup
> 
> This comment confuses me.
AVStreamGroup.type is not setteable through AVOptions, but it of course 
needs to be supported by the CLI. So i catch it and remove it from the 
dict that will be used for avformat_stream_group_create().
I can change the comment to "'type' is not a user settable AVOption".
> 
>> +            e = av_dict_get(dict, "type", NULL, 0);
>> +            if (!e) {
>> +                av_log(mux, AV_LOG_ERROR, "No type define for Steam Group %d\n", i);
> 
> 1) Steam
> 2) defined? Or maybe specified.
Will change to specified.
> 3) Print the string, not the index.
> 
>> +                ret = AVERROR(EINVAL);
>> +                goto end;
>> +            }
>> +
>> +            ret = av_opt_eval_int(&pclass, opts, e->value, &type);
>> +            if (ret < 0 || type == AV_STREAM_GROUP_PARAMS_NONE) {
>> +                av_log(mux, AV_LOG_ERROR, "Invalid group type \"%s\"\n", e->value);
>> +                goto end;
>> +            }
>> +
>> +            av_dict_copy(&tmp, dict, 0);
>> +            stg = avformat_stream_group_create(oc, type, &tmp);
>> +            if (!stg) {
>> +                ret = AVERROR(ENOMEM);
>> +                goto end;
>> +            }
>> +            av_dict_set(&tmp, "type", NULL, 0);
>> +
>> +            e = NULL;
>> +            while (e = av_dict_get(dict, "st", e, 0)) {
>> +                unsigned int idx = strtol(e->value, NULL, 0);
>> +                if (idx >= oc->nb_streams) {
>> +                    av_log(mux, AV_LOG_ERROR, "Invalid stream index %d\n", idx);
>> +                    ret = AVERROR(EINVAL);
>> +                    goto end;
>> +                }
> 
> This block seems confused about signedness of e->value.
You mean change %d to %u?
> 
>> +                avformat_stream_group_add_stream(stg, oc->streams[idx]);
> 
> Unchecked return value.
> 
> 
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [FFmpeg-devel] [PATCH 3/8] ffmpeg: add support for muxing AVStreamGroups
  2023-12-05 22:43 ` [FFmpeg-devel] [PATCH 3/8] ffmpeg: add support for muxing AVStreamGroups James Almer
@ 2023-12-11 11:48   ` Anton Khirnov
  2023-12-11 12:46     ` James Almer
  0 siblings, 1 reply; 16+ messages in thread
From: Anton Khirnov @ 2023-12-11 11:48 UTC (permalink / raw)
  To: FFmpeg development discussions and patches
Quoting James Almer (2023-12-05 23:43:57)
> Starting with IAMF support.
> 
> Signed-off-by: James Almer <jamrial@gmail.com>
> ---
>  fftools/ffmpeg.h          |   2 +
>  fftools/ffmpeg_mux_init.c | 335 ++++++++++++++++++++++++++++++++++++++
>  fftools/ffmpeg_opt.c      |   2 +
>  3 files changed, 339 insertions(+)
Missing documentation.
> +static int of_add_groups(Muxer *mux, const OptionsContext *o)
> +{
> +    AVFormatContext *oc = mux->fc;
> +    int ret;
> +
> +    /* process manually set groups */
> +    for (int i = 0; i < o->nb_stream_groups; i++) {
> +        AVDictionary *dict = NULL, *tmp = NULL;
> +        const AVDictionaryEntry *e;
> +        AVStreamGroup *stg = NULL;
> +        int type;
> +        const char *token;
> +        char *str, *ptr = NULL;
> +        const AVOption opts[] = {
> +            { "type", "Set group type", offsetof(AVStreamGroup, type), AV_OPT_TYPE_INT,
> +                    { .i64 = 0 }, 0, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "type" },
> +                { "iamf_audio_element",    NULL, 0, AV_OPT_TYPE_CONST,
> +                    { .i64 = AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT },    .unit = "type" },
> +                { "iamf_mix_presentation", NULL, 0, AV_OPT_TYPE_CONST,
> +                    { .i64 = AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION }, .unit = "type" },
> +            { NULL },
> +        };
> +         const AVClass class = {
> +            .class_name = "StreamGroupType",
> +            .item_name  = av_default_item_name,
> +            .option     = opts,
> +            .version    = LIBAVUTIL_VERSION_INT,
> +        };
> +        const AVClass *pclass = &class;
> +
> +        str = av_strdup(o->stream_groups[i].u.str);
> +        if (!str)
> +            goto end;
> +
> +        token = av_strtok(str, ",", &ptr);
> +        if (token) {
Too many indentation levels, move this whole block into a separate
function.
> +            ret = av_dict_parse_string(&dict, token, "=", ":", AV_DICT_MULTIKEY);
> +            if (ret < 0) {
> +                av_log(mux, AV_LOG_ERROR, "Error parsing group specification %s\n", token);
> +                goto end;
> +            }
> +
> +            // "type" is not a user settable option in AVStreamGroup
This comment confuses me.
> +            e = av_dict_get(dict, "type", NULL, 0);
> +            if (!e) {
> +                av_log(mux, AV_LOG_ERROR, "No type define for Steam Group %d\n", i);
1) Steam
2) defined? Or maybe specified.
3) Print the string, not the index.
> +                ret = AVERROR(EINVAL);
> +                goto end;
> +            }
> +
> +            ret = av_opt_eval_int(&pclass, opts, e->value, &type);
> +            if (ret < 0 || type == AV_STREAM_GROUP_PARAMS_NONE) {
> +                av_log(mux, AV_LOG_ERROR, "Invalid group type \"%s\"\n", e->value);
> +                goto end;
> +            }
> +
> +            av_dict_copy(&tmp, dict, 0);
> +            stg = avformat_stream_group_create(oc, type, &tmp);
> +            if (!stg) {
> +                ret = AVERROR(ENOMEM);
> +                goto end;
> +            }
> +            av_dict_set(&tmp, "type", NULL, 0);
> +
> +            e = NULL;
> +            while (e = av_dict_get(dict, "st", e, 0)) {
> +                unsigned int idx = strtol(e->value, NULL, 0);
> +                if (idx >= oc->nb_streams) {
> +                    av_log(mux, AV_LOG_ERROR, "Invalid stream index %d\n", idx);
> +                    ret = AVERROR(EINVAL);
> +                    goto end;
> +                }
This block seems confused about signedness of e->value.
> +                avformat_stream_group_add_stream(stg, oc->streams[idx]);
Unchecked return value.
-- 
Anton Khirnov
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply	[flat|nested] 16+ messages in thread
* [FFmpeg-devel] [PATCH 3/8] ffmpeg: add support for muxing AVStreamGroups
  2023-12-05 22:43 [FFmpeg-devel] [PATCH v6 0/8] avformat: introduce AVStreamGroup James Almer
@ 2023-12-05 22:43 ` James Almer
  2023-12-11 11:48   ` Anton Khirnov
  0 siblings, 1 reply; 16+ messages in thread
From: James Almer @ 2023-12-05 22:43 UTC (permalink / raw)
  To: ffmpeg-devel
Starting with IAMF support.
Signed-off-by: James Almer <jamrial@gmail.com>
---
 fftools/ffmpeg.h          |   2 +
 fftools/ffmpeg_mux_init.c | 335 ++++++++++++++++++++++++++++++++++++++
 fftools/ffmpeg_opt.c      |   2 +
 3 files changed, 339 insertions(+)
diff --git a/fftools/ffmpeg.h b/fftools/ffmpeg.h
index 41935d39d5..057535adbb 100644
--- a/fftools/ffmpeg.h
+++ b/fftools/ffmpeg.h
@@ -262,6 +262,8 @@ typedef struct OptionsContext {
     int        nb_disposition;
     SpecifierOpt *program;
     int        nb_program;
+    SpecifierOpt *stream_groups;
+    int        nb_stream_groups;
     SpecifierOpt *time_bases;
     int        nb_time_bases;
     SpecifierOpt *enc_time_bases;
diff --git a/fftools/ffmpeg_mux_init.c b/fftools/ffmpeg_mux_init.c
index 63a25a350f..7648f2a2f1 100644
--- a/fftools/ffmpeg_mux_init.c
+++ b/fftools/ffmpeg_mux_init.c
@@ -39,6 +39,7 @@
 #include "libavutil/dict.h"
 #include "libavutil/display.h"
 #include "libavutil/getenv_utf8.h"
+#include "libavutil/iamf.h"
 #include "libavutil/intreadwrite.h"
 #include "libavutil/log.h"
 #include "libavutil/mem.h"
@@ -1943,6 +1944,336 @@ static int setup_sync_queues(Muxer *mux, AVFormatContext *oc, int64_t buf_size_u
     return 0;
 }
 
+static int of_parse_iamf_audio_element_layers(Muxer *mux, AVStreamGroup *stg, char **ptr)
+{
+    AVIAMFAudioElement *audio_element = stg->params.iamf_audio_element;
+    AVDictionary *dict = NULL;
+    const char *token;
+    int ret = 0;
+
+    audio_element->demixing_info =
+        av_iamf_param_definition_alloc(AV_IAMF_PARAMETER_DEFINITION_DEMIXING, 1, NULL);
+    audio_element->recon_gain_info =
+        av_iamf_param_definition_alloc(AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN, 1, NULL);
+
+    if (!audio_element->demixing_info ||
+        !audio_element->recon_gain_info)
+        return AVERROR(ENOMEM);
+
+    /* process manually set layers and parameters */
+    token = av_strtok(NULL, ",", ptr);
+    while (token) {
+        const AVDictionaryEntry *e;
+        int demixing = 0, recon_gain = 0;
+        int layer = 0;
+
+        if (av_strstart(token, "layer=", &token))
+            layer = 1;
+        else if (av_strstart(token, "demixing=", &token))
+            demixing = 1;
+        else if (av_strstart(token, "recon_gain=", &token))
+            recon_gain = 1;
+
+        av_dict_free(&dict);
+        ret = av_dict_parse_string(&dict, token, "=", ":", 0);
+        if (ret < 0) {
+            av_log(mux, AV_LOG_ERROR, "Error parsing audio element specification %s\n", token);
+            goto fail;
+        }
+
+        if (layer) {
+            AVIAMFLayer *audio_layer = av_iamf_audio_element_add_layer(audio_element);
+            if (!audio_layer) {
+                av_log(mux, AV_LOG_ERROR, "Error adding layer to stream group %d\n", stg->index);
+                ret = AVERROR(ENOMEM);
+                goto fail;
+            }
+            av_opt_set_dict(audio_layer, &dict);
+        } else if (demixing || recon_gain) {
+            AVIAMFParamDefinition *param = demixing ? audio_element->demixing_info
+                                                    : audio_element->recon_gain_info;
+            void *subblock = av_iamf_param_definition_get_subblock(param, 0);
+
+            av_opt_set_dict(param, &dict);
+            av_opt_set_dict(subblock, &dict);
+
+            /* Hardcode spec parameters */
+            param->param_definition_mode = 0;
+            param->parameter_rate = stg->streams[0]->codecpar->sample_rate;
+            param->duration =
+            param->constant_subblock_duration = stg->streams[0]->codecpar->frame_size;
+        }
+
+        // make sure that no entries are left in the dict
+        e = NULL;
+        if (e = av_dict_iterate(dict, e)) {
+            av_log(mux, AV_LOG_FATAL, "Unknown layer key %s.\n", e->key);
+            ret = AVERROR(EINVAL);
+            goto fail;
+        }
+        token = av_strtok(NULL, ",", ptr);
+    }
+
+fail:
+    av_dict_free(&dict);
+    if (!ret && !audio_element->nb_layers) {
+        av_log(mux, AV_LOG_ERROR, "No layer in audio element specification\n");
+        ret = AVERROR(EINVAL);
+    }
+
+    return ret;
+}
+
+static int of_parse_iamf_submixes(Muxer *mux, AVStreamGroup *stg, char **ptr)
+{
+    AVFormatContext *oc = mux->fc;
+    AVIAMFMixPresentation *mix = stg->params.iamf_mix_presentation;
+    AVDictionary *dict = NULL;
+    const char *token;
+    char *submix_str = NULL;
+    int ret = 0;
+
+    /* process manually set submixes */
+    token = av_strtok(NULL, ",", ptr);
+    while (token) {
+        AVIAMFSubmix *submix = NULL;
+        const char *subtoken;
+        char *subptr = NULL;
+
+        if (!av_strstart(token, "submix=", &token)) {
+            av_log(mux, AV_LOG_ERROR, "No submix in mix presentation specification \"%s\"\n", token);
+            goto fail;
+        }
+
+        submix_str = av_strdup(token);
+        if (!submix_str)
+            goto fail;
+
+        submix = av_iamf_mix_presentation_add_submix(mix);
+        if (!submix) {
+            av_log(mux, AV_LOG_ERROR, "Error adding submix to stream group %d\n", stg->index);
+            ret = AVERROR(ENOMEM);
+            goto fail;
+        }
+        submix->output_mix_config =
+            av_iamf_param_definition_alloc(AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN, 0, NULL);
+        if (!submix->output_mix_config) {
+            ret = AVERROR(ENOMEM);
+            goto fail;
+        }
+
+        submix->output_mix_config->parameter_rate = stg->streams[0]->codecpar->sample_rate;
+
+        subptr = NULL;
+        subtoken = av_strtok(submix_str, "|", &subptr);
+        while (subtoken) {
+            const AVDictionaryEntry *e;
+            int element = 0, layout = 0;
+
+            if (av_strstart(subtoken, "element=", &subtoken))
+                element = 1;
+            else if (av_strstart(subtoken, "layout=", &subtoken))
+                layout = 1;
+
+            av_dict_free(&dict);
+            ret = av_dict_parse_string(&dict, subtoken, "=", ":", 0);
+            if (ret < 0) {
+                av_log(mux, AV_LOG_ERROR, "Error parsing submix specification \"%s\"\n", subtoken);
+                goto fail;
+            }
+
+            if (element) {
+                AVIAMFSubmixElement *submix_element;
+                int idx = -1;
+
+                if (e = av_dict_get(dict, "stg", NULL, 0))
+                    idx = strtol(e->value, NULL, 0);
+                av_dict_set(&dict, "stg", NULL, 0);
+                if (idx < 0 || idx >= oc->nb_stream_groups) {
+                    av_log(mux, AV_LOG_ERROR, "Invalid or missing stream group index in "
+                                              "submix element specification \"%s\"\n", subtoken);
+                    ret = AVERROR(EINVAL);
+                    goto fail;
+                }
+                submix_element = av_iamf_submix_add_element(submix);
+                if (!submix_element) {
+                    av_log(mux, AV_LOG_ERROR, "Error adding element to submix\n");
+                    ret = AVERROR(ENOMEM);
+                    goto fail;
+                }
+
+                submix_element->audio_element_id = oc->stream_groups[idx]->id;
+
+                submix_element->element_mix_config =
+                    av_iamf_param_definition_alloc(AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN, 0, NULL);
+                if (!submix_element->element_mix_config)
+                    ret = AVERROR(ENOMEM);
+                av_opt_set_dict2(submix_element, &dict, AV_OPT_SEARCH_CHILDREN);
+                submix_element->element_mix_config->parameter_rate = stg->streams[0]->codecpar->sample_rate;
+            } else if (layout) {
+                AVIAMFSubmixLayout *submix_layout = av_iamf_submix_add_layout(submix);
+                if (!submix_layout) {
+                    av_log(mux, AV_LOG_ERROR, "Error adding layout to submix\n");
+                    ret = AVERROR(ENOMEM);
+                    goto fail;
+                }
+                av_opt_set_dict(submix_layout, &dict);
+            } else
+                av_opt_set_dict2(submix, &dict, AV_OPT_SEARCH_CHILDREN);
+
+            if (ret < 0) {
+                goto fail;
+            }
+
+            // make sure that no entries are left in the dict
+            e = NULL;
+            while (e = av_dict_iterate(dict, e)) {
+                av_log(mux, AV_LOG_FATAL, "Unknown submix key %s.\n", e->key);
+                ret = AVERROR(EINVAL);
+                goto fail;
+            }
+            subtoken = av_strtok(NULL, "|", &subptr);
+        }
+        av_freep(&submix_str);
+
+        if (!submix->nb_elements) {
+            av_log(mux, AV_LOG_ERROR, "No audio elements in submix specification \"%s\"\n", token);
+            ret = AVERROR(EINVAL);
+        }
+        token = av_strtok(NULL, ",", ptr);
+    }
+
+fail:
+    av_dict_free(&dict);
+    av_free(submix_str);
+
+    return ret;
+}
+
+static int of_add_groups(Muxer *mux, const OptionsContext *o)
+{
+    AVFormatContext *oc = mux->fc;
+    int ret;
+
+    /* process manually set groups */
+    for (int i = 0; i < o->nb_stream_groups; i++) {
+        AVDictionary *dict = NULL, *tmp = NULL;
+        const AVDictionaryEntry *e;
+        AVStreamGroup *stg = NULL;
+        int type;
+        const char *token;
+        char *str, *ptr = NULL;
+        const AVOption opts[] = {
+            { "type", "Set group type", offsetof(AVStreamGroup, type), AV_OPT_TYPE_INT,
+                    { .i64 = 0 }, 0, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "type" },
+                { "iamf_audio_element",    NULL, 0, AV_OPT_TYPE_CONST,
+                    { .i64 = AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT },    .unit = "type" },
+                { "iamf_mix_presentation", NULL, 0, AV_OPT_TYPE_CONST,
+                    { .i64 = AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION }, .unit = "type" },
+            { NULL },
+        };
+         const AVClass class = {
+            .class_name = "StreamGroupType",
+            .item_name  = av_default_item_name,
+            .option     = opts,
+            .version    = LIBAVUTIL_VERSION_INT,
+        };
+        const AVClass *pclass = &class;
+
+        str = av_strdup(o->stream_groups[i].u.str);
+        if (!str)
+            goto end;
+
+        token = av_strtok(str, ",", &ptr);
+        if (token) {
+            ret = av_dict_parse_string(&dict, token, "=", ":", AV_DICT_MULTIKEY);
+            if (ret < 0) {
+                av_log(mux, AV_LOG_ERROR, "Error parsing group specification %s\n", token);
+                goto end;
+            }
+
+            // "type" is not a user settable option in AVStreamGroup
+            e = av_dict_get(dict, "type", NULL, 0);
+            if (!e) {
+                av_log(mux, AV_LOG_ERROR, "No type define for Steam Group %d\n", i);
+                ret = AVERROR(EINVAL);
+                goto end;
+            }
+
+            ret = av_opt_eval_int(&pclass, opts, e->value, &type);
+            if (ret < 0 || type == AV_STREAM_GROUP_PARAMS_NONE) {
+                av_log(mux, AV_LOG_ERROR, "Invalid group type \"%s\"\n", e->value);
+                goto end;
+            }
+
+            av_dict_copy(&tmp, dict, 0);
+            stg = avformat_stream_group_create(oc, type, &tmp);
+            if (!stg) {
+                ret = AVERROR(ENOMEM);
+                goto end;
+            }
+            av_dict_set(&tmp, "type", NULL, 0);
+
+            e = NULL;
+            while (e = av_dict_get(dict, "st", e, 0)) {
+                unsigned int idx = strtol(e->value, NULL, 0);
+                if (idx >= oc->nb_streams) {
+                    av_log(mux, AV_LOG_ERROR, "Invalid stream index %d\n", idx);
+                    ret = AVERROR(EINVAL);
+                    goto end;
+                }
+                avformat_stream_group_add_stream(stg, oc->streams[idx]);
+            }
+            while (e = av_dict_get(dict, "stg", e, 0)) {
+                unsigned int idx = strtol(e->value, NULL, 0);
+                if (idx >= oc->nb_stream_groups || idx == stg->index) {
+                    av_log(mux, AV_LOG_ERROR, "Invalid stream group index %d\n", idx);
+                    ret = AVERROR(EINVAL);
+                    goto end;
+                }
+                for (int j = 0; j < oc->stream_groups[idx]->nb_streams; j++)
+                    avformat_stream_group_add_stream(stg, oc->stream_groups[idx]->streams[j]);
+            }
+
+            switch(type) {
+            case AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT:
+                ret = of_parse_iamf_audio_element_layers(mux, stg, &ptr);
+                break;
+            case AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION:
+                ret = of_parse_iamf_submixes(mux, stg, &ptr);
+                break;
+            default:
+                av_log(mux, AV_LOG_FATAL, "Unknown group type %d.\n", type);
+                ret = AVERROR(EINVAL);
+                break;
+            }
+
+            if (ret < 0)
+                goto end;
+
+            // make sure that nothing but "st" and "stg" entries are left in the dict
+            e = NULL;
+            while (e = av_dict_iterate(tmp, e)) {
+                if (!strcmp(e->key, "st") || !strcmp(e->key, "stg"))
+                    continue;
+
+                av_log(mux, AV_LOG_FATAL, "Unknown group key %s.\n", e->key);
+                ret = AVERROR(EINVAL);
+                goto end;
+            }
+        }
+
+end:
+        av_dict_free(&dict);
+        av_dict_free(&tmp);
+        av_free(str);
+        if (ret < 0)
+            return ret;
+    }
+
+    return 0;
+}
+
 static int of_add_programs(Muxer *mux, const OptionsContext *o)
 {
     AVFormatContext *oc = mux->fc;
@@ -2740,6 +3071,10 @@ int of_open(const OptionsContext *o, const char *filename)
     if (err < 0)
         return err;
 
+    err = of_add_groups(mux, o);
+    if (err < 0)
+        return err;
+
     err = of_add_programs(mux, o);
     if (err < 0)
         return err;
diff --git a/fftools/ffmpeg_opt.c b/fftools/ffmpeg_opt.c
index 304471dd03..1144f64f89 100644
--- a/fftools/ffmpeg_opt.c
+++ b/fftools/ffmpeg_opt.c
@@ -1491,6 +1491,8 @@ const OptionDef options[] = {
         "add metadata", "string=string" },
     { "program",        HAS_ARG | OPT_STRING | OPT_SPEC | OPT_OUTPUT, { .off = OFFSET(program) },
         "add program with specified streams", "title=string:st=number..." },
+    { "stream_group",        HAS_ARG | OPT_STRING | OPT_SPEC | OPT_OUTPUT, { .off = OFFSET(stream_groups) },
+        "add stream group with specified streams and group type-specific arguments", "id=number:st=number..." },
     { "dframes",        HAS_ARG | OPT_PERFILE | OPT_EXPERT |
                         OPT_OUTPUT,                                  { .func_arg = opt_data_frames },
         "set the number of data frames to output", "number" },
-- 
2.43.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply	[flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-12-18 18:10 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-14 20:14 [FFmpeg-devel] [PATCH v7 0/8] avformat: introduce AVStreamGroup James Almer
2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 1/8] avutil: introduce an Immersive Audio Model and Formats API James Almer
2023-12-18 11:04   ` Anton Khirnov
2023-12-18 18:10     ` James Almer
2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 2/8] avformat: introduce AVStreamGroup James Almer
2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 3/8] ffmpeg: add support for muxing AVStreamGroups James Almer
2023-12-15 21:28   ` James Almer
2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 4/8] avcodec/packet: add IAMF Parameters side data types James Almer
2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 5/8] avcodec/get_bits: add get_leb() James Almer
2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 6/8] avformat/aviobuf: add ffio_read_leb() and ffio_write_leb() James Almer
2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 7/8] avformat: Immersive Audio Model and Formats demuxer James Almer
2023-12-14 20:14 ` [FFmpeg-devel] [PATCH 8/8] avformat: Immersive Audio Model and Formats muxer James Almer
  -- strict thread matches above, loose matches on Subject: below --
2023-12-05 22:43 [FFmpeg-devel] [PATCH v6 0/8] avformat: introduce AVStreamGroup James Almer
2023-12-05 22:43 ` [FFmpeg-devel] [PATCH 3/8] ffmpeg: add support for muxing AVStreamGroups James Almer
2023-12-11 11:48   ` Anton Khirnov
2023-12-11 12:46     ` James Almer
2023-12-12 11:26       ` Anton Khirnov
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git