[FFmpeg-devel] [PATCH] WIP: Add encoder and decoder for MEBX/Metadata Boxed (PR #20775)

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed

* [FFmpeg-devel] [PATCH] WIP: Add encoder and decoder for MEBX/Metadata Boxed (PR #20775)
@ 2025-10-28 13:47 Lukas via ffmpeg-devel
  0 siblings, 0 replies; only message in thread
From: Lukas via ffmpeg-devel @ 2025-10-28 13:47 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Lukas

PR #20775 opened by Lukas (lholliger)
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20775
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20775.patch

ISO 14496-12 defines a timed metadata allowing for metadata per-frame
or sub-frame, this patch aims to implement a decoder to validate the
data in the moov atom as well as the frame data. This also implements
a basic encoder to write this data back. This also implements some
newer requirements for the dtyp and keyd atoms defined on Apple's
documentation as this format tends to currently only be used for
certain Apple videos such as subtitles on Vision Pro or Cinematic Mode
on iPhone, but this format is by no means limited to just Apple's usage.

A lot was edited here and a part of me feels like its a bit too much since
I had to go into the decode.c and others to get DATA to behave as frames,
but since this format is timed as frames was it felt necessary, but it seems
something like this hasn't been done much before so any help and teaching
is appreciated!

This is based on PR #20738 but since it changes a lot and isn't the original
idea I thought a new PR would suit this best.


>From 42d0970e24b7fe02299fde4c96fe265dbd31a136 Mon Sep 17 00:00:00 2001
From: lholliger <14064434+lholliger@users.noreply.github.com>
Date: Tue, 28 Oct 2025 09:37:13 -0400
Subject: [PATCH] Add encoder and decoder for MEBX/Metadata Boxed

ISO 14496-12 defines a timed metadata allowing for metadata per-frame
or sub-frame, this patch aims to implement a decoder to validate the
data in the moov atom as well as the frame data. This also implements
a basic encoder to write this data back. This also implements some
newer requirements for the dtyp and keyd atoms defined on Apple's
documentation as this format tends to currently only be used for
certain Apple videos such as subtitles on Vision Pro or Cinematic Mode
on iPhone, but this format is by no means limited to just Apple's usage.
---
 fftools/ffmpeg_dec.c       |   2 +
 fftools/ffmpeg_enc.c       |  10 +
 fftools/ffmpeg_mux_init.c  |   3 +-
 fftools/ffprobe.c          |   1 +
 libavcodec/Makefile        |   2 +
 libavcodec/allcodecs.c     |   4 +
 libavcodec/codec_desc.c    |   6 +
 libavcodec/codec_id.h      |   1 +
 libavcodec/decode.c        |  18 +-
 libavcodec/mebxdec.c       | 449 +++++++++++++++++++++++++++++++++++++
 libavcodec/mebxenc.c       | 100 +++++++++
 libavcodec/tests/avcodec.c |   3 +-
 libavformat/avformat.h     |   1 +
 libavformat/format.c       |   2 +
 libavformat/isom.c         |   1 +
 libavformat/mov.c          |   7 +
 libavformat/movenc.c       |  44 +++-
 libavutil/frame.h          |   6 +
 18 files changed, 654 insertions(+), 6 deletions(-)
 create mode 100644 libavcodec/mebxdec.c
 create mode 100644 libavcodec/mebxenc.c

diff --git a/fftools/ffmpeg_dec.c b/fftools/ffmpeg_dec.c
index 2f25265997..343718660e 100644
--- a/fftools/ffmpeg_dec.c
+++ b/fftools/ffmpeg_dec.c
@@ -791,6 +791,8 @@ static int packet_decode(DecoderPriv *dp, AVPacket *pkt, AVFrame *frame)
             dp->dec.samples_decoded += frame->nb_samples;
 
             audio_ts_process(dp, frame);
+        } else if (dec->codec_type == AVMEDIA_TYPE_DATA) {
+            // metadata needs to be processed, no changes needed
         } else {
             ret = video_frame_process(dp, frame, &outputs_mask);
             if (ret < 0) {
diff --git a/fftools/ffmpeg_enc.c b/fftools/ffmpeg_enc.c
index 84e7e0ca0e..28d45a21f0 100644
--- a/fftools/ffmpeg_enc.c
+++ b/fftools/ffmpeg_enc.c
@@ -227,6 +227,12 @@ int enc_open(void *opaque, const AVFrame *frame)
     if (ost->type == AVMEDIA_TYPE_AUDIO || ost->type == AVMEDIA_TYPE_VIDEO) {
         enc_ctx->time_base      = frame->time_base;
         enc_ctx->framerate      = fd->frame_rate_filter;
+    } else if (ost->type == AVMEDIA_TYPE_DATA) {
+        // use frame timebase if available, otherwise use a default
+        if (frame && frame->time_base.num > 0)
+            enc_ctx->time_base = frame->time_base;
+        else
+            enc_ctx->time_base = AV_TIME_BASE_Q;
     }
 
     switch (enc_ctx->codec_type) {
@@ -329,6 +335,10 @@ int enc_open(void *opaque, const AVFrame *frame)
             enc_ctx->subtitle_header_size = dec->subtitle_header_size;
         }
 
+        break;
+    case AVMEDIA_TYPE_DATA:
+        // Data streams have minimal encoding requirements
+        // No special frame properties to set
         break;
     default:
         av_assert0(0);
diff --git a/fftools/ffmpeg_mux_init.c b/fftools/ffmpeg_mux_init.c
index c24b69f2d1..ca6e0435d2 100644
--- a/fftools/ffmpeg_mux_init.c
+++ b/fftools/ffmpeg_mux_init.c
@@ -79,7 +79,8 @@ static int choose_encoder(const OptionsContext *o, AVFormatContext *s,
 
     if (type != AVMEDIA_TYPE_VIDEO      &&
         type != AVMEDIA_TYPE_AUDIO      &&
-        type != AVMEDIA_TYPE_SUBTITLE) {
+        type != AVMEDIA_TYPE_SUBTITLE   &&
+        type != AVMEDIA_TYPE_DATA) {
         if (codec_name && strcmp(codec_name, "copy")) {
             const char *type_str = av_get_media_type_string(type);
             av_log(ost, AV_LOG_FATAL,
diff --git a/fftools/ffprobe.c b/fftools/ffprobe.c
index 89b8dd3802..333fd8495d 100644
--- a/fftools/ffprobe.c
+++ b/fftools/ffprobe.c
@@ -1465,6 +1465,7 @@ static av_always_inline int process_frame(AVTextFormatContext *tfc,
         switch (par->codec_type) {
         case AVMEDIA_TYPE_VIDEO:
         case AVMEDIA_TYPE_AUDIO:
+        case AVMEDIA_TYPE_DATA:
             if (*packet_new) {
                 ret = avcodec_send_packet(dec_ctx, pkt);
                 if (ret == AVERROR(EAGAIN)) {
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 972a17f060..d4c3b786aa 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -539,6 +539,8 @@ OBJS-$(CONFIG_MJPEG_QSV_ENCODER)       += qsvenc_jpeg.o
 OBJS-$(CONFIG_MJPEG_VAAPI_ENCODER)     += vaapi_encode_mjpeg.o
 OBJS-$(CONFIG_MLP_DECODER)             += mlpdec.o mlpdsp.o
 OBJS-$(CONFIG_MLP_ENCODER)             += mlpenc.o mlp.o
+OBJS-$(CONFIG_MEBX_DECODER)            += mebxdec.o
+OBJS-$(CONFIG_MEBX_ENCODER)            += mebxenc.o
 OBJS-$(CONFIG_MMVIDEO_DECODER)         += mmvideo.o
 OBJS-$(CONFIG_MOBICLIP_DECODER)        += mobiclip.o
 OBJS-$(CONFIG_MOTIONPIXELS_DECODER)    += motionpixels.o
diff --git a/libavcodec/allcodecs.c b/libavcodec/allcodecs.c
index 251ffae390..70818b161b 100644
--- a/libavcodec/allcodecs.c
+++ b/libavcodec/allcodecs.c
@@ -754,6 +754,10 @@ extern const FFCodec ff_webvtt_decoder;
 extern const FFCodec ff_xsub_encoder;
 extern const FFCodec ff_xsub_decoder;
 
+/* metadata */
+extern const FFCodec ff_mebx_decoder;
+extern const FFCodec ff_mebx_encoder;
+
 /* external libraries */
 extern const FFCodec ff_aac_at_encoder;
 extern const FFCodec ff_aac_at_decoder;
diff --git a/libavcodec/codec_desc.c b/libavcodec/codec_desc.c
index c72271bfad..40c8235f60 100644
--- a/libavcodec/codec_desc.c
+++ b/libavcodec/codec_desc.c
@@ -3839,6 +3839,12 @@ static const AVCodecDescriptor codec_descriptors[] = {
         .name      = "smpte_436m_anc",
         .long_name = NULL_IF_CONFIG_SMALL("MXF SMPTE-436M ANC"),
     },
+    {
+        .id        = AV_CODEC_ID_MEBX,
+        .type      = AVMEDIA_TYPE_DATA,
+        .name      = "mebx",
+        .long_name = NULL_IF_CONFIG_SMALL("Metadata Boxed"),
+    },
     {
         .id        = AV_CODEC_ID_MPEG2TS,
         .type      = AVMEDIA_TYPE_DATA,
diff --git a/libavcodec/codec_id.h b/libavcodec/codec_id.h
index 8c98ac6335..c623efa0c3 100644
--- a/libavcodec/codec_id.h
+++ b/libavcodec/codec_id.h
@@ -613,6 +613,7 @@ enum AVCodecID {
     AV_CODEC_ID_SMPTE_2038,
     AV_CODEC_ID_LCEVC,
     AV_CODEC_ID_SMPTE_436M_ANC,
+    AV_CODEC_ID_MEBX,
 
 
     AV_CODEC_ID_PROBE = 0x19000, ///< codec_id is not known (like AV_CODEC_ID_NONE) but lavf should attempt to identify it
diff --git a/libavcodec/decode.c b/libavcodec/decode.c
index d8e523f327..a797ec405d 100644
--- a/libavcodec/decode.c
+++ b/libavcodec/decode.c
@@ -450,6 +450,8 @@ static inline int decode_simple_internal(AVCodecContext *avctx, AVFrame *frame,
     } else if (avctx->codec->type == AVMEDIA_TYPE_AUDIO) {
         ret =  !got_frame ? AVERROR(EAGAIN)
                           : discard_samples(avctx, frame, discarded_samples);
+    } else if (avctx->codec->type == AVMEDIA_TYPE_DATA) {
+        ret = !got_frame ? AVERROR(EAGAIN) : 0;
     } else
         av_assert0(0);
 
@@ -464,7 +466,7 @@ static inline int decode_simple_internal(AVCodecContext *avctx, AVFrame *frame,
     if (consumed >= 0 && avctx->codec->type == AVMEDIA_TYPE_VIDEO)
         consumed = pkt->size;
 
-    if (!ret)
+    if (!ret && avctx->codec->type != AVMEDIA_TYPE_DATA)
         av_assert0(frame->buf[0]);
     if (ret == AVERROR(EAGAIN))
         ret = 0;
@@ -764,7 +766,15 @@ static int apply_cropping(AVCodecContext *avctx, AVFrame *frame)
 // make sure frames returned to the caller are valid
 static int frame_validate(AVCodecContext *avctx, AVFrame *frame)
 {
-    if (!frame->buf[0] || frame->format < 0)
+    // Data codec frames can have metadata/side-data without a buffer
+    if (avctx->codec_type != AVMEDIA_TYPE_DATA && !frame->buf[0])
+        goto fail;
+    if (avctx->codec_type == AVMEDIA_TYPE_DATA && !frame->buf[0] &&
+        !frame->metadata && frame->nb_side_data == 0)
+        goto fail;
+
+    // Data codec frames don't have format requirements
+    if (avctx->codec_type != AVMEDIA_TYPE_DATA && frame->format < 0)
         goto fail;
 
     switch (avctx->codec_type) {
@@ -778,6 +788,10 @@ static int frame_validate(AVCodecContext *avctx, AVFrame *frame)
             goto fail;
 
         break;
+    case AVMEDIA_TYPE_DATA:
+        // Data codec frames don't need pixel/sample format or dimensions
+        // Just need a valid buffer which is checked above
+        break;
     default: av_assert0(0);
     }
 
diff --git a/libavcodec/mebxdec.c b/libavcodec/mebxdec.c
new file mode 100644
index 0000000000..b5957f64f5
--- /dev/null
+++ b/libavcodec/mebxdec.c
@@ -0,0 +1,449 @@
+/*
+ * Metadata Boxed (mebx) decoder
+ * Copyright (c) 2025 Lukas Holliger
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "avcodec.h"
+#include "codec_internal.h"
+#include "libavutil/dict.h"
+#include "libavutil/intreadwrite.h"
+#include "libavutil/mem.h"
+#include "libavutil/macros.h"
+
+/**
+ * Metadata key definition with type information
+ */
+typedef struct {
+    uint32_t key_id;             // 1-based key identifier, zeroes are "dropped"
+    char *key_name;              // Full key name (e.g., "mdta:com.apple.quicktime.scene-illuminance")
+    uint32_t datatype_namespace; // 0 for well-known types, 1 for custom/reverse-DNS
+    uint32_t datatype_value;     // Type code (if namespace==0) or pointer to type string (if namespace==1)
+    char *datatype_string;       // Type string for namespace==1
+} MebxKeyDef;
+
+typedef struct {
+    AVDictionary *metadata;
+    MebxKeyDef *keys;          // Array of key definitions
+    int nb_keys;               // Number of keys
+} MebxContext;
+
+/**
+ * Parse a keyd (key definition) box.
+ * Returns the key definition and advances the pointer.
+ *
+ * Format:
+ *   [4 bytes] box size
+ *   [4 bytes] 'keyd' fourcc
+ *   [4 bytes] key namespace (4 ASCII chars)
+ *   [variable] key value (null-terminated string)
+ */
+static int mebx_parse_keyd_box(AVCodecContext *avctx, const uint8_t *ptr, const uint8_t *box_end,
+                               uint32_t local_key_id, MebxKeyDef *key_def, const uint8_t **next_ptr)
+{
+    uint32_t box_size;
+    uint32_t box_type;
+    char key_namespace[5] = { 0 };
+    char key_value[256] = { 0 };
+    char key_name[512] = { 0 };
+
+    if (ptr + 8 > box_end)
+        return AVERROR_INVALIDDATA;
+
+    box_size = AV_RB32(ptr);
+    box_type = AV_RB32(ptr + 4);
+
+    if (box_type != MKBETAG('k', 'e', 'y', 'd')) {
+        av_log(avctx, AV_LOG_WARNING, "mebx_parse_keyd_box: Expected 'keyd', got 0x%08x\n", box_type);
+        return AVERROR_INVALIDDATA;
+    }
+
+    if (box_size < 12 || ptr + box_size > box_end) {
+        av_log(avctx, AV_LOG_WARNING, "mebx_parse_keyd_box: Invalid box size %u\n", box_size);
+        return AVERROR_INVALIDDATA;
+    }
+
+    // Parse keyd content: namespace (4 bytes) + key value (rest)
+    memcpy(key_namespace, ptr + 8, 4);
+    int keyd_data_size = box_size - 12;
+    if (keyd_data_size > 0) {
+        memcpy(key_value, ptr + 12, FFMIN(keyd_data_size, (int)sizeof(key_value) - 1));
+        key_value[FFMIN(keyd_data_size, (int)sizeof(key_value) - 1)] = '\0';
+    }
+
+    // Create full key name
+    snprintf(key_name, sizeof(key_name), "%s:%s", key_namespace, key_value);
+
+    // Initialize key definition with local_key_id from the MetadataKeyBox box type
+    key_def->key_id = local_key_id;
+    key_def->key_name = av_strdup(key_name);
+    key_def->datatype_namespace = 0;  // Default to well-known types
+    key_def->datatype_value = 0;      // Unknown type
+    key_def->datatype_string = NULL;
+
+    *next_ptr = ptr + box_size;
+    return 0;
+}
+
+/**
+ * Parse a dtyp (datatype definition) box.
+ * Returns the type information and advances the pointer.
+ *
+ * Format:
+ *   [4 bytes] box size
+ *   [4 bytes] 'dtyp' fourcc
+ *   [4 bytes] datatype namespace (0 for well-known, 1 for custom)
+ *   [variable] datatype value (4-byte uint32 for namespace 0, UTF-8 string for namespace 1)
+ */
+static int mebx_parse_dtyp_box(AVCodecContext *avctx, const uint8_t *ptr, const uint8_t *box_end,
+                               MebxKeyDef *key_def, const uint8_t **next_ptr)
+{
+    uint32_t box_size;
+    uint32_t box_type;
+
+    if (ptr + 8 > box_end)
+        return AVERROR_INVALIDDATA;
+
+    box_size = AV_RB32(ptr);
+    box_type = AV_RB32(ptr + 4);
+
+    // in 2022 spec it is undocumented so it is theoretically optional, only documented on Apple's docs iirc
+    if (box_type != MKBETAG('d', 't', 'y', 'p')) {
+        *next_ptr = ptr;
+        return 0;
+    }
+
+    if (box_size < 12 || ptr + box_size > box_end) {
+        av_log(avctx, AV_LOG_WARNING, "mebx_parse_dtyp_box: Invalid box size %u\n", box_size);
+        return AVERROR_INVALIDDATA;
+    }
+
+    // Parse dtyp content: datatype namespace (4 bytes) + datatype value/string (rest)
+    key_def->datatype_namespace = AV_RB32(ptr + 8);
+
+    int dtyp_data_size = box_size - 12;
+    if (key_def->datatype_namespace == 0) {
+        // Well-known type
+        if (dtyp_data_size >= 4) {
+            key_def->datatype_value = AV_RB32(ptr + 12);
+        } else {
+            av_log(avctx, AV_LOG_WARNING, "mebx_parse_dtyp_box: Invalid dtyp datatype size %u\n", dtyp_data_size);
+
+        }
+    } else if (key_def->datatype_namespace == 1) {
+        // Custom type: UTF-8 string (no null terminator in box)
+        if (dtyp_data_size > 0) {
+            key_def->datatype_string = av_malloc(dtyp_data_size + 1);
+            if (key_def->datatype_string) {
+                memcpy(key_def->datatype_string, ptr + 12, dtyp_data_size);
+                key_def->datatype_string[dtyp_data_size] = '\0';
+            }
+        }
+    }
+
+    *next_ptr = ptr + box_size;
+    return 0;
+}
+
+/**
+ * Parse the keys box and extract metadata entries with type information.
+ * The keys box contains a mapping of keys to numeric identifiers, and usually dtyp boxes for type info.
+ *
+ * Format:
+ *   [4 bytes] box size
+ *   [4 bytes] 'keys' fourcc
+ *   For each entry:
+ *     [MetadataKeyBox] - box ID
+ *     [keyd box] - key definition
+ *     [dtyp box] - type information
+ */
+static int mebx_parse_keys_box(AVCodecContext *avctx, const uint8_t *data, int size, MebxContext *ctx)
+{
+    const uint8_t *ptr = data;
+    const uint8_t *end = data + size;
+
+    if (size < 8)
+        return AVERROR_INVALIDDATA;
+
+    // Loop through all top-level boxes in the extradata
+    while (ptr + 8 <= end) {
+        uint32_t box_size = AV_RB32(ptr);
+        uint32_t box_type = AV_RB32(ptr + 4);
+
+        // Validate box size
+        if (box_size < 8 || ptr + box_size > end) {
+            av_log(avctx, AV_LOG_WARNING, "mebx_parse_keys_box: Invalid box size %u\n", box_size);
+            break;
+        }
+
+        // Only process 'keys' boxes
+        if (box_type == MKBETAG('k', 'e', 'y', 's')) {
+            av_log(avctx, AV_LOG_TRACE, "mebx_parse_keys_box: Found 'keys' box, processing...\n");
+            const uint8_t *box_ptr = ptr + 8;
+            const uint8_t *box_end = ptr + box_size;
+
+            // Each MetadataKeyBox: [size(4)][local_key_id(4)][keyd_box][dtyp_box]
+            // First pass: count valid MetadataKeyBox entries (skip entries with local_key_id of 0)
+            uint32_t count = 0;
+            const uint8_t *scan_ptr = box_ptr;
+            while (scan_ptr < box_end) {
+                if (scan_ptr + 8 > box_end)
+                    break;
+                uint32_t key_box_size = AV_RB32(scan_ptr);
+                if (key_box_size < 8 || scan_ptr + key_box_size > box_end)
+                    break;
+                uint32_t local_key_id = AV_RB32(scan_ptr + 4);
+                // Only count non-zero key IDs (zero means disabled)
+                if (local_key_id != 0)
+                    count++;
+                scan_ptr += key_box_size;
+            }
+
+            if (count == 0) {
+                av_log(avctx, AV_LOG_DEBUG, "mebx_parse_keys_box: No MetadataKeyBox entries found\n");
+            } else {
+                av_log(avctx, AV_LOG_DEBUG, "mebx_parse_keys_box: found %u key entries\n", count);
+                ctx->keys = av_malloc_array(count, sizeof(MebxKeyDef));
+                if (!ctx->keys)
+                    return AVERROR(ENOMEM);
+                memset(ctx->keys, 0, count * sizeof(MebxKeyDef));
+                ctx->nb_keys = count;
+
+                // Process each MetadataKeyBox
+                int key_idx = 0;
+                while (key_idx < count && box_ptr < box_end) {
+                    const uint8_t *next_ptr;
+                    int ret;
+                    uint32_t local_key_id;
+
+                    // Extract local_key_id from the MetadataKeyBox box type
+                    if (box_ptr + 8 > box_end) {
+                        av_log(avctx, AV_LOG_ERROR, "mebx_parse_keys_box: Not enough data for MetadataKeyBox\n");
+                        break;
+                    }
+                    uint32_t key_box_size = AV_RB32(box_ptr);
+                    local_key_id = AV_RB32(box_ptr + 4);
+                    box_ptr += 8;
+
+                    // local_key_id of 0 is considered disabled
+                    if (local_key_id == 0) {
+                        av_log(avctx, AV_LOG_DEBUG, "mebx_parse_keys_box: Skipping MetadataKeyBox with local_key_id=0\n");
+                        box_ptr += key_box_size - 8;
+                        continue;
+                    }
+
+                    // Parse keyd box
+                    ret = mebx_parse_keyd_box(avctx, box_ptr, box_ptr + key_box_size - 8, local_key_id, &ctx->keys[key_idx], &next_ptr);
+                    if (ret < 0) {
+                        av_log(avctx, AV_LOG_ERROR, "mebx_parse_keys_box: Failed to parse keyd box\n");
+                        break;
+                    }
+
+                    box_ptr = next_ptr;
+
+                    // Parse dtyp box following keyd
+                    ret = mebx_parse_dtyp_box(avctx, box_ptr, box_end, &ctx->keys[key_idx], &next_ptr);
+                    if (ret < 0) {
+                        av_log(avctx, AV_LOG_ERROR, "mebx_parse_keys_box: Failed to parse dtyp box\n");
+                        break;
+                    }
+                    box_ptr = next_ptr;
+
+                    // Some logging
+                    if (ctx->keys[key_idx].datatype_namespace == 0) {
+                        av_log(avctx, AV_LOG_DEBUG, "mebx: key[%u,id=%u] %s (type=%u)\n",
+                               key_idx, ctx->keys[key_idx].key_id, ctx->keys[key_idx].key_name, ctx->keys[key_idx].datatype_value);
+                    } else if (ctx->keys[key_idx].datatype_namespace == 1) {
+                        av_log(avctx, AV_LOG_DEBUG, "mebx: key[%u,id=%u] %s (custom type: %s)\n",
+                               key_idx, ctx->keys[key_idx].key_id, ctx->keys[key_idx].key_name, ctx->keys[key_idx].datatype_string ? ctx->keys[key_idx].datatype_string : "");
+                    } else {
+                        av_log(avctx, AV_LOG_DEBUG, "mebx: key[%u,id=%u] %s (unknown namespace %u)\n",
+                               key_idx, ctx->keys[key_idx].key_id, ctx->keys[key_idx].key_name, ctx->keys[key_idx].datatype_namespace);
+                    }
+
+                    // Store in metadata dictionary as well
+                    char index_str[16];
+                    snprintf(index_str, sizeof(index_str), "%u", key_idx + 1);
+                    av_dict_set(&ctx->metadata, ctx->keys[key_idx].key_name, index_str, 0);
+
+                    key_idx++;
+                }
+            }
+        } else {
+            av_log(avctx, AV_LOG_DEBUG, "mebx_parse_keys_box: Skipping unknown box type=0x%08x (box_size=%u)\n",
+                   box_type, box_size);
+        }
+
+        // Move to next top-level box
+        ptr += box_size;
+    }
+
+    return 0;
+}
+
+/**
+ * Main mebx decoder function.
+ * Parses the frame packet data which contains item entries.
+ *
+ * Packet format:
+ *   [4 bytes] item size
+ *   [4 bytes] item ID
+ *   [variable] item data (binary, or a well-known type depending on header dtyp)
+ */
+static int mebx_decode_frame(AVCodecContext *avctx, AVFrame *frame,
+                             int *got_frame, AVPacket *avpkt)
+{
+    MebxContext *ctx = avctx->priv_data;
+    const uint8_t *data = avpkt->data;
+    int size = avpkt->size;
+    const uint8_t *ptr = data;
+    const uint8_t *end = data + size;
+
+    if (size == 0) {
+        // We shouldn't have empty packets as they should either be duplicated per-frame or missing
+        av_log(avctx, AV_LOG_WARNING, "mebx_decode_frame: received empty packet (size=0)\n");
+        *got_frame = 0;
+        return 0;
+    }
+
+    // Parse item entries in the packet data
+    // Each item has: [4 bytes size][4 bytes item_id][variable data]
+    while (ptr + 8 <= end) {
+        uint32_t item_size;
+        uint32_t item_id;
+        int data_size;
+        char value_str[256] = { 0 }; // this might be too small??? Anything can really go in here but it tends to be small per-frame
+
+        item_size = AV_RB32(ptr);
+        item_id = AV_RB32(ptr + 4);
+
+        if (item_size < 8 || ptr + item_size > end) {
+            av_log(avctx, AV_LOG_WARNING, "mebx_decode_frame: invalid item size %u\n", item_size);
+            break;
+        }
+
+        data_size = item_size - 8;  // Skip size and item_id fields
+
+        av_log(avctx, AV_LOG_DEBUG, "mebx_decode_frame: item_id=%u, size=%u, data_size=%d\n",
+               item_id, item_size, data_size);
+
+        // Try to look up the key name from the keys array
+        const char *key_name_ptr = NULL;
+
+        for (int i = 0; i < ctx->nb_keys; i++) {
+            if (ctx->keys[i].key_id == item_id) {
+                key_name_ptr = ctx->keys[i].key_name;
+                break;
+            }
+        }
+
+        // Create the metadata entry
+        if (key_name_ptr) {
+            // Store binary data
+            if (data_size > 0) {
+                int str_pos = 0;
+                for (int j = 0; j < data_size && str_pos < (int)sizeof(value_str) - 3; j++) {
+                    str_pos += snprintf(value_str + str_pos, sizeof(value_str) - str_pos, "%02x", ptr[8 + j]);
+                }
+            }
+
+            av_dict_set(&frame->metadata, key_name_ptr, value_str, 0);
+            av_log(avctx, AV_LOG_DEBUG, "mebx_decode_frame: %s = %s\n", key_name_ptr, value_str);
+        } else {
+            // Unknown item ID - log it for now
+            av_log(avctx, AV_LOG_DEBUG, "mebx_decode_frame: unknown item_id %u, skipping\n", item_id);
+        }
+
+        ptr += item_size;
+    }
+
+    // Set basic frame properties
+    frame->pts = avpkt->pts;
+    frame->pkt_dts = avpkt->dts;
+    frame->time_base = avctx->pkt_timebase;
+    if (avpkt->duration > 0)
+        frame->duration = avpkt->duration;
+
+    frame->format = 0;  // No specific format for data frames, set for validation
+
+    // Store the original packet data as side-data for encoder to preserve it
+    // Validation allows DATA frames with metadata/side-data but no buf[0]
+    AVBufferRef *pkt_buf = av_buffer_create(av_memdup(avpkt->data, avpkt->size), avpkt->size,
+                                            av_buffer_default_free, NULL, 0);
+    if (!pkt_buf) {
+        av_log(avctx, AV_LOG_ERROR, "mebx_decode_frame: Failed to allocate packet buffer\n");
+        return AVERROR(ENOMEM);
+    }
+
+    AVFrameSideData *sd = av_frame_new_side_data_from_buf(frame, AV_FRAME_DATA_MEBX_PACKET, pkt_buf);
+    if (!sd) {
+        av_log(avctx, AV_LOG_ERROR, "mebx_decode_frame: Failed to attach packet data as side-data\n");
+        av_buffer_unref(&pkt_buf);
+        return AVERROR(ENOMEM);
+    }
+
+    *got_frame = 1;
+    return avpkt->size;
+}
+
+static int mebx_decode_init(AVCodecContext *avctx)
+{
+    MebxContext *ctx = avctx->priv_data;
+    if (avctx->extradata_size > 0) {
+        mebx_parse_keys_box(avctx, avctx->extradata, avctx->extradata_size, ctx);
+    }
+
+    return 0;
+}
+
+static int mebx_decode_close(AVCodecContext *avctx)
+{
+    MebxContext *ctx = avctx->priv_data;
+
+    if (ctx->keys) {
+        for (int i = 0; i < ctx->nb_keys; i++) {
+            if (ctx->keys[i].key_name) {
+                av_free(ctx->keys[i].key_name);
+            }
+            if (ctx->keys[i].datatype_string) {
+                av_free(ctx->keys[i].datatype_string);
+            }
+        }
+        av_free(ctx->keys);
+        ctx->keys = NULL;
+        ctx->nb_keys = 0;
+    }
+
+    if (ctx->metadata) {
+        av_dict_free(&ctx->metadata);
+        ctx->metadata = NULL;
+    }
+
+    return 0;
+}
+
+const FFCodec ff_mebx_decoder = {
+    .p.name         = "mebx",
+    CODEC_LONG_NAME("Metadata Boxed"),
+    .p.type         = AVMEDIA_TYPE_DATA,
+    .p.id           = AV_CODEC_ID_MEBX,
+    .priv_data_size = sizeof(MebxContext),
+    .init           = mebx_decode_init,
+    .close          = mebx_decode_close,
+    FF_CODEC_DECODE_CB(mebx_decode_frame),
+};
diff --git a/libavcodec/mebxenc.c b/libavcodec/mebxenc.c
new file mode 100644
index 0000000000..7e62b6dbb0
--- /dev/null
+++ b/libavcodec/mebxenc.c
@@ -0,0 +1,100 @@
+/*
+ * Metadata Boxed (mebx) encoder
+ * Copyright (c) 2025 Lukas Holliger
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "avcodec.h"
+#include "codec_internal.h"
+#include "libavutil/dict.h"
+#include "libavutil/intreadwrite.h"
+#include "libavutil/mem.h"
+#include "libavutil/bprint.h"
+
+typedef struct {
+    AVDictionary *metadata;
+} MebxContext;
+
+/**
+ * Main mebx encoder function.
+ * For transparent round-trip transcoding, preserves original packet data
+ * that was stored during decoding via frame side-data.
+ */
+static int mebx_encode_frame(AVCodecContext *avctx, AVPacket *pkt,
+                            const AVFrame *frame, int *got_packet)
+{
+    if (!frame || !frame->metadata || av_dict_count(frame->metadata) == 0) {
+        // something invalid here
+        *got_packet = 0;
+        return 0;
+    } else {
+        // Check if we have the original packet data stored in frame side-data
+        AVFrameSideData *sd = av_frame_get_side_data(frame, AV_FRAME_DATA_MEBX_PACKET);
+        if (sd && sd->buf && sd->size > 0) {
+            // Use the original packet data
+            pkt->buf = av_buffer_ref(sd->buf);
+            if (!pkt->buf)
+                return AVERROR(ENOMEM);
+
+            pkt->data = sd->data;
+            pkt->size = sd->size;
+
+            av_log(avctx, AV_LOG_DEBUG, "mebx_encode_frame: using original packet data from side-data, size=%d\n",
+                   pkt->size);
+        } else {
+            // The data wasn't given to us
+            av_log(avctx, AV_LOG_WARNING, "mebx_encode_frame: no original packet data, discarding frame\n");
+            *got_packet = 0;
+            return 0;
+        }
+    }
+
+    *got_packet = 1;
+    return 0;
+}
+
+static int mebx_encode_init(AVCodecContext *avctx)
+{
+    av_log(avctx, AV_LOG_DEBUG, "mebx_encode_init: encoder initialized\n");
+    avctx->pkt_timebase = avctx->time_base;
+
+    return 0;
+}
+
+static int mebx_encode_close(AVCodecContext *avctx)
+{
+    MebxContext *ctx = avctx->priv_data;
+
+    if (ctx->metadata) {
+        av_dict_free(&ctx->metadata);
+        ctx->metadata = NULL;
+    }
+
+    return 0;
+}
+
+const FFCodec ff_mebx_encoder = {
+    .p.name         = "mebx",
+    CODEC_LONG_NAME("Metadata Boxed"),
+    .p.type         = AVMEDIA_TYPE_DATA,
+    .p.id           = AV_CODEC_ID_MEBX,
+    .priv_data_size = sizeof(MebxContext),
+    .init           = mebx_encode_init,
+    .close          = mebx_encode_close,
+    FF_CODEC_ENCODE_CB(mebx_encode_frame),
+};
diff --git a/libavcodec/tests/avcodec.c b/libavcodec/tests/avcodec.c
index dde8226384..c77205b4fb 100644
--- a/libavcodec/tests/avcodec.c
+++ b/libavcodec/tests/avcodec.c
@@ -73,7 +73,8 @@ int main(void){
         }
         if (codec->type != AVMEDIA_TYPE_VIDEO &&
             codec->type != AVMEDIA_TYPE_AUDIO &&
-            codec->type != AVMEDIA_TYPE_SUBTITLE)
+            codec->type != AVMEDIA_TYPE_SUBTITLE &&
+            codec->type != AVMEDIA_TYPE_DATA)
             ERR_EXT("Codec %s has unsupported type %s\n",
                     get_type_string(codec->type));
         if (codec->type != AVMEDIA_TYPE_AUDIO) {
diff --git a/libavformat/avformat.h b/libavformat/avformat.h
index a7446546e5..7fe3b45539 100644
--- a/libavformat/avformat.h
+++ b/libavformat/avformat.h
@@ -516,6 +516,7 @@ typedef struct AVOutputFormat {
     enum AVCodecID audio_codec;    /**< default audio codec */
     enum AVCodecID video_codec;    /**< default video codec */
     enum AVCodecID subtitle_codec; /**< default subtitle codec */
+    enum AVCodecID data_codec;     /**< default data codec */
     /**
      * can use flags: AVFMT_NOFILE, AVFMT_NEEDNUMBER,
      * AVFMT_GLOBALHEADER, AVFMT_NOTIMESTAMPS, AVFMT_VARIABLE_FPS,
diff --git a/libavformat/format.c b/libavformat/format.c
index 516925e7e4..1580312df2 100644
--- a/libavformat/format.c
+++ b/libavformat/format.c
@@ -139,6 +139,8 @@ enum AVCodecID av_guess_codec(const AVOutputFormat *fmt, const char *short_name,
         return fmt->audio_codec;
     else if (type == AVMEDIA_TYPE_SUBTITLE)
         return fmt->subtitle_codec;
+    else if (type == AVMEDIA_TYPE_DATA)
+        return fmt->data_codec;
     else
         return AV_CODEC_ID_NONE;
 }
diff --git a/libavformat/isom.c b/libavformat/isom.c
index 29171fea40..7dbc934330 100644
--- a/libavformat/isom.c
+++ b/libavformat/isom.c
@@ -81,6 +81,7 @@ const AVCodecTag ff_codec_movsubtitle_tags[] = {
 
 const AVCodecTag ff_codec_movdata_tags[] = {
     { AV_CODEC_ID_BIN_DATA, MKTAG('g', 'p', 'm', 'd') },
+    { AV_CODEC_ID_MEBX, MKTAG('m', 'e', 'b', 'x') },
     { AV_CODEC_ID_NONE, 0 },
 };
 
diff --git a/libavformat/mov.c b/libavformat/mov.c
index 45c562cdc6..978121bc1d 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -2895,6 +2895,13 @@ static int mov_parse_stsd_data(MOVContext *c, AVIOContext *pb,
                 }
             }
         }
+    } else if (st->codecpar->codec_tag == MKTAG('m','e','b','x')) {
+        if ((int)size != size)
+            return AVERROR(ENOMEM);
+
+        ret = ff_get_extradata(c->fc, st->codecpar, pb, size);
+        if (ret < 0)
+            return ret;
     } else {
         /* other codec type, just skip (rtp, mp4s ...) */
         avio_skip(pb, size);
diff --git a/libavformat/movenc.c b/libavformat/movenc.c
index e949c54f04..7987f38c3f 100644
--- a/libavformat/movenc.c
+++ b/libavformat/movenc.c
@@ -2100,6 +2100,8 @@ static unsigned int mov_get_codec_tag(AVFormatContext *s, MOVTrack *track)
             }
         } else if (track->par->codec_type == AVMEDIA_TYPE_SUBTITLE)
             tag = ff_codec_get_tag(ff_codec_movsubtitle_tags, track->par->codec_id);
+        else if (track->par->codec_type == AVMEDIA_TYPE_DATA)
+            tag = ff_codec_get_tag(ff_codec_movdata_tags, track->par->codec_id);
     }
 
     return tag;
@@ -3090,6 +3092,36 @@ static int mov_write_gpmd_tag(AVIOContext *pb, const MOVTrack *track)
     return update_size(pb, pos);
 }
 
+static int mov_write_mebx_keys_tag(AVIOContext *pb, const MOVTrack *track)
+{
+    int64_t pos = avio_tell(pb);
+    avio_wb32(pb, 0); /* size */
+    ffio_wfourcc(pb, "keys");
+    avio_wb32(pb, 0); /* version and flags */
+    
+    return update_size(pb, pos);
+}
+
+static int mov_write_mebx_tag(AVIOContext *pb, const MOVTrack *track)
+{
+    int64_t pos = avio_tell(pb);
+    avio_wb32(pb, 0); /* size */
+    ffio_wfourcc(pb, "mebx");
+    avio_wb32(pb, 0); /* Reserved */
+    avio_wb16(pb, 0); /* Reserved */
+    avio_wb16(pb, 1); /* Data-reference index */
+
+    // Write the keys box (and any other boxes) from extradata
+    if (track->par->extradata_size > 0) {
+        avio_write(pb, track->par->extradata, track->par->extradata_size);
+    } else {
+        // No extradata, write minimal empty keys box
+        mov_write_mebx_keys_tag(pb, track);
+    }
+
+    return update_size(pb, pos);
+}
+
 static int mov_write_stsd_tag(AVFormatContext *s, AVIOContext *pb, MOVMuxContext *mov, MOVTrack *track)
 {
     int64_t pos = avio_tell(pb);
@@ -3115,7 +3147,8 @@ static int mov_write_stsd_tag(AVFormatContext *s, AVIOContext *pb, MOVMuxContext
         ret = mov_write_tmcd_tag(pb, track);
     else if (track->par->codec_tag == MKTAG('g','p','m','d'))
         ret = mov_write_gpmd_tag(pb, track);
-
+    else if (track->par->codec_tag == MKTAG('m','e','b','x'))
+        ret = mov_write_mebx_tag(pb, track);
     if (ret < 0)
         return ret;
     }
@@ -3537,6 +3570,9 @@ static int mov_write_hdlr_tag(AVFormatContext *s, AVIOContext *pb, MOVTrack *tra
         } else if (track->par->codec_tag == MKTAG('g','p','m','d')) {
             hdlr_type = "meta";
             descr = "GoPro MET"; // GoPro Metadata
+        } else if (track->par->codec_tag == MKTAG('m','e','b','x')) {
+            hdlr_type = "meta";
+            descr = "Metadata Boxed";
         } else {
             av_log(s, AV_LOG_WARNING,
                    "Unknown hdlr_type for %s, writing dummy values\n",
@@ -3772,6 +3808,8 @@ static int mov_write_minf_tag(AVFormatContext *s, AVIOContext *pb, MOVMuxContext
             mov_write_gmhd_tag(pb, track);
     } else if (track->tag == MKTAG('g','p','m','d')) {
         mov_write_gmhd_tag(pb, track);
+    } else if (track->tag == MKTAG('m','e','b','x')) {
+        mov_write_mebx_tag(pb, track);
     }
     if (track->mode == MODE_MOV) /* ISO 14496-12 8.4.3.1 specifies hdlr only within mdia or meta boxes */
         mov_write_hdlr_tag(s, pb, NULL);
@@ -8860,6 +8898,7 @@ static const AVCodecTag codec_mp4_tags[] = {
     { AV_CODEC_ID_DVD_SUBTITLE,    MKTAG('m', 'p', '4', 's') },
     { AV_CODEC_ID_MOV_TEXT,        MKTAG('t', 'x', '3', 'g') },
     { AV_CODEC_ID_BIN_DATA,        MKTAG('g', 'p', 'm', 'd') },
+    { AV_CODEC_ID_MEBX,            MKTAG('m', 'e', 'b', 'x') },
     { AV_CODEC_ID_MPEGH_3D_AUDIO,  MKTAG('m', 'h', 'm', '1') },
     { AV_CODEC_ID_TTML,            MOV_MP4_TTML_TAG          },
     { AV_CODEC_ID_TTML,            MOV_ISMV_TTML_TAG         },
@@ -8942,6 +8981,7 @@ const FFOutputFormat ff_mov_muxer = {
     .p.audio_codec     = AV_CODEC_ID_AAC,
     .p.video_codec     = CONFIG_LIBX264_ENCODER ?
                          AV_CODEC_ID_H264 : AV_CODEC_ID_MPEG4,
+    .p.data_codec      = AV_CODEC_ID_MEBX,
     .init              = mov_init,
     .write_header      = mov_write_header,
     .write_packet      = mov_write_packet,
@@ -8949,7 +8989,7 @@ const FFOutputFormat ff_mov_muxer = {
     .deinit            = mov_free,
     .p.flags           = AVFMT_GLOBALHEADER | AVFMT_TS_NEGATIVE | AVFMT_VARIABLE_FPS,
     .p.codec_tag       = (const AVCodecTag* const []){
-        ff_codec_movvideo_tags, ff_codec_movaudio_tags, ff_codec_movsubtitle_tags, 0
+        ff_codec_movvideo_tags, ff_codec_movaudio_tags, ff_codec_movsubtitle_tags, ff_codec_movdata_tags, 0
     },
     .check_bitstream   = mov_check_bitstream,
     .p.priv_class      = &mov_isobmff_muxer_class,
diff --git a/libavutil/frame.h b/libavutil/frame.h
index 088b24b717..3a8ed1c2b5 100644
--- a/libavutil/frame.h
+++ b/libavutil/frame.h
@@ -260,6 +260,12 @@ enum AVFrameSideDataType {
      * EXIF metadata, starting with either 49 49 2a 00, or 4d 4d 00 2a.
      */
      AV_FRAME_DATA_EXIF,
+
+    /**
+     * Metadata Boxed (MEBX) packet data for preserving original packet bytes during transcoding.
+     * Used by the MEBX codec to store the original packet data so the output can remain unchanged.
+     */
+    AV_FRAME_DATA_MEBX_PACKET,
 };
 
 enum AVActiveFormatDescription {
-- 
2.49.1

_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-10-28 13:48 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-28 13:47 [FFmpeg-devel] [PATCH] WIP: Add encoder and decoder for MEBX/Metadata Boxed (PR #20775) Lukas via ffmpeg-devel

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git