From: Paul B Mahol <onemda@gmail.com> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Subject: Re: [FFmpeg-devel] [PATCH] avfilter: merge loudnorm filter functionality into f_ebur128.c Date: Tue, 28 Nov 2023 17:51:14 +0100 Message-ID: <CAPYw7P46huiN92C_q3zSuQu0wn6Dz+Vq36+_H6DtS-859FBO7Q@mail.gmail.com> (raw) In-Reply-To: <CAPYw7P5MGP2Hg9Ca3qbKv3MnGYJH2y3T9Y=Dy251EWPLTpDtYg@mail.gmail.com> [-- Attachment #1: Type: text/plain, Size: 261 bytes --] Major change: loudnorm no longer returns oversampled audio at 192000 Hz when doing dynamic processing. Oversampled audio is only used for true peak finding now. This was trivial improvement as possible with ebur128 code. Minor changes: numerous stability fixes [-- Attachment #2: 0001-avfilter-merge-loudnorm-filter-functionality-into-f_.patch --] [-- Type: text/x-patch, Size: 118973 bytes --] From 008773730e4f95186e1a01f2039bbd8a2a79656a Mon Sep 17 00:00:00 2001 From: Paul B Mahol <onemda@gmail.com> Date: Fri, 29 Sep 2023 20:53:51 +0200 Subject: [PATCH] avfilter: merge loudnorm filter functionality into f_ebur128.c Signed-off-by: Paul B Mahol <onemda@gmail.com> --- libavfilter/Makefile | 1 - libavfilter/af_loudnorm.c | 941 -------------------------------------- libavfilter/ebur128.c | 725 ----------------------------- libavfilter/ebur128.h | 229 ---------- libavfilter/f_ebur128.c | 914 +++++++++++++++++++++++++++++------- 5 files changed, 744 insertions(+), 2066 deletions(-) delete mode 100644 libavfilter/af_loudnorm.c delete mode 100644 libavfilter/ebur128.c delete mode 100644 libavfilter/ebur128.h diff --git a/libavfilter/Makefile b/libavfilter/Makefile index 63725f91b4..747bc24e58 100644 --- a/libavfilter/Makefile +++ b/libavfilter/Makefile @@ -150,7 +150,6 @@ OBJS-$(CONFIG_HIGHPASS_FILTER) += af_biquads.o OBJS-$(CONFIG_HIGHSHELF_FILTER) += af_biquads.o OBJS-$(CONFIG_JOIN_FILTER) += af_join.o OBJS-$(CONFIG_LADSPA_FILTER) += af_ladspa.o -OBJS-$(CONFIG_LOUDNORM_FILTER) += af_loudnorm.o ebur128.o OBJS-$(CONFIG_LOWPASS_FILTER) += af_biquads.o OBJS-$(CONFIG_LOWSHELF_FILTER) += af_biquads.o OBJS-$(CONFIG_LV2_FILTER) += af_lv2.o diff --git a/libavfilter/af_loudnorm.c b/libavfilter/af_loudnorm.c deleted file mode 100644 index d83398ae2a..0000000000 --- a/libavfilter/af_loudnorm.c +++ /dev/null @@ -1,941 +0,0 @@ -/* - * Copyright (c) 2016 Kyle Swanson <k@ylo.ph>. - * - * This file is part of FFmpeg. - * - * FFmpeg is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation; either - * version 2.1 of the License, or (at your option) any later version. - * - * FFmpeg is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * Lesser General Public License for more details. - * - * You should have received a copy of the GNU Lesser General Public - * License along with FFmpeg; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA - */ - -/* http://k.ylo.ph/2016/04/04/loudnorm.html */ - -#include "libavutil/opt.h" -#include "avfilter.h" -#include "filters.h" -#include "formats.h" -#include "internal.h" -#include "audio.h" -#include "ebur128.h" - -enum FrameType { - FIRST_FRAME, - INNER_FRAME, - FINAL_FRAME, - LINEAR_MODE, - FRAME_NB -}; - -enum LimiterState { - OUT, - ATTACK, - SUSTAIN, - RELEASE, - STATE_NB -}; - -enum PrintFormat { - NONE, - JSON, - SUMMARY, - PF_NB -}; - -typedef struct LoudNormContext { - const AVClass *class; - double target_i; - double target_lra; - double target_tp; - double measured_i; - double measured_lra; - double measured_tp; - double measured_thresh; - double offset; - int linear; - int dual_mono; - enum PrintFormat print_format; - - double *buf; - int buf_size; - int buf_index; - int prev_buf_index; - - double delta[30]; - double weights[21]; - double prev_delta; - int index; - - double gain_reduction[2]; - double *limiter_buf; - double *prev_smp; - int limiter_buf_index; - int limiter_buf_size; - enum LimiterState limiter_state; - int peak_index; - int env_index; - int env_cnt; - int attack_length; - int release_length; - - int64_t pts[30]; - enum FrameType frame_type; - int above_threshold; - int prev_nb_samples; - int channels; - - FFEBUR128State *r128_in; - FFEBUR128State *r128_out; -} LoudNormContext; - -#define OFFSET(x) offsetof(LoudNormContext, x) -#define FLAGS AV_OPT_FLAG_AUDIO_PARAM|AV_OPT_FLAG_FILTERING_PARAM - -static const AVOption loudnorm_options[] = { - { "I", "set integrated loudness target", OFFSET(target_i), AV_OPT_TYPE_DOUBLE, {.dbl = -24.}, -70., -5., FLAGS }, - { "i", "set integrated loudness target", OFFSET(target_i), AV_OPT_TYPE_DOUBLE, {.dbl = -24.}, -70., -5., FLAGS }, - { "LRA", "set loudness range target", OFFSET(target_lra), AV_OPT_TYPE_DOUBLE, {.dbl = 7.}, 1., 50., FLAGS }, - { "lra", "set loudness range target", OFFSET(target_lra), AV_OPT_TYPE_DOUBLE, {.dbl = 7.}, 1., 50., FLAGS }, - { "TP", "set maximum true peak", OFFSET(target_tp), AV_OPT_TYPE_DOUBLE, {.dbl = -2.}, -9., 0., FLAGS }, - { "tp", "set maximum true peak", OFFSET(target_tp), AV_OPT_TYPE_DOUBLE, {.dbl = -2.}, -9., 0., FLAGS }, - { "measured_I", "measured IL of input file", OFFSET(measured_i), AV_OPT_TYPE_DOUBLE, {.dbl = 0.}, -99., 0., FLAGS }, - { "measured_i", "measured IL of input file", OFFSET(measured_i), AV_OPT_TYPE_DOUBLE, {.dbl = 0.}, -99., 0., FLAGS }, - { "measured_LRA", "measured LRA of input file", OFFSET(measured_lra), AV_OPT_TYPE_DOUBLE, {.dbl = 0.}, 0., 99., FLAGS }, - { "measured_lra", "measured LRA of input file", OFFSET(measured_lra), AV_OPT_TYPE_DOUBLE, {.dbl = 0.}, 0., 99., FLAGS }, - { "measured_TP", "measured true peak of input file", OFFSET(measured_tp), AV_OPT_TYPE_DOUBLE, {.dbl = 99.}, -99., 99., FLAGS }, - { "measured_tp", "measured true peak of input file", OFFSET(measured_tp), AV_OPT_TYPE_DOUBLE, {.dbl = 99.}, -99., 99., FLAGS }, - { "measured_thresh", "measured threshold of input file", OFFSET(measured_thresh), AV_OPT_TYPE_DOUBLE, {.dbl = -70.}, -99., 0., FLAGS }, - { "offset", "set offset gain", OFFSET(offset), AV_OPT_TYPE_DOUBLE, {.dbl = 0.}, -99., 99., FLAGS }, - { "linear", "normalize linearly if possible", OFFSET(linear), AV_OPT_TYPE_BOOL, {.i64 = 1}, 0, 1, FLAGS }, - { "dual_mono", "treat mono input as dual-mono", OFFSET(dual_mono), AV_OPT_TYPE_BOOL, {.i64 = 0}, 0, 1, FLAGS }, - { "print_format", "set print format for stats", OFFSET(print_format), AV_OPT_TYPE_INT, {.i64 = NONE}, NONE, PF_NB -1, FLAGS, "print_format" }, - { "none", 0, 0, AV_OPT_TYPE_CONST, {.i64 = NONE}, 0, 0, FLAGS, "print_format" }, - { "json", 0, 0, AV_OPT_TYPE_CONST, {.i64 = JSON}, 0, 0, FLAGS, "print_format" }, - { "summary", 0, 0, AV_OPT_TYPE_CONST, {.i64 = SUMMARY}, 0, 0, FLAGS, "print_format" }, - { NULL } -}; - -AVFILTER_DEFINE_CLASS(loudnorm); - -static inline int frame_size(int sample_rate, int frame_len_msec) -{ - const int frame_size = round((double)sample_rate * (frame_len_msec / 1000.0)); - return frame_size + (frame_size % 2); -} - -static void init_gaussian_filter(LoudNormContext *s) -{ - double total_weight = 0.0; - const double sigma = 3.5; - double adjust; - int i; - - const int offset = 21 / 2; - const double c1 = 1.0 / (sigma * sqrt(2.0 * M_PI)); - const double c2 = 2.0 * pow(sigma, 2.0); - - for (i = 0; i < 21; i++) { - const int x = i - offset; - s->weights[i] = c1 * exp(-(pow(x, 2.0) / c2)); - total_weight += s->weights[i]; - } - - adjust = 1.0 / total_weight; - for (i = 0; i < 21; i++) - s->weights[i] *= adjust; -} - -static double gaussian_filter(LoudNormContext *s, int index) -{ - double result = 0.; - int i; - - index = index - 10 > 0 ? index - 10 : index + 20; - for (i = 0; i < 21; i++) - result += s->delta[((index + i) < 30) ? (index + i) : (index + i - 30)] * s->weights[i]; - - return result; -} - -static void detect_peak(LoudNormContext *s, int offset, int nb_samples, int channels, int *peak_delta, double *peak_value) -{ - int n, c, i, index; - double ceiling; - double *buf; - - *peak_delta = -1; - buf = s->limiter_buf; - ceiling = s->target_tp; - - index = s->limiter_buf_index + (offset * channels) + (1920 * channels); - if (index >= s->limiter_buf_size) - index -= s->limiter_buf_size; - - if (s->frame_type == FIRST_FRAME) { - for (c = 0; c < channels; c++) - s->prev_smp[c] = fabs(buf[index + c - channels]); - } - - for (n = 0; n < nb_samples; n++) { - for (c = 0; c < channels; c++) { - double this, next, max_peak; - - this = fabs(buf[(index + c) < s->limiter_buf_size ? (index + c) : (index + c - s->limiter_buf_size)]); - next = fabs(buf[(index + c + channels) < s->limiter_buf_size ? (index + c + channels) : (index + c + channels - s->limiter_buf_size)]); - - if ((s->prev_smp[c] <= this) && (next <= this) && (this > ceiling) && (n > 0)) { - int detected; - - detected = 1; - for (i = 2; i < 12; i++) { - next = fabs(buf[(index + c + (i * channels)) < s->limiter_buf_size ? (index + c + (i * channels)) : (index + c + (i * channels) - s->limiter_buf_size)]); - if (next > this) { - detected = 0; - break; - } - } - - if (!detected) - continue; - - for (c = 0; c < channels; c++) { - if (c == 0 || fabs(buf[index + c]) > max_peak) - max_peak = fabs(buf[index + c]); - - s->prev_smp[c] = fabs(buf[(index + c) < s->limiter_buf_size ? (index + c) : (index + c - s->limiter_buf_size)]); - } - - *peak_delta = n; - s->peak_index = index; - *peak_value = max_peak; - return; - } - - s->prev_smp[c] = this; - } - - index += channels; - if (index >= s->limiter_buf_size) - index -= s->limiter_buf_size; - } -} - -static void true_peak_limiter(LoudNormContext *s, double *out, int nb_samples, int channels) -{ - int n, c, index, peak_delta, smp_cnt; - double ceiling, peak_value; - double *buf; - - buf = s->limiter_buf; - ceiling = s->target_tp; - index = s->limiter_buf_index; - smp_cnt = 0; - - if (s->frame_type == FIRST_FRAME) { - double max; - - max = 0.; - for (n = 0; n < 1920; n++) { - for (c = 0; c < channels; c++) { - max = fabs(buf[c]) > max ? fabs(buf[c]) : max; - } - buf += channels; - } - - if (max > ceiling) { - s->gain_reduction[1] = ceiling / max; - s->limiter_state = SUSTAIN; - buf = s->limiter_buf; - - for (n = 0; n < 1920; n++) { - for (c = 0; c < channels; c++) { - double env; - env = s->gain_reduction[1]; - buf[c] *= env; - } - buf += channels; - } - } - - buf = s->limiter_buf; - } - - do { - - switch(s->limiter_state) { - case OUT: - detect_peak(s, smp_cnt, nb_samples - smp_cnt, channels, &peak_delta, &peak_value); - if (peak_delta != -1) { - s->env_cnt = 0; - smp_cnt += (peak_delta - s->attack_length); - s->gain_reduction[0] = 1.; - s->gain_reduction[1] = ceiling / peak_value; - s->limiter_state = ATTACK; - - s->env_index = s->peak_index - (s->attack_length * channels); - if (s->env_index < 0) - s->env_index += s->limiter_buf_size; - - s->env_index += (s->env_cnt * channels); - if (s->env_index > s->limiter_buf_size) - s->env_index -= s->limiter_buf_size; - - } else { - smp_cnt = nb_samples; - } - break; - - case ATTACK: - for (; s->env_cnt < s->attack_length; s->env_cnt++) { - for (c = 0; c < channels; c++) { - double env; - env = s->gain_reduction[0] - ((double) s->env_cnt / (s->attack_length - 1) * (s->gain_reduction[0] - s->gain_reduction[1])); - buf[s->env_index + c] *= env; - } - - s->env_index += channels; - if (s->env_index >= s->limiter_buf_size) - s->env_index -= s->limiter_buf_size; - - smp_cnt++; - if (smp_cnt >= nb_samples) { - s->env_cnt++; - break; - } - } - - if (smp_cnt < nb_samples) { - s->env_cnt = 0; - s->attack_length = 1920; - s->limiter_state = SUSTAIN; - } - break; - - case SUSTAIN: - detect_peak(s, smp_cnt, nb_samples, channels, &peak_delta, &peak_value); - if (peak_delta == -1) { - s->limiter_state = RELEASE; - s->gain_reduction[0] = s->gain_reduction[1]; - s->gain_reduction[1] = 1.; - s->env_cnt = 0; - break; - } else { - double gain_reduction; - gain_reduction = ceiling / peak_value; - - if (gain_reduction < s->gain_reduction[1]) { - s->limiter_state = ATTACK; - - s->attack_length = peak_delta; - if (s->attack_length <= 1) - s->attack_length = 2; - - s->gain_reduction[0] = s->gain_reduction[1]; - s->gain_reduction[1] = gain_reduction; - s->env_cnt = 0; - break; - } - - for (s->env_cnt = 0; s->env_cnt < peak_delta; s->env_cnt++) { - for (c = 0; c < channels; c++) { - double env; - env = s->gain_reduction[1]; - buf[s->env_index + c] *= env; - } - - s->env_index += channels; - if (s->env_index >= s->limiter_buf_size) - s->env_index -= s->limiter_buf_size; - - smp_cnt++; - if (smp_cnt >= nb_samples) { - s->env_cnt++; - break; - } - } - } - break; - - case RELEASE: - for (; s->env_cnt < s->release_length; s->env_cnt++) { - for (c = 0; c < channels; c++) { - double env; - env = s->gain_reduction[0] + (((double) s->env_cnt / (s->release_length - 1)) * (s->gain_reduction[1] - s->gain_reduction[0])); - buf[s->env_index + c] *= env; - } - - s->env_index += channels; - if (s->env_index >= s->limiter_buf_size) - s->env_index -= s->limiter_buf_size; - - smp_cnt++; - if (smp_cnt >= nb_samples) { - s->env_cnt++; - break; - } - } - - if (smp_cnt < nb_samples) { - s->env_cnt = 0; - s->limiter_state = OUT; - } - - break; - } - - } while (smp_cnt < nb_samples); - - for (n = 0; n < nb_samples; n++) { - for (c = 0; c < channels; c++) { - out[c] = buf[index + c]; - if (fabs(out[c]) > ceiling) { - out[c] = ceiling * (out[c] < 0 ? -1 : 1); - } - } - out += channels; - index += channels; - if (index >= s->limiter_buf_size) - index -= s->limiter_buf_size; - } -} - -static int filter_frame(AVFilterLink *inlink, AVFrame *in) -{ - AVFilterContext *ctx = inlink->dst; - LoudNormContext *s = ctx->priv; - AVFilterLink *outlink = ctx->outputs[0]; - AVFrame *out; - const double *src; - double *dst; - double *buf; - double *limiter_buf; - int i, n, c, subframe_length, src_index; - double gain, gain_next, env_global, env_shortterm, - global, shortterm, lra, relative_threshold; - - if (av_frame_is_writable(in)) { - out = in; - } else { - out = ff_get_audio_buffer(outlink, in->nb_samples); - if (!out) { - av_frame_free(&in); - return AVERROR(ENOMEM); - } - av_frame_copy_props(out, in); - } - - out->pts = s->pts[0]; - memmove(s->pts, &s->pts[1], (FF_ARRAY_ELEMS(s->pts) - 1) * sizeof(s->pts[0])); - - src = (const double *)in->data[0]; - dst = (double *)out->data[0]; - buf = s->buf; - limiter_buf = s->limiter_buf; - - ff_ebur128_add_frames_double(s->r128_in, src, in->nb_samples); - - if (s->frame_type == FIRST_FRAME && in->nb_samples < frame_size(inlink->sample_rate, 3000)) { - double offset, offset_tp, true_peak; - - ff_ebur128_loudness_global(s->r128_in, &global); - for (c = 0; c < inlink->ch_layout.nb_channels; c++) { - double tmp; - ff_ebur128_sample_peak(s->r128_in, c, &tmp); - if (c == 0 || tmp > true_peak) - true_peak = tmp; - } - - offset = pow(10., (s->target_i - global) / 20.); - offset_tp = true_peak * offset; - s->offset = offset_tp < s->target_tp ? offset : s->target_tp / true_peak; - s->frame_type = LINEAR_MODE; - } - - switch (s->frame_type) { - case FIRST_FRAME: - for (n = 0; n < in->nb_samples; n++) { - for (c = 0; c < inlink->ch_layout.nb_channels; c++) { - buf[s->buf_index + c] = src[c]; - } - src += inlink->ch_layout.nb_channels; - s->buf_index += inlink->ch_layout.nb_channels; - } - - ff_ebur128_loudness_shortterm(s->r128_in, &shortterm); - - if (shortterm < s->measured_thresh) { - s->above_threshold = 0; - env_shortterm = shortterm <= -70. ? 0. : s->target_i - s->measured_i; - } else { - s->above_threshold = 1; - env_shortterm = shortterm <= -70. ? 0. : s->target_i - shortterm; - } - - for (n = 0; n < 30; n++) - s->delta[n] = pow(10., env_shortterm / 20.); - s->prev_delta = s->delta[s->index]; - - s->buf_index = - s->limiter_buf_index = 0; - - for (n = 0; n < (s->limiter_buf_size / inlink->ch_layout.nb_channels); n++) { - for (c = 0; c < inlink->ch_layout.nb_channels; c++) { - limiter_buf[s->limiter_buf_index + c] = buf[s->buf_index + c] * s->delta[s->index] * s->offset; - } - s->limiter_buf_index += inlink->ch_layout.nb_channels; - if (s->limiter_buf_index >= s->limiter_buf_size) - s->limiter_buf_index -= s->limiter_buf_size; - - s->buf_index += inlink->ch_layout.nb_channels; - } - - subframe_length = frame_size(inlink->sample_rate, 100); - true_peak_limiter(s, dst, subframe_length, inlink->ch_layout.nb_channels); - ff_ebur128_add_frames_double(s->r128_out, dst, subframe_length); - - out->nb_samples = subframe_length; - - s->frame_type = INNER_FRAME; - break; - - case INNER_FRAME: - gain = gaussian_filter(s, s->index + 10 < 30 ? s->index + 10 : s->index + 10 - 30); - gain_next = gaussian_filter(s, s->index + 11 < 30 ? s->index + 11 : s->index + 11 - 30); - - for (n = 0; n < in->nb_samples; n++) { - for (c = 0; c < inlink->ch_layout.nb_channels; c++) { - buf[s->prev_buf_index + c] = src[c]; - limiter_buf[s->limiter_buf_index + c] = buf[s->buf_index + c] * (gain + (((double) n / in->nb_samples) * (gain_next - gain))) * s->offset; - } - src += inlink->ch_layout.nb_channels; - - s->limiter_buf_index += inlink->ch_layout.nb_channels; - if (s->limiter_buf_index >= s->limiter_buf_size) - s->limiter_buf_index -= s->limiter_buf_size; - - s->prev_buf_index += inlink->ch_layout.nb_channels; - if (s->prev_buf_index >= s->buf_size) - s->prev_buf_index -= s->buf_size; - - s->buf_index += inlink->ch_layout.nb_channels; - if (s->buf_index >= s->buf_size) - s->buf_index -= s->buf_size; - } - - subframe_length = (frame_size(inlink->sample_rate, 100) - in->nb_samples) * inlink->ch_layout.nb_channels; - s->limiter_buf_index = s->limiter_buf_index + subframe_length < s->limiter_buf_size ? s->limiter_buf_index + subframe_length : s->limiter_buf_index + subframe_length - s->limiter_buf_size; - - true_peak_limiter(s, dst, in->nb_samples, inlink->ch_layout.nb_channels); - ff_ebur128_add_frames_double(s->r128_out, dst, in->nb_samples); - - ff_ebur128_loudness_range(s->r128_in, &lra); - ff_ebur128_loudness_global(s->r128_in, &global); - ff_ebur128_loudness_shortterm(s->r128_in, &shortterm); - ff_ebur128_relative_threshold(s->r128_in, &relative_threshold); - - if (s->above_threshold == 0) { - double shortterm_out; - - if (shortterm > s->measured_thresh) - s->prev_delta *= 1.0058; - - ff_ebur128_loudness_shortterm(s->r128_out, &shortterm_out); - if (shortterm_out >= s->target_i) - s->above_threshold = 1; - } - - if (shortterm < relative_threshold || shortterm <= -70. || s->above_threshold == 0) { - s->delta[s->index] = s->prev_delta; - } else { - env_global = fabs(shortterm - global) < (s->target_lra / 2.) ? shortterm - global : (s->target_lra / 2.) * ((shortterm - global) < 0 ? -1 : 1); - env_shortterm = s->target_i - shortterm; - s->delta[s->index] = pow(10., (env_global + env_shortterm) / 20.); - } - - s->prev_delta = s->delta[s->index]; - s->index++; - if (s->index >= 30) - s->index -= 30; - s->prev_nb_samples = in->nb_samples; - break; - - case FINAL_FRAME: - gain = gaussian_filter(s, s->index + 10 < 30 ? s->index + 10 : s->index + 10 - 30); - s->limiter_buf_index = 0; - src_index = 0; - - for (n = 0; n < s->limiter_buf_size / inlink->ch_layout.nb_channels; n++) { - for (c = 0; c < inlink->ch_layout.nb_channels; c++) { - s->limiter_buf[s->limiter_buf_index + c] = src[src_index + c] * gain * s->offset; - } - src_index += inlink->ch_layout.nb_channels; - - s->limiter_buf_index += inlink->ch_layout.nb_channels; - if (s->limiter_buf_index >= s->limiter_buf_size) - s->limiter_buf_index -= s->limiter_buf_size; - } - - subframe_length = frame_size(inlink->sample_rate, 100); - for (i = 0; i < in->nb_samples / subframe_length; i++) { - true_peak_limiter(s, dst, subframe_length, inlink->ch_layout.nb_channels); - - for (n = 0; n < subframe_length; n++) { - for (c = 0; c < inlink->ch_layout.nb_channels; c++) { - if (src_index < (in->nb_samples * inlink->ch_layout.nb_channels)) { - limiter_buf[s->limiter_buf_index + c] = src[src_index + c] * gain * s->offset; - } else { - limiter_buf[s->limiter_buf_index + c] = 0.; - } - } - - if (src_index < (in->nb_samples * inlink->ch_layout.nb_channels)) - src_index += inlink->ch_layout.nb_channels; - - s->limiter_buf_index += inlink->ch_layout.nb_channels; - if (s->limiter_buf_index >= s->limiter_buf_size) - s->limiter_buf_index -= s->limiter_buf_size; - } - - dst += (subframe_length * inlink->ch_layout.nb_channels); - } - - dst = (double *)out->data[0]; - ff_ebur128_add_frames_double(s->r128_out, dst, in->nb_samples); - break; - - case LINEAR_MODE: - for (n = 0; n < in->nb_samples; n++) { - for (c = 0; c < inlink->ch_layout.nb_channels; c++) { - dst[c] = src[c] * s->offset; - } - src += inlink->ch_layout.nb_channels; - dst += inlink->ch_layout.nb_channels; - } - - dst = (double *)out->data[0]; - ff_ebur128_add_frames_double(s->r128_out, dst, in->nb_samples); - break; - } - - if (in != out) - av_frame_free(&in); - return ff_filter_frame(outlink, out); -} - -static int flush_frame(AVFilterLink *outlink) -{ - AVFilterContext *ctx = outlink->src; - AVFilterLink *inlink = ctx->inputs[0]; - LoudNormContext *s = ctx->priv; - int ret = 0; - - if (s->frame_type == INNER_FRAME) { - double *src; - double *buf; - int nb_samples, n, c, offset; - AVFrame *frame; - - nb_samples = (s->buf_size / inlink->ch_layout.nb_channels) - s->prev_nb_samples; - nb_samples -= (frame_size(inlink->sample_rate, 100) - s->prev_nb_samples); - - frame = ff_get_audio_buffer(outlink, nb_samples); - if (!frame) - return AVERROR(ENOMEM); - frame->nb_samples = nb_samples; - - buf = s->buf; - src = (double *)frame->data[0]; - - offset = ((s->limiter_buf_size / inlink->ch_layout.nb_channels) - s->prev_nb_samples) * inlink->ch_layout.nb_channels; - offset -= (frame_size(inlink->sample_rate, 100) - s->prev_nb_samples) * inlink->ch_layout.nb_channels; - s->buf_index = s->buf_index - offset < 0 ? s->buf_index - offset + s->buf_size : s->buf_index - offset; - - for (n = 0; n < nb_samples; n++) { - for (c = 0; c < inlink->ch_layout.nb_channels; c++) { - src[c] = buf[s->buf_index + c]; - } - src += inlink->ch_layout.nb_channels; - s->buf_index += inlink->ch_layout.nb_channels; - if (s->buf_index >= s->buf_size) - s->buf_index -= s->buf_size; - } - - s->frame_type = FINAL_FRAME; - ret = filter_frame(inlink, frame); - } - return ret; -} - -static int activate(AVFilterContext *ctx) -{ - AVFilterLink *inlink = ctx->inputs[0]; - AVFilterLink *outlink = ctx->outputs[0]; - LoudNormContext *s = ctx->priv; - AVFrame *in = NULL; - int ret = 0, status; - int64_t pts; - - FF_FILTER_FORWARD_STATUS_BACK(outlink, inlink); - - if (s->frame_type != LINEAR_MODE) { - int nb_samples; - - if (s->frame_type == FIRST_FRAME) { - nb_samples = frame_size(inlink->sample_rate, 3000); - } else { - nb_samples = frame_size(inlink->sample_rate, 100); - } - - ret = ff_inlink_consume_samples(inlink, nb_samples, nb_samples, &in); - } else { - ret = ff_inlink_consume_frame(inlink, &in); - } - - if (ret < 0) - return ret; - if (ret > 0) { - if (s->frame_type == FIRST_FRAME) { - const int nb_samples = frame_size(inlink->sample_rate, 100); - - for (int i = 0; i < FF_ARRAY_ELEMS(s->pts); i++) - s->pts[i] = in->pts + i * nb_samples; - } else if (s->frame_type == LINEAR_MODE) { - s->pts[0] = in->pts; - } else { - s->pts[FF_ARRAY_ELEMS(s->pts) - 1] = in->pts; - } - ret = filter_frame(inlink, in); - } - if (ret < 0) - return ret; - - if (ff_inlink_acknowledge_status(inlink, &status, &pts)) { - ff_outlink_set_status(outlink, status, pts); - return flush_frame(outlink); - } - - FF_FILTER_FORWARD_WANTED(outlink, inlink); - - return FFERROR_NOT_READY; -} - -static int query_formats(AVFilterContext *ctx) -{ - LoudNormContext *s = ctx->priv; - static const int input_srate[] = {192000, -1}; - static const enum AVSampleFormat sample_fmts[] = { - AV_SAMPLE_FMT_DBL, - AV_SAMPLE_FMT_NONE - }; - int ret = ff_set_common_all_channel_counts(ctx); - if (ret < 0) - return ret; - - ret = ff_set_common_formats_from_list(ctx, sample_fmts); - if (ret < 0) - return ret; - - if (s->frame_type == LINEAR_MODE) { - return ff_set_common_all_samplerates(ctx); - } else { - return ff_set_common_samplerates_from_list(ctx, input_srate); - } -} - -static int config_input(AVFilterLink *inlink) -{ - AVFilterContext *ctx = inlink->dst; - LoudNormContext *s = ctx->priv; - - s->r128_in = ff_ebur128_init(inlink->ch_layout.nb_channels, inlink->sample_rate, 0, FF_EBUR128_MODE_I | FF_EBUR128_MODE_S | FF_EBUR128_MODE_LRA | FF_EBUR128_MODE_SAMPLE_PEAK); - if (!s->r128_in) - return AVERROR(ENOMEM); - - s->r128_out = ff_ebur128_init(inlink->ch_layout.nb_channels, inlink->sample_rate, 0, FF_EBUR128_MODE_I | FF_EBUR128_MODE_S | FF_EBUR128_MODE_LRA | FF_EBUR128_MODE_SAMPLE_PEAK); - if (!s->r128_out) - return AVERROR(ENOMEM); - - if (inlink->ch_layout.nb_channels == 1 && s->dual_mono) { - ff_ebur128_set_channel(s->r128_in, 0, FF_EBUR128_DUAL_MONO); - ff_ebur128_set_channel(s->r128_out, 0, FF_EBUR128_DUAL_MONO); - } - - s->buf_size = frame_size(inlink->sample_rate, 3000) * inlink->ch_layout.nb_channels; - s->buf = av_malloc_array(s->buf_size, sizeof(*s->buf)); - if (!s->buf) - return AVERROR(ENOMEM); - - s->limiter_buf_size = frame_size(inlink->sample_rate, 210) * inlink->ch_layout.nb_channels; - s->limiter_buf = av_malloc_array(s->buf_size, sizeof(*s->limiter_buf)); - if (!s->limiter_buf) - return AVERROR(ENOMEM); - - s->prev_smp = av_malloc_array(inlink->ch_layout.nb_channels, sizeof(*s->prev_smp)); - if (!s->prev_smp) - return AVERROR(ENOMEM); - - init_gaussian_filter(s); - - s->buf_index = - s->prev_buf_index = - s->limiter_buf_index = 0; - s->channels = inlink->ch_layout.nb_channels; - s->index = 1; - s->limiter_state = OUT; - s->offset = pow(10., s->offset / 20.); - s->target_tp = pow(10., s->target_tp / 20.); - s->attack_length = frame_size(inlink->sample_rate, 10); - s->release_length = frame_size(inlink->sample_rate, 100); - - return 0; -} - -static av_cold int init(AVFilterContext *ctx) -{ - LoudNormContext *s = ctx->priv; - s->frame_type = FIRST_FRAME; - - if (s->linear) { - double offset, offset_tp; - offset = s->target_i - s->measured_i; - offset_tp = s->measured_tp + offset; - - if (s->measured_tp != 99 && s->measured_thresh != -70 && s->measured_lra != 0 && s->measured_i != 0) { - if ((offset_tp <= s->target_tp) && (s->measured_lra <= s->target_lra)) { - s->frame_type = LINEAR_MODE; - s->offset = offset; - } - } - } - - return 0; -} - -static av_cold void uninit(AVFilterContext *ctx) -{ - LoudNormContext *s = ctx->priv; - double i_in, i_out, lra_in, lra_out, thresh_in, thresh_out, tp_in, tp_out; - int c; - - if (!s->r128_in || !s->r128_out) - goto end; - - ff_ebur128_loudness_range(s->r128_in, &lra_in); - ff_ebur128_loudness_global(s->r128_in, &i_in); - ff_ebur128_relative_threshold(s->r128_in, &thresh_in); - for (c = 0; c < s->channels; c++) { - double tmp; - ff_ebur128_sample_peak(s->r128_in, c, &tmp); - if ((c == 0) || (tmp > tp_in)) - tp_in = tmp; - } - - ff_ebur128_loudness_range(s->r128_out, &lra_out); - ff_ebur128_loudness_global(s->r128_out, &i_out); - ff_ebur128_relative_threshold(s->r128_out, &thresh_out); - for (c = 0; c < s->channels; c++) { - double tmp; - ff_ebur128_sample_peak(s->r128_out, c, &tmp); - if ((c == 0) || (tmp > tp_out)) - tp_out = tmp; - } - - switch(s->print_format) { - case NONE: - break; - - case JSON: - av_log(ctx, AV_LOG_INFO, - "\n{\n" - "\t\"input_i\" : \"%.2f\",\n" - "\t\"input_tp\" : \"%.2f\",\n" - "\t\"input_lra\" : \"%.2f\",\n" - "\t\"input_thresh\" : \"%.2f\",\n" - "\t\"output_i\" : \"%.2f\",\n" - "\t\"output_tp\" : \"%+.2f\",\n" - "\t\"output_lra\" : \"%.2f\",\n" - "\t\"output_thresh\" : \"%.2f\",\n" - "\t\"normalization_type\" : \"%s\",\n" - "\t\"target_offset\" : \"%.2f\"\n" - "}\n", - i_in, - 20. * log10(tp_in), - lra_in, - thresh_in, - i_out, - 20. * log10(tp_out), - lra_out, - thresh_out, - s->frame_type == LINEAR_MODE ? "linear" : "dynamic", - s->target_i - i_out - ); - break; - - case SUMMARY: - av_log(ctx, AV_LOG_INFO, - "\n" - "Input Integrated: %+6.1f LUFS\n" - "Input True Peak: %+6.1f dBTP\n" - "Input LRA: %6.1f LU\n" - "Input Threshold: %+6.1f LUFS\n" - "\n" - "Output Integrated: %+6.1f LUFS\n" - "Output True Peak: %+6.1f dBTP\n" - "Output LRA: %6.1f LU\n" - "Output Threshold: %+6.1f LUFS\n" - "\n" - "Normalization Type: %s\n" - "Target Offset: %+6.1f LU\n", - i_in, - 20. * log10(tp_in), - lra_in, - thresh_in, - i_out, - 20. * log10(tp_out), - lra_out, - thresh_out, - s->frame_type == LINEAR_MODE ? "Linear" : "Dynamic", - s->target_i - i_out - ); - break; - } - -end: - if (s->r128_in) - ff_ebur128_destroy(&s->r128_in); - if (s->r128_out) - ff_ebur128_destroy(&s->r128_out); - av_freep(&s->limiter_buf); - av_freep(&s->prev_smp); - av_freep(&s->buf); -} - -static const AVFilterPad avfilter_af_loudnorm_inputs[] = { - { - .name = "default", - .type = AVMEDIA_TYPE_AUDIO, - .config_props = config_input, - }, -}; - -const AVFilter ff_af_loudnorm = { - .name = "loudnorm", - .description = NULL_IF_CONFIG_SMALL("EBU R128 loudness normalization"), - .priv_size = sizeof(LoudNormContext), - .priv_class = &loudnorm_class, - .init = init, - .activate = activate, - .uninit = uninit, - FILTER_INPUTS(avfilter_af_loudnorm_inputs), - FILTER_OUTPUTS(ff_audio_default_filterpad), - FILTER_QUERY_FUNC(query_formats), -}; diff --git a/libavfilter/ebur128.c b/libavfilter/ebur128.c deleted file mode 100644 index 062099e206..0000000000 --- a/libavfilter/ebur128.c +++ /dev/null @@ -1,725 +0,0 @@ -/* - * Copyright (c) 2011 Jan Kokemüller - * - * This file is part of FFmpeg. - * - * FFmpeg is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation; either - * version 2.1 of the License, or (at your option) any later version. - * - * FFmpeg is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * Lesser General Public License for more details. - * - * You should have received a copy of the GNU Lesser General Public - * License along with FFmpeg; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA - * - * This file is based on libebur128 which is available at - * https://github.com/jiixyj/libebur128/ - * - * Libebur128 has the following copyright: - * - * Permission is hereby granted, free of charge, to any person obtaining a copy - * of this software and associated documentation files (the "Software"), to deal - * in the Software without restriction, including without limitation the rights - * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell - * copies of the Software, and to permit persons to whom the Software is - * furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice shall be included in - * all copies or substantial portions of the Software. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE - * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, - * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN - * THE SOFTWARE. -*/ - -#include "ebur128.h" - -#include <float.h> -#include <limits.h> -#include <math.h> /* You may have to define _USE_MATH_DEFINES if you use MSVC */ - -#include "libavutil/error.h" -#include "libavutil/macros.h" -#include "libavutil/mem.h" -#include "libavutil/mem_internal.h" -#include "libavutil/thread.h" - -#define CHECK_ERROR(condition, errorcode, goto_point) \ - if ((condition)) { \ - errcode = (errorcode); \ - goto goto_point; \ - } - -#define ALMOST_ZERO 0.000001 - -#define RELATIVE_GATE (-10.0) -#define RELATIVE_GATE_FACTOR pow(10.0, RELATIVE_GATE / 10.0) -#define MINUS_20DB pow(10.0, -20.0 / 10.0) - -struct FFEBUR128StateInternal { - /** Filtered audio data (used as ring buffer). */ - double *audio_data; - /** Size of audio_data array. */ - size_t audio_data_frames; - /** Current index for audio_data. */ - size_t audio_data_index; - /** How many frames are needed for a gating block. Will correspond to 400ms - * of audio at initialization, and 100ms after the first block (75% overlap - * as specified in the 2011 revision of BS1770). */ - unsigned long needed_frames; - /** The channel map. Has as many elements as there are channels. */ - int *channel_map; - /** How many samples fit in 100ms (rounded). */ - unsigned long samples_in_100ms; - /** BS.1770 filter coefficients (nominator). */ - double b[5]; - /** BS.1770 filter coefficients (denominator). */ - double a[5]; - /** BS.1770 filter state. */ - double v[5][5]; - /** Histograms, used to calculate LRA. */ - unsigned long *block_energy_histogram; - unsigned long *short_term_block_energy_histogram; - /** Keeps track of when a new short term block is needed. */ - size_t short_term_frame_counter; - /** Maximum sample peak, one per channel */ - double *sample_peak; - /** The maximum window duration in ms. */ - unsigned long window; - /** Data pointer array for interleaved data */ - void **data_ptrs; -}; - -static AVOnce histogram_init = AV_ONCE_INIT; -static DECLARE_ALIGNED(32, double, histogram_energies)[1000]; -static DECLARE_ALIGNED(32, double, histogram_energy_boundaries)[1001]; - -static void ebur128_init_filter(FFEBUR128State * st) -{ - int i, j; - - double f0 = 1681.974450955533; - double G = 3.999843853973347; - double Q = 0.7071752369554196; - - double K = tan(M_PI * f0 / (double) st->samplerate); - double Vh = pow(10.0, G / 20.0); - double Vb = pow(Vh, 0.4996667741545416); - - double pb[3] = { 0.0, 0.0, 0.0 }; - double pa[3] = { 1.0, 0.0, 0.0 }; - double rb[3] = { 1.0, -2.0, 1.0 }; - double ra[3] = { 1.0, 0.0, 0.0 }; - - double a0 = 1.0 + K / Q + K * K; - pb[0] = (Vh + Vb * K / Q + K * K) / a0; - pb[1] = 2.0 * (K * K - Vh) / a0; - pb[2] = (Vh - Vb * K / Q + K * K) / a0; - pa[1] = 2.0 * (K * K - 1.0) / a0; - pa[2] = (1.0 - K / Q + K * K) / a0; - - f0 = 38.13547087602444; - Q = 0.5003270373238773; - K = tan(M_PI * f0 / (double) st->samplerate); - - ra[1] = 2.0 * (K * K - 1.0) / (1.0 + K / Q + K * K); - ra[2] = (1.0 - K / Q + K * K) / (1.0 + K / Q + K * K); - - st->d->b[0] = pb[0] * rb[0]; - st->d->b[1] = pb[0] * rb[1] + pb[1] * rb[0]; - st->d->b[2] = pb[0] * rb[2] + pb[1] * rb[1] + pb[2] * rb[0]; - st->d->b[3] = pb[1] * rb[2] + pb[2] * rb[1]; - st->d->b[4] = pb[2] * rb[2]; - - st->d->a[0] = pa[0] * ra[0]; - st->d->a[1] = pa[0] * ra[1] + pa[1] * ra[0]; - st->d->a[2] = pa[0] * ra[2] + pa[1] * ra[1] + pa[2] * ra[0]; - st->d->a[3] = pa[1] * ra[2] + pa[2] * ra[1]; - st->d->a[4] = pa[2] * ra[2]; - - for (i = 0; i < 5; ++i) { - for (j = 0; j < 5; ++j) { - st->d->v[i][j] = 0.0; - } - } -} - -static int ebur128_init_channel_map(FFEBUR128State * st) -{ - size_t i; - st->d->channel_map = - (int *) av_malloc_array(st->channels, sizeof(*st->d->channel_map)); - if (!st->d->channel_map) - return AVERROR(ENOMEM); - if (st->channels == 4) { - st->d->channel_map[0] = FF_EBUR128_LEFT; - st->d->channel_map[1] = FF_EBUR128_RIGHT; - st->d->channel_map[2] = FF_EBUR128_LEFT_SURROUND; - st->d->channel_map[3] = FF_EBUR128_RIGHT_SURROUND; - } else if (st->channels == 5) { - st->d->channel_map[0] = FF_EBUR128_LEFT; - st->d->channel_map[1] = FF_EBUR128_RIGHT; - st->d->channel_map[2] = FF_EBUR128_CENTER; - st->d->channel_map[3] = FF_EBUR128_LEFT_SURROUND; - st->d->channel_map[4] = FF_EBUR128_RIGHT_SURROUND; - } else { - for (i = 0; i < st->channels; ++i) { - switch (i) { - case 0: - st->d->channel_map[i] = FF_EBUR128_LEFT; - break; - case 1: - st->d->channel_map[i] = FF_EBUR128_RIGHT; - break; - case 2: - st->d->channel_map[i] = FF_EBUR128_CENTER; - break; - case 3: - st->d->channel_map[i] = FF_EBUR128_UNUSED; - break; - case 4: - st->d->channel_map[i] = FF_EBUR128_LEFT_SURROUND; - break; - case 5: - st->d->channel_map[i] = FF_EBUR128_RIGHT_SURROUND; - break; - default: - st->d->channel_map[i] = FF_EBUR128_UNUSED; - break; - } - } - } - return 0; -} - -static inline void init_histogram(void) -{ - int i; - /* initialize static constants */ - histogram_energy_boundaries[0] = pow(10.0, (-70.0 + 0.691) / 10.0); - for (i = 0; i < 1000; ++i) { - histogram_energies[i] = - pow(10.0, ((double) i / 10.0 - 69.95 + 0.691) / 10.0); - } - for (i = 1; i < 1001; ++i) { - histogram_energy_boundaries[i] = - pow(10.0, ((double) i / 10.0 - 70.0 + 0.691) / 10.0); - } -} - -FFEBUR128State *ff_ebur128_init(unsigned int channels, - unsigned long samplerate, - unsigned long window, int mode) -{ - int errcode; - FFEBUR128State *st; - - st = (FFEBUR128State *) av_malloc(sizeof(*st)); - CHECK_ERROR(!st, 0, exit) - st->d = (struct FFEBUR128StateInternal *) - av_malloc(sizeof(*st->d)); - CHECK_ERROR(!st->d, 0, free_state) - st->channels = channels; - errcode = ebur128_init_channel_map(st); - CHECK_ERROR(errcode, 0, free_internal) - - st->d->sample_peak = - (double *) av_calloc(channels, sizeof(*st->d->sample_peak)); - CHECK_ERROR(!st->d->sample_peak, 0, free_channel_map) - - st->samplerate = samplerate; - st->d->samples_in_100ms = (st->samplerate + 5) / 10; - st->mode = mode; - if ((mode & FF_EBUR128_MODE_S) == FF_EBUR128_MODE_S) { - st->d->window = FFMAX(window, 3000); - } else if ((mode & FF_EBUR128_MODE_M) == FF_EBUR128_MODE_M) { - st->d->window = FFMAX(window, 400); - } else { - goto free_sample_peak; - } - st->d->audio_data_frames = st->samplerate * st->d->window / 1000; - if (st->d->audio_data_frames % st->d->samples_in_100ms) { - /* round up to multiple of samples_in_100ms */ - st->d->audio_data_frames = st->d->audio_data_frames - + st->d->samples_in_100ms - - (st->d->audio_data_frames % st->d->samples_in_100ms); - } - st->d->audio_data = - (double *) av_calloc(st->d->audio_data_frames, - st->channels * sizeof(*st->d->audio_data)); - CHECK_ERROR(!st->d->audio_data, 0, free_sample_peak) - - ebur128_init_filter(st); - - st->d->block_energy_histogram = - av_mallocz(1000 * sizeof(*st->d->block_energy_histogram)); - CHECK_ERROR(!st->d->block_energy_histogram, 0, free_audio_data) - st->d->short_term_block_energy_histogram = - av_mallocz(1000 * sizeof(*st->d->short_term_block_energy_histogram)); - CHECK_ERROR(!st->d->short_term_block_energy_histogram, 0, - free_block_energy_histogram) - st->d->short_term_frame_counter = 0; - - /* the first block needs 400ms of audio data */ - st->d->needed_frames = st->d->samples_in_100ms * 4; - /* start at the beginning of the buffer */ - st->d->audio_data_index = 0; - - if (ff_thread_once(&histogram_init, &init_histogram) != 0) - goto free_short_term_block_energy_histogram; - - st->d->data_ptrs = av_malloc_array(channels, sizeof(*st->d->data_ptrs)); - CHECK_ERROR(!st->d->data_ptrs, 0, - free_short_term_block_energy_histogram); - - return st; - -free_short_term_block_energy_histogram: - av_free(st->d->short_term_block_energy_histogram); -free_block_energy_histogram: - av_free(st->d->block_energy_histogram); -free_audio_data: - av_free(st->d->audio_data); -free_sample_peak: - av_free(st->d->sample_peak); -free_channel_map: - av_free(st->d->channel_map); -free_internal: - av_free(st->d); -free_state: - av_free(st); -exit: - return NULL; -} - -void ff_ebur128_destroy(FFEBUR128State ** st) -{ - av_free((*st)->d->block_energy_histogram); - av_free((*st)->d->short_term_block_energy_histogram); - av_free((*st)->d->audio_data); - av_free((*st)->d->channel_map); - av_free((*st)->d->sample_peak); - av_free((*st)->d->data_ptrs); - av_free((*st)->d); - av_free(*st); - *st = NULL; -} - -#define EBUR128_FILTER(type, scaling_factor) \ -static void ebur128_filter_##type(FFEBUR128State* st, const type** srcs, \ - size_t src_index, size_t frames, \ - int stride) { \ - double* audio_data = st->d->audio_data + st->d->audio_data_index; \ - size_t i, c; \ - \ - if ((st->mode & FF_EBUR128_MODE_SAMPLE_PEAK) == FF_EBUR128_MODE_SAMPLE_PEAK) { \ - for (c = 0; c < st->channels; ++c) { \ - double max = 0.0; \ - for (i = 0; i < frames; ++i) { \ - type v = srcs[c][src_index + i * stride]; \ - if (v > max) { \ - max = v; \ - } else if (-v > max) { \ - max = -1.0 * v; \ - } \ - } \ - max /= scaling_factor; \ - if (max > st->d->sample_peak[c]) st->d->sample_peak[c] = max; \ - } \ - } \ - for (c = 0; c < st->channels; ++c) { \ - int ci = st->d->channel_map[c] - 1; \ - if (ci < 0) continue; \ - else if (ci == FF_EBUR128_DUAL_MONO - 1) ci = 0; /*dual mono */ \ - for (i = 0; i < frames; ++i) { \ - st->d->v[ci][0] = (double) (srcs[c][src_index + i * stride] / scaling_factor) \ - - st->d->a[1] * st->d->v[ci][1] \ - - st->d->a[2] * st->d->v[ci][2] \ - - st->d->a[3] * st->d->v[ci][3] \ - - st->d->a[4] * st->d->v[ci][4]; \ - audio_data[i * st->channels + c] = \ - st->d->b[0] * st->d->v[ci][0] \ - + st->d->b[1] * st->d->v[ci][1] \ - + st->d->b[2] * st->d->v[ci][2] \ - + st->d->b[3] * st->d->v[ci][3] \ - + st->d->b[4] * st->d->v[ci][4]; \ - st->d->v[ci][4] = st->d->v[ci][3]; \ - st->d->v[ci][3] = st->d->v[ci][2]; \ - st->d->v[ci][2] = st->d->v[ci][1]; \ - st->d->v[ci][1] = st->d->v[ci][0]; \ - } \ - st->d->v[ci][4] = fabs(st->d->v[ci][4]) < DBL_MIN ? 0.0 : st->d->v[ci][4]; \ - st->d->v[ci][3] = fabs(st->d->v[ci][3]) < DBL_MIN ? 0.0 : st->d->v[ci][3]; \ - st->d->v[ci][2] = fabs(st->d->v[ci][2]) < DBL_MIN ? 0.0 : st->d->v[ci][2]; \ - st->d->v[ci][1] = fabs(st->d->v[ci][1]) < DBL_MIN ? 0.0 : st->d->v[ci][1]; \ - } \ -} -EBUR128_FILTER(double, 1.0) - -static double ebur128_energy_to_loudness(double energy) -{ - return 10 * log10(energy) - 0.691; -} - -static size_t find_histogram_index(double energy) -{ - size_t index_min = 0; - size_t index_max = 1000; - size_t index_mid; - - do { - index_mid = (index_min + index_max) / 2; - if (energy >= histogram_energy_boundaries[index_mid]) { - index_min = index_mid; - } else { - index_max = index_mid; - } - } while (index_max - index_min != 1); - - return index_min; -} - -static void ebur128_calc_gating_block(FFEBUR128State * st, - size_t frames_per_block, - double *optional_output) -{ - size_t i, c; - double sum = 0.0; - double channel_sum; - for (c = 0; c < st->channels; ++c) { - if (st->d->channel_map[c] == FF_EBUR128_UNUSED) - continue; - channel_sum = 0.0; - if (st->d->audio_data_index < frames_per_block * st->channels) { - for (i = 0; i < st->d->audio_data_index / st->channels; ++i) { - channel_sum += st->d->audio_data[i * st->channels + c] * - st->d->audio_data[i * st->channels + c]; - } - for (i = st->d->audio_data_frames - - (frames_per_block - - st->d->audio_data_index / st->channels); - i < st->d->audio_data_frames; ++i) { - channel_sum += st->d->audio_data[i * st->channels + c] * - st->d->audio_data[i * st->channels + c]; - } - } else { - for (i = - st->d->audio_data_index / st->channels - frames_per_block; - i < st->d->audio_data_index / st->channels; ++i) { - channel_sum += - st->d->audio_data[i * st->channels + - c] * st->d->audio_data[i * - st->channels + - c]; - } - } - if (st->d->channel_map[c] == FF_EBUR128_Mp110 || - st->d->channel_map[c] == FF_EBUR128_Mm110 || - st->d->channel_map[c] == FF_EBUR128_Mp060 || - st->d->channel_map[c] == FF_EBUR128_Mm060 || - st->d->channel_map[c] == FF_EBUR128_Mp090 || - st->d->channel_map[c] == FF_EBUR128_Mm090) { - channel_sum *= 1.41; - } else if (st->d->channel_map[c] == FF_EBUR128_DUAL_MONO) { - channel_sum *= 2.0; - } - sum += channel_sum; - } - sum /= (double) frames_per_block; - if (optional_output) { - *optional_output = sum; - } else if (sum >= histogram_energy_boundaries[0]) { - ++st->d->block_energy_histogram[find_histogram_index(sum)]; - } -} - -int ff_ebur128_set_channel(FFEBUR128State * st, - unsigned int channel_number, int value) -{ - if (channel_number >= st->channels) { - return 1; - } - if (value == FF_EBUR128_DUAL_MONO && - (st->channels != 1 || channel_number != 0)) { - return 1; - } - st->d->channel_map[channel_number] = value; - return 0; -} - -static int ebur128_energy_shortterm(FFEBUR128State * st, double *out); -#define EBUR128_ADD_FRAMES_PLANAR(type) \ -static void ebur128_add_frames_planar_##type(FFEBUR128State* st, const type** srcs, \ - size_t frames, int stride) { \ - size_t src_index = 0; \ - while (frames > 0) { \ - if (frames >= st->d->needed_frames) { \ - ebur128_filter_##type(st, srcs, src_index, st->d->needed_frames, stride); \ - src_index += st->d->needed_frames * stride; \ - frames -= st->d->needed_frames; \ - st->d->audio_data_index += st->d->needed_frames * st->channels; \ - /* calculate the new gating block */ \ - if ((st->mode & FF_EBUR128_MODE_I) == FF_EBUR128_MODE_I) { \ - ebur128_calc_gating_block(st, st->d->samples_in_100ms * 4, NULL); \ - } \ - if ((st->mode & FF_EBUR128_MODE_LRA) == FF_EBUR128_MODE_LRA) { \ - st->d->short_term_frame_counter += st->d->needed_frames; \ - if (st->d->short_term_frame_counter == st->d->samples_in_100ms * 30) { \ - double st_energy; \ - ebur128_energy_shortterm(st, &st_energy); \ - if (st_energy >= histogram_energy_boundaries[0]) { \ - ++st->d->short_term_block_energy_histogram[ \ - find_histogram_index(st_energy)]; \ - } \ - st->d->short_term_frame_counter = st->d->samples_in_100ms * 20; \ - } \ - } \ - /* 100ms are needed for all blocks besides the first one */ \ - st->d->needed_frames = st->d->samples_in_100ms; \ - /* reset audio_data_index when buffer full */ \ - if (st->d->audio_data_index == st->d->audio_data_frames * st->channels) { \ - st->d->audio_data_index = 0; \ - } \ - } else { \ - ebur128_filter_##type(st, srcs, src_index, frames, stride); \ - st->d->audio_data_index += frames * st->channels; \ - if ((st->mode & FF_EBUR128_MODE_LRA) == FF_EBUR128_MODE_LRA) { \ - st->d->short_term_frame_counter += frames; \ - } \ - st->d->needed_frames -= frames; \ - frames = 0; \ - } \ - } \ -} -EBUR128_ADD_FRAMES_PLANAR(double) -#define FF_EBUR128_ADD_FRAMES(type) \ -void ff_ebur128_add_frames_##type(FFEBUR128State* st, const type* src, \ - size_t frames) { \ - int i; \ - const type **buf = (const type**)st->d->data_ptrs; \ - for (i = 0; i < st->channels; i++) \ - buf[i] = src + i; \ - ebur128_add_frames_planar_##type(st, buf, frames, st->channels); \ -} -FF_EBUR128_ADD_FRAMES(double) - -static int ebur128_calc_relative_threshold(FFEBUR128State **sts, size_t size, - double *relative_threshold) -{ - size_t i, j; - int above_thresh_counter = 0; - *relative_threshold = 0.0; - - for (i = 0; i < size; i++) { - unsigned long *block_energy_histogram = sts[i]->d->block_energy_histogram; - for (j = 0; j < 1000; ++j) { - *relative_threshold += block_energy_histogram[j] * histogram_energies[j]; - above_thresh_counter += block_energy_histogram[j]; - } - } - - if (above_thresh_counter != 0) { - *relative_threshold /= (double)above_thresh_counter; - *relative_threshold *= RELATIVE_GATE_FACTOR; - } - - return above_thresh_counter; -} - -static int ebur128_gated_loudness(FFEBUR128State ** sts, size_t size, - double *out) -{ - double gated_loudness = 0.0; - double relative_threshold; - size_t above_thresh_counter; - size_t i, j, start_index; - - for (i = 0; i < size; i++) - if ((sts[i]->mode & FF_EBUR128_MODE_I) != FF_EBUR128_MODE_I) - return AVERROR(EINVAL); - - if (!ebur128_calc_relative_threshold(sts, size, &relative_threshold)) { - *out = -HUGE_VAL; - return 0; - } - - above_thresh_counter = 0; - if (relative_threshold < histogram_energy_boundaries[0]) { - start_index = 0; - } else { - start_index = find_histogram_index(relative_threshold); - if (relative_threshold > histogram_energies[start_index]) { - ++start_index; - } - } - for (i = 0; i < size; i++) { - for (j = start_index; j < 1000; ++j) { - gated_loudness += sts[i]->d->block_energy_histogram[j] * - histogram_energies[j]; - above_thresh_counter += sts[i]->d->block_energy_histogram[j]; - } - } - if (!above_thresh_counter) { - *out = -HUGE_VAL; - return 0; - } - gated_loudness /= (double) above_thresh_counter; - *out = ebur128_energy_to_loudness(gated_loudness); - return 0; -} - -int ff_ebur128_relative_threshold(FFEBUR128State * st, double *out) -{ - double relative_threshold; - - if ((st->mode & FF_EBUR128_MODE_I) != FF_EBUR128_MODE_I) - return AVERROR(EINVAL); - - if (!ebur128_calc_relative_threshold(&st, 1, &relative_threshold)) { - *out = -70.0; - return 0; - } - - *out = ebur128_energy_to_loudness(relative_threshold); - return 0; -} - -int ff_ebur128_loudness_global(FFEBUR128State * st, double *out) -{ - return ebur128_gated_loudness(&st, 1, out); -} - -static int ebur128_energy_in_interval(FFEBUR128State * st, - size_t interval_frames, double *out) -{ - if (interval_frames > st->d->audio_data_frames) { - return AVERROR(EINVAL); - } - ebur128_calc_gating_block(st, interval_frames, out); - return 0; -} - -static int ebur128_energy_shortterm(FFEBUR128State * st, double *out) -{ - return ebur128_energy_in_interval(st, st->d->samples_in_100ms * 30, - out); -} - -int ff_ebur128_loudness_shortterm(FFEBUR128State * st, double *out) -{ - double energy; - int error = ebur128_energy_shortterm(st, &energy); - if (error) { - return error; - } else if (energy <= 0.0) { - *out = -HUGE_VAL; - return 0; - } - *out = ebur128_energy_to_loudness(energy); - return 0; -} - -/* EBU - TECH 3342 */ -int ff_ebur128_loudness_range_multiple(FFEBUR128State ** sts, size_t size, - double *out) -{ - size_t i, j; - size_t stl_size; - double stl_power, stl_integrated; - /* High and low percentile energy */ - double h_en, l_en; - unsigned long hist[1000] = { 0 }; - size_t percentile_low, percentile_high; - size_t index; - - for (i = 0; i < size; ++i) { - if (sts[i]) { - if ((sts[i]->mode & FF_EBUR128_MODE_LRA) != - FF_EBUR128_MODE_LRA) { - return AVERROR(EINVAL); - } - } - } - - stl_size = 0; - stl_power = 0.0; - for (i = 0; i < size; ++i) { - if (!sts[i]) - continue; - for (j = 0; j < 1000; ++j) { - hist[j] += sts[i]->d->short_term_block_energy_histogram[j]; - stl_size += sts[i]->d->short_term_block_energy_histogram[j]; - stl_power += sts[i]->d->short_term_block_energy_histogram[j] - * histogram_energies[j]; - } - } - if (!stl_size) { - *out = 0.0; - return 0; - } - - stl_power /= stl_size; - stl_integrated = MINUS_20DB * stl_power; - - if (stl_integrated < histogram_energy_boundaries[0]) { - index = 0; - } else { - index = find_histogram_index(stl_integrated); - if (stl_integrated > histogram_energies[index]) { - ++index; - } - } - stl_size = 0; - for (j = index; j < 1000; ++j) { - stl_size += hist[j]; - } - if (!stl_size) { - *out = 0.0; - return 0; - } - - percentile_low = (size_t) ((stl_size - 1) * 0.1 + 0.5); - percentile_high = (size_t) ((stl_size - 1) * 0.95 + 0.5); - - stl_size = 0; - j = index; - while (stl_size <= percentile_low) { - stl_size += hist[j++]; - } - l_en = histogram_energies[j - 1]; - while (stl_size <= percentile_high) { - stl_size += hist[j++]; - } - h_en = histogram_energies[j - 1]; - *out = - ebur128_energy_to_loudness(h_en) - - ebur128_energy_to_loudness(l_en); - return 0; -} - -int ff_ebur128_loudness_range(FFEBUR128State * st, double *out) -{ - return ff_ebur128_loudness_range_multiple(&st, 1, out); -} - -int ff_ebur128_sample_peak(FFEBUR128State * st, - unsigned int channel_number, double *out) -{ - if ((st->mode & FF_EBUR128_MODE_SAMPLE_PEAK) != - FF_EBUR128_MODE_SAMPLE_PEAK) { - return AVERROR(EINVAL); - } else if (channel_number >= st->channels) { - return AVERROR(EINVAL); - } - *out = st->d->sample_peak[channel_number]; - return 0; -} diff --git a/libavfilter/ebur128.h b/libavfilter/ebur128.h deleted file mode 100644 index 8e7385e044..0000000000 --- a/libavfilter/ebur128.h +++ /dev/null @@ -1,229 +0,0 @@ -/* - * Copyright (c) 2011 Jan Kokemüller - * - * This file is part of FFmpeg. - * - * FFmpeg is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation; either - * version 2.1 of the License, or (at your option) any later version. - * - * FFmpeg is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * Lesser General Public License for more details. - * - * You should have received a copy of the GNU Lesser General Public - * License along with FFmpeg; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA - * - * This file is based on libebur128 which is available at - * https://github.com/jiixyj/libebur128/ - * -*/ - -#ifndef AVFILTER_EBUR128_H -#define AVFILTER_EBUR128_H - -/** \file ebur128.h - * \brief libebur128 - a library for loudness measurement according to - * the EBU R128 standard. - */ - -#include <stddef.h> /* for size_t */ - -/** \enum channel - * Use these values when setting the channel map with ebur128_set_channel(). - * See definitions in ITU R-REC-BS 1770-4 - */ -enum channel { - FF_EBUR128_UNUSED = 0, /**< unused channel (for example LFE channel) */ - FF_EBUR128_LEFT, - FF_EBUR128_Mp030 = 1, /**< itu M+030 */ - FF_EBUR128_RIGHT, - FF_EBUR128_Mm030 = 2, /**< itu M-030 */ - FF_EBUR128_CENTER, - FF_EBUR128_Mp000 = 3, /**< itu M+000 */ - FF_EBUR128_LEFT_SURROUND, - FF_EBUR128_Mp110 = 4, /**< itu M+110 */ - FF_EBUR128_RIGHT_SURROUND, - FF_EBUR128_Mm110 = 5, /**< itu M-110 */ - FF_EBUR128_DUAL_MONO, /**< a channel that is counted twice */ - FF_EBUR128_MpSC, /**< itu M+SC */ - FF_EBUR128_MmSC, /**< itu M-SC */ - FF_EBUR128_Mp060, /**< itu M+060 */ - FF_EBUR128_Mm060, /**< itu M-060 */ - FF_EBUR128_Mp090, /**< itu M+090 */ - FF_EBUR128_Mm090, /**< itu M-090 */ - FF_EBUR128_Mp135, /**< itu M+135 */ - FF_EBUR128_Mm135, /**< itu M-135 */ - FF_EBUR128_Mp180, /**< itu M+180 */ - FF_EBUR128_Up000, /**< itu U+000 */ - FF_EBUR128_Up030, /**< itu U+030 */ - FF_EBUR128_Um030, /**< itu U-030 */ - FF_EBUR128_Up045, /**< itu U+045 */ - FF_EBUR128_Um045, /**< itu U-030 */ - FF_EBUR128_Up090, /**< itu U+090 */ - FF_EBUR128_Um090, /**< itu U-090 */ - FF_EBUR128_Up110, /**< itu U+110 */ - FF_EBUR128_Um110, /**< itu U-110 */ - FF_EBUR128_Up135, /**< itu U+135 */ - FF_EBUR128_Um135, /**< itu U-135 */ - FF_EBUR128_Up180, /**< itu U+180 */ - FF_EBUR128_Tp000, /**< itu T+000 */ - FF_EBUR128_Bp000, /**< itu B+000 */ - FF_EBUR128_Bp045, /**< itu B+045 */ - FF_EBUR128_Bm045 /**< itu B-045 */ -}; - -/** \enum mode - * Use these values in ebur128_init (or'ed). Try to use the lowest possible - * modes that suit your needs, as performance will be better. - */ -enum mode { - /** can resurrrect and call ff_ebur128_loudness_momentary */ - FF_EBUR128_MODE_M = (1 << 0), - /** can call ff_ebur128_loudness_shortterm */ - FF_EBUR128_MODE_S = (1 << 1) | FF_EBUR128_MODE_M, - /** can call ff_ebur128_loudness_global_* and ff_ebur128_relative_threshold */ - FF_EBUR128_MODE_I = (1 << 2) | FF_EBUR128_MODE_M, - /** can call ff_ebur128_loudness_range */ - FF_EBUR128_MODE_LRA = (1 << 3) | FF_EBUR128_MODE_S, - /** can call ff_ebur128_sample_peak */ - FF_EBUR128_MODE_SAMPLE_PEAK = (1 << 4) | FF_EBUR128_MODE_M, -}; - -/** forward declaration of FFEBUR128StateInternal */ -struct FFEBUR128StateInternal; - -/** \brief Contains information about the state of a loudness measurement. - * - * You should not need to modify this struct directly. - */ -typedef struct FFEBUR128State { - int mode; /**< The current mode. */ - unsigned int channels; /**< The number of channels. */ - unsigned long samplerate; /**< The sample rate. */ - struct FFEBUR128StateInternal *d; /**< Internal state. */ -} FFEBUR128State; - -/** \brief Initialize library state. - * - * @param channels the number of channels. - * @param samplerate the sample rate. - * @param window set the maximum window size in ms, set to 0 for auto. - * @param mode see the mode enum for possible values. - * @return an initialized library state. - */ -FFEBUR128State *ff_ebur128_init(unsigned int channels, - unsigned long samplerate, - unsigned long window, int mode); - -/** \brief Destroy library state. - * - * @param st pointer to a library state. - */ -void ff_ebur128_destroy(FFEBUR128State ** st); - -/** \brief Set channel type. - * - * The default is: - * - 0 -> FF_EBUR128_LEFT - * - 1 -> FF_EBUR128_RIGHT - * - 2 -> FF_EBUR128_CENTER - * - 3 -> FF_EBUR128_UNUSED - * - 4 -> FF_EBUR128_LEFT_SURROUND - * - 5 -> FF_EBUR128_RIGHT_SURROUND - * - * @param st library state. - * @param channel_number zero based channel index. - * @param value channel type from the "channel" enum. - * @return - * - 0 on success. - * - AVERROR(EINVAL) if invalid channel index. - */ -int ff_ebur128_set_channel(FFEBUR128State * st, - unsigned int channel_number, int value); - -/** \brief Add frames to be processed. - * - * @param st library state. - * @param src array of source frames. Channels must be interleaved. - * @param frames number of frames. Not number of samples! - */ -void ff_ebur128_add_frames_double(FFEBUR128State * st, - const double *src, size_t frames); - -/** \brief Get global integrated loudness in LUFS. - * - * @param st library state. - * @param out integrated loudness in LUFS. -HUGE_VAL if result is negative - * infinity. - * @return - * - 0 on success. - * - AVERROR(EINVAL) if mode "FF_EBUR128_MODE_I" has not been set. - */ -int ff_ebur128_loudness_global(FFEBUR128State * st, double *out); - -/** \brief Get short-term loudness (last 3s) in LUFS. - * - * @param st library state. - * @param out short-term loudness in LUFS. -HUGE_VAL if result is negative - * infinity. - * @return - * - 0 on success. - * - AVERROR(EINVAL) if mode "FF_EBUR128_MODE_S" has not been set. - */ -int ff_ebur128_loudness_shortterm(FFEBUR128State * st, double *out); - -/** \brief Get loudness range (LRA) of programme in LU. - * - * Calculates loudness range according to EBU 3342. - * - * @param st library state. - * @param out loudness range (LRA) in LU. Will not be changed in case of - * error. AVERROR(EINVAL) will be returned in this case. - * @return - * - 0 on success. - * - AVERROR(EINVAL) if mode "FF_EBUR128_MODE_LRA" has not been set. - */ -int ff_ebur128_loudness_range(FFEBUR128State * st, double *out); -/** \brief Get loudness range (LRA) in LU across multiple instances. - * - * Calculates loudness range according to EBU 3342. - * - * @param sts array of library states. - * @param size length of sts - * @param out loudness range (LRA) in LU. Will not be changed in case of - * error. AVERROR(EINVAL) will be returned in this case. - * @return - * - 0 on success. - * - AVERROR(EINVAL) if mode "FF_EBUR128_MODE_LRA" has not been set. - */ -int ff_ebur128_loudness_range_multiple(FFEBUR128State ** sts, - size_t size, double *out); - -/** \brief Get maximum sample peak of selected channel in float format. - * - * @param st library state - * @param channel_number channel to analyse - * @param out maximum sample peak in float format (1.0 is 0 dBFS) - * @return - * - 0 on success. - * - AVERROR(EINVAL) if mode "FF_EBUR128_MODE_SAMPLE_PEAK" has not been set. - * - AVERROR(EINVAL) if invalid channel index. - */ -int ff_ebur128_sample_peak(FFEBUR128State * st, - unsigned int channel_number, double *out); - -/** \brief Get relative threshold in LUFS. - * - * @param st library state - * @param out relative threshold in LUFS. - * @return - * - 0 on success. - * - AVERROR(EINVAL) if mode "FF_EBUR128_MODE_I" has not been set. - */ -int ff_ebur128_relative_threshold(FFEBUR128State * st, double *out); - -#endif /* AVFILTER_EBUR128_H */ diff --git a/libavfilter/f_ebur128.c b/libavfilter/f_ebur128.c index a921602b44..d85df34149 100644 --- a/libavfilter/f_ebur128.c +++ b/libavfilter/f_ebur128.c @@ -75,6 +75,13 @@ struct integrator { struct hist_entry *histogram; ///< histogram of the powers, used to compute LRA and I }; +enum PrintFormat { + NONE, + JSON, + SUMMARY, + PF_NB +}; + struct rect { int x, y, w, h; }; typedef struct EBUR128Context { @@ -85,8 +92,10 @@ typedef struct EBUR128Context { double true_peak; ///< global true peak double *true_peaks; ///< true peaks per channel double sample_peak; ///< global sample peak + double frame_sample_peak; ///< frame sample peak double *sample_peaks; ///< sample peaks per channel double *true_peaks_per_frame; ///< true peaks in a frame per channel + double *sample_peaks_per_frame; ///< sample peaks in a frame per channel #if CONFIG_SWRESAMPLE SwrContext *swr_ctx; ///< over-sampling context for true peak metering double *swr_buf; ///< resampled audio data for true peak metering @@ -389,11 +398,8 @@ static int config_video_output(AVFilterLink *outlink) return 0; } -static int config_audio_input(AVFilterLink *inlink) +static int config_audio_in(AVFilterLink *inlink, EBUR128Context *ebur128) { - AVFilterContext *ctx = inlink->dst; - EBUR128Context *ebur128 = ctx->priv; - /* Unofficial reversed parametrization of PRE * and RLB from 48kHz */ @@ -434,11 +440,16 @@ static int config_audio_input(AVFilterLink *inlink) return 0; } -static int config_audio_output(AVFilterLink *outlink) +static int config_audio_input(AVFilterLink *inlink) { - int i; - AVFilterContext *ctx = outlink->src; + AVFilterContext *ctx = inlink->dst; EBUR128Context *ebur128 = ctx->priv; + return config_audio_in(inlink, ebur128); +} + +static int config_audio_out(AVFilterLink *outlink, EBUR128Context *ebur128) +{ + int i; const int nb_channels = outlink->ch_layout.nb_channels; #define BACK_MASK (AV_CH_BACK_LEFT |AV_CH_BACK_CENTER |AV_CH_BACK_RIGHT| \ @@ -515,14 +526,22 @@ static int config_audio_output(AVFilterLink *outlink) #endif if (ebur128->peak_mode & PEAK_MODE_SAMPLES_PEAKS) { + ebur128->sample_peaks_per_frame = av_calloc(nb_channels, sizeof(*ebur128->sample_peaks_per_frame)); ebur128->sample_peaks = av_calloc(nb_channels, sizeof(*ebur128->sample_peaks)); - if (!ebur128->sample_peaks) + if (!ebur128->sample_peaks || !ebur128->sample_peaks_per_frame) return AVERROR(ENOMEM); } return 0; } +static int config_audio_output(AVFilterLink *outlink) +{ + AVFilterContext *ctx = outlink->src; + EBUR128Context *ebur128 = ctx->priv; + return config_audio_out(outlink, ebur128); +} + #define ENERGY(loudness) (ff_exp10(((loudness) + 0.691) / 10.)) #define LOUDNESS(energy) (-0.691 + 10 * log10(energy)) #define DBFS(energy) (20 * log10(energy)) @@ -541,9 +560,8 @@ static struct hist_entry *get_histogram(void) return h; } -static av_cold int init(AVFilterContext *ctx) +static av_cold int init_ebur128(AVFilterContext *ctx, EBUR128Context *ebur128) { - EBUR128Context *ebur128 = ctx->priv; AVFilterPad pad; int ret; @@ -574,6 +592,9 @@ static av_cold int init(AVFilterContext *ctx) ebur128->integrated_loudness = ABS_THRES; ebur128->loudness_range = 0; + if (strcmp(ctx->filter->name, "ebur128")) + return 0; + /* insert output pads */ if (ebur128->do_video) { pad = (AVFilterPad){ @@ -600,6 +621,12 @@ static av_cold int init(AVFilterContext *ctx) return 0; } +static av_cold int init(AVFilterContext *ctx) +{ + EBUR128Context *ebur128 = ctx->priv; + return init_ebur128(ctx, ebur128); +} + #define HIST_POS(power) (int)(((power) - ABS_THRES) * HIST_GRAIN) /* loudness and power should be set such as loudness = -0.691 + @@ -627,39 +654,44 @@ static int gate_update(struct integrator *integ, double power, return gate_hist_pos; } -static int filter_frame(AVFilterLink *inlink, AVFrame *insamples) +static int true_peaks_ebur128(EBUR128Context *ebur128, const uint8_t **samples, + int nb_samples) { - int i, ch, idx_insample, ret; - AVFilterContext *ctx = inlink->dst; - EBUR128Context *ebur128 = ctx->priv; - const int nb_channels = ebur128->nb_channels; - const int nb_samples = insamples->nb_samples; - const double *samples = (double *)insamples->data[0]; - AVFrame *pic; - #if CONFIG_SWRESAMPLE if (ebur128->peak_mode & PEAK_MODE_TRUE_PEAKS && ebur128->idx_insample == 0) { const double *swr_samples = ebur128->swr_buf; - int ret = swr_convert(ebur128->swr_ctx, (uint8_t**)&ebur128->swr_buf, 19200, - (const uint8_t **)insamples->data, nb_samples); - if (ret < 0) - return ret; - for (ch = 0; ch < nb_channels; ch++) - ebur128->true_peaks_per_frame[ch] = 0.0; - for (idx_insample = 0; idx_insample < ret; idx_insample++) { - for (ch = 0; ch < nb_channels; ch++) { - ebur128->true_peaks[ch] = FFMAX(ebur128->true_peaks[ch], fabs(*swr_samples)); - ebur128->true_peaks_per_frame[ch] = FFMAX(ebur128->true_peaks_per_frame[ch], - fabs(*swr_samples)); - swr_samples++; + const int nb_channels = ebur128->nb_channels; + int nb_out_samples = swr_get_out_samples(ebur128->swr_ctx, nb_samples); + + while (nb_out_samples > 0) { + int ret = swr_convert(ebur128->swr_ctx, (uint8_t**)&ebur128->swr_buf, 19200, + samples, nb_samples); + if (ret <= 0) + return ret; + for (int ch = 0; ch < nb_channels; ch++) + ebur128->true_peaks_per_frame[ch] = 0.0; + for (int idx_insample = 0; idx_insample < ret; idx_insample++) { + for (int ch = 0; ch < nb_channels; ch++) { + ebur128->true_peaks[ch] = FFMAX(ebur128->true_peaks[ch], fabs(*swr_samples)); + ebur128->true_peaks_per_frame[ch] = FFMAX(ebur128->true_peaks_per_frame[ch], + fabs(*swr_samples)); + swr_samples++; + } } + + nb_out_samples -= ret; + nb_samples = 0; } } #endif + return 0; +} - for (idx_insample = ebur128->idx_insample; idx_insample < nb_samples; idx_insample++) { - const int bin_id_400 = ebur128->i400.cache_pos; - const int bin_id_3000 = ebur128->i3000.cache_pos; +static void process_ebur128(EBUR128Context *ebur128, const double *samples) +{ + const int nb_channels = ebur128->nb_channels; + const int bin_id_400 = ebur128->i400.cache_pos; + const int bin_id_3000 = ebur128->i3000.cache_pos; #define MOVE_TO_NEXT_CACHED_ENTRY(time) do { \ ebur128->i##time.cache_pos++; \ @@ -670,47 +702,49 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *insamples) } \ } while (0) - MOVE_TO_NEXT_CACHED_ENTRY(400); - MOVE_TO_NEXT_CACHED_ENTRY(3000); + MOVE_TO_NEXT_CACHED_ENTRY(400); + MOVE_TO_NEXT_CACHED_ENTRY(3000); - for (ch = 0; ch < nb_channels; ch++) { - double bin; + for (int ch = 0; ch < nb_channels; ch++) { + double bin; - if (ebur128->peak_mode & PEAK_MODE_SAMPLES_PEAKS) - ebur128->sample_peaks[ch] = FFMAX(ebur128->sample_peaks[ch], fabs(samples[idx_insample * nb_channels + ch])); + if (ebur128->peak_mode & PEAK_MODE_SAMPLES_PEAKS) { + ebur128->sample_peaks[ch] = FFMAX(ebur128->sample_peaks[ch], fabs(samples[ch])); + ebur128->sample_peaks_per_frame[ch] = FFMAX(ebur128->sample_peaks_per_frame[ch], fabs(samples[ch])); + } - ebur128->x[ch * 3] = samples[idx_insample * nb_channels + ch]; // set X[i] + ebur128->x[ch * 3] = samples[ch]; // set X[i] - if (!ebur128->ch_weighting[ch]) - continue; + if (!ebur128->ch_weighting[ch]) + continue; - /* Y[i] = X[i]*b0 + X[i-1]*b1 + X[i-2]*b2 - Y[i-1]*a1 - Y[i-2]*a2 */ -#define FILTER(Y, X, NUM, DEN) do { \ - double *dst = ebur128->Y + ch*3; \ - double *src = ebur128->X + ch*3; \ - dst[2] = dst[1]; \ - dst[1] = dst[0]; \ - dst[0] = src[0]*NUM[0] + src[1]*NUM[1] + src[2]*NUM[2] \ - - dst[1]*DEN[1] - dst[2]*DEN[2]; \ + /* Y[i] = X[i]*b0 + X[i-1]*b1 + X[i-2]*b2 - Y[i-1]*a1 - Y[i-2]*a2 */ +#define FILTER(Y, X, NUM, DEN) do { \ + double *dst = ebur128->Y + ch*3; \ + double *src = ebur128->X + ch*3; \ + dst[2] = dst[1]; \ + dst[1] = dst[0]; \ + dst[0] = src[0]*NUM[0] + src[1]*NUM[1] + src[2]*NUM[2] \ + - dst[1]*DEN[1] - dst[2]*DEN[2]; \ } while (0) - // TODO: merge both filters in one? - FILTER(y, x, ebur128->pre_b, ebur128->pre_a); // apply pre-filter - ebur128->x[ch * 3 + 2] = ebur128->x[ch * 3 + 1]; - ebur128->x[ch * 3 + 1] = ebur128->x[ch * 3 ]; - FILTER(z, y, ebur128->rlb_b, ebur128->rlb_a); // apply RLB-filter + // TODO: merge both filters in one? + FILTER(y, x, ebur128->pre_b, ebur128->pre_a); // apply pre-filter + ebur128->x[ch * 3 + 2] = ebur128->x[ch * 3 + 1]; + ebur128->x[ch * 3 + 1] = ebur128->x[ch * 3 ]; + FILTER(z, y, ebur128->rlb_b, ebur128->rlb_a); // apply RLB-filter - bin = ebur128->z[ch * 3] * ebur128->z[ch * 3]; + bin = ebur128->z[ch * 3] * ebur128->z[ch * 3]; - /* add the new value, and limit the sum to the cache size (400ms or 3s) - * by removing the oldest one */ - ebur128->i400.sum [ch] = ebur128->i400.sum [ch] + bin - ebur128->i400.cache [ch][bin_id_400]; - ebur128->i3000.sum[ch] = ebur128->i3000.sum[ch] + bin - ebur128->i3000.cache[ch][bin_id_3000]; + /* add the new value, and limit the sum to the cache size (400ms or 3s) + * by removing the oldest one */ + ebur128->i400.sum [ch] = ebur128->i400.sum [ch] + bin - ebur128->i400.cache [ch][bin_id_400]; + ebur128->i3000.sum[ch] = ebur128->i3000.sum[ch] + bin - ebur128->i3000.cache[ch][bin_id_3000]; - /* override old cache entry with the new value */ - ebur128->i400.cache [ch][bin_id_400 ] = bin; - ebur128->i3000.cache[ch][bin_id_3000] = bin; - } + /* override old cache entry with the new value */ + ebur128->i400.cache [ch][bin_id_400 ] = bin; + ebur128->i3000.cache[ch][bin_id_3000] = bin; + } #define FIND_PEAK(global, sp, ptype) do { \ int ch; \ @@ -723,110 +757,149 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *insamples) } \ } while (0) - FIND_PEAK(ebur128->sample_peak, ebur128->sample_peaks, SAMPLES); - FIND_PEAK(ebur128->true_peak, ebur128->true_peaks, TRUE); + FIND_PEAK(ebur128->frame_sample_peak, ebur128->sample_peaks_per_frame, SAMPLES); + FIND_PEAK(ebur128->sample_peak, ebur128->sample_peaks, SAMPLES); + FIND_PEAK(ebur128->true_peak, ebur128->true_peaks, TRUE); +} - /* For integrated loudness, gating blocks are 400ms long with 75% - * overlap (see BS.1770-2 p5), so a re-computation is needed each 100ms - * (4800 samples at 48kHz). */ - if (++ebur128->sample_count == inlink->sample_rate / 10) { - double loudness_400, loudness_3000; - double power_400 = 1e-12, power_3000 = 1e-12; - AVFilterLink *outlink = ctx->outputs[0]; - const int64_t pts = insamples->pts + - av_rescale_q(idx_insample, (AVRational){ 1, inlink->sample_rate }, - ctx->outputs[ebur128->do_video]->time_base); +static void ebur128_loudness(AVFilterLink *inlink, + EBUR128Context *ebur128, + double *l400, double *l3000, double *integrated, double *peak) +{ + const int nb_channels = ebur128->nb_channels; + double power_400 = 1e-12, power_3000 = 1e-12; + double loudness_400, loudness_3000; - ebur128->sample_count = 0; + ebur128->sample_count = 0; #define COMPUTE_LOUDNESS(m, time) do { \ if (ebur128->i##time.filled) { \ /* weighting sum of the last <time> ms */ \ - for (ch = 0; ch < nb_channels; ch++) \ + for (int ch = 0; ch < nb_channels; ch++) \ power_##time += ebur128->ch_weighting[ch] * ebur128->i##time.sum[ch]; \ power_##time /= I##time##_BINS(inlink->sample_rate); \ } \ loudness_##time = LOUDNESS(power_##time); \ } while (0) - COMPUTE_LOUDNESS(M, 400); - COMPUTE_LOUDNESS(S, 3000); + COMPUTE_LOUDNESS(M, 400); + COMPUTE_LOUDNESS(S, 3000); - /* Integrated loudness */ + /* Integrated loudness */ #define I_GATE_THRES -10 // initially defined to -8 LU in the first EBU standard - if (loudness_400 >= ABS_THRES) { - double integrated_sum = 0.0; - uint64_t nb_integrated = 0; - int gate_hist_pos = gate_update(&ebur128->i400, power_400, - loudness_400, I_GATE_THRES); - - /* compute integrated loudness by summing the histogram values - * above the relative threshold */ - for (i = gate_hist_pos; i < HIST_SIZE; i++) { - const unsigned nb_v = ebur128->i400.histogram[i].count; - nb_integrated += nb_v; - integrated_sum += nb_v * ebur128->i400.histogram[i].energy; - } - if (nb_integrated) { - ebur128->integrated_loudness = LOUDNESS(integrated_sum / nb_integrated); - /* dual-mono correction */ - if (nb_channels == 1 && ebur128->dual_mono) { - ebur128->integrated_loudness -= ebur128->pan_law; - } - } + if (loudness_400 >= ABS_THRES) { + double integrated_sum = 0.0; + uint64_t nb_integrated = 0; + int gate_hist_pos = gate_update(&ebur128->i400, power_400, + loudness_400, I_GATE_THRES); + + /* compute integrated loudness by summing the histogram values + * above the relative threshold */ + for (int i = gate_hist_pos; i < HIST_SIZE; i++) { + const unsigned nb_v = ebur128->i400.histogram[i].count; + nb_integrated += nb_v; + integrated_sum += nb_v * ebur128->i400.histogram[i].energy; + } + if (nb_integrated) { + ebur128->integrated_loudness = LOUDNESS(integrated_sum / nb_integrated); + /* dual-mono correction */ + if (nb_channels == 1 && ebur128->dual_mono) { + ebur128->integrated_loudness -= ebur128->pan_law; } + } + } - /* LRA */ + /* LRA */ #define LRA_GATE_THRES -20 #define LRA_LOWER_PRC 10 #define LRA_HIGHER_PRC 95 - /* XXX: example code in EBU 3342 is ">=" but formula in BS.1770 - * specs is ">" */ - if (loudness_3000 >= ABS_THRES) { - uint64_t nb_powers = 0; - int gate_hist_pos = gate_update(&ebur128->i3000, power_3000, - loudness_3000, LRA_GATE_THRES); - - for (i = gate_hist_pos; i < HIST_SIZE; i++) - nb_powers += ebur128->i3000.histogram[i].count; - if (nb_powers) { - uint64_t n, nb_pow; - - /* get lower loudness to consider */ - n = 0; - nb_pow = LRA_LOWER_PRC * nb_powers * 0.01 + 0.5; - for (i = gate_hist_pos; i < HIST_SIZE; i++) { - n += ebur128->i3000.histogram[i].count; - if (n >= nb_pow) { - ebur128->lra_low = ebur128->i3000.histogram[i].loudness; - break; - } - } - - /* get higher loudness to consider */ - n = nb_powers; - nb_pow = LRA_HIGHER_PRC * nb_powers * 0.01 + 0.5; - for (i = HIST_SIZE - 1; i >= 0; i--) { - n -= FFMIN(n, ebur128->i3000.histogram[i].count); - if (n < nb_pow) { - ebur128->lra_high = ebur128->i3000.histogram[i].loudness; - break; - } - } - - // XXX: show low & high on the graph? - ebur128->loudness_range = ebur128->lra_high - ebur128->lra_low; + /* XXX: example code in EBU 3342 is ">=" but formula in BS.1770 + * specs is ">" */ + if (loudness_3000 >= ABS_THRES) { + uint64_t nb_powers = 0; + int gate_hist_pos = gate_update(&ebur128->i3000, power_3000, + loudness_3000, LRA_GATE_THRES); + + for (int i = gate_hist_pos; i < HIST_SIZE; i++) + nb_powers += ebur128->i3000.histogram[i].count; + if (nb_powers) { + uint64_t n, nb_pow; + + /* get lower loudness to consider */ + n = 0; + nb_pow = LRA_LOWER_PRC * nb_powers * 0.01 + 0.5; + for (int i = gate_hist_pos; i < HIST_SIZE; i++) { + n += ebur128->i3000.histogram[i].count; + if (n >= nb_pow) { + ebur128->lra_low = ebur128->i3000.histogram[i].loudness; + break; } } - /* dual-mono correction */ - if (nb_channels == 1 && ebur128->dual_mono) { - loudness_400 -= ebur128->pan_law; - loudness_3000 -= ebur128->pan_law; + /* get higher loudness to consider */ + n = nb_powers; + nb_pow = LRA_HIGHER_PRC * nb_powers * 0.01 + 0.5; + for (int i = HIST_SIZE - 1; i >= 0; i--) { + n -= FFMIN(n, ebur128->i3000.histogram[i].count); + if (n < nb_pow) { + ebur128->lra_high = ebur128->i3000.histogram[i].loudness; + break; + } } + // XXX: show low & high on the graph? + ebur128->loudness_range = ebur128->lra_high - ebur128->lra_low; + } + } + + /* dual-mono correction */ + if (nb_channels == 1 && ebur128->dual_mono) { + loudness_400 -= ebur128->pan_law; + loudness_3000 -= ebur128->pan_law; + } + + if (ebur128->peak_mode & PEAK_MODE_SAMPLES_PEAKS) { + for (int ch = 0; ch < nb_channels; ch++) + ebur128->sample_peaks_per_frame[ch] = 0.0; + } + + *l400 = loudness_400; + *l3000 = loudness_3000; + *integrated = ebur128->integrated_loudness; + *peak = ebur128->frame_sample_peak; +} + +static int filter_frame(AVFilterLink *inlink, AVFrame *insamples) +{ + int idx_insample, ret; + AVFilterContext *ctx = inlink->dst; + EBUR128Context *ebur128 = ctx->priv; + const int nb_channels = ebur128->nb_channels; + const int nb_samples = insamples->nb_samples; + const double *samples = (const double *)insamples->data[0]; + AVFrame *pic; + + ret = true_peaks_ebur128(ebur128, (const uint8_t **)insamples->data, nb_samples); + if (ret < 0) + return ret; + + for (idx_insample = ebur128->idx_insample; idx_insample < nb_samples; idx_insample++) { + process_ebur128(ebur128, samples + idx_insample * nb_channels); + + /* For integrated loudness, gating blocks are 400ms long with 75% + * overlap (see BS.1770-2 p5), so a re-computation is needed each 100ms + * (4800 samples at 48kHz). */ + if (++ebur128->sample_count == inlink->sample_rate / 10) { + double loudness_400, loudness_3000, loudness_integrated, peak; + AVFilterLink *outlink = ctx->outputs[0]; + const int64_t pts = insamples->pts + + av_rescale_q(idx_insample, (AVRational){ 1, inlink->sample_rate }, + ctx->outputs[ebur128->do_video]->time_base); + + ebur128_loudness(inlink, ebur128, &loudness_400, &loudness_3000, &loudness_integrated, &peak); + #define LOG_FMT "TARGET:%d LUFS M:%6.1f S:%6.1f I:%6.1f %s LRA:%6.1f LU" /* push one video frame */ @@ -910,7 +983,7 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *insamples) if (ebur128->peak_mode & PEAK_MODE_ ## ptype ## _PEAKS) { \ double max_peak = 0.0; \ char key[64]; \ - for (ch = 0; ch < nb_channels; ch++) { \ + for (int ch = 0; ch < nb_channels; ch++) { \ snprintf(key, sizeof(key), \ META_PREFIX AV_STRINGIFY(name) "_peaks_ch%d", ch); \ max_peak = fmax(max_peak, ebur128->name##_peaks[ch]); \ @@ -949,7 +1022,7 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *insamples) #define PRINT_PEAKS(str, sp, ptype) do { \ if (ebur128->peak_mode & PEAK_MODE_ ## ptype ## _PEAKS) { \ av_log(ctx, ebur128->loglevel, " " str ":"); \ - for (ch = 0; ch < nb_channels; ch++) \ + for (int ch = 0; ch < nb_channels; ch++) \ av_log(ctx, ebur128->loglevel, " %5.1f", DBFS(sp[ch])); \ av_log(ctx, ebur128->loglevel, " dBFS"); \ } \ @@ -1047,6 +1120,36 @@ static int query_formats(AVFilterContext *ctx) return 0; } +static av_cold void uninit_ebur128(AVFilterContext *ctx, EBUR128Context *ebur128) +{ + av_freep(&ebur128->y_line_ref); + av_freep(&ebur128->x); + av_freep(&ebur128->y); + av_freep(&ebur128->z); + av_freep(&ebur128->ch_weighting); + av_freep(&ebur128->true_peaks); + av_freep(&ebur128->sample_peaks); + av_freep(&ebur128->true_peaks_per_frame); + av_freep(&ebur128->sample_peaks_per_frame); + av_freep(&ebur128->i400.sum); + av_freep(&ebur128->i3000.sum); + av_freep(&ebur128->i400.histogram); + av_freep(&ebur128->i3000.histogram); + for (int i = 0; i < ebur128->nb_channels; i++) { + if (ebur128->i400.cache) + av_freep(&ebur128->i400.cache[i]); + if (ebur128->i3000.cache) + av_freep(&ebur128->i3000.cache[i]); + } + av_freep(&ebur128->i400.cache); + av_freep(&ebur128->i3000.cache); + av_frame_free(&ebur128->outpicref); +#if CONFIG_SWRESAMPLE + av_freep(&ebur128->swr_buf); + swr_free(&ebur128->swr_ctx); +#endif +} + static av_cold void uninit(AVFilterContext *ctx) { EBUR128Context *ebur128 = ctx->priv; @@ -1085,31 +1188,7 @@ static av_cold void uninit(AVFilterContext *ctx) av_log(ctx, AV_LOG_INFO, "\n"); } - av_freep(&ebur128->y_line_ref); - av_freep(&ebur128->x); - av_freep(&ebur128->y); - av_freep(&ebur128->z); - av_freep(&ebur128->ch_weighting); - av_freep(&ebur128->true_peaks); - av_freep(&ebur128->sample_peaks); - av_freep(&ebur128->true_peaks_per_frame); - av_freep(&ebur128->i400.sum); - av_freep(&ebur128->i3000.sum); - av_freep(&ebur128->i400.histogram); - av_freep(&ebur128->i3000.histogram); - for (int i = 0; i < ebur128->nb_channels; i++) { - if (ebur128->i400.cache) - av_freep(&ebur128->i400.cache[i]); - if (ebur128->i3000.cache) - av_freep(&ebur128->i3000.cache[i]); - } - av_freep(&ebur128->i400.cache); - av_freep(&ebur128->i3000.cache); - av_frame_free(&ebur128->outpicref); -#if CONFIG_SWRESAMPLE - av_freep(&ebur128->swr_buf); - swr_free(&ebur128->swr_ctx); -#endif + uninit_ebur128(ctx, ebur128); } static const AVFilterPad ebur128_inputs[] = { @@ -1133,3 +1212,498 @@ const AVFilter ff_af_ebur128 = { .priv_class = &ebur128_class, .flags = AVFILTER_FLAG_DYNAMIC_OUTPUTS, }; + +#define FIFO_SIZE 30 + +enum DynamicMode { + DM_MOMENTARY = 1 << 0, + DM_SHORTTERM = 1 << 1, + DM_INTEGRATED = 1 << 2, +}; + +enum MeanMode { + MM_ARITHMETIC, + MM_HARMONIC, + MM_GEOMETRIC, + MM_MAXIMUM, + MM_MODES +}; + +typedef struct LoudNormContext { + const AVClass *class; + double target_i; + double target_lra; + double target_tp; + double measured_i; + double measured_lra; + double measured_tp; + double measured_thresh; + double offset; + int linear_mode; + int dual_mono; + enum PrintFormat print_format; + int dynamic_mode; + int mean_mode; + double rangeup; + double rangedown; + double attack; + double release; + double attack_coeff; + double release_coeff; + + int eof; + int64_t eof_pts; + int nb_channels; + int nb_samples; + + AVFrame *insamples; + AVFrame *frames[FIFO_SIZE]; + double i400; + double i3000; + double integrated; + double peaks[FIFO_SIZE]; + double prev_offset; + + EBUR128Context r128_in; + EBUR128Context r128_out; +} LoudNormContext; + +static av_cold int loudnorm_init(AVFilterContext *ctx) +{ + LoudNormContext *s = ctx->priv; + int ret; + + ret = init_ebur128(ctx, &s->r128_in); + if (ret < 0) + return ret; + ret = init_ebur128(ctx, &s->r128_out); + if (ret < 0) + return ret; + + s->r128_in.dual_mono = s->dual_mono; + s->r128_out.dual_mono = s->dual_mono; + + if (s->linear_mode) { + double offset, offset_tp; + offset = s->target_i - s->measured_i; + offset_tp = s->measured_tp + offset; + + if (s->measured_tp != 99 && s->measured_thresh != -70 && s->measured_lra != 0 && s->measured_i != 0 && + offset_tp <= s->target_tp && s->measured_lra <= s->target_lra) { + s->offset = pow(10., offset / 20.); + } else { + s->linear_mode = 0; + } + } + + return 0; +} + +static av_cold void loudnorm_uninit(AVFilterContext *ctx) +{ + LoudNormContext *s = ctx->priv; + EBUR128Context *r128_out = &s->r128_out; + EBUR128Context *r128_in = &s->r128_in; + + if (s->nb_channels > 0) { + switch (s->print_format) { + case NONE: + break; + + case JSON: + av_log(ctx, AV_LOG_INFO, + "\n{\n" + "\t\"input_i\" : \"%.2f\",\n" + "\t\"input_tp\" : \"%.2f\",\n" + "\t\"input_lra\" : \"%.2f\",\n" + "\t\"input_thresh\" : \"%.2f\",\n" + "\t\"output_i\" : \"%.2f\",\n" + "\t\"output_tp\" : \"%+.2f\",\n" + "\t\"output_lra\" : \"%.2f\",\n" + "\t\"output_thresh\" : \"%.2f\",\n" + "\t\"normalization_type\" : \"%s\",\n" + "\t\"target_offset\" : \"%.2f\"\n" + "}\n", + r128_in->integrated_loudness, + r128_in->true_peak, + r128_in->loudness_range, + r128_in->i3000.rel_threshold, + r128_out->integrated_loudness, + r128_out->true_peak, + r128_out->loudness_range, + r128_out->i3000.rel_threshold, + s->linear_mode ? "linear" : "dynamic", + s->target_i - r128_out->integrated_loudness + ); + break; + + case SUMMARY: + av_log(ctx, AV_LOG_INFO, + "\n" + "Input Integrated: %+6.1f LUFS\n" + "Input True Peak: %+6.1f dBTP\n" + "Input LRA: %6.1f LU\n" + "Input Threshold: %+6.1f LUFS\n" + "\n" + "Output Integrated: %+6.1f LUFS\n" + "Output True Peak: %+6.1f dBTP\n" + "Output LRA: %6.1f LU\n" + "Output Threshold: %+6.1f LUFS\n" + "\n" + "Normalization Type: %s\n" + "Target Offset: %+6.1f LU\n", + r128_in->integrated_loudness, + r128_in->true_peak, + r128_in->loudness_range, + r128_in->i3000.rel_threshold, + r128_out->integrated_loudness, + r128_out->true_peak, + r128_out->loudness_range, + r128_out->i3000.rel_threshold, + s->linear_mode ? "Linear" : "Dynamic", + s->target_i - r128_out->integrated_loudness + ); + break; + } + } + + for (int i = 0; i < FIFO_SIZE; i++) + av_frame_free(&s->frames[i]); + + uninit_ebur128(ctx, &s->r128_in); + uninit_ebur128(ctx, &s->r128_out); +} + +static int loudnorm_config_input(AVFilterLink *inlink) +{ + AVFilterContext *ctx = inlink->dst; + LoudNormContext *s = ctx->priv; + int ret; + + s->nb_samples = FFMAX(inlink->sample_rate / 10, 1); + + ret = config_audio_in(inlink, &s->r128_in); + if (ret < 0) + return ret; + + ret = config_audio_in(inlink, &s->r128_out); + if (ret < 0) + return ret; + + return 0; +} + +static double get_coeff(double x, double sr) +{ + return 1.0 - exp(-1.0 / (0.001 * x * sr)); +} + +static int loudnorm_config_output(AVFilterLink *outlink) +{ + AVFilterContext *ctx = outlink->src; + LoudNormContext *s = ctx->priv; + int ret; + + s->attack_coeff = get_coeff(s->attack, outlink->sample_rate); + s->release_coeff = get_coeff(s->release, outlink->sample_rate); + s->prev_offset = 1.0; + + s->nb_channels = outlink->ch_layout.nb_channels; + s->r128_out.peak_mode = PEAK_MODE_TRUE_PEAKS|PEAK_MODE_SAMPLES_PEAKS; + s->r128_in.peak_mode = PEAK_MODE_TRUE_PEAKS|PEAK_MODE_SAMPLES_PEAKS; + + ret = config_audio_out(outlink, &s->r128_in); + if (ret < 0) + return ret; + + ret = config_audio_out(outlink, &s->r128_out); + if (ret < 0) + return ret; + + return 0; +} + +static double add_item(int mode, double offset, double item) +{ + switch (mode) { + case MM_MAXIMUM: + offset = fmax(offset, item); + break; + case MM_GEOMETRIC: + offset *= item; + break; + case MM_HARMONIC: + offset += 1.0 / item; + break; + case MM_ARITHMETIC: + offset += item; + break; + } + + return offset; +} + +static double get_loudness(LoudNormContext *s, + int dynamic_mode, + int mean_mode) +{ + double offset, p = 0.0; + + if (mean_mode == MM_GEOMETRIC) + offset = 1.0; + else if (mean_mode == MM_MAXIMUM) + offset = -70.0; + else + offset = 0.0; + + if (dynamic_mode & DM_INTEGRATED) { + offset = add_item(mean_mode, offset, s->integrated); + p += 1.0; + } + + if (dynamic_mode & DM_SHORTTERM) { + offset = add_item(mean_mode, offset, s->i3000); + p += 1.0; + } + + if (s->dynamic_mode & DM_MOMENTARY) { + offset = add_item(mean_mode, offset, s->i400); + p += 1.0; + } + + switch (s->mean_mode) { + case MM_MAXIMUM: + break; + case MM_GEOMETRIC: + offset = -pow(fabs(offset), 1.0 / p); + break; + case MM_HARMONIC: + offset = p / offset; + break; + case MM_ARITHMETIC: + offset /= p; + break; + } + + return offset; +} + +static int loudnorm_filter_frame(AVFilterLink *inlink, AVFrame *in) +{ + AVFilterContext *ctx = inlink->dst; + AVFilterLink *outlink = ctx->outputs[0]; + LoudNormContext *s = ctx->priv; + EBUR128Context *r128_out = &s->r128_out; + EBUR128Context *r128_in = &s->r128_in; + const int nb_channels = s->nb_channels; + int nb_samples = in ? in->nb_samples : 0; + const double *samples = in ? (const double *)in->data[0] : NULL; + AVFrame *out; + int ret; + + if (in) { + ret = true_peaks_ebur128(r128_in, (const uint8_t **)in->data, nb_samples); + if (ret < 0) + return ret; + } + + for (int idx_insample = r128_in->idx_insample; idx_insample < nb_samples; idx_insample++) { + process_ebur128(r128_in, samples + idx_insample * nb_channels); + if (++r128_in->sample_count == inlink->sample_rate / 10) { + double peak; + + ebur128_loudness(inlink, r128_in, &s->i400, &s->i3000, &s->integrated, &peak); + memmove(&s->peaks[0], &s->peaks[1], sizeof(s->peaks) - sizeof(s->peaks[0])); + s->peaks[FIFO_SIZE-1] = peak; + } + } + + r128_in->idx_insample = 0; + s->insamples = NULL; + + if (s->linear_mode) { + out = ff_get_audio_buffer(outlink, nb_samples); + if (!out) { + av_frame_free(&in); + return AVERROR(ENOMEM); + } else { + const double *src = (const double *)in->data[0]; + double *dst = (double *)out->data[0]; + const double offset = s->offset; + + for (int n = 0; n < nb_samples * nb_channels; n++) + dst[n] = src[n] * offset; + } + } else { + av_frame_free(&s->frames[0]); + memmove(&s->frames[0], &s->frames[1], sizeof(s->frames) - sizeof(s->frames[0])); + s->frames[FIFO_SIZE-1] = in; + in = s->frames[0]; + if (in) { + nb_samples = in->nb_samples; + out = ff_get_audio_buffer(outlink, nb_samples); + if (!out) { + return AVERROR(ENOMEM); + } else { + const double release = s->release_coeff; + const double attack = s->attack_coeff; + const double *src = (const double *)in->data[0]; + double *dst = (double *)out->data[0]; + const double measured = get_loudness(s, s->dynamic_mode, s->mean_mode); + const double limit = s->target_tp - fmax(s->peaks[0], s->peaks[1]); + const double rangemin = -s->rangedown; + const double rangemax = fmin(s->rangeup, limit); + const double target = av_clipd(s->target_i - measured, rangemin, rangemax); + const double new_offset = pow(10., target / 20.); + double prev_offset = s->prev_offset; + + for (int n = 0; n < nb_samples * nb_channels; n += nb_channels) { + const double f = (new_offset > prev_offset) * attack + (new_offset <= prev_offset) * release; + const double offset = f * new_offset + (1.0 - f) * prev_offset; + for (int c = n; c < n + nb_channels; c++) + dst[c] = src[c] * offset; + prev_offset = offset; + } + + s->prev_offset = prev_offset; + } + av_frame_copy_props(out, in); + } else { + ff_filter_set_ready(ctx, 100); + return 0; + } + } + + ret = true_peaks_ebur128(r128_out, (const uint8_t **)out->data, out->nb_samples); + if (ret < 0) + return ret; + + samples = (const double *)out->data[0]; + for (int idx_insample = r128_out->idx_insample; idx_insample < out->nb_samples; idx_insample++) { + process_ebur128(r128_out, samples + idx_insample * nb_channels); + if (++r128_out->sample_count == inlink->sample_rate / 10) { + double loudness_400, loudness_3000, loudness_integrated, peak; + ebur128_loudness(inlink, r128_out, &loudness_400, &loudness_3000, &loudness_integrated, &peak); + } + } + + r128_out->idx_insample = 0; + if (s->linear_mode) { + av_frame_copy_props(out, in); + av_frame_free(&in); + } + + return ff_filter_frame(outlink, out); +} + +static int loudnorm_activate(AVFilterContext *ctx) +{ + AVFilterLink *outlink = ctx->outputs[0]; + AVFilterLink *inlink = ctx->inputs[0]; + LoudNormContext *s = ctx->priv; + int ret, status; + + FF_FILTER_FORWARD_STATUS_BACK(outlink, inlink); + + if (!s->insamples && !s->eof) { + AVFrame *in; + + ret = ff_inlink_consume_samples(inlink, s->nb_samples, s->nb_samples, &in); + if (ret < 0) + return ret; + if (ret > 0) + s->insamples = in; + } + + if (s->insamples) + return loudnorm_filter_frame(inlink, s->insamples); + + if (!s->eof && ff_inlink_acknowledge_status(inlink, &status, &s->eof_pts)) { + if (status == AVERROR_EOF) + s->eof = 1; + } + + if (s->eof && !s->frames[0]) + ff_outlink_set_status(outlink, AVERROR_EOF, s->eof_pts); + + if (s->eof && !s->linear_mode) + return loudnorm_filter_frame(inlink, NULL); + + FF_FILTER_FORWARD_WANTED(outlink, inlink); + + return ret; +} + +#undef OFFSET +#undef FLAGS + +#define OFFSET(x) offsetof(LoudNormContext, x) +#define FLAGS AV_OPT_FLAG_AUDIO_PARAM|AV_OPT_FLAG_FILTERING_PARAM + +static const AVOption loudnorm_options[] = { + { "I", "set integrated loudness target", OFFSET(target_i), AV_OPT_TYPE_DOUBLE, {.dbl = -24.}, -70., -5., FLAGS }, + { "i", "set integrated loudness target", OFFSET(target_i), AV_OPT_TYPE_DOUBLE, {.dbl = -24.}, -70., -5., FLAGS }, + { "LRA", "set loudness range target", OFFSET(target_lra), AV_OPT_TYPE_DOUBLE, {.dbl = 7.}, 1., 50., FLAGS }, + { "lra", "set loudness range target", OFFSET(target_lra), AV_OPT_TYPE_DOUBLE, {.dbl = 7.}, 1., 50., FLAGS }, + { "TP", "set maximum true peak", OFFSET(target_tp), AV_OPT_TYPE_DOUBLE, {.dbl = -2.}, -9., 0., FLAGS }, + { "tp", "set maximum true peak", OFFSET(target_tp), AV_OPT_TYPE_DOUBLE, {.dbl = -2.}, -9., 0., FLAGS }, + { "measured_I", "measured IL of input file", OFFSET(measured_i), AV_OPT_TYPE_DOUBLE, {.dbl = 0.}, -99., 0., FLAGS }, + { "measured_i", "measured IL of input file", OFFSET(measured_i), AV_OPT_TYPE_DOUBLE, {.dbl = 0.}, -99., 0., FLAGS }, + { "measured_LRA", "measured LRA of input file", OFFSET(measured_lra), AV_OPT_TYPE_DOUBLE, {.dbl = 0.}, 0., 99., FLAGS }, + { "measured_lra", "measured LRA of input file", OFFSET(measured_lra), AV_OPT_TYPE_DOUBLE, {.dbl = 0.}, 0., 99., FLAGS }, + { "measured_TP", "measured true peak of input file", OFFSET(measured_tp), AV_OPT_TYPE_DOUBLE, {.dbl = 99.}, -99., 99., FLAGS }, + { "measured_tp", "measured true peak of input file", OFFSET(measured_tp), AV_OPT_TYPE_DOUBLE, {.dbl = 99.}, -99., 99., FLAGS }, + { "measured_thresh", "measured threshold of input file", OFFSET(measured_thresh), AV_OPT_TYPE_DOUBLE, {.dbl = -70.}, -99., 0., FLAGS }, + { "offset", "set offset gain", OFFSET(offset), AV_OPT_TYPE_DOUBLE, {.dbl = 0.}, -99., 99., FLAGS }, + { "linear", "normalize linearly if possible", OFFSET(linear_mode), AV_OPT_TYPE_BOOL, {.i64 = 1}, 0, 1, FLAGS }, + { "dual_mono", "treat mono input as dual-mono", OFFSET(dual_mono), AV_OPT_TYPE_BOOL, {.i64 = 0}, 0, 1, FLAGS }, + { "print_format", "set print format for stats", OFFSET(print_format), AV_OPT_TYPE_INT, {.i64 = NONE}, NONE, PF_NB -1, FLAGS, "print_format" }, + { "none", 0, 0, AV_OPT_TYPE_CONST, {.i64 = NONE}, 0, 0, FLAGS, "print_format" }, + { "json", 0, 0, AV_OPT_TYPE_CONST, {.i64 = JSON}, 0, 0, FLAGS, "print_format" }, + { "summary", 0, 0, AV_OPT_TYPE_CONST, {.i64 = SUMMARY}, 0, 0, FLAGS, "print_format" }, + { "dynamic_mode", "set dynamic mode", OFFSET(dynamic_mode), AV_OPT_TYPE_FLAGS, {.i64 = DM_INTEGRATED|DM_SHORTTERM},0,INT32_MAX,FLAGS,"dynamic_mode" }, + { "i", "integrated", 0, AV_OPT_TYPE_CONST, {.i64 = DM_INTEGRATED}, 0, 0, FLAGS, "dynamic_mode" }, + { "m", "momentary", 0, AV_OPT_TYPE_CONST, {.i64 = DM_MOMENTARY}, 0, 0, FLAGS, "dynamic_mode" }, + { "s", "shortterm", 0, AV_OPT_TYPE_CONST, {.i64 = DM_SHORTTERM}, 0, 0, FLAGS, "dynamic_mode" }, + { "mean_mode", "set mean mode", OFFSET(mean_mode), AV_OPT_TYPE_INT, {.i64 = MM_GEOMETRIC}, 0, MM_MODES-1, FLAGS,"mean_mode" }, + { "a", "arithmetic", 0, AV_OPT_TYPE_CONST, {.i64 = MM_ARITHMETIC}, 0, 0, FLAGS, "mean_mode" }, + { "h", "harmonic", 0, AV_OPT_TYPE_CONST, {.i64 = MM_HARMONIC}, 0, 0, FLAGS, "mean_mode" }, + { "g", "geometric", 0, AV_OPT_TYPE_CONST, {.i64 = MM_GEOMETRIC}, 0, 0, FLAGS, "mean_mode" }, + { "m", "maximum", 0, AV_OPT_TYPE_CONST, {.i64 = MM_MAXIMUM}, 0, 0, FLAGS, "mean_mode" }, + { "rangeup", "set max expansion", OFFSET(rangeup), AV_OPT_TYPE_DOUBLE, {.dbl = 0}, 0, 70, FLAGS }, + { "rangedown", "set max compression", OFFSET(rangedown), AV_OPT_TYPE_DOUBLE, {.dbl = 70}, 0, 70, FLAGS }, + { "attack", "set attack", OFFSET(attack), AV_OPT_TYPE_DOUBLE, {.dbl = 1}, 1, 2000, FLAGS }, + { "release", "set release", OFFSET(release), AV_OPT_TYPE_DOUBLE, {.dbl = 1}, 1, 2000, FLAGS }, + { NULL } +}; + +AVFILTER_DEFINE_CLASS(loudnorm); + +static const AVFilterPad loudnorm_inputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_AUDIO, + .config_props = loudnorm_config_input, + }, +}; + +static const AVFilterPad loudnorm_outputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_AUDIO, + .config_props = loudnorm_config_output, + }, +}; + +const AVFilter ff_af_loudnorm = { + .name = "loudnorm", + .description = NULL_IF_CONFIG_SMALL("EBU R128 loudness normalization"), + .priv_size = sizeof(LoudNormContext), + .priv_class = &loudnorm_class, + .init = loudnorm_init, + .activate = loudnorm_activate, + .uninit = loudnorm_uninit, + FILTER_INPUTS(loudnorm_inputs), + FILTER_OUTPUTS(loudnorm_outputs), + FILTER_SINGLE_SAMPLEFMT(AV_SAMPLE_FMT_DBL), +}; -- 2.42.1 [-- Attachment #3: Type: text/plain, Size: 251 bytes --] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2023-11-28 16:42 UTC|newest] Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-09-29 21:46 Paul B Mahol 2023-11-15 20:46 ` Paul B Mahol 2023-11-17 6:38 ` Kyle Swanson 2023-11-19 11:56 ` Paul B Mahol 2023-11-19 21:55 ` Marton Balint 2023-11-19 23:37 ` Paul B Mahol 2023-11-21 18:53 ` Kyle Swanson 2023-11-28 16:51 ` Paul B Mahol [this message] 2023-11-30 11:43 ` Anton Khirnov 2023-11-30 12:48 ` Paul B Mahol 2023-11-30 13:47 ` Anton Khirnov 2023-11-30 14:01 ` Paul B Mahol 2023-11-30 13:57 ` Anton Khirnov 2023-11-30 14:20 ` Paul B Mahol 2023-11-30 18:34 ` Kyle Swanson 2023-11-30 21:44 ` Paul B Mahol 2023-11-30 22:19 ` Kyle Swanson 2023-11-30 22:51 ` Paul B Mahol 2023-11-30 23:29 ` Kyle Swanson 2023-12-01 10:45 ` Paul B Mahol 2023-12-01 21:12 ` Kyle Swanson 2023-12-01 21:27 ` Paul B Mahol
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAPYw7P46huiN92C_q3zSuQu0wn6Dz+Vq36+_H6DtS-859FBO7Q@mail.gmail.com \ --to=onemda@gmail.com \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git