On 4/17/2025 2:07 PM, Andreas Rheinhardt wrote: > gkhayat@spectre-music.com: >> From: Guillaume Khayat <gkhayat@spectre-music.com> >> >> Improve performance (+17%) of ebur_128 filter using AVX2 and FMA instruction in the body of the filter_frame function. >> >> ## Benchmark >> >> Tested with hyperfine >> >> hyperfine --warmup 2 "./ffmpeg_reference -i ~/test.wav -vn -af ebur128=peak=none:framelog=quiet -f null -" "./ffmpeg_avx -i ~/test.wav -vn -af ebur128=peak=none:framelog=quiet -f null -" >> Benchmark 1: ./ffmpeg_reference -i ~/test.wav -vn -af ebur128=peak=none:framelog=quiet -f null - >> Time (mean ± σ): 7.118 s ± 0.037 s [User: 9.114 s, System: 1.038 s] >> Range (min … max): 7.073 s … 7.177 s 10 runs >> >> Benchmark 2: ./ffmpeg_avx -i ~/test.wav -vn -af ebur128=peak=none:framelog=quiet -f null - >> Time (mean ± σ): 6.073 s ± 0.108 s [User: 7.903 s, System: 1.058 s] >> Range (min … max): 5.955 s … 6.327 s 10 runs >> >> Summary >> ./ffmpeg_avx -i ~/test.wav -vn -af ebur128=peak=none:framelog=quiet -f null - ran >> 1.17 ± 0.02 times faster than ./ffmpeg_reference -i ~/test.wav -vn -af ebur128=peak=none:framelog=quiet -f null - >> >> ## Tests >> >> - all FATE tests pass, tested on Darwin/arm64 and Linux/x86_64 w/ AVX2/FMA support >> - On AVX2/FMA-capable system, all test files from the EBU yield the exact same output values (I/LRA) after and before optimization. See https://tech.ebu.ch/publications/ebu_loudness_test_set >> >> Disclaimer: this is my first ever patch submission to FFmpeg, and first ever time using git send-email to submit a patch anywhere. >> >> Signed-off-by: Cesar Matheus <cesar.matheus@telecom-paris.fr> >> Signed-off-by: Guillaume Khayat <gkhayat@spectre-music.com> >> --- >> libavfilter/f_ebur128.c | 246 ++++++++++++++++++++++++++++++++++------ >> 1 file changed, 214 insertions(+), 32 deletions(-) >> >> diff --git a/libavfilter/f_ebur128.c b/libavfilter/f_ebur128.c >> index 768f062bac..e305b0a3ce 100644 >> --- a/libavfilter/f_ebur128.c >> +++ b/libavfilter/f_ebur128.c >> @@ -28,7 +28,7 @@ >> >> #include <float.h> >> #include <math.h> >> - >> +#include "libavutil/intmath.h" >> #include "libavutil/avassert.h" >> #include "libavutil/channel_layout.h" >> #include "libavutil/dict.h" >> @@ -199,7 +199,7 @@ static const AVOption ebur128_options[] = { >> }; >> >> AVFILTER_DEFINE_CLASS(ebur128); >> - >> +#define MIN(a, b) ((a) < (b) ? (a) : (b)) >> static const uint8_t graph_colors[] = { >> 0xdd, 0x66, 0x66, // value above 1LU non reached below -1LU (impossible) >> 0x66, 0x66, 0xdd, // value below 1LU non reached below -1LU >> @@ -628,13 +628,61 @@ static int gate_update(struct integrator *integ, double power, >> >> static int filter_frame(AVFilterLink *inlink, AVFrame *insamples) >> { >> - int i, ch, idx_insample, ret; >> + >> + int i, ch, idx_insample, ret,bin_id_400,bin_id_3000; >> AVFilterContext *ctx = inlink->dst; >> EBUR128Context *ebur128 = ctx->priv; >> const int nb_channels = ebur128->nb_channels; >> const int nb_samples = insamples->nb_samples; >> const double *samples = (double *)insamples->data[0]; >> AVFrame *pic; >> + >> +#if HAVE_AVX2_EXTERNAL && HAVE_AVX2 > > This is completely wrong: This only checks whether your assembler > supports AVX2 and whether it was not disabled in configure. But this > does not imply that the CPU where this code runs is actually capable of > AVX2; I don't even know whether this check ensures that the compiler > understands __m256d. > For actual runtime support you need to check via av_get_cpu_flags(). See > how other DSP code does it. > > - Andreas This also needs to be written in NASM syntax assembly, not Intel intrinsics, and it should be in a separate file in the x86/ folder, using function pointers like every other SIMD implementation.