On 4/17/2025 2:07 PM, Andreas Rheinhardt wrote:
> gkhayat@spectre-music.com:
>> From: Guillaume Khayat <gkhayat@spectre-music.com>
>>
>> Improve performance (+17%) of ebur_128 filter using AVX2 and FMA instruction in the body of the filter_frame function.
>>
>> ## Benchmark
>>
>> Tested with hyperfine
>>
>> hyperfine --warmup 2 "./ffmpeg_reference -i ~/test.wav -vn -af ebur128=peak=none:framelog=quiet -f null -" "./ffmpeg_avx -i ~/test.wav -vn -af ebur128=peak=none:framelog=quiet -f null -"
>> Benchmark 1: ./ffmpeg_reference -i ~/test.wav -vn -af ebur128=peak=none:framelog=quiet -f null -
>>    Time (mean ± σ):      7.118 s ±  0.037 s    [User: 9.114 s, System: 1.038 s]
>>    Range (min … max):    7.073 s …  7.177 s    10 runs
>>   
>> Benchmark 2: ./ffmpeg_avx -i ~/test.wav -vn -af ebur128=peak=none:framelog=quiet -f null -
>>    Time (mean ± σ):      6.073 s ±  0.108 s    [User: 7.903 s, System: 1.058 s]
>>    Range (min … max):    5.955 s …  6.327 s    10 runs
>>   
>> Summary
>>    ./ffmpeg_avx -i ~/test.wav -vn -af ebur128=peak=none:framelog=quiet -f null - ran
>>      1.17 ± 0.02 times faster than ./ffmpeg_reference -i ~/test.wav -vn -af ebur128=peak=none:framelog=quiet -f null -
>>
>> ## Tests
>>
>> - all FATE tests pass, tested on Darwin/arm64 and Linux/x86_64 w/ AVX2/FMA support
>> - On AVX2/FMA-capable system, all test files from the EBU yield the exact same output values (I/LRA) after and before optimization. See https://tech.ebu.ch/publications/ebu_loudness_test_set
>>
>> Disclaimer: this is my first ever patch submission to FFmpeg, and first ever time using git send-email to submit a patch anywhere.
>>
>> Signed-off-by: Cesar Matheus <cesar.matheus@telecom-paris.fr>
>> Signed-off-by: Guillaume Khayat <gkhayat@spectre-music.com>
>> ---
>>   libavfilter/f_ebur128.c | 246 ++++++++++++++++++++++++++++++++++------
>>   1 file changed, 214 insertions(+), 32 deletions(-)
>>
>> diff --git a/libavfilter/f_ebur128.c b/libavfilter/f_ebur128.c
>> index 768f062bac..e305b0a3ce 100644
>> --- a/libavfilter/f_ebur128.c
>> +++ b/libavfilter/f_ebur128.c
>> @@ -28,7 +28,7 @@
>>   
>>   #include <float.h>
>>   #include <math.h>
>> -
>> +#include "libavutil/intmath.h"
>>   #include "libavutil/avassert.h"
>>   #include "libavutil/channel_layout.h"
>>   #include "libavutil/dict.h"
>> @@ -199,7 +199,7 @@ static const AVOption ebur128_options[] = {
>>   };
>>   
>>   AVFILTER_DEFINE_CLASS(ebur128);
>> -
>> +#define MIN(a, b) ((a) < (b) ? (a) : (b))
>>   static const uint8_t graph_colors[] = {
>>       0xdd, 0x66, 0x66,   // value above 1LU non reached below -1LU (impossible)
>>       0x66, 0x66, 0xdd,   // value below 1LU non reached below -1LU
>> @@ -628,13 +628,61 @@ static int gate_update(struct integrator *integ, double power,
>>   
>>   static int filter_frame(AVFilterLink *inlink, AVFrame *insamples)
>>   {
>> -    int i, ch, idx_insample, ret;
>> +
>> +    int i, ch, idx_insample, ret,bin_id_400,bin_id_3000;
>>       AVFilterContext *ctx = inlink->dst;
>>       EBUR128Context *ebur128 = ctx->priv;
>>       const int nb_channels = ebur128->nb_channels;
>>       const int nb_samples  = insamples->nb_samples;
>>       const double *samples = (double *)insamples->data[0];
>>       AVFrame *pic;
>> +
>> +#if HAVE_AVX2_EXTERNAL && HAVE_AVX2
> 
> This is completely wrong: This only checks whether your assembler
> supports AVX2 and whether it was not disabled in configure. But this
> does not imply that the CPU where this code runs is actually capable of
> AVX2; I don't even know whether this check ensures that the compiler
> understands __m256d.
> For actual runtime support you need to check via av_get_cpu_flags(). See
> how other DSP code does it.
> 
> - Andreas

This also needs to be written in NASM syntax assembly, not Intel 
intrinsics, and it should be in a separate file in the x86/ folder, 
using function pointers like every other SIMD implementation.