Re: [FFmpeg-devel] [PATCH 1/6] opus: convert encoder and decoder to lavu/tx

From: Lynne <dev@lynne.ee>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH 1/6] opus: convert encoder and decoder to lavu/tx
Date: Sun, 25 Sep 2022 23:46:42 +0200 (CEST)
Message-ID: <NCqbTHG--3-2@lynne.ee> (raw)
In-Reply-To: <AS8P250MB0744B23538CC2C2BD1ABA8498F539@AS8P250MB0744.EURP250.PROD.OUTLOOK.COM>

Sep 25, 2022, 23:17 by andreas.rheinhardt@outlook.com:

> Lynne:
>
>> Sep 25, 2022, 14:34 by andreas.rheinhardt@outlook.com:
>>
>>> Lynne:
>>>
>>>> Sep 24, 2022, 23:57 by dev@lynne.ee:
>>>>
>>>>> Sep 24, 2022, 21:40 by martin@martin.st:
>>>>>
>>>>>> What about ac3dsp then - that one seems like it's fairly optimized for arm?
>>>>>>
>>>>> Haven't touched them, they're still being used. Unfortunately, for AC3,
>>>>> the full MDCT optimizations in lavc do make a difference and the overall
>>>>> decoder becomes 15% slower with this patch on for aarch64 with lavu/tx's
>>>>> asm disabled and 7% slower with lavu/tx's asm enabled. I do plan to write
>>>>> an aarch64 MDCT NEON SIMD code in a month or so, unless someone is faster,
>>>>> which should make the decoder at least 10% faster with lavu/tx.
>>>>>
>>>>
>>>> I'd just like to add this was for the float version of the ac3 decoder. The fixed-point
>>>> version is a few percent faster with the patch on an A53, and quite a bit
>>>> more accurate.
>>>> The lavc fixed-point FFT code also has some weird large spikes in #cycles
>>>> for some transform sizes, so the figure above is an average, but the dips
>>>> went from 117x realtime to 78x realtime, which on a slower CPU may
>>>> be the difference between stuttering and realtime playback.
>>>> On this CPU, the fixed-point version is 23% slower than the float version,
>>>> but on a CPU with slower float ops, it would make more sense to pick that
>>>> decoder up than the float version.
>>>> The 2 decoders produce nearly identical results, minus a few rounding
>>>> errors, since AC3 is inherently a fixed-point codec. The only difference
>>>> are the transforms themselves, and the extra ops needed to convert
>>>> the 25bit ints to floats in the float decoder.
>>>>
>>>
>>> 1. You forgot to remove mdct15 requirements from configure in this whole
>>> patchset.
>>> 2. You forgot to update the FATE references for several tests; e.g. when
>>> only applying the ac3 patch, then I get this:
>>>
>>
>> I know. durandal pointed it out the day I sent them. I'll send them again
>> later.
>> I'm planning to just push the Opus patch in a day with the mdct15
>> line in configure gone.
>>
>>
>>> As the above shows, the difference between the reference files and the
>>> decoded output becomes larger in several tests, i.e. the reference files
>>> won't be usable lateron. If the new float and fixed-point decoders
>>> produce indeed produce nearly identical output, then one could write
>>> tests that decode the same file with both the floating point and the
>>> fixed point decoder, check that both are nearly identical and print a
>>> checksum of the output of the fixed point decoder.
>>>
>>
>> I have a standalone program I've hacked on as I need to for the fixed-point
>> transforms: https://0x0.st/oWxO.c
>> The square root of the squared rounding error across the entire range
>> (1 to 21 bits) of transforms from 32pt to 1024pt is 6.855655 for lavu and
>> 7.141428 for lavc, which is slightly worse. If you extend the range
>> to 22bits, the 1024pt transform in lavc explodes, while lavu is still fine,
>> thus showing a greater range.
>> The rounding errors are a lesser problem than hitting the max range,
>> because then you get huge spikes in the output.
>> I can further reduce the error in lavu at the cost of speed, but I think
>> this is sufficient.
>>
>>
>>> Also note that there is currently no test that directly verifies your
>>> claims of greater accuracy. One could write such a test by encoding a
>>> file with ac3-fixed and decoding it again (with the fixed point decoder)
>>> and printing the psnr of input and output. No encoding tests does this
>>> at the moment.
>>>
>>
>> I'm not writing that, but I like the idea, the point of fixed-point decoders
>> isn't bitexactness, but speed on slow hardware, so we shouldn't be testing
>> an MD5.
>>
>
> Are your fixed-point transforms bitexact across all arches/cpuflags?
>

As much as libavcodec's. This is because we use a float value for the MDCT scale,
and we calculate the exptabs and FFT tables with floats before converting
them to ints during init. If issues arise, we could specialcase them, though as
libavcodec's hasn't needed that, lavu doesn't need it either.
Since the FFT tables are always constant, they would benefit from hardcoding,
as it would take out any local machine precision out of the equation. The actual
constants are quantized versions of the computed floats, which also has a fair leeway.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".