Re: [FFmpeg-devel] [PATCH 1/6] opus: convert encoder and decoder to lavu/tx

From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
To: ffmpeg-devel@ffmpeg.org
Subject: Re: [FFmpeg-devel] [PATCH 1/6] opus: convert encoder and decoder to lavu/tx
Date: Sun, 25 Sep 2022 23:17:28 +0200
Message-ID: <AS8P250MB0744B23538CC2C2BD1ABA8498F539@AS8P250MB0744.EURP250.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <NCqThVx--3-2@lynne.ee>

Lynne:
> Sep 25, 2022, 14:34 by andreas.rheinhardt@outlook.com:
> 
>> Lynne:
>>
>>> Sep 24, 2022, 23:57 by dev@lynne.ee:
>>>
>>>> Sep 24, 2022, 21:40 by martin@martin.st:
>>>>
>>>>> What about ac3dsp then - that one seems like it's fairly optimized for arm?
>>>>>
>>>> Haven't touched them, they're still being used. Unfortunately, for AC3,
>>>> the full MDCT optimizations in lavc do make a difference and the overall
>>>> decoder becomes 15% slower with this patch on for aarch64 with lavu/tx's
>>>> asm disabled and 7% slower with lavu/tx's asm enabled. I do plan to write
>>>> an aarch64 MDCT NEON SIMD code in a month or so, unless someone is faster,
>>>> which should make the decoder at least 10% faster with lavu/tx.
>>>>
>>>
>>> I'd just like to add this was for the float version of the ac3 decoder. The fixed-point
>>> version is a few percent faster with the patch on an A53, and quite a bit
>>> more accurate.
>>> The lavc fixed-point FFT code also has some weird large spikes in #cycles
>>> for some transform sizes, so the figure above is an average, but the dips
>>> went from 117x realtime to 78x realtime, which on a slower CPU may
>>> be the difference between stuttering and realtime playback.
>>> On this CPU, the fixed-point version is 23% slower than the float version,
>>> but on a CPU with slower float ops, it would make more sense to pick that
>>> decoder up than the float version.
>>> The 2 decoders produce nearly identical results, minus a few rounding
>>> errors, since AC3 is inherently a fixed-point codec. The only difference
>>> are the transforms themselves, and the extra ops needed to convert
>>> the 25bit ints to floats in the float decoder.
>>>
>>
>> 1. You forgot to remove mdct15 requirements from configure in this whole
>> patchset.
>> 2. You forgot to update the FATE references for several tests; e.g. when
>> only applying the ac3 patch, then I get this:
>>
> 
> I know. durandal pointed it out the day I sent them. I'll send them again
> later.
> I'm planning to just push the Opus patch in a day with the mdct15
> line in configure gone.
> 
> 
>> As the above shows, the difference between the reference files and the
>> decoded output becomes larger in several tests, i.e. the reference files
>> won't be usable lateron. If the new float and fixed-point decoders
>> produce indeed produce nearly identical output, then one could write
>> tests that decode the same file with both the floating point and the
>> fixed point decoder, check that both are nearly identical and print a
>> checksum of the output of the fixed point decoder.
>>
> 
> I have a standalone program I've hacked on as I need to for the fixed-point
> transforms: https://0x0.st/oWxO.c
> The square root of the squared rounding error across the entire range
> (1 to 21 bits) of transforms from 32pt to 1024pt is 6.855655 for lavu and
> 7.141428 for lavc, which is slightly worse. If you extend the range
> to 22bits, the 1024pt transform in lavc explodes, while lavu is still fine,
> thus showing a greater range.
> The rounding errors are a lesser problem than hitting the max range,
> because then you get huge spikes in the output.
> I can further reduce the error in lavu at the cost of speed, but I think
> this is sufficient.
> 
> 
>> Also note that there is currently no test that directly verifies your
>> claims of greater accuracy. One could write such a test by encoding a
>> file with ac3-fixed and decoding it again (with the fixed point decoder)
>> and printing the psnr of input and output. No encoding tests does this
>> at the moment.
>>
> 
> I'm not writing that, but I like the idea, the point of fixed-point decoders
> isn't bitexactness, but speed on slow hardware, so we shouldn't be testing
> an MD5.

Are your fixed-point transforms bitexact across all arches/cpuflags?

- Andreas

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".