On Tue, Jul 19, 2022 at 09:41:17PM -0700, Chris Phlipot wrote: > Add a new version of yadif_filter_line performed using packed bytes > instead of the packed words used by the current implementaiton. As > a result this implementation runs almost 2x as fast as the current > fastest SSSE3 implementation. > > This implementation is created from scratch based on the C code, with > the goal of keeping all intermediate values within 8-bits so that > the vectorized code can be computed using packed bytes. differences > are as follows: > - Use algorithms to compute avg and abs difference using only 8-bit > intermediate values. > - Reworked the mode 1 code by applying various mathematical identities > to keep all intermediate values within 8-bits. > - Attempt to compute the spatial score using only 8-bits. The actual > spatial score fits within this range 97% (content dependent) of the > time for the entire 128-bit xmm vector. In the case that spatial > score needs more than 8-bits to be represented, we detect this case, > and recompute the spatial score using 16-bit packed words instead. > > In 3% of cases the spatial_score will need more than 8-bytes to store > so we have a slow path, where the spatial score is computed using > packed words instead. > > This implementation is currently limited to x86_64 due to the number > of registers required. x86_32 is possible, but the performance benefit > over the existing SSSE3 implentation is not as great, due to all of the > stack spills that would result from having far fewer registers. ASM was > not generated for the 32-bit varient due to limited ROI, as most AVX > users are likely on 64-bit OS at this point and 32-bit users would > lose out on most of the performance benefit. > > Signed-off-by: Chris Phlipot theres no need to support 32it but ffmpeg build must not break on linux x86-32 src/libavfilter/x86/vf_yadif_x64.asm:145: error: impossible combination of address sizes src/libavfilter/x86/vf_yadif_x64.asm:145: error: invalid effective address src/libavfilter/x86/vf_yadif_x64.asm:146: error: impossible combination of address sizes src//libavutil/x86/x86inc.asm:1399: ... from macro `movdqu' defined here src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here src//libavutil/x86/x86inc.asm:1717: ... from macro `vmovdqu' defined here [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Everything should be made as simple as possible, but not simpler. -- Albert Einstein