From: Michael Niedermayer <michael@niedermayer.cc>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH 5/5] avfilter/vf_yadif: Add x86_64 avx yadif asm
Date: Wed, 20 Jul 2022 15:16:50 +0200
Message-ID: <20220720131650.GX2088045@pb2> (raw)
In-Reply-To: <20220720044117.1282961-5-cphlipot0@gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 2590 bytes --]
On Tue, Jul 19, 2022 at 09:41:17PM -0700, Chris Phlipot wrote:
> Add a new version of yadif_filter_line performed using packed bytes
> instead of the packed words used by the current implementaiton. As
> a result this implementation runs almost 2x as fast as the current
> fastest SSSE3 implementation.
>
> This implementation is created from scratch based on the C code, with
> the goal of keeping all intermediate values within 8-bits so that
> the vectorized code can be computed using packed bytes. differences
> are as follows:
> - Use algorithms to compute avg and abs difference using only 8-bit
> intermediate values.
> - Reworked the mode 1 code by applying various mathematical identities
> to keep all intermediate values within 8-bits.
> - Attempt to compute the spatial score using only 8-bits. The actual
> spatial score fits within this range 97% (content dependent) of the
> time for the entire 128-bit xmm vector. In the case that spatial
> score needs more than 8-bits to be represented, we detect this case,
> and recompute the spatial score using 16-bit packed words instead.
>
> In 3% of cases the spatial_score will need more than 8-bytes to store
> so we have a slow path, where the spatial score is computed using
> packed words instead.
>
> This implementation is currently limited to x86_64 due to the number
> of registers required. x86_32 is possible, but the performance benefit
> over the existing SSSE3 implentation is not as great, due to all of the
> stack spills that would result from having far fewer registers. ASM was
> not generated for the 32-bit varient due to limited ROI, as most AVX
> users are likely on 64-bit OS at this point and 32-bit users would
> lose out on most of the performance benefit.
>
> Signed-off-by: Chris Phlipot <cphlipot0@gmail.com>
theres no need to support 32it but ffmpeg build must not break
on linux x86-32
src/libavfilter/x86/vf_yadif_x64.asm:145: error: impossible combination of address sizes
src/libavfilter/x86/vf_yadif_x64.asm:145: error: invalid effective address
src/libavfilter/x86/vf_yadif_x64.asm:146: error: impossible combination of address sizes
src//libavutil/x86/x86inc.asm:1399: ... from macro `movdqu' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src//libavutil/x86/x86inc.asm:1717: ... from macro `vmovdqu' defined here
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Everything should be made as simple as possible, but not simpler.
-- Albert Einstein
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
[-- Attachment #2: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2022-07-20 13:17 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-20 4:41 [FFmpeg-devel] [PATCH 1/5] avfilter/vf_yadif: Fix edge size when MAX_ALIGN is < 4 Chris Phlipot
2022-07-20 4:41 ` [FFmpeg-devel] [PATCH 2/5] avfilter/vf_yadif: Allow alignment to be configurable Chris Phlipot
2022-07-20 4:41 ` [FFmpeg-devel] [PATCH 3/5] avfilter/vf_yadif: reformat code to improve readability Chris Phlipot
2022-07-20 4:41 ` [FFmpeg-devel] [PATCH 4/5] avfilter/vf_yadif: Process more pixels using filter_line Chris Phlipot
2022-07-20 4:41 ` [FFmpeg-devel] [PATCH 5/5] avfilter/vf_yadif: Add x86_64 avx yadif asm Chris Phlipot
2022-07-20 13:16 ` Michael Niedermayer [this message]
2022-07-21 2:30 ` Chris Phlipot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220720131650.GX2088045@pb2 \
--to=michael@niedermayer.cc \
--cc=ffmpeg-devel@ffmpeg.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git