From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 4F60543112 for ; Wed, 20 Jul 2022 13:17:02 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 661BC68B5C5; Wed, 20 Jul 2022 16:16:59 +0300 (EEST) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7D6C968B413 for ; Wed, 20 Jul 2022 16:16:52 +0300 (EEST) Received: (Authenticated sender: michael@niedermayer.cc) by mail.gandi.net (Postfix) with ESMTPSA id 9D61D1C0007 for ; Wed, 20 Jul 2022 13:16:51 +0000 (UTC) Date: Wed, 20 Jul 2022 15:16:50 +0200 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20220720131650.GX2088045@pb2> References: <20220720044117.1282961-1-cphlipot0@gmail.com> <20220720044117.1282961-5-cphlipot0@gmail.com> MIME-Version: 1.0 In-Reply-To: <20220720044117.1282961-5-cphlipot0@gmail.com> Subject: Re: [FFmpeg-devel] [PATCH 5/5] avfilter/vf_yadif: Add x86_64 avx yadif asm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: multipart/mixed; boundary="===============6908439200065615549==" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --===============6908439200065615549== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="3rzwkXnKuCXxoL5m" Content-Disposition: inline --3rzwkXnKuCXxoL5m Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jul 19, 2022 at 09:41:17PM -0700, Chris Phlipot wrote: > Add a new version of yadif_filter_line performed using packed bytes > instead of the packed words used by the current implementaiton. As > a result this implementation runs almost 2x as fast as the current > fastest SSSE3 implementation. >=20 > This implementation is created from scratch based on the C code, with > the goal of keeping all intermediate values within 8-bits so that > the vectorized code can be computed using packed bytes. differences > are as follows: > - Use algorithms to compute avg and abs difference using only 8-bit > intermediate values. > - Reworked the mode 1 code by applying various mathematical identities > to keep all intermediate values within 8-bits. > - Attempt to compute the spatial score using only 8-bits. The actual > spatial score fits within this range 97% (content dependent) of the > time for the entire 128-bit xmm vector. In the case that spatial > score needs more than 8-bits to be represented, we detect this case, > and recompute the spatial score using 16-bit packed words instead. >=20 > In 3% of cases the spatial_score will need more than 8-bytes to store > so we have a slow path, where the spatial score is computed using > packed words instead. >=20 > This implementation is currently limited to x86_64 due to the number > of registers required. x86_32 is possible, but the performance benefit > over the existing SSSE3 implentation is not as great, due to all of the > stack spills that would result from having far fewer registers. ASM was > not generated for the 32-bit varient due to limited ROI, as most AVX > users are likely on 64-bit OS at this point and 32-bit users would > lose out on most of the performance benefit. >=20 > Signed-off-by: Chris Phlipot theres no need to support 32it but ffmpeg build must not break on linux x86-32 src/libavfilter/x86/vf_yadif_x64.asm:145: error: impossible combination of = address sizes src/libavfilter/x86/vf_yadif_x64.asm:145: error: invalid effective address src/libavfilter/x86/vf_yadif_x64.asm:146: error: impossible combination of = address sizes src//libavutil/x86/x86inc.asm:1399: ... from macro `movdqu' defined here src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined = here src//libavutil/x86/x86inc.asm:1717: ... from macro `vmovdqu' defined here [...] --=20 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Everything should be made as simple as possible, but not simpler. -- Albert Einstein --3rzwkXnKuCXxoL5m Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EABEIAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCYtgAOgAKCRBhHseHBAsP qwwMAJ4lpjDsHTuUKScNBT5FT2DzImQirgCdHJfjWsVQfA1wruh1wNm0fLNMUM8= =Rjjr -----END PGP SIGNATURE----- --3rzwkXnKuCXxoL5m-- --===============6908439200065615549== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --===============6908439200065615549==--