From: Lynne via ffmpeg-devel <ffmpeg-devel@ffmpeg.org> To: ffmpeg-devel@ffmpeg.org Cc: Lynne <dev@lynne.ee> Subject: Re: [FFmpeg-devel] [PATCH] lpc: rewrite lpc_compute_autocorr in external asm Date: Sun, 26 May 2024 01:24:43 +0200 Message-ID: <b9f4b827-e73a-42aa-b83a-3dc266d4629b@lynne.ee> (raw) In-Reply-To: <72761d42-0f8f-4abb-8476-6832d14b0774@gmail.com> [-- Attachment #1.1.1.1: Type: text/plain, Size: 3832 bytes --] On 26/05/2024 00:31, James Almer wrote: > On 5/25/2024 5:57 PM, Lynne via ffmpeg-devel wrote: >> The inline asm function had issues running under checkasm. >> So I came to finish what I started, and wrote the last part >> of LPC computation in assembly. >> >> autocorr_10_c: 135525.8 >> autocorr_10_sse2: 50729.8 >> autocorr_10_fma3: 19007.8 >> autocorr_30_c: 390100.8 >> autocorr_30_sse2: 142478.8 >> autocorr_30_fma3: 50559.8 >> autocorr_32_c: 407058.3 >> autocorr_32_sse2: 151633.3 >> autocorr_32_fma3: 50517.3 >> --- >> libavcodec/x86/lpc.asm | 91 +++++++++++++++++++++++++++++++++++++++ >> libavcodec/x86/lpc_init.c | 87 ++++--------------------------------- >> 2 files changed, 100 insertions(+), 78 deletions(-) >> >> diff --git a/libavcodec/x86/lpc.asm b/libavcodec/x86/lpc.asm >> index a585c17ef5..790841b7f4 100644 >> --- a/libavcodec/x86/lpc.asm >> +++ b/libavcodec/x86/lpc.asm >> @@ -32,6 +32,8 @@ dec_tab_sse2: times 2 dq -2.0 >> dec_tab_scalar: times 2 dq -1.0 >> seq_tab_sse2: dq 1.0, 0.0 >> +autoc_init_tab: times 4 dq 1.0 >> + >> SECTION .text >> %macro APPLY_WELCH_FN 0 >> @@ -261,3 +263,92 @@ APPLY_WELCH_FN >> INIT_YMM avx2 >> APPLY_WELCH_FN >> %endif >> + >> +%macro COMPUTE_AUTOCORR_FN 0 >> +cglobal lpc_compute_autocorr, 4, 7, 8, data, len, lag, autoc, lag_p, >> data_l, len_p > > Already mentioned, but it should be 3 not 8. Already done, as said on IRC not 10 minutes after I submitted it. > >> + >> + shl lagd, 3 >> + shl lenq, 3 >> + xor lag_pq, lag_pq >> + >> +.lag_l: >> + movaps m8, [autoc_init_tab] > > m2 > >> + >> + mov len_pq, lag_pq >> + >> + lea data_lq, [lag_pq + mmsize - 8] >> + neg data_lq ; -j - mmsize >> + add data_lq, dataq ; data[-j - mmsize] >> +.len_l: >> + ; We waste the upper value here on SSE2, >> + ; but we use it on AVX. >> + movupd xm0, [dataq + len_pq] ; data[i] > > movsd Fixed. > >> + movupd m1, [data_lq + len_pq] ; data[i - j] >> + >> +%if cpuflag(avx) > > %if mmsize == 32 here and everywhere else. Done. > >> + vbroadcastsd m0, xm0 > > This is AVX2. AVX only has memory input argument. So use that and save > the movsd from above for the FMA3 version. > >> + vperm2f128 m1, m1, m1, 0x01 > > Aren't you loading 16 extra bytes for no reason if you're just going to > use the upper 16 bytes from the load above? Lane swapped, like you mentioned. >> +%endif >> + >> + shufpd m0, m0, m0, 1100b > > The last argument has two bits, not four. What you're doing here is a > splat/broadcast, so you don't need it for FMA3. > >> + shufpd m1, m1, m1, 0101b > > The upper two bits of imm8 are ignored. Intentional. Not ignored on FMA3. >> + >> +%if cpuflag(fma3) >> + fmaddpd m8, m0, m1, m8 ; sum += data[i]*data[i-j] >> +%else >> + mulpd m0, m1 >> + addpd m8, m0 ; sum += data[i]*data[i-j] >> +%endif >> + >> + add len_pq, 8 >> + cmp len_pq, lenq >> + jl .len_l >> + >> + movups [autocq + lag_pq], m8 ; autoc[j] = sum >> + add lag_pq, mmsize >> + cmp lag_pq, lagq >> + jl .lag_l >> + >> + ; The tail computation is guaranteed never to happen >> + ; as long as we're doing multiples of 4, rather than 2. >> + ; It is trivial to convert this to avx if ever needed. >> +%if !cpuflag(avx) > > This doesn't seem to be tested as is. Maybe the checkasm should try > other lag values? That's for the checkasm patch. You can trigger this check with fate-alac-16-lpc-orders as-is. [-- Attachment #1.1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 637 bytes --] [-- Attachment #1.2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] [-- Attachment #2: Type: text/plain, Size: 251 bytes --] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2024-05-25 23:24 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2024-05-25 20:57 Lynne via ffmpeg-devel 2024-05-25 22:12 ` Michael Niedermayer 2024-05-25 22:31 ` James Almer 2024-05-25 22:45 ` James Almer 2024-05-26 0:02 ` Lynne via ffmpeg-devel 2024-05-26 0:09 ` James Almer 2024-05-25 23:24 ` Lynne via ffmpeg-devel [this message] 2024-05-25 23:41 ` James Almer 2024-05-26 5:45 ` Rémi Denis-Courmont 2024-05-26 0:39 ` James Almer 2024-05-26 1:42 ` [FFmpeg-devel] [PATCH v2] " Lynne via ffmpeg-devel 2024-05-26 1:51 ` James Almer 2024-05-26 2:16 ` James Almer 2024-05-26 19:43 ` Michael Niedermayer
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=b9f4b827-e73a-42aa-b83a-3dc266d4629b@lynne.ee \ --to=ffmpeg-devel@ffmpeg.org \ --cc=dev@lynne.ee \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git