From: "Rémi Denis-Courmont" <remi@remlab.net>
To: ffmpeg-devel@ffmpeg.org
Subject: Re: [FFmpeg-devel] [PATCH] lpc: rewrite lpc_compute_autocorr in external asm
Date: Sun, 26 May 2024 08:45:01 +0300
Message-ID: <7066008.51mDsFpV5M@basile.remlab.net> (raw)
In-Reply-To: <72761d42-0f8f-4abb-8476-6832d14b0774@gmail.com>
Le sunnuntaina 26. toukokuuta 2024, 1.31.18 EEST James Almer a écrit :
> On 5/25/2024 5:57 PM, Lynne via ffmpeg-devel wrote:
> > The inline asm function had issues running under checkasm.
> > So I came to finish what I started, and wrote the last part
> > of LPC computation in assembly.
> >
> > autocorr_10_c: 135525.8
> > autocorr_10_sse2: 50729.8
> > autocorr_10_fma3: 19007.8
> > autocorr_30_c: 390100.8
> > autocorr_30_sse2: 142478.8
> > autocorr_30_fma3: 50559.8
> > autocorr_32_c: 407058.3
> > autocorr_32_sse2: 151633.3
> > autocorr_32_fma3: 50517.3
> > ---
> >
> > libavcodec/x86/lpc.asm | 91 +++++++++++++++++++++++++++++++++++++++
> > libavcodec/x86/lpc_init.c | 87 ++++---------------------------------
> > 2 files changed, 100 insertions(+), 78 deletions(-)
> >
> > diff --git a/libavcodec/x86/lpc.asm b/libavcodec/x86/lpc.asm
> > index a585c17ef5..790841b7f4 100644
> > --- a/libavcodec/x86/lpc.asm
> > +++ b/libavcodec/x86/lpc.asm
> > @@ -32,6 +32,8 @@ dec_tab_sse2: times 2 dq -2.0
> >
> > dec_tab_scalar: times 2 dq -1.0
> > seq_tab_sse2: dq 1.0, 0.0
> >
> > +autoc_init_tab: times 4 dq 1.0
> > +
> >
> > SECTION .text
> >
> > %macro APPLY_WELCH_FN 0
> >
> > @@ -261,3 +263,92 @@ APPLY_WELCH_FN
> >
> > INIT_YMM avx2
> > APPLY_WELCH_FN
> > %endif
> >
> > +
> > +%macro COMPUTE_AUTOCORR_FN 0
> > +cglobal lpc_compute_autocorr, 4, 7, 8, data, len, lag, autoc, lag_p,
> > data_l, len_p
> Already mentioned, but it should be 3 not 8.
>
> > +
> > + shl lagd, 3
> > + shl lenq, 3
> > + xor lag_pq, lag_pq
> > +
> > +.lag_l:
> > + movaps m8, [autoc_init_tab]
>
> m2
>
> > +
> > + mov len_pq, lag_pq
> > +
> > + lea data_lq, [lag_pq + mmsize - 8]
> > + neg data_lq ; -j - mmsize
> > + add data_lq, dataq ; data[-j - mmsize]
> > +.len_l:
> > + ; We waste the upper value here on SSE2,
> > + ; but we use it on AVX.
> > + movupd xm0, [dataq + len_pq] ; data[i]
>
> movsd
>
> > + movupd m1, [data_lq + len_pq] ; data[i - j]
> > +
> > +%if cpuflag(avx)
>
> %if mmsize == 32 here and everywhere else.
>
> > + vbroadcastsd m0, xm0
>
> This is AVX2. AVX only has memory input argument. So use that and save
> the movsd from above for the FMA3 version.
>
> > + vperm2f128 m1, m1, m1, 0x01
>
> Aren't you loading 16 extra bytes for no reason if you're just going to
> use the upper 16 bytes from the load above?
>
> > +%endif
> > +
> > + shufpd m0, m0, m0, 1100b
>
> The last argument has two bits, not four. What you're doing here is a
> splat/broadcast, so you don't need it for FMA3.
>
> > + shufpd m1, m1, m1, 0101b
>
> The upper two bits of imm8 are ignored.
>
> > +
> > +%if cpuflag(fma3)
> > + fmaddpd m8, m0, m1, m8 ; sum += data[i]*data[i-j]
> > +%else
> > + mulpd m0, m1
> > + addpd m8, m0 ; sum += data[i]*data[i-j]
> > +%endif
> > +
> > + add len_pq, 8
> > + cmp len_pq, lenq
> > + jl .len_l
> > +
> > + movups [autocq + lag_pq], m8 ; autoc[j] = sum
> > + add lag_pq, mmsize
> > + cmp lag_pq, lagq
> > + jl .lag_l
> > +
> > + ; The tail computation is guaranteed never to happen
> > + ; as long as we're doing multiples of 4, rather than 2.
> > + ; It is trivial to convert this to avx if ever needed.
> > +%if !cpuflag(avx)
>
> This doesn't seem to be tested as is. Maybe the checkasm should try
> other lag values?
Uh, my patch tests 10, 30 and 32, so I am not clear what you think is missing
here.
--
レミ・デニ-クールモン
http://www.remlab.net/
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2024-05-26 5:45 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-25 20:57 Lynne via ffmpeg-devel
2024-05-25 22:12 ` Michael Niedermayer
2024-05-25 22:31 ` James Almer
2024-05-25 22:45 ` James Almer
2024-05-26 0:02 ` Lynne via ffmpeg-devel
2024-05-26 0:09 ` James Almer
2024-05-25 23:24 ` Lynne via ffmpeg-devel
2024-05-25 23:41 ` James Almer
2024-05-26 5:45 ` Rémi Denis-Courmont [this message]
2024-05-26 0:39 ` James Almer
2024-05-26 1:42 ` [FFmpeg-devel] [PATCH v2] " Lynne via ffmpeg-devel
2024-05-26 1:51 ` James Almer
2024-05-26 2:16 ` James Almer
2024-05-26 19:43 ` Michael Niedermayer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7066008.51mDsFpV5M@basile.remlab.net \
--to=remi@remlab.net \
--cc=ffmpeg-devel@ffmpeg.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git