From: Alan Kelly <alankelly-at-google.com@ffmpeg.org> To: ffmpeg-devel@ffmpeg.org Cc: Alan Kelly <alankelly@google.com> Subject: [FFmpeg-devel] [PATCH 2/4] libswscale: Avx2 hscale can process any input of size which is a multiple of 4. Date: Mon, 10 Jan 2022 15:58:34 +0100 Message-ID: <20220110145836.3449558-2-alankelly@google.com> (raw) In-Reply-To: <20220110145836.3449558-1-alankelly@google.com> The main loop processes blocks of 16 pixels. The tail processes blocks of size 4. --- libswscale/x86/scale_avx2.asm | 48 +++++++++++++++++++++++++++++++++-- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm index 20acdbd633..dc42abb100 100644 --- a/libswscale/x86/scale_avx2.asm +++ b/libswscale/x86/scale_avx2.asm @@ -53,6 +53,9 @@ cglobal hscale8to15_%1, 7, 9, 16, pos0, dst, w, srcmem, filter, fltpos, fltsize, mova m14, [four] shr fltsized, 2 %endif + cmp wq, 16 + jl .tail_loop + mov countq, 0x10 .loop: movu m1, [fltposq] movu m2, [fltposq+32] @@ -97,11 +100,52 @@ cglobal hscale8to15_%1, 7, 9, 16, pos0, dst, w, srcmem, filter, fltpos, fltsize, vpsrad m6, 7 vpackssdw m5, m5, m6 vpermd m5, m15, m5 - vmovdqu [dstq + countq * 2], m5 + vmovdqu [dstq], m5 + add dstq, 0x20 add fltposq, 0x40 add countq, 0x10 cmp countq, wq - jl .loop + jle .loop + + sub countq, 0x10 + cmp countq, wq + jge .end + +.tail_loop: + movu xm1, [fltposq] +%ifidn %1, X4 + pxor xm9, xm9 + pxor xm10, xm10 + xor innerq, innerq +.tail_innerloop: +%endif + vpcmpeqd xm13, xm13 + vpgatherdd xm3,[srcmemq + xm1], xm13 + vpunpcklbw xm5, xm3, xm0 + vpunpckhbw xm6, xm3, xm0 + vpmaddwd xm5, xm5, [filterq] + vpmaddwd xm6, xm6, [filterq + 16] + add filterq, 0x20 +%ifidn %1, X4 + paddd xm9, xm5 + paddd xm10, xm6 + paddd xm1, xm14 + add innerq, 1 + cmp innerq, fltsizeq + jl .tail_innerloop + vphaddd xm5, xm9, xm10 +%else + vphaddd xm5, xm5, xm6 +%endif + vpsrad xm5, 7 + vpackssdw xm5, xm5, xm5 + vmovq [dstq], xm5 + add dstq, 0x8 + add fltposq, 0x10 + add countq, 0x4 + cmp countq, wq + jl .tail_loop +.end: REP_RET %endmacro -- 2.34.1.575.g55b058a8bb-goog _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2022-01-10 14:59 UTC|newest] Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-01-10 14:58 [FFmpeg-devel] [PATCH 1/4] libswscale: Re-factor ff_shuffle_filter_coefficients Alan Kelly 2022-01-10 14:58 ` Alan Kelly [this message] 2022-01-10 14:58 ` [FFmpeg-devel] [PATCH 3/4] libswscale: Enable hscale_avx2 for input sizes which ar emultiples of 4 Alan Kelly 2022-01-10 14:58 ` [FFmpeg-devel] [PATCH 4/4] checkasm/sw_scale: hscale does not requires cpuflag test Alan Kelly 2022-02-02 12:21 ` [FFmpeg-devel] [PATCH 1/4] libswscale: Re-factor ff_shuffle_filter_coefficients Alan Kelly 2022-02-03 14:11 ` Michael Niedermayer 2022-02-09 9:17 ` Alan Kelly
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20220110145836.3449558-2-alankelly@google.com \ --to=alankelly-at-google.com@ffmpeg.org \ --cc=alankelly@google.com \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git