From: James Almer <jamrial@gmail.com> To: ffmpeg-devel@ffmpeg.org Subject: Re: [FFmpeg-devel] [PATCH 5/6] lavc/apv: AVX2 transquant for x86-64 Date: Sat, 19 Apr 2025 18:16:45 -0300 Message-ID: <ca34f4f4-0620-442d-b5b5-b68c7c4e8ee7@gmail.com> (raw) In-Reply-To: <20250419190712.1265201-6-sw@jkqxz.net> [-- Attachment #1.1.1: Type: text/plain, Size: 4791 bytes --] On 4/19/2025 4:07 PM, Mark Thompson wrote: > diff --git a/libavcodec/x86/apv_dsp.asm b/libavcodec/x86/apv_dsp.asm > new file mode 100644 > index 0000000000..0329089f45 > --- /dev/null > +++ b/libavcodec/x86/apv_dsp.asm > @@ -0,0 +1,243 @@ > +;************************************************************************ > +;* This file is part of FFmpeg. > +;* > +;* FFmpeg is free software; you can redistribute it and/or > +;* modify it under the terms of the GNU Lesser General Public > +;* License as published by the Free Software Foundation; either > +;* version 2.1 of the License, or (at your option) any later version. > +;* > +;* FFmpeg is distributed in the hope that it will be useful, > +;* but WITHOUT ANY WARRANTY; without even the implied warranty of > +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +;* Lesser General Public License for more details. > +;* > +;* You should have received a copy of the GNU Lesser General Public > +;* License along with FFmpeg; if not, write to the Free Software > +;* 51, Inc., Foundation Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > +;****************************************************************************** > + > +%include "libavutil/x86/x86util.asm" > + > +SECTION .text > + > +align 32 > +const tmatrixh SECTION_RODATA 32 tmatrixh: dw ... tmatrixy: dw ... etc. Add only functions to SECTION .text > + dw 64, 89, 84, 75, 64, 50, 35, 18 > + dw 64, 75, 35, -18, -64, -89, -84, -50 > + dw 64, 50, -35, -89, -64, 18, 84, 75 > + dw 64, 18, -84, -50, 64, 75, -35, -89 > + dw 64, -18, -84, 50, 64, -75, -35, 89 > + dw 64, -50, -35, 89, -64, -18, 84, -75 > + dw 64, -75, 35, 18, -64, 89, -84, 50 > + dw 64, -89, 84, -75, 64, -50, 35, -18 > +const tmatrixv > + dw 64, 89, 84, 75, 64, 50, 35, 18 > + dw 64, -18, -84, 50, 64, -75, -35, 89 > + dw 64, 75, 35, -18, -64, -89, -84, -50 > + dw 64, -50, -35, 89, -64, -18, 84, -75 > + dw 64, 50, -35, -89, -64, 18, 84, 75 > + dw 64, -75, 35, 18, -64, 89, -84, 50 > + dw 64, 18, -84, -50, 64, 75, -35, -89 > + dw 64, -89, 84, -75, 64, -50, 35, -18 > + > +; Memory targets for vpbroadcastd (register version requires AVX512). > +const one > + dd 1 There's pd_1 defined in constants.c, and you can include it here with cextern pd_1 > +const sixtyfour > + dd 64 > + > +; void ff_apv_decode_transquant_avx2(void *output, > +; ptrdiff_t pitch, > +; const int16_t *input, > +; const int16_t *qmatrix, > +; int64_t bit_depth, > +; int64_t qp_shift); > + > +INIT_YMM avx2 > + > +cglobal apv_decode_transquant, 6, 6, 16, output, pitch, input, qmatrix, bit_depth, qp_shift > + > + ; Load input and dequantise > + > + lea rax, [bit_depthq - 2] Are you sure you're not overwriting a passed in argument with this? rax is different on Unix64, x86_32, and Win64 ABIs. You have qp_shift free after the mov to xm8 if you need a tmp register. In general, you should use the names you gave the registers, or the r$ aliases from x86inc. > + movq xm8, qp_shiftq Both bit_depth and this fit in an int, so unless there's a real reason to use int64_t in the prototype, you can change them to int and read 32 bits from the registers. > + movq xm9, rax > + vpbroadcastd m10, [one] > + vpslld m10, m10, xm9 > + vpsrld m10, m10, 1 No need to add the v prefix to pre-AVX instructions. x86inc will do its magic and add emit the VEX encoded version for them as required. Similarly, if dst and src1 are the same, you can remove one of them and x86inc will also handle it, so just do: pslld m10, xm9 And so. This is important to get yelled at by x86inc if you misuse an instruction in some cases, and if you use SWAP and other x86inc helpers so the correct register is used. > + > + ; m8 = scalar qp_shift > + ; m9 = scalar bd_shift > + ; m10 = vector 1 << (bd_shift - 1) > + ; m11 = qmatrix load > +%macro LOAD_AND_DEQUANT 2 ; (xmm input, constant offset) > + vpmovsxwd m%1, [inputq + %2] > + vpmovsxwd m11, [qmatrixq + %2] > + vpmulld m%1, m%1, m11 Can't you use pmaddwd here, seeing it's 16bit x 16bit -> 32bit? pmulld is super slow, like 10 cycles vs 3 or less from every other integer multiply instruction. > + vpslld m%1, m%1, xm8 > + vpaddd m%1, m%1, m10 > + vpsrad m%1, m%1, xm9 > + vpackssdw m%1, m%1, m%1 > +%endmacro [-- Attachment #1.2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 495 bytes --] [-- Attachment #2: Type: text/plain, Size: 251 bytes --] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2025-04-19 21:16 UTC|newest] Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top 2025-04-19 19:06 [FFmpeg-devel] [PATCH 0/6] APV support Mark Thompson 2025-04-19 19:06 ` [FFmpeg-devel] [PATCH 1/6] lavc: APV codec ID and descriptor Mark Thompson 2025-04-19 19:07 ` [FFmpeg-devel] [PATCH 2/6] lavf: APV demuxer Mark Thompson 2025-04-20 16:07 ` Derek Buitenhuis 2025-04-20 16:20 ` James Almer 2025-04-20 16:57 ` Mark Thompson 2025-04-20 18:59 ` James Almer 2025-04-21 0:54 ` Michael Niedermayer 2025-04-21 14:59 ` Mark Thompson 2025-04-21 15:22 ` Andreas Rheinhardt 2025-04-21 21:30 ` Michael Niedermayer 2025-04-19 19:07 ` [FFmpeg-devel] [PATCH 3/6] lavc/cbs: APV support Mark Thompson 2025-04-19 19:07 ` [FFmpeg-devel] [PATCH 4/6] lavc: APV decoder Mark Thompson 2025-04-21 14:09 ` James Almer 2025-04-19 19:07 ` [FFmpeg-devel] [PATCH 5/6] lavc/apv: AVX2 transquant for x86-64 Mark Thompson 2025-04-19 20:34 ` Mark Thompson 2025-04-19 21:16 ` James Almer [this message] 2025-04-20 1:48 ` James Almer 2025-04-19 19:07 ` [FFmpeg-devel] [PATCH 6/6] lavc: APV metadata bitstream filter Mark Thompson
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=ca34f4f4-0620-442d-b5b5-b68c7c4e8ee7@gmail.com \ --to=jamrial@gmail.com \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git