From: James Almer <jamrial@gmail.com>
To: ffmpeg-devel@ffmpeg.org
Subject: Re: [FFmpeg-devel] [PATCH 5/6] lavc/apv: AVX2 transquant for x86-64
Date: Sat, 19 Apr 2025 18:16:45 -0300
Message-ID: <ca34f4f4-0620-442d-b5b5-b68c7c4e8ee7@gmail.com> (raw)
In-Reply-To: <20250419190712.1265201-6-sw@jkqxz.net>
[-- Attachment #1.1.1: Type: text/plain, Size: 4791 bytes --]
On 4/19/2025 4:07 PM, Mark Thompson wrote:
> diff --git a/libavcodec/x86/apv_dsp.asm b/libavcodec/x86/apv_dsp.asm
> new file mode 100644
> index 0000000000..0329089f45
> --- /dev/null
> +++ b/libavcodec/x86/apv_dsp.asm
> @@ -0,0 +1,243 @@
> +;************************************************************************
> +;* This file is part of FFmpeg.
> +;*
> +;* FFmpeg is free software; you can redistribute it and/or
> +;* modify it under the terms of the GNU Lesser General Public
> +;* License as published by the Free Software Foundation; either
> +;* version 2.1 of the License, or (at your option) any later version.
> +;*
> +;* FFmpeg is distributed in the hope that it will be useful,
> +;* but WITHOUT ANY WARRANTY; without even the implied warranty of
> +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +;* Lesser General Public License for more details.
> +;*
> +;* You should have received a copy of the GNU Lesser General Public
> +;* License along with FFmpeg; if not, write to the Free Software
> +;* 51, Inc., Foundation Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> +;******************************************************************************
> +
> +%include "libavutil/x86/x86util.asm"
> +
> +SECTION .text
> +
> +align 32
> +const tmatrixh
SECTION_RODATA 32
tmatrixh: dw ...
tmatrixy: dw ...
etc. Add only functions to SECTION .text
> + dw 64, 89, 84, 75, 64, 50, 35, 18
> + dw 64, 75, 35, -18, -64, -89, -84, -50
> + dw 64, 50, -35, -89, -64, 18, 84, 75
> + dw 64, 18, -84, -50, 64, 75, -35, -89
> + dw 64, -18, -84, 50, 64, -75, -35, 89
> + dw 64, -50, -35, 89, -64, -18, 84, -75
> + dw 64, -75, 35, 18, -64, 89, -84, 50
> + dw 64, -89, 84, -75, 64, -50, 35, -18
> +const tmatrixv
> + dw 64, 89, 84, 75, 64, 50, 35, 18
> + dw 64, -18, -84, 50, 64, -75, -35, 89
> + dw 64, 75, 35, -18, -64, -89, -84, -50
> + dw 64, -50, -35, 89, -64, -18, 84, -75
> + dw 64, 50, -35, -89, -64, 18, 84, 75
> + dw 64, -75, 35, 18, -64, 89, -84, 50
> + dw 64, 18, -84, -50, 64, 75, -35, -89
> + dw 64, -89, 84, -75, 64, -50, 35, -18
> +
> +; Memory targets for vpbroadcastd (register version requires AVX512).
> +const one
> + dd 1
There's pd_1 defined in constants.c, and you can include it here with
cextern pd_1
> +const sixtyfour
> + dd 64
> +
> +; void ff_apv_decode_transquant_avx2(void *output,
> +; ptrdiff_t pitch,
> +; const int16_t *input,
> +; const int16_t *qmatrix,
> +; int64_t bit_depth,
> +; int64_t qp_shift);
> +
> +INIT_YMM avx2
> +
> +cglobal apv_decode_transquant, 6, 6, 16, output, pitch, input, qmatrix, bit_depth, qp_shift
> +
> + ; Load input and dequantise
> +
> + lea rax, [bit_depthq - 2]
Are you sure you're not overwriting a passed in argument with this? rax
is different on Unix64, x86_32, and Win64 ABIs. You have qp_shift free
after the mov to xm8 if you need a tmp register.
In general, you should use the names you gave the registers, or the r$
aliases from x86inc.
> + movq xm8, qp_shiftq
Both bit_depth and this fit in an int, so unless there's a real reason
to use int64_t in the prototype, you can change them to int and read 32
bits from the registers.
> + movq xm9, rax
> + vpbroadcastd m10, [one]
> + vpslld m10, m10, xm9
> + vpsrld m10, m10, 1
No need to add the v prefix to pre-AVX instructions. x86inc will do its
magic and add emit the VEX encoded version for them as required.
Similarly, if dst and src1 are the same, you can remove one of them and
x86inc will also handle it, so just do:
pslld m10, xm9
And so. This is important to get yelled at by x86inc if you misuse an
instruction in some cases, and if you use SWAP and other x86inc helpers
so the correct register is used.
> +
> + ; m8 = scalar qp_shift
> + ; m9 = scalar bd_shift
> + ; m10 = vector 1 << (bd_shift - 1)
> + ; m11 = qmatrix load
> +%macro LOAD_AND_DEQUANT 2 ; (xmm input, constant offset)
> + vpmovsxwd m%1, [inputq + %2]
> + vpmovsxwd m11, [qmatrixq + %2]
> + vpmulld m%1, m%1, m11
Can't you use pmaddwd here, seeing it's 16bit x 16bit -> 32bit? pmulld
is super slow, like 10 cycles vs 3 or less from every other integer
multiply instruction.
> + vpslld m%1, m%1, xm8
> + vpaddd m%1, m%1, m10
> + vpsrad m%1, m%1, xm9
> + vpackssdw m%1, m%1, m%1
> +%endmacro
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
[-- Attachment #2: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2025-04-19 21:16 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-19 19:06 [FFmpeg-devel] [PATCH 0/6] APV support Mark Thompson
2025-04-19 19:06 ` [FFmpeg-devel] [PATCH 1/6] lavc: APV codec ID and descriptor Mark Thompson
2025-04-19 19:07 ` [FFmpeg-devel] [PATCH 2/6] lavf: APV demuxer Mark Thompson
2025-04-20 16:07 ` Derek Buitenhuis
2025-04-20 16:20 ` James Almer
2025-04-20 16:57 ` Mark Thompson
2025-04-20 18:59 ` James Almer
2025-04-21 0:54 ` Michael Niedermayer
2025-04-21 14:59 ` Mark Thompson
2025-04-21 15:22 ` Andreas Rheinhardt
2025-04-21 21:30 ` Michael Niedermayer
2025-04-19 19:07 ` [FFmpeg-devel] [PATCH 3/6] lavc/cbs: APV support Mark Thompson
2025-04-19 19:07 ` [FFmpeg-devel] [PATCH 4/6] lavc: APV decoder Mark Thompson
2025-04-21 14:09 ` James Almer
2025-04-19 19:07 ` [FFmpeg-devel] [PATCH 5/6] lavc/apv: AVX2 transquant for x86-64 Mark Thompson
2025-04-19 20:34 ` Mark Thompson
2025-04-19 21:16 ` James Almer [this message]
2025-04-20 1:48 ` James Almer
2025-04-19 19:07 ` [FFmpeg-devel] [PATCH 6/6] lavc: APV metadata bitstream filter Mark Thompson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ca34f4f4-0620-442d-b5b5-b68c7c4e8ee7@gmail.com \
--to=jamrial@gmail.com \
--cc=ffmpeg-devel@ffmpeg.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git