Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
* [FFmpeg-devel] [PATCH] RFC: v210enc optimisations and initial AVX-512
@ 2022-10-21  3:41 Kieran Kunhya
  2022-10-21 13:57 ` Henrik Gramner
  0 siblings, 1 reply; 3+ messages in thread
From: Kieran Kunhya @ 2022-10-21  3:41 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

[-- Attachment #1: Type: text/plain, Size: 600 bytes --]

Hi,

Please see attached an attempt to optimise the 8-bit input to v210enc to
reduce the number of shuffles.
This comes at the cost of having to extract the middle element and perform
a DWORD shift on it and then reinserting it.
I have added a few comments but any other ideas are welcome.

Crude benchmarks on Intel(R) Xeon(R) D-2123IT:

Before:

v210_planar_pack_8_ssse3: 316.5
v210_planar_pack_8_avx: 319.0
v210_planar_pack_8_avx2: 223.0

After:

v210_planar_pack_8_ssse3: 321.0
v210_planar_pack_8_avx: 326.0
v210_planar_pack_8_avx2: 217.0
v210_planar_pack_8_avx512: 211.0

Regards,
Kieran Kunhya

[-- Attachment #2: 0001-RFC-v210enc-optimisations-and-initial-AVX-512.patch --]
[-- Type: application/octet-stream, Size: 4642 bytes --]

[-- Attachment #3: Type: text/plain, Size: 251 bytes --]

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [FFmpeg-devel] [PATCH] RFC: v210enc optimisations and initial AVX-512
  2022-10-21  3:41 [FFmpeg-devel] [PATCH] RFC: v210enc optimisations and initial AVX-512 Kieran Kunhya
@ 2022-10-21 13:57 ` Henrik Gramner
  2022-10-26 13:51   ` James Darnley
  0 siblings, 1 reply; 3+ messages in thread
From: Henrik Gramner @ 2022-10-21 13:57 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

On Fri, Oct 21, 2022 at 5:41 AM Kieran Kunhya <kierank@obe.tv> wrote:
>
> Hi,
>
> Please see attached an attempt to optimise the 8-bit input to v210enc to
> reduce the number of shuffles.
> This comes at the cost of having to extract the middle element and perform
> a DWORD shift on it and then reinserting it.
> I have added a few comments but any other ideas are welcome.

Random untested idea:

A: db 32,  0, 48, -1,  1, 33,  2, -1, 49,  3, 34, -1,  4, 50,  5, -1
   db 35,  6, 51, -1,  7, 36,  8, -1, 52,  9, 37, -1, 10, 53, 11, -1
   db 38, 12, 54, -1, 13, 39, 14, -1, 55, 15, 40, -1, 16, 56, 17, -1
   db 41, 18, 57, -1, 19, 42, 20, -1, 58, 21, 43, -1, 22, 59, 23, -1
B: db  1,  0, 16,  0
C: dd 0x0003fc00

[...]

mova              m2, [A]
vpbroadcastd      m3, [B]
vpbroadcastd      m6, [C]

[...]

.loop:
    movu         ym1, [yq]
    vinserti32x4  m1, [uq], 2
    vinserti32x4  m1, [vq], 3
    CLIPUB        m1, m4, m5
    vpermb        m1, m2, m1
    pmaddubsw     m0, m1, m3
    pslld         m1, 2
    vpternlogd    m0, m1, m6, 0xca
    movu      [dstq], m0

I guess it could also be scaled to ymm if you're a big Skylake fan :P
(in which case you'd probably want to reorder the shuffle indices so
that chroma comes first, i.e. movq [u] + movhps [v] + vinserti32x4
[y])
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [FFmpeg-devel] [PATCH] RFC: v210enc optimisations and initial AVX-512
  2022-10-21 13:57 ` Henrik Gramner
@ 2022-10-26 13:51   ` James Darnley
  0 siblings, 0 replies; 3+ messages in thread
From: James Darnley @ 2022-10-26 13:51 UTC (permalink / raw)
  To: ffmpeg-devel

> I guess it could also be scaled to ymm if you're a big Skylake fan :P                                                                                                                                             
> (in which case you'd probably want to reorder the shuffle indices so                                                                                                                                              
> that chroma comes first, i.e. movq [u] + movhps [v] + vinserti32x4[y])

What shuffle or permute did you have in mind when you suggested this for 
Skylake?  Without the permute I'm not sure how the change in ordering 
helps.  Aren't we stuck with data in separate lanes?  I'm probably 
missing something though.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-10-26 13:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-21  3:41 [FFmpeg-devel] [PATCH] RFC: v210enc optimisations and initial AVX-512 Kieran Kunhya
2022-10-21 13:57 ` Henrik Gramner
2022-10-26 13:51   ` James Darnley

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git