From: "Rémi Denis-Courmont" <remi@remlab.net>
To: "ffmpeg-devel@ffmpeg.org" <ffmpeg-devel@ffmpeg.org>
Cc: Michael Platzer <michael.platzer@axelera.ai>
Subject: Re: [FFmpeg-devel] RISC-V vector DSP functions: Motivation for commit 446b009
Date: Fri, 19 Jan 2024 19:14:02 +0200
Message-ID: <2165884.x4K2pz0Oi7@basile.remlab.net> (raw)
In-Reply-To: <DB8P194MB0839826005206AA13F7CA7AC97702@DB8P194MB0839.EURP194.PROD.OUTLOOK.COM>
Hi,
Le perjantaina 19. tammikuuta 2024, 17.30.00 EET Michael Platzer via ffmpeg-
devel a écrit :
> Commit 446b0090cbb66ee614dcf6ca79c78dc8eb7f0e37 by Remi Denis-Courmont has
> replaced RISC-V vector loads and stores with negative stride with vrgather
> (generalized permutation within vector registers) instructions in order to
> reverse the elements in a vector register. The commit message explains that
> this change was done, but it does not explain why.
It was faster on what the best approximation of real hardware available at the
time, i.e. a Sipeed Lichee Pi4A board. There are no benchmarks in the commit
because I don't like to publish benchmarks collected from prototypes.
Nevertheless I think the commit message hints enough that anybody could easily
guess that it was a performance optimisation, if I'm being honest.
This is not exactly surprising: typical hardware can only access so many
memory addresses simultaneously (i.e. one or maybe two), so indexed loads and
strided loads are bound to be much slower than unit-strided loads.
Maybe you have access to special hardware that is able to optimise the special
case of strides equal to minus one to reduce the number of memory accesses.
But I didn't back then, and as a matter of fact, I still don't. Hardware
donations are welcome.
> I fail to see what could possibly have motivated this change.
> The RISC-V vector loads and stores support negative stride values for use
> cases such as this one.
[Citation required]
> Using vrgather instead replaces the more specific operation with a more
> generic one,
That is a very subjective and unsubstantiated assertion. This feels a bit
hypocritical while you are attacking me for not providing justification.
As far as I can tell, neither instruction are specific to reversing vector
element order. An actual real-life specific instruction exists on Arm in the
form of vector-reverse. I don't know any ISA with load-reverse or store-
reverse.
> which is likely to be less performant on most HW architectures.
Would you care to define "most architectures"? I only know one commercially
available hardware architecture as of today, Kendryte K230 SoC with T-Head
C908 CPU, so I can't make much sense of your sentence here.
> In addition, it requires to setup an index vector,
That is irrelevant since in this loop, the vector bank is not a bottleneck.
The loop can run with maximul LMUL either way. And besides, the loop turned
out to be faster with a smaller multiplier.
> thus raising dynamic instruction count.
It adds only one instruction (reverse subtraction) in the main loop, and even
that could be optimised away if relevant.
--
レミ・デニ-クールモン
http://www.remlab.net/
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2024-01-19 17:14 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-19 15:30 Michael Platzer via ffmpeg-devel
2024-01-19 17:14 ` Rémi Denis-Courmont [this message]
2024-01-23 17:34 ` Michael Platzer via ffmpeg-devel
2024-01-23 18:02 ` Rémi Denis-Courmont
2024-07-06 12:05 ` Rémi Denis-Courmont
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2165884.x4K2pz0Oi7@basile.remlab.net \
--to=remi@remlab.net \
--cc=ffmpeg-devel@ffmpeg.org \
--cc=michael.platzer@axelera.ai \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git