Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
From: "Rémi Denis-Courmont" <remi@remlab.net>
To: ffmpeg-devel@ffmpeg.org
Subject: Re: [FFmpeg-devel] [PATCH 1/2] lavc/flacdsp: R-V V flac_wasted32
Date: Sun, 12 May 2024 22:41:00 +0300
Message-ID: <3598016.umlc2LYTmm@basile.remlab.net> (raw)
In-Reply-To: <74a36742-14bd-469e-a43b-272d2e5b3ca9@gmail.com>

Le sunnuntaina 12. toukokuuta 2024, 21.37.28 EEST James Almer a écrit :
> Not sure if you're taking it into account, but the minimum blocksize is
> 16

Granted, this only fills a single 8-vector vector group (v8-v15), so only a 
quarter of the register bank (v0-v31), which is unusually low. But that 
already adds up to 32 ints per iteration with 128-bit vectors respectively. 
IIUC, the x86 implementation is only half as much.

> and the buffer is always allocated for max_blocksize plus padding,

RVV really wants element-size alignment, so 32/64-bit here. Beyond that, it 
really does not care. (I think Arm SVE works the same way?)

> so you should be able to do more samples per loop than this.

In my experience, this particular hardware would likely exhibit marginally 
better performance with only *half* as many sample per iterations. I just 
don't want to overfit to this relatively early and low-end hardware design. In 
fact, Yuechi already has newer better hardware.

> Same for wasted33.

The wasted33 kernel actually already uses 3 eighth (v8-v11, v16-v23) of the
bank, for 16 ints per iteration. I doubt that unrolling explicitly would help.

The performance (~1.5x) is pretty disappointing to be sure. The root cause is 
RVV's notoriously lacks of widening left-shifts. FWIW, Zvbb only adds the 
unsigned variant, which is not what we need here. Plus there is no 
commercially available hardware with Zvbb yet. So in the end, we have an extra 
size conversion. And then 64-bit shift and 64-bit element stores which are 
half as fast as 32-bit ones on weighed basis.

Maybe widening signed multiplication would be faster though, I will try that.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

      reply	other threads:[~2024-05-12 19:41 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-12 17:07 Rémi Denis-Courmont
2024-05-12 18:37 ` James Almer
2024-05-12 19:41   ` Rémi Denis-Courmont [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3598016.umlc2LYTmm@basile.remlab.net \
    --to=remi@remlab.net \
    --cc=ffmpeg-devel@ffmpeg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git