From: "Rémi Denis-Courmont" <remi@remlab.net> To: ffmpeg-devel@ffmpeg.org Subject: Re: [FFmpeg-devel] [PATCH 1/2] lavc/flacdsp: R-V V flac_wasted32 Date: Sun, 12 May 2024 22:41:00 +0300 Message-ID: <3598016.umlc2LYTmm@basile.remlab.net> (raw) In-Reply-To: <74a36742-14bd-469e-a43b-272d2e5b3ca9@gmail.com> Le sunnuntaina 12. toukokuuta 2024, 21.37.28 EEST James Almer a écrit : > Not sure if you're taking it into account, but the minimum blocksize is > 16 Granted, this only fills a single 8-vector vector group (v8-v15), so only a quarter of the register bank (v0-v31), which is unusually low. But that already adds up to 32 ints per iteration with 128-bit vectors respectively. IIUC, the x86 implementation is only half as much. > and the buffer is always allocated for max_blocksize plus padding, RVV really wants element-size alignment, so 32/64-bit here. Beyond that, it really does not care. (I think Arm SVE works the same way?) > so you should be able to do more samples per loop than this. In my experience, this particular hardware would likely exhibit marginally better performance with only *half* as many sample per iterations. I just don't want to overfit to this relatively early and low-end hardware design. In fact, Yuechi already has newer better hardware. > Same for wasted33. The wasted33 kernel actually already uses 3 eighth (v8-v11, v16-v23) of the bank, for 16 ints per iteration. I doubt that unrolling explicitly would help. The performance (~1.5x) is pretty disappointing to be sure. The root cause is RVV's notoriously lacks of widening left-shifts. FWIW, Zvbb only adds the unsigned variant, which is not what we need here. Plus there is no commercially available hardware with Zvbb yet. So in the end, we have an extra size conversion. And then 64-bit shift and 64-bit element stores which are half as fast as 32-bit ones on weighed basis. Maybe widening signed multiplication would be faster though, I will try that. -- Rémi Denis-Courmont http://www.remlab.net/ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
prev parent reply other threads:[~2024-05-12 19:41 UTC|newest] Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top 2024-05-12 17:07 Rémi Denis-Courmont 2024-05-12 18:37 ` James Almer 2024-05-12 19:41 ` Rémi Denis-Courmont [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=3598016.umlc2LYTmm@basile.remlab.net \ --to=remi@remlab.net \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git