* [FFmpeg-devel] [RFC] performance tuning, memcpy
@ 2025-11-05 7:48 Thilo Schunck via ffmpeg-devel
2025-11-11 0:17 ` [FFmpeg-devel] " Niklas Haas via ffmpeg-devel
0 siblings, 1 reply; 2+ messages in thread
From: Thilo Schunck via ffmpeg-devel @ 2025-11-05 7:48 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Thilo Schunck
Hi Team!
Apologies for maybe breaking submit rules but as of now I don't know better :-)
I figured out on arm "hwdownload" is quite slow.
I turns out this is caused by imgutils.c image_copy_plane which does a memcpy loop
for (;height > 0; height--) {
memcpy(dst, src, bytewidth);
dst += dst_linesize;
src += src_linesize;
}
As a POC, quick'n dirty I create 4 threads and split the copy. In my case this improved fps from about ~26 to 51
./ffmpeg -hide_banner -hwaccel v4l2request -hwaccel_output_format drm_prime \
-threads 4 \
-i ../Big_Buck_Bunny_720_10s_10MB.mp4 \
-filter_complex "[0:v]hwdownload,format=nv12[myOut]" -map "[myOut]" \
-f null -
Maybe someone is interested in this improvement with cleaned code.
My PoC uses hard coded 4 threads which is for sure bad ...
Btw. This may apply to other locations as well.
Also, but specific for arm there is a tuned memcpy replacement:
https://github.com/simonjhall/copies-and-fills/
which also speeds up ffmpeg (and of course everything else).
Best from Germany
Thilo
_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org
^ permalink raw reply [flat|nested] 2+ messages in thread
* [FFmpeg-devel] Re: [RFC] performance tuning, memcpy
2025-11-05 7:48 [FFmpeg-devel] [RFC] performance tuning, memcpy Thilo Schunck via ffmpeg-devel
@ 2025-11-11 0:17 ` Niklas Haas via ffmpeg-devel
0 siblings, 0 replies; 2+ messages in thread
From: Niklas Haas via ffmpeg-devel @ 2025-11-11 0:17 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Niklas Haas
On Monday, November 10th, 2025 at 3:09 PM, Thilo Schunck via ffmpeg-devel <ffmpeg-devel@ffmpeg.org> wrote:
>
>
> Hi Team!
>
> Apologies for maybe breaking submit rules but as of now I don't know better :-)
>
> I figured out on arm "hwdownload" is quite slow.
> I turns out this is caused by imgutils.c image_copy_plane which does a memcpy loop
>
> for (;height > 0; height--) {
>
>
> memcpy(dst, src, bytewidth);
>
> dst += dst_linesize;
> src += src_linesize;
> }
>
> As a POC, quick'n dirty I create 4 threads and split the copy. In my case this improved fps from about ~26 to 51
>
> ./ffmpeg -hide_banner -hwaccel v4l2request -hwaccel_output_format drm_prime \
> -threads 4 \
> -i ../Big_Buck_Bunny_720_10s_10MB.mp4 \
> -filter_complex "[0:v]hwdownload,format=nv12[myOut]" -map "[myOut]" \
> -f null -
>
> Maybe someone is interested in this improvement with cleaned code.
> My PoC uses hard coded 4 threads which is for sure bad ...
I could see `hwcontext_drm` specifically using an internal `AVSliceThread` for `RAM<->VRAM` transfers. We have to be careful about thread safety, though, as `av_hwframe_transfer_data` may be called from multiple threads. So in the worst case, we need to create and tear down the `AVSliceThread` per frame being transferred..
As a possible alternative, `vf_hwdownload` could become frame-threaded, though that would only improve throughput, not latency.
_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-11-11 0:17 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-05 7:48 [FFmpeg-devel] [RFC] performance tuning, memcpy Thilo Schunck via ffmpeg-devel
2025-11-11 0:17 ` [FFmpeg-devel] " Niklas Haas via ffmpeg-devel
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git