Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
* [FFmpeg-devel] [RFC] performance tuning, memcpy
@ 2025-11-05  7:48 Thilo Schunck via ffmpeg-devel
  2025-11-11  0:17 ` [FFmpeg-devel] " Niklas Haas via ffmpeg-devel
  0 siblings, 1 reply; 2+ messages in thread
From: Thilo Schunck via ffmpeg-devel @ 2025-11-05  7:48 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Thilo Schunck

Hi Team!

Apologies for maybe breaking submit rules but as of now I don't know better :-)

I figured out on arm "hwdownload" is quite slow.
I turns out this is caused by imgutils.c image_copy_plane which does a memcpy loop

     for (;height > 0; height--) {

        memcpy(dst, src, bytewidth);

        dst += dst_linesize;
        src += src_linesize;
    }

As a POC, quick'n dirty I create 4 threads and split the copy. In my case this improved fps from about ~26 to 51

./ffmpeg -hide_banner -hwaccel v4l2request -hwaccel_output_format drm_prime \
 -threads 4 \
 -i ../Big_Buck_Bunny_720_10s_10MB.mp4 \
 -filter_complex "[0:v]hwdownload,format=nv12[myOut]" -map "[myOut]"  \
 -f null -

Maybe someone is interested in this improvement with cleaned code. 
My PoC uses hard coded 4 threads which is for sure bad ...

Btw. This may apply to other locations as well.


Also, but specific for arm there is a tuned memcpy replacement:
https://github.com/simonjhall/copies-and-fills/
which also speeds up ffmpeg (and of course everything else).


 Best from Germany
     Thilo

_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [FFmpeg-devel] Re: [RFC] performance tuning, memcpy
  2025-11-05  7:48 [FFmpeg-devel] [RFC] performance tuning, memcpy Thilo Schunck via ffmpeg-devel
@ 2025-11-11  0:17 ` Niklas Haas via ffmpeg-devel
  0 siblings, 0 replies; 2+ messages in thread
From: Niklas Haas via ffmpeg-devel @ 2025-11-11  0:17 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Niklas Haas

On Monday, November 10th, 2025 at 3:09 PM, Thilo Schunck via ffmpeg-devel <ffmpeg-devel@ffmpeg.org> wrote:

> 
> 
> Hi Team!
> 
> Apologies for maybe breaking submit rules but as of now I don't know better :-)
> 
> I figured out on arm "hwdownload" is quite slow.
> I turns out this is caused by imgutils.c image_copy_plane which does a memcpy loop
> 
> for (;height > 0; height--) {
> 
> 
> memcpy(dst, src, bytewidth);
> 
> dst += dst_linesize;
> src += src_linesize;
> }
> 
> As a POC, quick'n dirty I create 4 threads and split the copy. In my case this improved fps from about ~26 to 51
> 
> ./ffmpeg -hide_banner -hwaccel v4l2request -hwaccel_output_format drm_prime \
> -threads 4 \
> -i ../Big_Buck_Bunny_720_10s_10MB.mp4 \
> -filter_complex "[0:v]hwdownload,format=nv12[myOut]" -map "[myOut]" \
> -f null -
> 
> Maybe someone is interested in this improvement with cleaned code.
> My PoC uses hard coded 4 threads which is for sure bad ...

I could see `hwcontext_drm` specifically using an internal `AVSliceThread` for `RAM<->VRAM` transfers. We have to be careful about thread safety, though, as `av_hwframe_transfer_data` may be called from multiple threads. So in the worst case, we need to create and tear down the `AVSliceThread` per frame being transferred..

As a possible alternative, `vf_hwdownload` could become frame-threaded, though that would only improve throughput, not latency.

_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-11-11  0:17 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-05  7:48 [FFmpeg-devel] [RFC] performance tuning, memcpy Thilo Schunck via ffmpeg-devel
2025-11-11  0:17 ` [FFmpeg-devel] " Niklas Haas via ffmpeg-devel

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git