From: "Martin Storsjö" <martin@martin.st> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Subject: Re: [FFmpeg-devel] [PATCH] slicethread: Limit the automatic number of threads to 16 Date: Tue, 6 Sep 2022 23:53:48 +0300 (EEST) Message-ID: <4d712553-76b1-3bfb-9c91-1061d87365f2@martin.st> (raw) In-Reply-To: <trinity-7a28de0c-6ad3-4d0e-9f76-41d8316c10d7-1662493821677@3c-app-gmx-bs29> On Tue, 6 Sep 2022, Lukas Fellechner wrote: > There are really two separate issues here: > > 1. Running out of address space in 32-bit processes > > It probably makes sense to limit auto threads to 16, but it should only > be done in 32-bit processes. FWIW, this was my first approach, until Andreas pointed out that we have such caps for automatic numbers of threads already in all other places where we pick an automatic number of threads - including libavcodec/pthread_slice.c, where the limit already today is 16 threads. Also FWIW, this patch was already pushed, after being OK'd by Andreas on irc. > A 64-bit process should never run out of address space. We should not > cripple high end machines running 64-bit applications. > > > Sidenotes about "it does not make sense to have more than 16 slices": > On 8K video, when using 32 threads, each thread will process 256 lines > or about 1MP (> FullHD!). Sure makes sense to me. But even for sw decoding > 4K video, having more than 16 threads on a powerful machine makes sense. > > Intel's next desktop CPUs will have up to 24 physical cores. The > proposed change would limit them to use only 16 cores, even on 64-bit. > > > 2. Spawning too many threads when "auto" is used in multiple places > > This can indeed be an efficiency problem, although probably not major. > Since usually only one part of the pipeline is active at any time, > many of the threads will be sleeping, consuming very little resources. For 32 bit processes running out of address space, yes, the issue is with "auto" being used in many places at once. But in general, allowing arbitrarily high numbers of auto threads isn't beneficial - the optimal cap of threads depends a lot on the content at hand. The system I'm testing on has 160 cores - and it's quite certain that doing slice threading with 160 slices doesn't make sense. Maybe the cap of 16 is indeed too low - I don't mind raising it to 32 or something like that. Ideally, the auto mechanism would factor in the resolution of the content. Just for arguments sake - here's the output from 'time ffmpeg ...' for a fairly straightforward transcode (decode, transpose, scale, encode), 1080p input 10bit, 720p output 8bit, with explicitly setting the number of threads ("ffmpeg -threads N -i input -threads N -filter_threads N output"). 12: real 0m25.079s user 5m22.318s sys 0m5.047s 16: real 0m19.967s user 6m3.607s sys 0m9.112s 20: real 0m20.853s user 6m21.841s sys 0m28.829s 24: real 0m20.642s user 6m28.022s sys 1m1.262s 32: real 0m29.785s user 6m8.442s sys 4m45.290s 64: real 1m0.808s user 6m31.065s sys 40m44.598s I'm not testing this with 160 threads for each stage, since 64 already was painfully slow - while you suggest that using threads==cores always should be preferred, regardless of the number of cores. The optimum here seems to be somewhere between 16 and 20. Also, in these cases, the decoder and encoder both warn that "Application has requested N threads. Using a thread count greater than 16 is not recommended" (see libavcodec/pthread.c). I can also test with only varying the -filter_threads parameter, while keeping the decoder and encoder threads fixed at 16: 16: real 0m20.303s user 6m5.425s sys 0m12.954s 20: real 0m20.862s user 6m12.625s sys 0m21.860s 24: real 0m20.445s user 6m20.734s sys 0m21.111s 32: real 0m21.216s user 6m15.926s sys 0m42.264s 64: real 0m20.687s user 6m39.544s sys 0m59.204s Not quite as dramatical in this case, but (on this particular test clip, mostly determined by the resolution) we still don't gain anything above 16 threads. On a larger test clip, the optimum number of slice threads probably is a bit higher. But always using up to the number of cores isn't really healthy. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2022-09-06 20:54 UTC|newest] Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-09-05 12:20 Martin Storsjö 2022-09-05 19:58 ` Martin Storsjö 2022-09-06 19:50 ` Lukas Fellechner 2022-09-06 20:53 ` Martin Storsjö [this message] 2022-09-11 19:00 ` Lukas Fellechner 2022-09-06 21:11 ` Andreas Rheinhardt 2022-09-11 18:42 ` Lukas Fellechner
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=4d712553-76b1-3bfb-9c91-1061d87365f2@martin.st \ --to=martin@martin.st \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git