From: Timo Rothenpieler <timo@rothenpieler.org> To: ffmpeg-devel@ffmpeg.org Subject: Re: [FFmpeg-devel] [PATCH] libavcodec/cuviddec.c: increase CUVID_DEFAULT_NUM_SURFACES Date: Sat, 22 Feb 2025 14:16:09 +0100 Message-ID: <e2b9873e-ddfc-4d0e-83cf-4c2317f59a98@rothenpieler.org> (raw) In-Reply-To: <0385232f-61a8-48a1-a426-4ea210ea2957@gmail.com> On 22.02.2025 03:52, Scott Theisen wrote: > On 2/21/25 08:26, Timo Rothenpieler wrote: >> On 20.02.2025 21:37, Scott Theisen wrote: >>> The default value of CuvidContext::nb_surfaces was reduced from 25 to >>> 5 (as >>> (CUVID_MAX_DISPLAY_DELAY + 1)) in >>> 402d98c9d467dff6931d906ebb732b9a00334e0b. >>> >>> In cuvid_is_buffer_full() delay can be 2 * CUVID_MAX_DISPLAY_DELAY >>> with double >>> rate deinterlacing. ctx->nb_surfaces is CUVID_DEFAULT_NUM_SURFACES = >>> (CUVID_MAX_DISPLAY_DELAY + 1) by default, in which case >>> cuvid_is_buffer_full() >>> will always return true and cuvid_output_frame() will never read any >>> data since >>> it will not call ff_decode_get_packet(). >> >> It's been way too long since I looked at all that code, and I didn't >> even write most of the code involved: >>> https://github.com/FFmpeg/FFmpeg/commit/ >>> bddb2343b6e594e312dadb5d21b408702929ae04 >>> https://github.com/FFmpeg/FFmpeg/ >>> commit/402d98c9d467dff6931d906ebb732b9a00334e0b >> >> But doesn't this instead mean that the logic in cuvid_is_buffer_full >> is flawed somehow? > > I think it is the number of frames ready to send to the driver + the > number of frames in queue in the driver >= the number of decoded frame > buffers. However, it doesn't actually know how many frames are in queue > in the driver and assumes the maximum. Not sure if I understand you right, but the way it works is that av_fifo_can_read(ctx->frame_queue) returns how many frames have already been returned from cuvid and are ready for cuviddec.c to return them. To that number, the maximum possible number of delayed frames is added, which could be returned by the decoder without feeding in any more input frames. If that number reaches the desired amount of surfaces to buffer, cuvid_is_buffer_full() will report that its buffer is full, and cuviddec.c will stop fetching new input. >> Just increasing the default number of surfaces does not seem like the >> correct fix or sensible, since it will increase VRAM usage by >> potentially quite a bit for all users. >> > > The changes to cuvid_handle_video_sequence() from > 402d98c9d467dff6931d906ebb732b9a00334e0b will increase nb_surfaces once > data has been read. Only if the decoder reports that it will potentially buffer even more frames. >> From looking at this a bit, the issue will only happen when >> deinterlacing, the logic in cuvid_is_buffer_full becomes stuck then, >> and will always claim the buffer is full. >> And from my understanding, it's correct in making that claim. Due to >> the display delay, it could in theory happen that the moment cuvid >> starts outputting frames, there will be more output available than >> what fits into ctx->frame_queue, since it delayed by 4 frames, which >> results in 8 surfaces, but the queue only fits 5. >> >> So to me it looks like that the correct fix would be to double the >> size of the frame_queue when deinterlacing, not unconditionally. > > There is nothing stopping deint_mode or drop_second_field from being > changed after cuvid_decode_init() is called, so it doesn't necessarily > know it will deinterlace. > > Regardless, 402d98c9d467dff6931d906ebb732b9a00334e0b reduced > CUVID_DEFAULT_NUM_SURFACES from 25 to *only 5* to not break playback > entirely. I don't think the intention was to break playback for double > rate deinterlacing while allowing playback for only single rate > deinterlacing. > > Also, if AV_CODEC_FLAG_LOW_DELAY is set, then only one output surface is > needed, but there are still 5. The structs stored in the ctx->frame_queue aren't what's using all the memory. It's the frames buffered by cuvid itself, which are referred to by that buffer, so having it be larger than what cuvid will actually buffer doesn't hurt all that much. But yeah, it could be shrunk in this case. What this whole dance is actually trying to accomplish is to prevent the number of "ready but not-yet-returned frames" to never exceed the max value possible by cuvid set via ulNumDecodeSurfaces/ulMaxNumDecodeSurfaces, which is determined and stored in ctx->nb_surfaces during cuvid_handle_video_sequence(). In the default mode of operation, the buffer_full indicator will indeed stop pulling new input the moment even one frame is returned. But that's fine, since at that point already a bunch of input has been consumed, and a decent delay has been built up already. In low-delay mode frames are pretty much returned the moment it's possible. So looking at all this, I still think the core of the issue is incorrect handling of deinterlacing in all this. CUVID treats a deinterlaced frame as one internal frame, but it's stored in the frame_queue as two frames. So in the case of deinterlacing without drop_second_field, the size of that queue needs to be doubled, but nb_surfaces must stay the same, since for cuvid itself it's still just one frame. And in turn the is_buffer_full function has to be adjusted to multiply nb_surfaces by two if deinterlacing and not drop_second_field. Changing that stuff at runtime is no problem, since to change anything, cuvid_handle_video_sequence() has to re-run, which updates all these sizes and will resize the fifo accordingly. And when turning it off, the only thing that happens is that buffer_full will report it's full immediately, and some frames need to be read out before accepting input again. There's also the edge case of "half a frame" having already been returned, so the queue potentially no longer being considered full, but still all decode surfaces are in use, since the other half of that deinterlaced frame is still in the queue. So special care must be taken to not report buffer-free too early. >> >> nb_surfaces is also used to determine the size of the key_frame array, >> which would then also be pointlessly doubled. But not like a handful >> of extra ints would hurt that much though. >> Alternatively the size-doubling could not be reflected in nb_surfaces, >> but that would make the logic in various other places be more >> complicated. >> >>> --- >>> >>> I think part of the problem might be that cuvid_is_buffer_full() does >>> not know >>> how many frames are actually in the driver's queue and assumes it is the >>> maximum, even if none have yet been added. >>> >>> This was preventing any frames from being decoded using NVDEC with >>> MythTV for >>> some streams. See https://github.com/MythTV/mythtv/issues/1039 >> >> I'd highly recommend to not use cuviddec anymore, but instead use nvdec. >> cuviddec only still exists to sanity-check nvdec against it at times. > > .*_cuvid are FFCodecs while .*_nvdec are FFHWAccels. I don't know what > would be required to change to .*_nvdec and the .*_cuvid FFCodecs work > fine with this change. You just use the normal native decoder and turn on hwaccel, to very broadly summarize it. Would also allow using a plethora of other hwaccels with the same code. Pretty much how ffmpeg.c -hwaccel auto works. If you already support things like vaapi/vdpau/d3d11va, you should already have the code necessary to also use nvdec with minimal changes. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
prev parent reply other threads:[~2025-02-22 13:16 UTC|newest] Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top 2025-02-20 20:37 Scott Theisen 2025-02-21 13:26 ` Timo Rothenpieler 2025-02-22 2:52 ` Scott Theisen 2025-02-22 13:16 ` Timo Rothenpieler [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=e2b9873e-ddfc-4d0e-83cf-4c2317f59a98@rothenpieler.org \ --to=timo@rothenpieler.org \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git