[FFmpeg-devel] [PATCH] libavcodec/cuviddec.c: increase CUVID_DEFAULT_NUM

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed

* [FFmpeg-devel] [PATCH] libavcodec/cuviddec.c: increase CUVID_DEFAULT_NUM_SURFACES
@ 2025-02-20 20:37 Scott Theisen
  2025-02-21 13:26 ` Timo Rothenpieler
  0 siblings, 1 reply; 11+ messages in thread
From: Scott Theisen @ 2025-02-20 20:37 UTC (permalink / raw)
  To: ffmpeg-devel

The default value of CuvidContext::nb_surfaces was reduced from 25 to 5 (as
(CUVID_MAX_DISPLAY_DELAY + 1)) in 402d98c9d467dff6931d906ebb732b9a00334e0b.

In cuvid_is_buffer_full() delay can be 2 * CUVID_MAX_DISPLAY_DELAY with double
rate deinterlacing.  ctx->nb_surfaces is CUVID_DEFAULT_NUM_SURFACES =
(CUVID_MAX_DISPLAY_DELAY + 1) by default, in which case cuvid_is_buffer_full()
will always return true and cuvid_output_frame() will never read any data since
it will not call ff_decode_get_packet().
---

I think part of the problem might be that cuvid_is_buffer_full() does not know
how many frames are actually in the driver's queue and assumes it is the
maximum, even if none have yet been added.

This was preventing any frames from being decoded using NVDEC with MythTV for
some streams.  See https://github.com/MythTV/mythtv/issues/1039

---
 libavcodec/cuviddec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/cuviddec.c b/libavcodec/cuviddec.c
index 67076a1752..05dcafab6e 100644
--- a/libavcodec/cuviddec.c
+++ b/libavcodec/cuviddec.c
@@ -120,7 +120,7 @@ typedef struct CuvidParsedFrame
 #define CUVID_MAX_DISPLAY_DELAY (4)
 
 // Actual pool size will be determined by parser.
-#define CUVID_DEFAULT_NUM_SURFACES (CUVID_MAX_DISPLAY_DELAY + 1)
+#define CUVID_DEFAULT_NUM_SURFACES ((2 * CUVID_MAX_DISPLAY_DELAY) + 1)
 
 static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* format)
 {
-- 
2.43.0

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FFmpeg-devel] [PATCH] libavcodec/cuviddec.c: increase CUVID_DEFAULT_NUM_SURFACES
  2025-02-20 20:37 [FFmpeg-devel] [PATCH] libavcodec/cuviddec.c: increase CUVID_DEFAULT_NUM_SURFACES Scott Theisen
@ 2025-02-21 13:26 ` Timo Rothenpieler
  2025-02-22  2:52   ` Scott Theisen
  0 siblings, 1 reply; 11+ messages in thread
From: Timo Rothenpieler @ 2025-02-21 13:26 UTC (permalink / raw)
  To: ffmpeg-devel

On 20.02.2025 21:37, Scott Theisen wrote:
> The default value of CuvidContext::nb_surfaces was reduced from 25 to 5 (as
> (CUVID_MAX_DISPLAY_DELAY + 1)) in 402d98c9d467dff6931d906ebb732b9a00334e0b.
> 
> In cuvid_is_buffer_full() delay can be 2 * CUVID_MAX_DISPLAY_DELAY with double
> rate deinterlacing.  ctx->nb_surfaces is CUVID_DEFAULT_NUM_SURFACES =
> (CUVID_MAX_DISPLAY_DELAY + 1) by default, in which case cuvid_is_buffer_full()
> will always return true and cuvid_output_frame() will never read any data since
> it will not call ff_decode_get_packet().

It's been way too long since I looked at all that code, and I didn't 
even write most of the code involved:
> https://github.com/FFmpeg/FFmpeg/commit/bddb2343b6e594e312dadb5d21b408702929ae04
> https://github.com/FFmpeg/FFmpeg/commit/402d98c9d467dff6931d906ebb732b9a00334e0b

But doesn't this instead mean that the logic in cuvid_is_buffer_full is 
flawed somehow?
Just increasing the default number of surfaces does not seem like the 
correct fix or sensible, since it will increase VRAM usage by 
potentially quite a bit for all users.

 From looking at this a bit, the issue will only happen when 
deinterlacing, the logic in cuvid_is_buffer_full becomes stuck then, and 
will always claim the buffer is full.
And from my understanding, it's correct in making that claim. Due to the 
display delay, it could in theory happen that the moment cuvid starts 
outputting frames, there will be more output available than what fits 
into ctx->frame_queue, since it delayed by 4 frames, which results in 8 
surfaces, but the queue only fits 5.

So to me it looks like that the correct fix would be to double the size 
of the frame_queue when deinterlacing, not unconditionally.

nb_surfaces is also used to determine the size of the key_frame array, 
which would then also be pointlessly doubled. But not like a handful of 
extra ints would hurt that much though.
Alternatively the size-doubling could not be reflected in nb_surfaces, 
but that would make the logic in various other places be more complicated.

> ---
> 
> I think part of the problem might be that cuvid_is_buffer_full() does not know
> how many frames are actually in the driver's queue and assumes it is the
> maximum, even if none have yet been added.
> 
> This was preventing any frames from being decoded using NVDEC with MythTV for
> some streams.  See https://github.com/MythTV/mythtv/issues/1039

I'd highly recommend to not use cuviddec anymore, but instead use nvdec.
cuviddec only still exists to sanity-check nvdec against it at times.

> ---
>   libavcodec/cuviddec.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libavcodec/cuviddec.c b/libavcodec/cuviddec.c
> index 67076a1752..05dcafab6e 100644
> --- a/libavcodec/cuviddec.c
> +++ b/libavcodec/cuviddec.c
> @@ -120,7 +120,7 @@ typedef struct CuvidParsedFrame
>   #define CUVID_MAX_DISPLAY_DELAY (4)
>   
>   // Actual pool size will be determined by parser.
> -#define CUVID_DEFAULT_NUM_SURFACES (CUVID_MAX_DISPLAY_DELAY + 1)
> +#define CUVID_DEFAULT_NUM_SURFACES ((2 * CUVID_MAX_DISPLAY_DELAY) + 1)
>   
>   static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* format)
>   {

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FFmpeg-devel] [PATCH] libavcodec/cuviddec.c: increase CUVID_DEFAULT_NUM_SURFACES
  2025-02-21 13:26 ` Timo Rothenpieler
@ 2025-02-22  2:52   ` Scott Theisen
  2025-02-22 13:16     ` Timo Rothenpieler
  0 siblings, 1 reply; 11+ messages in thread
From: Scott Theisen @ 2025-02-22  2:52 UTC (permalink / raw)
  To: ffmpeg-devel

On 2/21/25 08:26, Timo Rothenpieler wrote:
> On 20.02.2025 21:37, Scott Theisen wrote:
>> The default value of CuvidContext::nb_surfaces was reduced from 25 to 
>> 5 (as
>> (CUVID_MAX_DISPLAY_DELAY + 1)) in 
>> 402d98c9d467dff6931d906ebb732b9a00334e0b.
>>
>> In cuvid_is_buffer_full() delay can be 2 * CUVID_MAX_DISPLAY_DELAY 
>> with double
>> rate deinterlacing.  ctx->nb_surfaces is CUVID_DEFAULT_NUM_SURFACES =
>> (CUVID_MAX_DISPLAY_DELAY + 1) by default, in which case 
>> cuvid_is_buffer_full()
>> will always return true and cuvid_output_frame() will never read any 
>> data since
>> it will not call ff_decode_get_packet().
>
> It's been way too long since I looked at all that code, and I didn't 
> even write most of the code involved:
>> https://github.com/FFmpeg/FFmpeg/commit/bddb2343b6e594e312dadb5d21b408702929ae04 
>>
>> https://github.com/FFmpeg/FFmpeg/commit/402d98c9d467dff6931d906ebb732b9a00334e0b 
>>
>
> But doesn't this instead mean that the logic in cuvid_is_buffer_full 
> is flawed somehow?

I think it is the number of frames ready to send to the driver + the 
number of frames in queue in the driver >= the number of decoded frame 
buffers.  However, it doesn't actually know how many frames are in queue 
in the driver and assumes the maximum.

> Just increasing the default number of surfaces does not seem like the 
> correct fix or sensible, since it will increase VRAM usage by 
> potentially quite a bit for all users.
>

The changes to cuvid_handle_video_sequence() from 
402d98c9d467dff6931d906ebb732b9a00334e0b will increase nb_surfaces once 
data has been read.

> From looking at this a bit, the issue will only happen when 
> deinterlacing, the logic in cuvid_is_buffer_full becomes stuck then, 
> and will always claim the buffer is full.
> And from my understanding, it's correct in making that claim. Due to 
> the display delay, it could in theory happen that the moment cuvid 
> starts outputting frames, there will be more output available than 
> what fits into ctx->frame_queue, since it delayed by 4 frames, which 
> results in 8 surfaces, but the queue only fits 5.
>
> So to me it looks like that the correct fix would be to double the 
> size of the frame_queue when deinterlacing, not unconditionally.

There is nothing stopping deint_mode or drop_second_field from being 
changed after cuvid_decode_init() is called, so it doesn't necessarily 
know it will deinterlace.

Regardless, 402d98c9d467dff6931d906ebb732b9a00334e0b reduced 
CUVID_DEFAULT_NUM_SURFACES from 25 to *only 5* to not break playback 
entirely.  I don't think the intention was to break playback for double 
rate deinterlacing while allowing playback for only single rate 
deinterlacing.

Also, if AV_CODEC_FLAG_LOW_DELAY is set, then only one output surface is 
needed, but there are still 5.

>
> nb_surfaces is also used to determine the size of the key_frame array, 
> which would then also be pointlessly doubled. But not like a handful 
> of extra ints would hurt that much though.
> Alternatively the size-doubling could not be reflected in nb_surfaces, 
> but that would make the logic in various other places be more 
> complicated.
>
>> ---
>>
>> I think part of the problem might be that cuvid_is_buffer_full() does 
>> not know
>> how many frames are actually in the driver's queue and assumes it is the
>> maximum, even if none have yet been added.
>>
>> This was preventing any frames from being decoded using NVDEC with 
>> MythTV for
>> some streams.  See https://github.com/MythTV/mythtv/issues/1039
>
> I'd highly recommend to not use cuviddec anymore, but instead use nvdec.
> cuviddec only still exists to sanity-check nvdec against it at times.

.*_cuvid are FFCodecs while .*_nvdec are FFHWAccels.  I don't know what 
would be required to change to .*_nvdec and the .*_cuvid FFCodecs work 
fine with this change.

>
>> ---
>>   libavcodec/cuviddec.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/libavcodec/cuviddec.c b/libavcodec/cuviddec.c
>> index 67076a1752..05dcafab6e 100644
>> --- a/libavcodec/cuviddec.c
>> +++ b/libavcodec/cuviddec.c
>> @@ -120,7 +120,7 @@ typedef struct CuvidParsedFrame
>>   #define CUVID_MAX_DISPLAY_DELAY (4)
>>     // Actual pool size will be determined by parser.
>> -#define CUVID_DEFAULT_NUM_SURFACES (CUVID_MAX_DISPLAY_DELAY + 1)
>> +#define CUVID_DEFAULT_NUM_SURFACES ((2 * CUVID_MAX_DISPLAY_DELAY) + 1)
>>     static int CUDAAPI cuvid_handle_video_sequence(void *opaque, 
>> CUVIDEOFORMAT* format)
>>   {
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FFmpeg-devel] [PATCH] libavcodec/cuviddec.c: increase CUVID_DEFAULT_NUM_SURFACES
  2025-02-22  2:52   ` Scott Theisen
@ 2025-02-22 13:16     ` Timo Rothenpieler
  2025-02-25  5:36       ` Scott Theisen
  0 siblings, 1 reply; 11+ messages in thread
From: Timo Rothenpieler @ 2025-02-22 13:16 UTC (permalink / raw)
  To: ffmpeg-devel

On 22.02.2025 03:52, Scott Theisen wrote:
> On 2/21/25 08:26, Timo Rothenpieler wrote:
>> On 20.02.2025 21:37, Scott Theisen wrote:
>>> The default value of CuvidContext::nb_surfaces was reduced from 25 to 
>>> 5 (as
>>> (CUVID_MAX_DISPLAY_DELAY + 1)) in 
>>> 402d98c9d467dff6931d906ebb732b9a00334e0b.
>>>
>>> In cuvid_is_buffer_full() delay can be 2 * CUVID_MAX_DISPLAY_DELAY 
>>> with double
>>> rate deinterlacing.  ctx->nb_surfaces is CUVID_DEFAULT_NUM_SURFACES =
>>> (CUVID_MAX_DISPLAY_DELAY + 1) by default, in which case 
>>> cuvid_is_buffer_full()
>>> will always return true and cuvid_output_frame() will never read any 
>>> data since
>>> it will not call ff_decode_get_packet().
>>
>> It's been way too long since I looked at all that code, and I didn't 
>> even write most of the code involved:
>>> https://github.com/FFmpeg/FFmpeg/commit/ 
>>> bddb2343b6e594e312dadb5d21b408702929ae04
>>> https://github.com/FFmpeg/FFmpeg/ 
>>> commit/402d98c9d467dff6931d906ebb732b9a00334e0b
>>
>> But doesn't this instead mean that the logic in cuvid_is_buffer_full 
>> is flawed somehow?
> 
> I think it is the number of frames ready to send to the driver + the 
> number of frames in queue in the driver >= the number of decoded frame 
> buffers.  However, it doesn't actually know how many frames are in queue 
> in the driver and assumes the maximum.

Not sure if I understand you right, but the way it works is that 
av_fifo_can_read(ctx->frame_queue) returns how many frames have already 
been returned from cuvid and are ready for cuviddec.c to return them.

To that number, the maximum possible number of delayed frames is added, 
which could be returned by the decoder without feeding in any more input 
frames.

If that number reaches the desired amount of surfaces to buffer, 
cuvid_is_buffer_full() will report that its buffer is full, and 
cuviddec.c will stop fetching new input.

>> Just increasing the default number of surfaces does not seem like the 
>> correct fix or sensible, since it will increase VRAM usage by 
>> potentially quite a bit for all users.
>>
> 
> The changes to cuvid_handle_video_sequence() from 
> 402d98c9d467dff6931d906ebb732b9a00334e0b will increase nb_surfaces once 
> data has been read.

Only if the decoder reports that it will potentially buffer even more 
frames.

>> From looking at this a bit, the issue will only happen when 
>> deinterlacing, the logic in cuvid_is_buffer_full becomes stuck then, 
>> and will always claim the buffer is full.
>> And from my understanding, it's correct in making that claim. Due to 
>> the display delay, it could in theory happen that the moment cuvid 
>> starts outputting frames, there will be more output available than 
>> what fits into ctx->frame_queue, since it delayed by 4 frames, which 
>> results in 8 surfaces, but the queue only fits 5.
>>
>> So to me it looks like that the correct fix would be to double the 
>> size of the frame_queue when deinterlacing, not unconditionally.
> 
> There is nothing stopping deint_mode or drop_second_field from being 
> changed after cuvid_decode_init() is called, so it doesn't necessarily 
> know it will deinterlace.
> 
> Regardless, 402d98c9d467dff6931d906ebb732b9a00334e0b reduced 
> CUVID_DEFAULT_NUM_SURFACES from 25 to *only 5* to not break playback 
> entirely.  I don't think the intention was to break playback for double 
> rate deinterlacing while allowing playback for only single rate 
> deinterlacing.
> 
> Also, if AV_CODEC_FLAG_LOW_DELAY is set, then only one output surface is 
> needed, but there are still 5.

The structs stored in the ctx->frame_queue aren't what's using all the 
memory.
It's the frames buffered by cuvid itself, which are referred to by that 
buffer, so having it be larger than what cuvid will actually buffer 
doesn't hurt all that much.
But yeah, it could be shrunk in this case.

What this whole dance is actually trying to accomplish is to prevent the 
number of "ready but not-yet-returned frames" to never exceed the max 
value possible by cuvid set via 
ulNumDecodeSurfaces/ulMaxNumDecodeSurfaces, which is determined and 
stored in ctx->nb_surfaces during cuvid_handle_video_sequence().

In the default mode of operation, the buffer_full indicator will indeed 
stop pulling new input the moment even one frame is returned. But that's 
fine, since at that point already a bunch of input has been consumed, 
and a decent delay has been built up already.
In low-delay mode frames are pretty much returned the moment it's possible.

So looking at all this, I still think the core of the issue is incorrect 
handling of deinterlacing in all this.
CUVID treats a deinterlaced frame as one internal frame, but it's stored 
in the frame_queue as two frames.

So in the case of deinterlacing without drop_second_field, the size of 
that queue needs to be doubled, but nb_surfaces must stay the same, 
since for cuvid itself it's still just one frame.
And in turn the is_buffer_full function has to be adjusted to multiply 
nb_surfaces by two if deinterlacing and not drop_second_field.

Changing that stuff at runtime is no problem, since to change anything, 
cuvid_handle_video_sequence() has to re-run, which updates all these 
sizes and will resize the fifo accordingly.
And when turning it off, the only thing that happens is that buffer_full 
will report it's full immediately, and some frames need to be read out 
before accepting input again.

There's also the edge case of "half a frame" having already been 
returned, so the queue potentially no longer being considered full, but 
still all decode surfaces are in use, since the other half of that 
deinterlaced frame is still in the queue.
So special care must be taken to not report buffer-free too early.

>>
>> nb_surfaces is also used to determine the size of the key_frame array, 
>> which would then also be pointlessly doubled. But not like a handful 
>> of extra ints would hurt that much though.
>> Alternatively the size-doubling could not be reflected in nb_surfaces, 
>> but that would make the logic in various other places be more 
>> complicated.
>>
>>> ---
>>>
>>> I think part of the problem might be that cuvid_is_buffer_full() does 
>>> not know
>>> how many frames are actually in the driver's queue and assumes it is the
>>> maximum, even if none have yet been added.
>>>
>>> This was preventing any frames from being decoded using NVDEC with 
>>> MythTV for
>>> some streams.  See https://github.com/MythTV/mythtv/issues/1039
>>
>> I'd highly recommend to not use cuviddec anymore, but instead use nvdec.
>> cuviddec only still exists to sanity-check nvdec against it at times.
> 
> .*_cuvid are FFCodecs while .*_nvdec are FFHWAccels.  I don't know what 
> would be required to change to .*_nvdec and the .*_cuvid FFCodecs work 
> fine with this change.

You just use the normal native decoder and turn on hwaccel, to very 
broadly summarize it.
Would also allow using a plethora of other hwaccels with the same code. 
Pretty much how ffmpeg.c -hwaccel auto works.
If you already support things like vaapi/vdpau/d3d11va, you should 
already have the code necessary to also use nvdec with minimal changes.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FFmpeg-devel] [PATCH] libavcodec/cuviddec.c: increase CUVID_DEFAULT_NUM_SURFACES
  2025-02-22 13:16     ` Timo Rothenpieler
@ 2025-02-25  5:36       ` Scott Theisen
  2025-02-25 18:10         ` Timo Rothenpieler
  2025-02-25 18:43         ` [FFmpeg-devel] [PATCH] avcodec/cuviddec: correctly handle buffer size and status when deinterlacing Timo Rothenpieler
  0 siblings, 2 replies; 11+ messages in thread
From: Scott Theisen @ 2025-02-25  5:36 UTC (permalink / raw)
  To: ffmpeg-devel

On 2/22/25 08:16, Timo Rothenpieler wrote:
> On 22.02.2025 03:52, Scott Theisen wrote:
>> On 2/21/25 08:26, Timo Rothenpieler wrote:
>>> On 20.02.2025 21:37, Scott Theisen wrote:
>>>> The default value of CuvidContext::nb_surfaces was reduced from 25 
>>>> to 5 (as
>>>> (CUVID_MAX_DISPLAY_DELAY + 1)) in 
>>>> 402d98c9d467dff6931d906ebb732b9a00334e0b.
>>>>
>>>> In cuvid_is_buffer_full() delay can be 2 * CUVID_MAX_DISPLAY_DELAY 
>>>> with double
>>>> rate deinterlacing.  ctx->nb_surfaces is CUVID_DEFAULT_NUM_SURFACES =
>>>> (CUVID_MAX_DISPLAY_DELAY + 1) by default, in which case 
>>>> cuvid_is_buffer_full()
>>>> will always return true and cuvid_output_frame() will never read 
>>>> any data since
>>>> it will not call ff_decode_get_packet().
>>>
>>> It's been way too long since I looked at all that code, and I didn't 
>>> even write most of the code involved:
>>>> https://github.com/FFmpeg/FFmpeg/commit/ 
>>>> bddb2343b6e594e312dadb5d21b408702929ae04
>>>> https://github.com/FFmpeg/FFmpeg/ 
>>>> commit/402d98c9d467dff6931d906ebb732b9a00334e0b
>>>
>>> But doesn't this instead mean that the logic in cuvid_is_buffer_full 
>>> is flawed somehow?
>>
>> I think it is the number of frames ready to send to the driver + the 
>> number of frames in queue in the driver >= the number of decoded 
>> frame buffers.  However, it doesn't actually know how many frames are 
>> in queue in the driver and assumes the maximum.
>
> Not sure if I understand you right, but the way it works is that 
> av_fifo_can_read(ctx->frame_queue) returns how many frames have 
> already been returned from cuvid and are ready for cuviddec.c to 
> return them.
>
> To that number, the maximum possible number of delayed frames is 
> added, which could be returned by the decoder without feeding in any 
> more input frames.
>
> If that number reaches the desired amount of surfaces to buffer, 
> cuvid_is_buffer_full() will report that its buffer is full, and 
> cuviddec.c will stop fetching new input.

I think the point I was trying to make is that since it doesn't track 
how many frames have been sent to and received from the driver it always 
assumes the maximum number of delayed frames are available in the 
driver, which is obviously not correct when no frames have been sent to 
the driver.

>
>>> Just increasing the default number of surfaces does not seem like 
>>> the correct fix or sensible, since it will increase VRAM usage by 
>>> potentially quite a bit for all users.
>>>
>>
>> The changes to cuvid_handle_video_sequence() from 
>> 402d98c9d467dff6931d906ebb732b9a00334e0b will increase nb_surfaces 
>> once data has been read.
>
> Only if the decoder reports that it will potentially buffer even more 
> frames.
>
>>> From looking at this a bit, the issue will only happen when 
>>> deinterlacing, the logic in cuvid_is_buffer_full becomes stuck then, 
>>> and will always claim the buffer is full.
>>> And from my understanding, it's correct in making that claim. Due to 
>>> the display delay, it could in theory happen that the moment cuvid 
>>> starts outputting frames, there will be more output available than 
>>> what fits into ctx->frame_queue, since it delayed by 4 frames, which 
>>> results in 8 surfaces, but the queue only fits 5.
>>>
>>> So to me it looks like that the correct fix would be to double the 
>>> size of the frame_queue when deinterlacing, not unconditionally.
>>
>> There is nothing stopping deint_mode or drop_second_field from being 
>> changed after cuvid_decode_init() is called, so it doesn't 
>> necessarily know it will deinterlace.
>>
>> Regardless, 402d98c9d467dff6931d906ebb732b9a00334e0b reduced 
>> CUVID_DEFAULT_NUM_SURFACES from 25 to *only 5* to not break playback 
>> entirely.  I don't think the intention was to break playback for 
>> double rate deinterlacing while allowing playback for only single 
>> rate deinterlacing.
>>
>> Also, if AV_CODEC_FLAG_LOW_DELAY is set, then only one output surface 
>> is needed, but there are still 5.
>
> The structs stored in the ctx->frame_queue aren't what's using all the 
> memory.
> It's the frames buffered by cuvid itself, which are referred to by 
> that buffer, so having it be larger than what cuvid will actually 
> buffer doesn't hurt all that much.
> But yeah, it could be shrunk in this case.
>
> What this whole dance is actually trying to accomplish is to prevent 
> the number of "ready but not-yet-returned frames" to never exceed the 
> max value possible by cuvid set via 
> ulNumDecodeSurfaces/ulMaxNumDecodeSurfaces, which is determined and 
> stored in ctx->nb_surfaces during cuvid_handle_video_sequence().
>
> In the default mode of operation, the buffer_full indicator will 
> indeed stop pulling new input the moment even one frame is returned. 
> But that's fine, since at that point already a bunch of input has been 
> consumed, and a decent delay has been built up already.
> In low-delay mode frames are pretty much returned the moment it's 
> possible.
>
> So looking at all this, I still think the core of the issue is 
> incorrect handling of deinterlacing in all this.
> CUVID treats a deinterlaced frame as one internal frame, but it's 
> stored in the frame_queue as two frames.
>
> So in the case of deinterlacing without drop_second_field, the size of 
> that queue needs to be doubled, but nb_surfaces must stay the same, 
> since for cuvid itself it's still just one frame.
> And in turn the is_buffer_full function has to be adjusted to multiply 
> nb_surfaces by two if deinterlacing and not drop_second_field.

So you think it needs to be something like this would work?:
```
diff --git a/mythtv/external/FFmpeg/libavcodec/cuviddec.c 
b/mythtv/external/FFmpeg/libavcodec/cuviddec.c
index 81ac54297e..535ff7afb5 100644
--- a/mythtv/external/FFmpeg/libavcodec/cuviddec.c
+++ b/mythtv/external/FFmpeg/libavcodec/cuviddec.c
@@ -317,13 +317,13 @@ static int CUDAAPI 
cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
          return 0;
      }

-    fifo_size_inc = ctx->nb_surfaces;
+    fifo_size_inc = av_fifo_can_read(ctx->frame_queue) + 
av_fifo_can_write(ctx->frame_queue);
      ctx->nb_surfaces = FFMAX(ctx->nb_surfaces, 
format->min_num_decode_surfaces + 3);

      if (avctx->extra_hw_frames > 0)
          ctx->nb_surfaces += avctx->extra_hw_frames;

-    fifo_size_inc = ctx->nb_surfaces - fifo_size_inc;
+    fifo_size_inc = (ctx->nb_surfaces * 2 ) - fifo_size_inc; // copy * 
2 logic from cuvid_is_buffer_full()?
      if (fifo_size_inc > 0 && av_fifo_grow2(ctx->frame_queue, 
fifo_size_inc) < 0) {
          av_log(avctx, AV_LOG_ERROR, "Failed to grow frame queue on 
video sequence callback\n");
          ctx->internal_error = AVERROR(ENOMEM);
@@ -417,10 +417,11 @@ static int cuvid_is_buffer_full(AVCodecContext *avctx)
      CuvidContext *ctx = avctx->priv_data;

      int delay = ctx->cuparseinfo.ulMaxDisplayDelay;
+    int output_frames = 1;
      if (ctx->deint_mode != cudaVideoDeinterlaceMode_Weave && 
!ctx->drop_second_field)
-        delay *= 2;
+        output_frames = 2;

-    return av_fifo_can_read(ctx->frame_queue) + delay >= ctx->nb_surfaces;
+    return av_fifo_can_read(ctx->frame_queue) + (delay * output_frames) 
 >= (ctx->nb_surfaces * output_frames); // should this be >= 
av_fifo_can_read(ctx->frame_queue) + av_fifo_can_write(ctx->frame_queue) ?
  }

  static int cuvid_decode_packet(AVCodecContext *avctx, const AVPacket 
*avpkt)
@@ -899,7 +900,7 @@ static av_cold int cuvid_decode_init(AVCodecContext 
*avctx)
      if(ctx->nb_surfaces < 0)
          ctx->nb_surfaces = CUVID_DEFAULT_NUM_SURFACES;

-    ctx->frame_queue = av_fifo_alloc2(ctx->nb_surfaces, 
sizeof(CuvidParsedFrame), 0);
+    ctx->frame_queue = av_fifo_alloc2(ctx->nb_surfaces * 2, 
sizeof(CuvidParsedFrame), 0);
      if (!ctx->frame_queue) {
          ret = AVERROR(ENOMEM);
          goto error;
```

It is odd that there is no function to return AVFifo::nb_elems 
directly.  I think ctx->nb_surfaces was incorrectly used instead of 
AVFifo::nb_elems.

Does frame_queue even need to grow, since ulMaxDisplayDelay is fixed at 
4 (or 0)?

>
> Changing that stuff at runtime is no problem, since to change 
> anything, cuvid_handle_video_sequence() has to re-run, which updates 
> all these sizes and will resize the fifo accordingly.
> And when turning it off, the only thing that happens is that 
> buffer_full will report it's full immediately, and some frames need to 
> be read out before accepting input again.
>
> There's also the edge case of "half a frame" having already been 
> returned, so the queue potentially no longer being considered full, 
> but still all decode surfaces are in use, since the other half of that 
> deinterlaced frame is still in the queue.
> So special care must be taken to not report buffer-free too early.
>

I'm not understanding you here.  Aren't the elements in frame_queue 
already extracted from the driver and owned by FFmpeg?

>>>
>>> nb_surfaces is also used to determine the size of the key_frame 
>>> array, which would then also be pointlessly doubled. But not like a 
>>> handful of extra ints would hurt that much though.
>>> Alternatively the size-doubling could not be reflected in 
>>> nb_surfaces, but that would make the logic in various other places 
>>> be more complicated.
>>>
>>>> ---
>>>>
>>>> I think part of the problem might be that cuvid_is_buffer_full() 
>>>> does not know
>>>> how many frames are actually in the driver's queue and assumes it 
>>>> is the
>>>> maximum, even if none have yet been added.
>>>>
>>>> This was preventing any frames from being decoded using NVDEC with 
>>>> MythTV for
>>>> some streams.  See https://github.com/MythTV/mythtv/issues/1039
>>>
>>> I'd highly recommend to not use cuviddec anymore, but instead use 
>>> nvdec.
>>> cuviddec only still exists to sanity-check nvdec against it at times.
>>
>> .*_cuvid are FFCodecs while .*_nvdec are FFHWAccels.  I don't know 
>> what would be required to change to .*_nvdec and the .*_cuvid 
>> FFCodecs work fine with this change.
>
> You just use the normal native decoder and turn on hwaccel, to very 
> broadly summarize it.
> Would also allow using a plethora of other hwaccels with the same 
> code. Pretty much how ffmpeg.c -hwaccel auto works.
> If you already support things like vaapi/vdpau/d3d11va, you should 
> already have the code necessary to also use nvdec with minimal changes.
>

There is code for using VAAPI and VDPAU in MythTV, I'll have to 
investigate later.

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FFmpeg-devel] [PATCH] libavcodec/cuviddec.c: increase CUVID_DEFAULT_NUM_SURFACES
  2025-02-25  5:36       ` Scott Theisen
@ 2025-02-25 18:10         ` Timo Rothenpieler
  2025-02-25 18:43         ` [FFmpeg-devel] [PATCH] avcodec/cuviddec: correctly handle buffer size and status when deinterlacing Timo Rothenpieler
  1 sibling, 0 replies; 11+ messages in thread
From: Timo Rothenpieler @ 2025-02-25 18:10 UTC (permalink / raw)
  To: ffmpeg-devel

On 25.02.2025 06:36, Scott Theisen wrote:
> On 2/22/25 08:16, Timo Rothenpieler wrote:
>> On 22.02.2025 03:52, Scott Theisen wrote:
>>> On 2/21/25 08:26, Timo Rothenpieler wrote:
>>>> On 20.02.2025 21:37, Scott Theisen wrote:
>>>>> The default value of CuvidContext::nb_surfaces was reduced from 25 
>>>>> to 5 (as
>>>>> (CUVID_MAX_DISPLAY_DELAY + 1)) in 
>>>>> 402d98c9d467dff6931d906ebb732b9a00334e0b.
>>>>>
>>>>> In cuvid_is_buffer_full() delay can be 2 * CUVID_MAX_DISPLAY_DELAY 
>>>>> with double
>>>>> rate deinterlacing.  ctx->nb_surfaces is CUVID_DEFAULT_NUM_SURFACES =
>>>>> (CUVID_MAX_DISPLAY_DELAY + 1) by default, in which case 
>>>>> cuvid_is_buffer_full()
>>>>> will always return true and cuvid_output_frame() will never read 
>>>>> any data since
>>>>> it will not call ff_decode_get_packet().
>>>>
>>>> It's been way too long since I looked at all that code, and I didn't 
>>>> even write most of the code involved:
>>>>> https://github.com/FFmpeg/FFmpeg/commit/ 
>>>>> bddb2343b6e594e312dadb5d21b408702929ae04
>>>>> https://github.com/FFmpeg/FFmpeg/ 
>>>>> commit/402d98c9d467dff6931d906ebb732b9a00334e0b
>>>>
>>>> But doesn't this instead mean that the logic in cuvid_is_buffer_full 
>>>> is flawed somehow?
>>>
>>> I think it is the number of frames ready to send to the driver + the 
>>> number of frames in queue in the driver >= the number of decoded 
>>> frame buffers.  However, it doesn't actually know how many frames are 
>>> in queue in the driver and assumes the maximum.
>>
>> Not sure if I understand you right, but the way it works is that 
>> av_fifo_can_read(ctx->frame_queue) returns how many frames have 
>> already been returned from cuvid and are ready for cuviddec.c to 
>> return them.
>>
>> To that number, the maximum possible number of delayed frames is 
>> added, which could be returned by the decoder without feeding in any 
>> more input frames.
>>
>> If that number reaches the desired amount of surfaces to buffer, 
>> cuvid_is_buffer_full() will report that its buffer is full, and 
>> cuviddec.c will stop fetching new input.
> 
> I think the point I was trying to make is that since it doesn't track 
> how many frames have been sent to and received from the driver it always 
> assumes the maximum number of delayed frames are available in the 
> driver, which is obviously not correct when no frames have been sent to 
> the driver.
> 
>>
>>>> Just increasing the default number of surfaces does not seem like 
>>>> the correct fix or sensible, since it will increase VRAM usage by 
>>>> potentially quite a bit for all users.
>>>>
>>>
>>> The changes to cuvid_handle_video_sequence() from 
>>> 402d98c9d467dff6931d906ebb732b9a00334e0b will increase nb_surfaces 
>>> once data has been read.
>>
>> Only if the decoder reports that it will potentially buffer even more 
>> frames.
>>
>>>> From looking at this a bit, the issue will only happen when 
>>>> deinterlacing, the logic in cuvid_is_buffer_full becomes stuck then, 
>>>> and will always claim the buffer is full.
>>>> And from my understanding, it's correct in making that claim. Due to 
>>>> the display delay, it could in theory happen that the moment cuvid 
>>>> starts outputting frames, there will be more output available than 
>>>> what fits into ctx->frame_queue, since it delayed by 4 frames, which 
>>>> results in 8 surfaces, but the queue only fits 5.
>>>>
>>>> So to me it looks like that the correct fix would be to double the 
>>>> size of the frame_queue when deinterlacing, not unconditionally.
>>>
>>> There is nothing stopping deint_mode or drop_second_field from being 
>>> changed after cuvid_decode_init() is called, so it doesn't 
>>> necessarily know it will deinterlace.
>>>
>>> Regardless, 402d98c9d467dff6931d906ebb732b9a00334e0b reduced 
>>> CUVID_DEFAULT_NUM_SURFACES from 25 to *only 5* to not break playback 
>>> entirely.  I don't think the intention was to break playback for 
>>> double rate deinterlacing while allowing playback for only single 
>>> rate deinterlacing.
>>>
>>> Also, if AV_CODEC_FLAG_LOW_DELAY is set, then only one output surface 
>>> is needed, but there are still 5.
>>
>> The structs stored in the ctx->frame_queue aren't what's using all the 
>> memory.
>> It's the frames buffered by cuvid itself, which are referred to by 
>> that buffer, so having it be larger than what cuvid will actually 
>> buffer doesn't hurt all that much.
>> But yeah, it could be shrunk in this case.
>>
>> What this whole dance is actually trying to accomplish is to prevent 
>> the number of "ready but not-yet-returned frames" to never exceed the 
>> max value possible by cuvid set via ulNumDecodeSurfaces/ 
>> ulMaxNumDecodeSurfaces, which is determined and stored in ctx- 
>> >nb_surfaces during cuvid_handle_video_sequence().
>>
>> In the default mode of operation, the buffer_full indicator will 
>> indeed stop pulling new input the moment even one frame is returned. 
>> But that's fine, since at that point already a bunch of input has been 
>> consumed, and a decent delay has been built up already.
>> In low-delay mode frames are pretty much returned the moment it's 
>> possible.
>>
>> So looking at all this, I still think the core of the issue is 
>> incorrect handling of deinterlacing in all this.
>> CUVID treats a deinterlaced frame as one internal frame, but it's 
>> stored in the frame_queue as two frames.
>>
>> So in the case of deinterlacing without drop_second_field, the size of 
>> that queue needs to be doubled, but nb_surfaces must stay the same, 
>> since for cuvid itself it's still just one frame.
>> And in turn the is_buffer_full function has to be adjusted to multiply 
>> nb_surfaces by two if deinterlacing and not drop_second_field.
> 
> So you think it needs to be something like this would work?:
> ```
> diff --git a/mythtv/external/FFmpeg/libavcodec/cuviddec.c b/mythtv/ 
> external/FFmpeg/libavcodec/cuviddec.c
> index 81ac54297e..535ff7afb5 100644
> --- a/mythtv/external/FFmpeg/libavcodec/cuviddec.c
> +++ b/mythtv/external/FFmpeg/libavcodec/cuviddec.c
> @@ -317,13 +317,13 @@ static int CUDAAPI 
> cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
>           return 0;
>       }
> 
> -    fifo_size_inc = ctx->nb_surfaces;
> +    fifo_size_inc = av_fifo_can_read(ctx->frame_queue) + 
> av_fifo_can_write(ctx->frame_queue);
>       ctx->nb_surfaces = FFMAX(ctx->nb_surfaces, format- 
>  >min_num_decode_surfaces + 3);
> 
>       if (avctx->extra_hw_frames > 0)
>           ctx->nb_surfaces += avctx->extra_hw_frames;
> 
> -    fifo_size_inc = ctx->nb_surfaces - fifo_size_inc;
> +    fifo_size_inc = (ctx->nb_surfaces * 2 ) - fifo_size_inc; // copy * 
> 2 logic from cuvid_is_buffer_full()?
>       if (fifo_size_inc > 0 && av_fifo_grow2(ctx->frame_queue, 
> fifo_size_inc) < 0) {
>           av_log(avctx, AV_LOG_ERROR, "Failed to grow frame queue on 
> video sequence callback\n");
>           ctx->internal_error = AVERROR(ENOMEM);
> @@ -417,10 +417,11 @@ static int cuvid_is_buffer_full(AVCodecContext 
> *avctx)
>       CuvidContext *ctx = avctx->priv_data;
> 
>       int delay = ctx->cuparseinfo.ulMaxDisplayDelay;
> +    int output_frames = 1;
>       if (ctx->deint_mode != cudaVideoDeinterlaceMode_Weave && !ctx- 
>  >drop_second_field)
> -        delay *= 2;
> +        output_frames = 2;
> 
> -    return av_fifo_can_read(ctx->frame_queue) + delay >= ctx->nb_surfaces;
> +    return av_fifo_can_read(ctx->frame_queue) + (delay * output_frames) 
>  >= (ctx->nb_surfaces * output_frames); // should this be >= 
> av_fifo_can_read(ctx->frame_queue) + av_fifo_can_write(ctx->frame_queue) ?
>   }
> 
>   static int cuvid_decode_packet(AVCodecContext *avctx, const AVPacket 
> *avpkt)
> @@ -899,7 +900,7 @@ static av_cold int cuvid_decode_init(AVCodecContext 
> *avctx)
>       if(ctx->nb_surfaces < 0)
>           ctx->nb_surfaces = CUVID_DEFAULT_NUM_SURFACES;
> 
> -    ctx->frame_queue = av_fifo_alloc2(ctx->nb_surfaces, 
> sizeof(CuvidParsedFrame), 0);
> +    ctx->frame_queue = av_fifo_alloc2(ctx->nb_surfaces * 2, 
> sizeof(CuvidParsedFrame), 0);
>       if (!ctx->frame_queue) {
>           ret = AVERROR(ENOMEM);
>           goto error;
> ```

Something like that, I'm looking into it as well.
Been too long since I looked at that code.

> It is odd that there is no function to return AVFifo::nb_elems 
> directly.  I think ctx->nb_surfaces was incorrectly used instead of 
> AVFifo::nb_elems.
> 
> Does frame_queue even need to grow, since ulMaxDisplayDelay is fixed at 
> 4 (or 0)?

The frame queue contains up to ulMaxNumDecodeSurfaces surfaces, not up 
to ulMaxDisplayDelay. ulMaxDisplayDelay indicated how many more frames 
might be in-flight and be spontaneously dumped out all at once by cuvid 
on the next input packet, so the queue needs to be kept free enough to 
contain them all.

So: The frame_queue needs to be able to contain up to 
ulMaxNumDecodeSurfaces frames. When deinterlacing and not dropping the 
second field, each DecodeSurface produces two frames, so the queue needs 
to double in size then.

>>
>> Changing that stuff at runtime is no problem, since to change 
>> anything, cuvid_handle_video_sequence() has to re-run, which updates 
>> all these sizes and will resize the fifo accordingly.
>> And when turning it off, the only thing that happens is that 
>> buffer_full will report it's full immediately, and some frames need to 
>> be read out before accepting input again.
>>
>> There's also the edge case of "half a frame" having already been 
>> returned, so the queue potentially no longer being considered full, 
>> but still all decode surfaces are in use, since the other half of that 
>> deinterlaced frame is still in the queue.
>> So special care must be taken to not report buffer-free too early.
>>
> 
> I'm not understanding you here.  Aren't the elements in frame_queue 
> already extracted from the driver and owned by FFmpeg?

No, the ctx->frame_queue frames are still cuvid frames and will stay 
that until cuvid_output_frame() removes them from the queue and 
downloads them into FFmpeg owned memory.

Thus the amount of distinct frames in ctx->frame_queue must never exceed 
that of ulMaxNumDecodeSurfaces.
And new packets can only be fed to cuvid while the frame_queue still has 
space to accept up to ulMaxDisplayDelay (*2 when deinterlacing) of new 
frames, which might spontaneously come out of cuvid at any moment.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [FFmpeg-devel] [PATCH] avcodec/cuviddec: correctly handle buffer size and status when deinterlacing
  2025-02-25  5:36       ` Scott Theisen
  2025-02-25 18:10         ` Timo Rothenpieler
@ 2025-02-25 18:43         ` Timo Rothenpieler
  2025-02-25 23:15           ` Timo Rothenpieler
  2025-02-26  2:42           ` Scott Theisen
  1 sibling, 2 replies; 11+ messages in thread
From: Timo Rothenpieler @ 2025-02-25 18:43 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Timo Rothenpieler

---
 libavcodec/cuviddec.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/libavcodec/cuviddec.c b/libavcodec/cuviddec.c
index 67076a1752..312742fb8c 100644
--- a/libavcodec/cuviddec.c
+++ b/libavcodec/cuviddec.c
@@ -131,7 +131,7 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
     CUVIDDECODECREATEINFO cuinfo;
     int surface_fmt;
     int chroma_444;
-    int fifo_size_inc;
+    int old_nb_surfaces, fifo_size_inc, fifo_size_mul = 1;
 
     int old_width = avctx->width;
     int old_height = avctx->height;
@@ -349,20 +349,24 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
         return 0;
     }
 
-    fifo_size_inc = ctx->nb_surfaces;
-    ctx->nb_surfaces = FFMAX(ctx->nb_surfaces, format->min_num_decode_surfaces + 3);
+    if (ctx->deint_mode_current != cudaVideoDeinterlaceMode_Weave && !ctx->drop_second_field) {
+        avctx->framerate = av_mul_q(avctx->framerate, (AVRational){2, 1});
+        fifo_size_mul = 2;
+    }
 
+    old_nb_surfaces = ctx->nb_surfaces;
+    ctx->nb_surfaces = FFMAX(ctx->nb_surfaces, format->min_num_decode_surfaces + 3);
     if (avctx->extra_hw_frames > 0)
         ctx->nb_surfaces += avctx->extra_hw_frames;
 
-    fifo_size_inc = ctx->nb_surfaces - fifo_size_inc;
+    fifo_size_inc = ctx->nb_surfaces * fifo_size_mul - av_fifo_can_read(ctx->frame_queue) - av_fifo_can_write(ctx->frame_queue);
     if (fifo_size_inc > 0 && av_fifo_grow2(ctx->frame_queue, fifo_size_inc) < 0) {
         av_log(avctx, AV_LOG_ERROR, "Failed to grow frame queue on video sequence callback\n");
         ctx->internal_error = AVERROR(ENOMEM);
         return 0;
     }
 
-    if (fifo_size_inc > 0 && av_reallocp_array(&ctx->key_frame, ctx->nb_surfaces, sizeof(int)) < 0) {
+    if (ctx->nb_surfaces > old_nb_surfaces && av_reallocp_array(&ctx->key_frame, ctx->nb_surfaces, sizeof(int)) < 0) {
         av_log(avctx, AV_LOG_ERROR, "Failed to grow key frame array on video sequence callback\n");
         ctx->internal_error = AVERROR(ENOMEM);
         return 0;
@@ -374,9 +378,6 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
     cuinfo.bitDepthMinus8 = format->bit_depth_luma_minus8;
     cuinfo.DeinterlaceMode = ctx->deint_mode_current;
 
-    if (ctx->deint_mode_current != cudaVideoDeinterlaceMode_Weave && !ctx->drop_second_field)
-        avctx->framerate = av_mul_q(avctx->framerate, (AVRational){2, 1});
-
     ctx->internal_error = CHECK_CU(ctx->cvdl->cuvidCreateDecoder(&ctx->cudecoder, &cuinfo));
     if (ctx->internal_error < 0)
         return 0;
@@ -448,11 +449,12 @@ static int cuvid_is_buffer_full(AVCodecContext *avctx)
 {
     CuvidContext *ctx = avctx->priv_data;
 
-    int delay = ctx->cuparseinfo.ulMaxDisplayDelay;
+    int mult = 1;
     if (ctx->deint_mode != cudaVideoDeinterlaceMode_Weave && !ctx->drop_second_field)
-        delay *= 2;
+        mult = 2;
 
-    return av_fifo_can_read(ctx->frame_queue) + delay >= ctx->nb_surfaces;
+    // "- mult + 1" ensures that the buffer is still signalled full if one half-frame has already been returned when deinterlacing.
+    return av_fifo_can_read(ctx->frame_queue) + (ctx->cuparseinfo.ulMaxDisplayDelay * mult) >= ctx->nb_surfaces * mult - mult + 1;
 }
 
 static int cuvid_decode_packet(AVCodecContext *avctx, const AVPacket *avpkt)
-- 
2.45.3

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FFmpeg-devel] [PATCH] avcodec/cuviddec: correctly handle buffer size and status when deinterlacing
  2025-02-25 18:43         ` [FFmpeg-devel] [PATCH] avcodec/cuviddec: correctly handle buffer size and status when deinterlacing Timo Rothenpieler
@ 2025-02-25 23:15           ` Timo Rothenpieler
  2025-02-26  2:42           ` Scott Theisen
  1 sibling, 0 replies; 11+ messages in thread
From: Timo Rothenpieler @ 2025-02-25 23:15 UTC (permalink / raw)
  To: ffmpeg-devel, Scott Theisen

On 25.02.2025 19:43, Timo Rothenpieler wrote:
> ---
>   libavcodec/cuviddec.c | 24 +++++++++++++-----------
>   1 file changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/libavcodec/cuviddec.c b/libavcodec/cuviddec.c
> index 67076a1752..312742fb8c 100644
> --- a/libavcodec/cuviddec.c
> +++ b/libavcodec/cuviddec.c
> @@ -131,7 +131,7 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
>       CUVIDDECODECREATEINFO cuinfo;
>       int surface_fmt;
>       int chroma_444;
> -    int fifo_size_inc;
> +    int old_nb_surfaces, fifo_size_inc, fifo_size_mul = 1;
>   
>       int old_width = avctx->width;
>       int old_height = avctx->height;
> @@ -349,20 +349,24 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
>           return 0;
>       }
>   
> -    fifo_size_inc = ctx->nb_surfaces;
> -    ctx->nb_surfaces = FFMAX(ctx->nb_surfaces, format->min_num_decode_surfaces + 3);
> +    if (ctx->deint_mode_current != cudaVideoDeinterlaceMode_Weave && !ctx->drop_second_field) {
> +        avctx->framerate = av_mul_q(avctx->framerate, (AVRational){2, 1});
> +        fifo_size_mul = 2;
> +    }
>   
> +    old_nb_surfaces = ctx->nb_surfaces;
> +    ctx->nb_surfaces = FFMAX(ctx->nb_surfaces, format->min_num_decode_surfaces + 3);
>       if (avctx->extra_hw_frames > 0)
>           ctx->nb_surfaces += avctx->extra_hw_frames;
>   
> -    fifo_size_inc = ctx->nb_surfaces - fifo_size_inc;
> +    fifo_size_inc = ctx->nb_surfaces * fifo_size_mul - av_fifo_can_read(ctx->frame_queue) - av_fifo_can_write(ctx->frame_queue);
>       if (fifo_size_inc > 0 && av_fifo_grow2(ctx->frame_queue, fifo_size_inc) < 0) {
>           av_log(avctx, AV_LOG_ERROR, "Failed to grow frame queue on video sequence callback\n");
>           ctx->internal_error = AVERROR(ENOMEM);
>           return 0;
>       }
>   
> -    if (fifo_size_inc > 0 && av_reallocp_array(&ctx->key_frame, ctx->nb_surfaces, sizeof(int)) < 0) {
> +    if (ctx->nb_surfaces > old_nb_surfaces && av_reallocp_array(&ctx->key_frame, ctx->nb_surfaces, sizeof(int)) < 0) {
>           av_log(avctx, AV_LOG_ERROR, "Failed to grow key frame array on video sequence callback\n");
>           ctx->internal_error = AVERROR(ENOMEM);
>           return 0;
> @@ -374,9 +378,6 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
>       cuinfo.bitDepthMinus8 = format->bit_depth_luma_minus8;
>       cuinfo.DeinterlaceMode = ctx->deint_mode_current;
>   
> -    if (ctx->deint_mode_current != cudaVideoDeinterlaceMode_Weave && !ctx->drop_second_field)
> -        avctx->framerate = av_mul_q(avctx->framerate, (AVRational){2, 1});
> -
>       ctx->internal_error = CHECK_CU(ctx->cvdl->cuvidCreateDecoder(&ctx->cudecoder, &cuinfo));
>       if (ctx->internal_error < 0)
>           return 0;
> @@ -448,11 +449,12 @@ static int cuvid_is_buffer_full(AVCodecContext *avctx)
>   {
>       CuvidContext *ctx = avctx->priv_data;
>   
> -    int delay = ctx->cuparseinfo.ulMaxDisplayDelay;
> +    int mult = 1;
>       if (ctx->deint_mode != cudaVideoDeinterlaceMode_Weave && !ctx->drop_second_field)
> -        delay *= 2;
> +        mult = 2;
>   
> -    return av_fifo_can_read(ctx->frame_queue) + delay >= ctx->nb_surfaces;
> +    // "- mult + 1" ensures that the buffer is still signalled full if one half-frame has already been returned when deinterlacing.
> +    return av_fifo_can_read(ctx->frame_queue) + (ctx->cuparseinfo.ulMaxDisplayDelay * mult) >= ctx->nb_surfaces * mult - mult + 1;
>   }
>   
>   static int cuvid_decode_packet(AVCodecContext *avctx, const AVPacket *avpkt)

This fixes decoding+deinterlacing with the ffmpeg CLI for me. Can you 
verify if it also fixes it in your case?
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FFmpeg-devel] [PATCH] avcodec/cuviddec: correctly handle buffer size and status when deinterlacing
  2025-02-25 18:43         ` [FFmpeg-devel] [PATCH] avcodec/cuviddec: correctly handle buffer size and status when deinterlacing Timo Rothenpieler
  2025-02-25 23:15           ` Timo Rothenpieler
@ 2025-02-26  2:42           ` Scott Theisen
  2025-02-26 17:17             ` Scott Theisen
  2025-02-26 19:58             ` Timo Rothenpieler
  1 sibling, 2 replies; 11+ messages in thread
From: Scott Theisen @ 2025-02-26  2:42 UTC (permalink / raw)
  To: ffmpeg-devel

On 2/25/25 13:43, Timo Rothenpieler wrote:
> ---
>   libavcodec/cuviddec.c | 24 +++++++++++++-----------
>   1 file changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/libavcodec/cuviddec.c b/libavcodec/cuviddec.c
> index 67076a1752..312742fb8c 100644
> --- a/libavcodec/cuviddec.c
> +++ b/libavcodec/cuviddec.c
> @@ -131,7 +131,7 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
>       CUVIDDECODECREATEINFO cuinfo;
>       int surface_fmt;
>       int chroma_444;
> -    int fifo_size_inc;
> +    int old_nb_surfaces, fifo_size_inc, fifo_size_mul = 1;
>   
>       int old_width = avctx->width;
>       int old_height = avctx->height;
> @@ -349,20 +349,24 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
>           return 0;
>       }
>   
> -    fifo_size_inc = ctx->nb_surfaces;
> -    ctx->nb_surfaces = FFMAX(ctx->nb_surfaces, format->min_num_decode_surfaces + 3);
> +    if (ctx->deint_mode_current != cudaVideoDeinterlaceMode_Weave && !ctx->drop_second_field) {
> +        avctx->framerate = av_mul_q(avctx->framerate, (AVRational){2, 1});
> +        fifo_size_mul = 2;
> +    }
>   
> +    old_nb_surfaces = ctx->nb_surfaces;
> +    ctx->nb_surfaces = FFMAX(ctx->nb_surfaces, format->min_num_decode_surfaces + 3);
>       if (avctx->extra_hw_frames > 0)
>           ctx->nb_surfaces += avctx->extra_hw_frames;
>   
> -    fifo_size_inc = ctx->nb_surfaces - fifo_size_inc;
> +    fifo_size_inc = ctx->nb_surfaces * fifo_size_mul - av_fifo_can_read(ctx->frame_queue) - av_fifo_can_write(ctx->frame_queue);
>       if (fifo_size_inc > 0 && av_fifo_grow2(ctx->frame_queue, fifo_size_inc) < 0) {
>           av_log(avctx, AV_LOG_ERROR, "Failed to grow frame queue on video sequence callback\n");
>           ctx->internal_error = AVERROR(ENOMEM);
>           return 0;
>       }
>   
> -    if (fifo_size_inc > 0 && av_reallocp_array(&ctx->key_frame, ctx->nb_surfaces, sizeof(int)) < 0) {
> +    if (ctx->nb_surfaces > old_nb_surfaces && av_reallocp_array(&ctx->key_frame, ctx->nb_surfaces, sizeof(int)) < 0) {
>           av_log(avctx, AV_LOG_ERROR, "Failed to grow key frame array on video sequence callback\n");
>           ctx->internal_error = AVERROR(ENOMEM);
>           return 0;
> @@ -374,9 +378,6 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
>       cuinfo.bitDepthMinus8 = format->bit_depth_luma_minus8;
>       cuinfo.DeinterlaceMode = ctx->deint_mode_current;
>   
> -    if (ctx->deint_mode_current != cudaVideoDeinterlaceMode_Weave && !ctx->drop_second_field)
> -        avctx->framerate = av_mul_q(avctx->framerate, (AVRational){2, 1});
> -
>       ctx->internal_error = CHECK_CU(ctx->cvdl->cuvidCreateDecoder(&ctx->cudecoder, &cuinfo));
>       if (ctx->internal_error < 0)
>           return 0;
> @@ -448,11 +449,12 @@ static int cuvid_is_buffer_full(AVCodecContext *avctx)
>   {
>       CuvidContext *ctx = avctx->priv_data;
>   
> -    int delay = ctx->cuparseinfo.ulMaxDisplayDelay;
> +    int mult = 1;
>       if (ctx->deint_mode != cudaVideoDeinterlaceMode_Weave && !ctx->drop_second_field)
> -        delay *= 2;
> +        mult = 2;
>   
> -    return av_fifo_can_read(ctx->frame_queue) + delay >= ctx->nb_surfaces;
> +    // "- mult + 1" ensures that the buffer is still signalled full if one half-frame has already been returned when deinterlacing.
> +    return av_fifo_can_read(ctx->frame_queue) + (ctx->cuparseinfo.ulMaxDisplayDelay * mult) >= ctx->nb_surfaces * mult - mult + 1;

I think this is clearer:
return ((av_fifo_can_read(ctx->frame_queue) + mult - 1) / mult) + 
ctx->cuparseinfo.ulMaxDisplayDelay >= ctx->nb_surfaces

Integer ceiling division to get the number of referenced surfaces in 
frame_queue.

However, when going from mult = 2 to 1, it thinks the buffer is more 
full than it really is, which probably isn't a problem. Unfortunately, 
when going from mult = 1 to 2, if there is more than one frame in 
frame_queue, it will think there are less surfaces referenced than there 
are, which may be a problem.

To avoid that, on transitions you would have to drain frame_queue until 
it is empty or peek at what is in frame_queue to actually count the 
number of referenced surfaces.

>   }
>   
>   static int cuvid_decode_packet(AVCodecContext *avctx, const AVPacket *avpkt)

I'll apply this to MythTV for testing.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FFmpeg-devel] [PATCH] avcodec/cuviddec: correctly handle buffer size and status when deinterlacing
  2025-02-26  2:42           ` Scott Theisen
@ 2025-02-26 17:17             ` Scott Theisen
  2025-02-26 19:58             ` Timo Rothenpieler
  1 sibling, 0 replies; 11+ messages in thread
From: Scott Theisen @ 2025-02-26 17:17 UTC (permalink / raw)
  To: ffmpeg-devel

On 2025-02-25 21:42:13, Scott Theisen wrote:
> On 2/25/25 13:43, Timo Rothenpieler wrote:
>> ---
>>   libavcodec/cuviddec.c | 24 +++++++++++++-----------
>>   1 file changed, 13 insertions(+), 11 deletions(-)
>>
>> diff --git a/libavcodec/cuviddec.c b/libavcodec/cuviddec.c
>> index 67076a1752..312742fb8c 100644
>> --- a/libavcodec/cuviddec.c
>> +++ b/libavcodec/cuviddec.c
>> @@ -131,7 +131,7 @@ static int CUDAAPI 
>> cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
>>       CUVIDDECODECREATEINFO cuinfo;
>>       int surface_fmt;
>>       int chroma_444;
>> -    int fifo_size_inc;
>> +    int old_nb_surfaces, fifo_size_inc, fifo_size_mul = 1;
>>         int old_width = avctx->width;
>>       int old_height = avctx->height;
>> @@ -349,20 +349,24 @@ static int CUDAAPI 
>> cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
>>           return 0;
>>       }
>>   -    fifo_size_inc = ctx->nb_surfaces;
>> -    ctx->nb_surfaces = FFMAX(ctx->nb_surfaces, 
>> format->min_num_decode_surfaces + 3);
>> +    if (ctx->deint_mode_current != cudaVideoDeinterlaceMode_Weave && 
>> !ctx->drop_second_field) {
>> +        avctx->framerate = av_mul_q(avctx->framerate, 
>> (AVRational){2, 1});
>> +        fifo_size_mul = 2;
>> +    }
>>   +    old_nb_surfaces = ctx->nb_surfaces;
>> +    ctx->nb_surfaces = FFMAX(ctx->nb_surfaces, 
>> format->min_num_decode_surfaces + 3);
>>       if (avctx->extra_hw_frames > 0)
>>           ctx->nb_surfaces += avctx->extra_hw_frames;
>>   -    fifo_size_inc = ctx->nb_surfaces - fifo_size_inc;
>> +    fifo_size_inc = ctx->nb_surfaces * fifo_size_mul - 
>> av_fifo_can_read(ctx->frame_queue) - 
>> av_fifo_can_write(ctx->frame_queue);
>>       if (fifo_size_inc > 0 && av_fifo_grow2(ctx->frame_queue, 
>> fifo_size_inc) < 0) {
>>           av_log(avctx, AV_LOG_ERROR, "Failed to grow frame queue on 
>> video sequence callback\n");
>>           ctx->internal_error = AVERROR(ENOMEM);
>>           return 0;
>>       }
>>   -    if (fifo_size_inc > 0 && av_reallocp_array(&ctx->key_frame, 
>> ctx->nb_surfaces, sizeof(int)) < 0) {
>> +    if (ctx->nb_surfaces > old_nb_surfaces && 
>> av_reallocp_array(&ctx->key_frame, ctx->nb_surfaces, sizeof(int)) < 0) {
>>           av_log(avctx, AV_LOG_ERROR, "Failed to grow key frame array 
>> on video sequence callback\n");
>>           ctx->internal_error = AVERROR(ENOMEM);
>>           return 0;
>> @@ -374,9 +378,6 @@ static int CUDAAPI 
>> cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
>>       cuinfo.bitDepthMinus8 = format->bit_depth_luma_minus8;
>>       cuinfo.DeinterlaceMode = ctx->deint_mode_current;
>>   -    if (ctx->deint_mode_current != cudaVideoDeinterlaceMode_Weave 
>> && !ctx->drop_second_field)
>> -        avctx->framerate = av_mul_q(avctx->framerate, 
>> (AVRational){2, 1});
>> -
>>       ctx->internal_error = 
>> CHECK_CU(ctx->cvdl->cuvidCreateDecoder(&ctx->cudecoder, &cuinfo));
>>       if (ctx->internal_error < 0)
>>           return 0;
>> @@ -448,11 +449,12 @@ static int cuvid_is_buffer_full(AVCodecContext 
>> *avctx)
>>   {
>>       CuvidContext *ctx = avctx->priv_data;
>>   -    int delay = ctx->cuparseinfo.ulMaxDisplayDelay;
>> +    int mult = 1;
>>       if (ctx->deint_mode != cudaVideoDeinterlaceMode_Weave && 
>> !ctx->drop_second_field)
>> -        delay *= 2;
>> +        mult = 2;
>>   -    return av_fifo_can_read(ctx->frame_queue) + delay >= 
>> ctx->nb_surfaces;
>> +    // "- mult + 1" ensures that the buffer is still signalled full 
>> if one half-frame has already been returned when deinterlacing.
>> +    return av_fifo_can_read(ctx->frame_queue) + 
>> (ctx->cuparseinfo.ulMaxDisplayDelay * mult) >= ctx->nb_surfaces * 
>> mult - mult + 1;
>
> I think this is clearer:
> return ((av_fifo_can_read(ctx->frame_queue) + mult - 1) / mult) + 
> ctx->cuparseinfo.ulMaxDisplayDelay >= ctx->nb_surfaces
>
> Integer ceiling division to get the number of referenced surfaces in 
> frame_queue.
>
> However, when going from mult = 2 to 1, it thinks the buffer is more 
> full than it really is, which probably isn't a problem. Unfortunately, 
> when going from mult = 1 to 2, if there is more than one frame in 
> frame_queue, it will think there are less surfaces referenced than 
> there are, which may be a problem.
>
> To avoid that, on transitions you would have to drain frame_queue 
> until it is empty or peek at what is in frame_queue to actually count 
> the number of referenced surfaces.
>
>>   }
>>     static int cuvid_decode_packet(AVCodecContext *avctx, const 
>> AVPacket *avpkt)
>
> I'll apply this to MythTV for testing.

Using this patch instead of my original suggestion also resolves the 
playback issue with MythTV.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FFmpeg-devel] [PATCH] avcodec/cuviddec: correctly handle buffer size and status when deinterlacing
  2025-02-26  2:42           ` Scott Theisen
  2025-02-26 17:17             ` Scott Theisen
@ 2025-02-26 19:58             ` Timo Rothenpieler
  1 sibling, 0 replies; 11+ messages in thread
From: Timo Rothenpieler @ 2025-02-26 19:58 UTC (permalink / raw)
  To: ffmpeg-devel

On 26.02.2025 03:42, Scott Theisen wrote:
> On 2/25/25 13:43, Timo Rothenpieler wrote:
>> ---
>>   libavcodec/cuviddec.c | 24 +++++++++++++-----------
>>   1 file changed, 13 insertions(+), 11 deletions(-)
>>
>> diff --git a/libavcodec/cuviddec.c b/libavcodec/cuviddec.c
>> index 67076a1752..312742fb8c 100644
>> --- a/libavcodec/cuviddec.c
>> +++ b/libavcodec/cuviddec.c
>> @@ -131,7 +131,7 @@ static int CUDAAPI 
>> cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
>>       CUVIDDECODECREATEINFO cuinfo;
>>       int surface_fmt;
>>       int chroma_444;
>> -    int fifo_size_inc;
>> +    int old_nb_surfaces, fifo_size_inc, fifo_size_mul = 1;
>>       int old_width = avctx->width;
>>       int old_height = avctx->height;
>> @@ -349,20 +349,24 @@ static int CUDAAPI 
>> cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
>>           return 0;
>>       }
>> -    fifo_size_inc = ctx->nb_surfaces;
>> -    ctx->nb_surfaces = FFMAX(ctx->nb_surfaces, format- 
>> >min_num_decode_surfaces + 3);
>> +    if (ctx->deint_mode_current != cudaVideoDeinterlaceMode_Weave 
>> && !ctx->drop_second_field) {
>> +        avctx->framerate = av_mul_q(avctx->framerate, (AVRational){2, 
>> 1});
>> +        fifo_size_mul = 2;
>> +    }
>> +    old_nb_surfaces = ctx->nb_surfaces;
>> +    ctx->nb_surfaces = FFMAX(ctx->nb_surfaces, format- 
>> >min_num_decode_surfaces + 3);
>>       if (avctx->extra_hw_frames > 0)
>>           ctx->nb_surfaces += avctx->extra_hw_frames;
>> -    fifo_size_inc = ctx->nb_surfaces - fifo_size_inc;
>> +    fifo_size_inc = ctx->nb_surfaces * fifo_size_mul - 
>> av_fifo_can_read(ctx->frame_queue) - av_fifo_can_write(ctx->frame_queue);
>>       if (fifo_size_inc > 0 && av_fifo_grow2(ctx->frame_queue, 
>> fifo_size_inc) < 0) {
>>           av_log(avctx, AV_LOG_ERROR, "Failed to grow frame queue on 
>> video sequence callback\n");
>>           ctx->internal_error = AVERROR(ENOMEM);
>>           return 0;
>>       }
>> -    if (fifo_size_inc > 0 && av_reallocp_array(&ctx->key_frame, ctx- 
>> >nb_surfaces, sizeof(int)) < 0) {
>> +    if (ctx->nb_surfaces > old_nb_surfaces && av_reallocp_array(&ctx- 
>> >key_frame, ctx->nb_surfaces, sizeof(int)) < 0) {
>>           av_log(avctx, AV_LOG_ERROR, "Failed to grow key frame array 
>> on video sequence callback\n");
>>           ctx->internal_error = AVERROR(ENOMEM);
>>           return 0;
>> @@ -374,9 +378,6 @@ static int CUDAAPI 
>> cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
>>       cuinfo.bitDepthMinus8 = format->bit_depth_luma_minus8;
>>       cuinfo.DeinterlaceMode = ctx->deint_mode_current;
>> -    if (ctx->deint_mode_current != cudaVideoDeinterlaceMode_Weave 
>> && !ctx->drop_second_field)
>> -        avctx->framerate = av_mul_q(avctx->framerate, (AVRational){2, 
>> 1});
>> -
>>       ctx->internal_error = CHECK_CU(ctx->cvdl- 
>> >cuvidCreateDecoder(&ctx->cudecoder, &cuinfo));
>>       if (ctx->internal_error < 0)
>>           return 0;
>> @@ -448,11 +449,12 @@ static int cuvid_is_buffer_full(AVCodecContext 
>> *avctx)
>>   {
>>       CuvidContext *ctx = avctx->priv_data;
>> -    int delay = ctx->cuparseinfo.ulMaxDisplayDelay;
>> +    int mult = 1;
>>       if (ctx->deint_mode != cudaVideoDeinterlaceMode_Weave && !ctx- 
>> >drop_second_field)
>> -        delay *= 2;
>> +        mult = 2;
>> -    return av_fifo_can_read(ctx->frame_queue) + delay >= ctx- 
>> >nb_surfaces;
>> +    // "- mult + 1" ensures that the buffer is still signalled full 
>> if one half-frame has already been returned when deinterlacing.
>> +    return av_fifo_can_read(ctx->frame_queue) + (ctx- 
>> >cuparseinfo.ulMaxDisplayDelay * mult) >= ctx->nb_surfaces * mult - 
>> mult + 1;
> 
> I think this is clearer:
> return ((av_fifo_can_read(ctx->frame_queue) + mult - 1) / mult) + ctx- 
>  >cuparseinfo.ulMaxDisplayDelay >= ctx->nb_surfaces

Yeah, though I've used a shift instead, purely as a minimal speed 
optimization.

> Integer ceiling division to get the number of referenced surfaces in 
> frame_queue.
> 
> However, when going from mult = 2 to 1, it thinks the buffer is more 
> full than it really is, which probably isn't a problem. Unfortunately, 
> when going from mult = 1 to 2, if there is more than one frame in 
> frame_queue, it will think there are less surfaces referenced than there 
> are, which may be a problem.

That's not the case, only new frames added to the queue are be doubled.


Patch is applied now, thanks!
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-02-26 19:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-20 20:37 [FFmpeg-devel] [PATCH] libavcodec/cuviddec.c: increase CUVID_DEFAULT_NUM_SURFACES Scott Theisen
2025-02-21 13:26 ` Timo Rothenpieler
2025-02-22  2:52   ` Scott Theisen
2025-02-22 13:16     ` Timo Rothenpieler
2025-02-25  5:36       ` Scott Theisen
2025-02-25 18:10         ` Timo Rothenpieler
2025-02-25 18:43         ` [FFmpeg-devel] [PATCH] avcodec/cuviddec: correctly handle buffer size and status when deinterlacing Timo Rothenpieler
2025-02-25 23:15           ` Timo Rothenpieler
2025-02-26  2:42           ` Scott Theisen
2025-02-26 17:17             ` Scott Theisen
2025-02-26 19:58             ` Timo Rothenpieler

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git