* [FFmpeg-devel] [PATCH] avcodec/cuviddec: update amount of decoder surfaces from within sequence decode callback
[not found] <CAJZRZVnbFVOQaUuvA=F7qUmr+P_wfrjJnga2cq=tdNtm4Ky0sQ@mail.gmail.com>
@ 2023-06-05 7:30 ` Roman Arzumanyan
2023-06-05 10:19 ` Anton Khirnov
0 siblings, 1 reply; 5+ messages in thread
From: Roman Arzumanyan @ 2023-06-05 7:30 UTC (permalink / raw)
To: ffmpeg-devel
[-- Attachment #1: Type: text/plain, Size: 550 bytes --]
Hello,
This patch reduces vRAM usage by cuvid decoder implementation.
The number of surfaces used for decoding is updated within the parser
sequence decode callback.
Also the "surfaces" AVDictionary option specific to cuvid was removed in
favor of "extra_hw_surfaces".
vRAM consumption was tested on various videos and savings are between 1%
for 360p resolution up to 21% for some 1080p H.264 videos.
Decoding performance was tested on various H.264 and H.265 videos in
different resolutions from 360p and higher, no performance penalty was
found.
[-- Attachment #2: 0001-libavcodec-cuviddec-determine-DPB-size-from-within-c.patch --]
[-- Type: application/x-patch, Size: 3864 bytes --]
[-- Attachment #3: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH] avcodec/cuviddec: update amount of decoder surfaces from within sequence decode callback
2023-06-05 7:30 ` [FFmpeg-devel] [PATCH] avcodec/cuviddec: update amount of decoder surfaces from within sequence decode callback Roman Arzumanyan
@ 2023-06-05 10:19 ` Anton Khirnov
2023-06-05 12:29 ` Roman Arzumanyan
0 siblings, 1 reply; 5+ messages in thread
From: Anton Khirnov @ 2023-06-05 10:19 UTC (permalink / raw)
To: FFmpeg development discussions and patches
Quoting Roman Arzumanyan (2023-06-05 09:30:07)
> Hello,
>
> This patch reduces vRAM usage by cuvid decoder implementation.
> The number of surfaces used for decoding is updated within the parser
> sequence decode callback.
> Also the "surfaces" AVDictionary option specific to cuvid was removed in
> favor of "extra_hw_surfaces".
This can break existing workflows, you should deprecated the option
instead and only remove it after some time has passed.
>
> vRAM consumption was tested on various videos and savings are between 1%
> for 360p resolution up to 21% for some 1080p H.264 videos.
> Decoding performance was tested on various H.264 and H.265 videos in
> different resolutions from 360p and higher, no performance penalty was
> found.
>
> From 32a1b016e88fa40b983318d4583750ef250a78d9 Mon Sep 17 00:00:00 2001
> From: Roman Arzumanyan <r.arzumanyan@visionlabs.ai>
> Date: Thu, 1 Jun 2023 11:17:39 +0300
> Subject: [PATCH] libavcodec/cuviddec: determine DPB size from within cuvid
> parser
>
> ---
> libavcodec/cuviddec.c | 29 +++++++++++++++++++++++++++--
> 1 file changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/libavcodec/cuviddec.c b/libavcodec/cuviddec.c
> index 3d43bbd466..759ed49870 100644
> --- a/libavcodec/cuviddec.c
> +++ b/libavcodec/cuviddec.c
> @@ -115,6 +115,12 @@ typedef struct CuvidParsedFrame
>
> #define CHECK_CU(x) FF_CUDA_CHECK_DL(avctx, ctx->cudl, x)
>
> +// NV recommends [2;4] range
> +#define CUVID_MAX_DISPLAY_DELAY (4)
> +
> +// Actual DPB size will be determined by parser.
> +#define CUVID_DEFAULT_NUM_SURFACES (CUVID_MAX_DISPLAY_DELAY + 1)
> +
> static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* format)
> {
> AVCodecContext *avctx = opaque;
> @@ -309,6 +315,25 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
> return 0;
> }
>
> + if (ctx->nb_surfaces < format->min_num_decode_surfaces + 3)
> + ctx->nb_surfaces = format->min_num_decode_surfaces + 3;
FFMAX()
> +
> + if (avctx->extra_hw_frames > 0)
> + ctx->nb_surfaces += avctx->extra_hw_frames;
> +
> + if (0 > av_fifo_realloc2(ctx->frame_queue, ctx->nb_surfaces * sizeof(CuvidParsedFrame))) {
this is the old deprecated AVFifoBuffer API, you cannot use it with
AVFifo objects
you should also forward the actual error code
> + av_log(avctx, AV_LOG_ERROR, "Failed to recreate frame queue on video sequence callback\n");
> + ctx->internal_error = AVERROR(EINVAL);
> + return 0;
> + }
> +
> + ctx->key_frame = av_realloc_array(ctx->key_frame, ctx->nb_surfaces, sizeof(int));
> + if (!ctx->key_frame) {
> + av_log(avctx, AV_LOG_ERROR, "Failed to recreate key frame queue on video sequence callback\n");
> + ctx->internal_error = AVERROR(EINVAL);
Leaks key_frame on failure and should be ENOMEM.
--
Anton Khirnov
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH] avcodec/cuviddec: update amount of decoder surfaces from within sequence decode callback
2023-06-05 10:19 ` Anton Khirnov
@ 2023-06-05 12:29 ` Roman Arzumanyan
2023-06-06 13:35 ` Timo Rothenpieler
0 siblings, 1 reply; 5+ messages in thread
From: Roman Arzumanyan @ 2023-06-05 12:29 UTC (permalink / raw)
To: FFmpeg development discussions and patches
[-- Attachment #1: Type: text/plain, Size: 3802 bytes --]
Thanks for the review, Anton. Please find the updated patch attached.
BTW - what do you mean by "you should also forward the actual error code"?
Within the cuvid_handle_video_sequence() function in case of error,
function still returns 0 and sets internal context error, so I kept
behavior intact. Do I miss something?
пн, 5 июн. 2023 г. в 13:19, Anton Khirnov <anton@khirnov.net>:
> Quoting Roman Arzumanyan (2023-06-05 09:30:07)
> > Hello,
> >
> > This patch reduces vRAM usage by cuvid decoder implementation.
> > The number of surfaces used for decoding is updated within the parser
> > sequence decode callback.
> > Also the "surfaces" AVDictionary option specific to cuvid was removed in
> > favor of "extra_hw_surfaces".
>
> This can break existing workflows, you should deprecated the option
> instead and only remove it after some time has passed.
>
> >
> > vRAM consumption was tested on various videos and savings are between 1%
> > for 360p resolution up to 21% for some 1080p H.264 videos.
> > Decoding performance was tested on various H.264 and H.265 videos in
> > different resolutions from 360p and higher, no performance penalty was
> > found.
> >
> > From 32a1b016e88fa40b983318d4583750ef250a78d9 Mon Sep 17 00:00:00 2001
> > From: Roman Arzumanyan <r.arzumanyan@visionlabs.ai>
> > Date: Thu, 1 Jun 2023 11:17:39 +0300
> > Subject: [PATCH] libavcodec/cuviddec: determine DPB size from within
> cuvid
> > parser
> >
> > ---
> > libavcodec/cuviddec.c | 29 +++++++++++++++++++++++++++--
> > 1 file changed, 27 insertions(+), 2 deletions(-)
> >
> > diff --git a/libavcodec/cuviddec.c b/libavcodec/cuviddec.c
> > index 3d43bbd466..759ed49870 100644
> > --- a/libavcodec/cuviddec.c
> > +++ b/libavcodec/cuviddec.c
> > @@ -115,6 +115,12 @@ typedef struct CuvidParsedFrame
> >
> > #define CHECK_CU(x) FF_CUDA_CHECK_DL(avctx, ctx->cudl, x)
> >
> > +// NV recommends [2;4] range
> > +#define CUVID_MAX_DISPLAY_DELAY (4)
> > +
> > +// Actual DPB size will be determined by parser.
> > +#define CUVID_DEFAULT_NUM_SURFACES (CUVID_MAX_DISPLAY_DELAY + 1)
> > +
> > static int CUDAAPI cuvid_handle_video_sequence(void *opaque,
> CUVIDEOFORMAT* format)
> > {
> > AVCodecContext *avctx = opaque;
> > @@ -309,6 +315,25 @@ static int CUDAAPI cuvid_handle_video_sequence(void
> *opaque, CUVIDEOFORMAT* form
> > return 0;
> > }
> >
> > + if (ctx->nb_surfaces < format->min_num_decode_surfaces + 3)
> > + ctx->nb_surfaces = format->min_num_decode_surfaces + 3;
>
> FFMAX()
>
> > +
> > + if (avctx->extra_hw_frames > 0)
> > + ctx->nb_surfaces += avctx->extra_hw_frames;
> > +
> > + if (0 > av_fifo_realloc2(ctx->frame_queue, ctx->nb_surfaces *
> sizeof(CuvidParsedFrame))) {
>
> this is the old deprecated AVFifoBuffer API, you cannot use it with
> AVFifo objects
>
> you should also forward the actual error code
>
> > + av_log(avctx, AV_LOG_ERROR, "Failed to recreate frame queue on
> video sequence callback\n");
> > + ctx->internal_error = AVERROR(EINVAL);
> > + return 0;
> > + }
> > +
> > + ctx->key_frame = av_realloc_array(ctx->key_frame, ctx->nb_surfaces,
> sizeof(int));
> > + if (!ctx->key_frame) {
> > + av_log(avctx, AV_LOG_ERROR, "Failed to recreate key frame queue
> on video sequence callback\n");
> > + ctx->internal_error = AVERROR(EINVAL);
>
> Leaks key_frame on failure and should be ENOMEM.
>
> --
> Anton Khirnov
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
[-- Attachment #2: 0001-libavcodec-cuviddec-determine-amount-of-decoded-surf.patch --]
[-- Type: text/x-patch, Size: 4469 bytes --]
From 4aeff36c9d79c046e726bcc18a311754ace5c47e Mon Sep 17 00:00:00 2001
From: Roman Arzumanyan <r.arzumanyan@visionlabs.ai>
Date: Thu, 1 Jun 2023 11:17:39 +0300
Subject: [PATCH] libavcodec/cuviddec: determine amount of decoded surfaces
from within cuvid parser
---
libavcodec/cuviddec.c | 34 ++++++++++++++++++++++++++++++++--
1 file changed, 32 insertions(+), 2 deletions(-)
diff --git a/libavcodec/cuviddec.c b/libavcodec/cuviddec.c
index 3d43bbd466..ef25c1d470 100644
--- a/libavcodec/cuviddec.c
+++ b/libavcodec/cuviddec.c
@@ -115,6 +115,12 @@ typedef struct CuvidParsedFrame
#define CHECK_CU(x) FF_CUDA_CHECK_DL(avctx, ctx->cudl, x)
+// NV recommends [2;4] range
+#define CUVID_MAX_DISPLAY_DELAY (4)
+
+// Actual pool size will be determined by parser.
+#define CUVID_DEFAULT_NUM_SURFACES (CUVID_MAX_DISPLAY_DELAY + 1)
+
static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* format)
{
AVCodecContext *avctx = opaque;
@@ -124,6 +130,7 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
CUVIDDECODECREATEINFO cuinfo;
int surface_fmt;
int chroma_444;
+ int fifo_size_inc;
int old_width = avctx->width;
int old_height = avctx->height;
@@ -309,6 +316,25 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
return 0;
}
+ fifo_size_inc = ctx->nb_surfaces;
+ ctx->nb_surfaces = FFMAX(ctx->nb_surfaces, format->min_num_decode_surfaces + 3);
+
+ if (0 < avctx->extra_hw_frames)
+ ctx->nb_surfaces += avctx->extra_hw_frames;
+
+ fifo_size_inc = ctx->nb_surfaces - fifo_size_inc;
+ if (0 < fifo_size_inc && 0 > av_fifo_grow2(ctx->frame_queue, fifo_size_inc)) {
+ av_log(avctx, AV_LOG_ERROR, "Failed to grow frame queue on video sequence callback\n");
+ ctx->internal_error = AVERROR(ENOMEM);
+ return 0;
+ }
+
+ if (0 < fifo_size_inc && 0 > av_reallocp_array(&ctx->key_frame, ctx->nb_surfaces, sizeof(int))) {
+ av_log(avctx, AV_LOG_ERROR, "Failed to grow key frame array on video sequence callback\n");
+ ctx->internal_error = AVERROR(ENOMEM);
+ return 0;
+ }
+
cuinfo.ulNumDecodeSurfaces = ctx->nb_surfaces;
cuinfo.ulNumOutputSurfaces = 1;
cuinfo.ulCreationFlags = cudaVideoCreate_PreferCUVID;
@@ -846,6 +872,10 @@ static av_cold int cuvid_decode_init(AVCodecContext *avctx)
goto error;
}
+ // respect the deprecated "surfaces" option if non-default value is given by user;
+ if(25 == ctx->nb_surfaces)
+ ctx->nb_surfaces = CUVID_DEFAULT_NUM_SURFACES;
+
ctx->frame_queue = av_fifo_alloc2(ctx->nb_surfaces, sizeof(CuvidParsedFrame), 0);
if (!ctx->frame_queue) {
ret = AVERROR(ENOMEM);
@@ -993,7 +1023,7 @@ static av_cold int cuvid_decode_init(AVCodecContext *avctx)
}
ctx->cuparseinfo.ulMaxNumDecodeSurfaces = ctx->nb_surfaces;
- ctx->cuparseinfo.ulMaxDisplayDelay = (avctx->flags & AV_CODEC_FLAG_LOW_DELAY) ? 0 : 4;
+ ctx->cuparseinfo.ulMaxDisplayDelay = (avctx->flags & AV_CODEC_FLAG_LOW_DELAY) ? 0 : CUVID_MAX_DISPLAY_DELAY;
ctx->cuparseinfo.pUserData = avctx;
ctx->cuparseinfo.pfnSequenceCallback = cuvid_handle_video_sequence;
ctx->cuparseinfo.pfnDecodePicture = cuvid_handle_picture_decode;
@@ -1097,7 +1127,7 @@ static const AVOption options[] = {
{ "bob", "Bob deinterlacing", 0, AV_OPT_TYPE_CONST, { .i64 = cudaVideoDeinterlaceMode_Bob }, 0, 0, VD, "deint" },
{ "adaptive", "Adaptive deinterlacing", 0, AV_OPT_TYPE_CONST, { .i64 = cudaVideoDeinterlaceMode_Adaptive }, 0, 0, VD, "deint" },
{ "gpu", "GPU to be used for decoding", OFFSET(cu_gpu), AV_OPT_TYPE_STRING, { .str = NULL }, 0, 0, VD },
- { "surfaces", "Maximum surfaces to be used for decoding", OFFSET(nb_surfaces), AV_OPT_TYPE_INT, { .i64 = 25 }, 0, INT_MAX, VD },
+ { "surfaces", "Maximum surfaces to be used for decoding", OFFSET(nb_surfaces), AV_OPT_TYPE_INT, { .i64 = 25 }, 0, INT_MAX, VD | AV_OPT_FLAG_DEPRECATED },
{ "drop_second_field", "Drop second field when deinterlacing", OFFSET(drop_second_field), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, VD },
{ "crop", "Crop (top)x(bottom)x(left)x(right)", OFFSET(crop_expr), AV_OPT_TYPE_STRING, { .str = NULL }, 0, 0, VD },
{ "resize", "Resize (width)x(height)", OFFSET(resize_expr), AV_OPT_TYPE_STRING, { .str = NULL }, 0, 0, VD },
--
2.34.1
[-- Attachment #3: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH] avcodec/cuviddec: update amount of decoder surfaces from within sequence decode callback
2023-06-05 12:29 ` Roman Arzumanyan
@ 2023-06-06 13:35 ` Timo Rothenpieler
2023-08-08 21:55 ` Michael Diesel
0 siblings, 1 reply; 5+ messages in thread
From: Timo Rothenpieler @ 2023-06-06 13:35 UTC (permalink / raw)
To: ffmpeg-devel
On 05.06.2023 14:29, Roman Arzumanyan wrote:
> Thanks for the review, Anton. Please find the updated patch attached.
> BTW - what do you mean by "you should also forward the actual error code"?
> Within the cuvid_handle_video_sequence() function in case of error,
> function still returns 0 and sets internal context error, so I kept
> behavior intact. Do I miss something?
applied with minor amendments.
Turned the if-statements around, we don't do them that way around in
ffmpeg code.
And switched the surfaces option to have a default of -1, to not rely on
a magic value of 25. That could surprise users who happen to request 25
surfaces.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH] avcodec/cuviddec: update amount of decoder surfaces from within sequence decode callback
2023-06-06 13:35 ` Timo Rothenpieler
@ 2023-08-08 21:55 ` Michael Diesel
0 siblings, 0 replies; 5+ messages in thread
From: Michael Diesel @ 2023-08-08 21:55 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: r.arzumanyan, timo
Hello ,
after this patch I get stutter in the output video when using h264_cuvid
as decoder
With "surfaces" as -1 will result :
CUVIDPARSERPARAMS::ulMaxNumDecodeSurfaces = 5 =
CUVID_DEFAULT_NUM_SURFACES
CUVIDDECODECREATEINFO::ulNumDecodeSurfaces = 8 =
format->min_num_decode_surfaces + 3
As per NVDEC Video Decoder API Programming Guide / Creating a parser ,
the two values should be equal.
cuvid_handle_video_sequence/pfnSequenceCallback should return a value :
">1: succeeded, and driver should override
CUVIDPARSERPARAMS::ulMaxNumDecodeSurfaces with this return value"
Something like:
if(ctx->cuparseinfo.ulMaxNumDecodeSurfaces !=
cuinfo.ulNumDecodeSurfaces)
return cuinfo.ulNumDecodeSurfaces;
else
return 1;
This solves partially the problem , there's still occasional stutter , a
value of 8 is not sufficient.
On my test system 12 is the magic number.
Timo Rothenpieler wrote:
> applied with minor amendments.
>
> Turned the if-statements around, we don't do them that way around in
> ffmpeg code.
> And switched the surfaces option to have a default of -1, to not rely on
> a magic value of 25. That could surprise users who happen to request 25
> surfaces.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-08-08 21:55 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CAJZRZVnbFVOQaUuvA=F7qUmr+P_wfrjJnga2cq=tdNtm4Ky0sQ@mail.gmail.com>
2023-06-05 7:30 ` [FFmpeg-devel] [PATCH] avcodec/cuviddec: update amount of decoder surfaces from within sequence decode callback Roman Arzumanyan
2023-06-05 10:19 ` Anton Khirnov
2023-06-05 12:29 ` Roman Arzumanyan
2023-06-06 13:35 ` Timo Rothenpieler
2023-08-08 21:55 ` Michael Diesel
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git