From: Timo Rothenpieler <timo@rothenpieler.org> To: ffmpeg-devel@ffmpeg.org Subject: Re: [FFmpeg-devel] [PATCH] avfilter: add pad_cuda filter Date: Tue, 17 Jun 2025 23:20:25 +0200 Message-ID: <924e11e3-4475-4457-9c65-9f5862be25f0@rothenpieler.org> (raw) In-Reply-To: <20250616233931.192291-1-jestrada.list@gmail.com> On 17.06.2025 01:39, Jorge Estrada wrote: > From: Jorge Estrada <--global> > > This patch adds the pad_cuda video filter. A filter similar to the existing pad filter but accelerated by CUDA. > > The filter shares the same options as the software pad filter. > > Example usage: > ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf "pad_cuda=w=iw+100:h=ih+100:x=-1:y=-1:color=red" out.mp4 > --- > Changelog | 1 + > configure | 2 + > doc/filters.texi | 82 +++++ > libavfilter/Makefile | 1 + > libavfilter/allfilters.c | 1 + > libavfilter/version.h | 2 +- > libavfilter/vf_pad_cuda.c | 650 +++++++++++++++++++++++++++++++++++++ > libavfilter/vf_pad_cuda.cu | 65 ++++ > 8 files changed, 803 insertions(+), 1 deletion(-) > create mode 100644 libavfilter/vf_pad_cuda.c > create mode 100644 libavfilter/vf_pad_cuda.cu > > diff --git a/Changelog b/Changelog > index 4217449438..ad7906acf9 100644 > --- a/Changelog > +++ b/Changelog > @@ -18,6 +18,7 @@ version <next>: > - APV encoding support through a libopenapv wrapper > - VVC decoder supports all content of SCC (Screen Content Coding): > IBC (Inter Block Copy), Palette Mode and ACT (Adaptive Color Transform > +- pad_cuda filter > > > version 7.1: > diff --git a/configure b/configure > index 534b443f7d..e68d612867 100755 > --- a/configure > +++ b/configure > @@ -3357,6 +3357,8 @@ thumbnail_cuda_filter_deps_any="cuda_nvcc cuda_llvm" > transpose_npp_filter_deps="ffnvcodec libnpp" > overlay_cuda_filter_deps="ffnvcodec" > overlay_cuda_filter_deps_any="cuda_nvcc cuda_llvm" > +pad_cuda_filter_deps="ffnvcodec" > +pad_cuda_filter_deps_any="cuda_nvcc cuda_llvm" > sharpen_npp_filter_deps="ffnvcodec libnpp" > > ddagrab_filter_deps="d3d11va IDXGIOutput1 DXGI_OUTDUPL_FRAME_INFO" > diff --git a/doc/filters.texi b/doc/filters.texi > index f32fc23c70..68be03c1ab 100644 > --- a/doc/filters.texi > +++ b/doc/filters.texi > @@ -26682,6 +26682,88 @@ See @ref{framesync}. > > This filter also supports the @ref{framesync} options. > > + > +@anchor{pad_cuda} > +@subsection pad_cuda > +Add paddings to an input video stream using CUDA. > + > +This filter is the CUDA-accelerated version of the @ref{pad} filter. It accepts CUDA frames as input and the following options: > + > +@table @option > +@item width, w > +@item height, h > +Specify an expression for the size of the output image with the paddings. If the value for @var{width} or @var{height} is 0, the corresponding input size is used for the output. > + > +The default value for both @var{width} and @var{height} is 0. > + > +@item x > +@item y > +Specify the offsets to place the input image at within the padded area, with respect to the top/left border of the output image. > + > +If @var{x} or @var{y} evaluates to a negative number, the input image will be centered on the padded area. > + > +The default value for both @var{x} and @var{y} is 0. > + > +@item color > +Specify the color of the padded area. For the syntax of this option, check the @ref{color syntax,,"Color" section in the ffmpeg-utils manual,ffmpeg-utils}. > + > +The default value of @var{color} is "black". > + > +@item eval > +Specify when to evaluate the @var{width}, @var{height}, @var{x}, and @var{y} expressions. > + > +It accepts the following values: > +@table @samp > +@item init > +Only evaluate expressions once during the filter initialization. (default) > +@item frame > +Evaluate expressions for each incoming frame. > +@end table > + > +@item aspect > +Pad to a specified aspect ratio instead of a fixed resolution. > +@end table > + > +The value for the @var{width}, @var{height}, @var{x}, and @var{y} options are expressions containing the following constants: > +@table @var > +@item in_w, iw > +The input video width. > +@item in_h, ih > +The input video height. > +@item out_w, ow > +The output width (the size of the padded area), as specified by the @var{width} expressions. > +@item out_h, oh > +The output height (the size of the padded area), as specified by the @var{height} expressions. > +@item x > +@item y > +The x and y offsets as specified by the @var{x} and @var{y} expressions. > +@item a > +Same as @code{iw / ih}. > +@item sar > +The input sample aspect ratio. > +@item dar > +The input display aspect ratio, equivalent to @code{(iw / ih) * sar}. > +@item hsub > +@item vsub > +The horizontal and vertical chroma subsample values. > +@end table > + > +@subsubsection Examples > + > +@itemize > +@item > +Add a 200-pixel black border to all sides of a video frame: > +@example > +ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf "pad_cuda=w=iw+400:h=ih+400:x=200:y=200" -c:v h264_nvenc out.mp4 > +@end example > + > +@item > +Pad the input video to a 16:9 aspect ratio, filling with the color "blue": > +@example > +ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf "pad_cuda=w=ih*16/9/sar:h=ih:x=(ow-iw)/2:y=(oh-ih)/2:color=blue" -c:v h264_nvenc out.mp4 > +@end example > +@end itemize > + I'm not sure if it's neccesary to fully document this filter, given it's identical to vf_pad. You could just say that it's identical to vf_pad usage, and then only describe any potential differences. > @anchor{scale_cuda} > @subsection scale_cuda > > diff --git a/libavfilter/Makefile b/libavfilter/Makefile > index 97f8f17272..9e9153f5b0 100644 > --- a/libavfilter/Makefile > +++ b/libavfilter/Makefile > @@ -422,6 +422,7 @@ OBJS-$(CONFIG_OVERLAY_VAAPI_FILTER) += vf_overlay_vaapi.o framesync.o v > OBJS-$(CONFIG_OVERLAY_VULKAN_FILTER) += vf_overlay_vulkan.o vulkan.o vulkan_filter.o > OBJS-$(CONFIG_OWDENOISE_FILTER) += vf_owdenoise.o > OBJS-$(CONFIG_PAD_FILTER) += vf_pad.o > +OBJS-$(CONFIG_PAD_CUDA_FILTER) += vf_pad_cuda.o vf_pad_cuda.ptx.o cuda/load_helper.o > OBJS-$(CONFIG_PAD_OPENCL_FILTER) += vf_pad_opencl.o opencl.o opencl/pad.o > OBJS-$(CONFIG_PALETTEGEN_FILTER) += vf_palettegen.o palette.o > OBJS-$(CONFIG_PALETTEUSE_FILTER) += vf_paletteuse.o framesync.o palette.o > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c > index 3bc045b28f..409099bf1f 100644 > --- a/libavfilter/allfilters.c > +++ b/libavfilter/allfilters.c > @@ -395,6 +395,7 @@ extern const FFFilter ff_vf_overlay_vulkan; > extern const FFFilter ff_vf_overlay_cuda; > extern const FFFilter ff_vf_owdenoise; > extern const FFFilter ff_vf_pad; > +extern const FFFilter ff_vf_pad_cuda; > extern const FFFilter ff_vf_pad_opencl; > extern const FFFilter ff_vf_palettegen; > extern const FFFilter ff_vf_paletteuse; > diff --git a/libavfilter/version.h b/libavfilter/version.h > index d5a6bc143a..1e884d9b44 100644 > --- a/libavfilter/version.h > +++ b/libavfilter/version.h > @@ -31,7 +31,7 @@ > > #include "version_major.h" > > -#define LIBAVFILTER_VERSION_MINOR 0 > +#define LIBAVFILTER_VERSION_MINOR 1 > #define LIBAVFILTER_VERSION_MICRO 100 > > > diff --git a/libavfilter/vf_pad_cuda.c b/libavfilter/vf_pad_cuda.c > new file mode 100644 > index 0000000000..a9a9036d71 > --- /dev/null > +++ b/libavfilter/vf_pad_cuda.c > @@ -0,0 +1,650 @@ > +/* > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + */ > + > +/** > + * @file > + * CUDA video padding filter > + */ > + > +#include <float.h> > +#include <vector_types.h> > + > +#include "filters.h" > +#include "libavutil/avstring.h" > +#include "libavutil/common.h" > +#include "libavutil/cuda_check.h" > +#include "libavutil/eval.h" > +#include "libavutil/hwcontext.h" > +#include "libavutil/hwcontext_cuda_internal.h" > +#include "libavutil/imgutils.h" > +#include "libavutil/internal.h" > +#include "libavutil/opt.h" > +#include "libavutil/pixdesc.h" > + > +#include "cuda/load_helper.h" > + > +#define CHECK_CU(x) FF_CUDA_CHECK_DL(ctx, device_hwctx->internal->cuda_dl, x) > +#define DIV_UP(a, b) ( ((a) + (b) - 1) / (b) ) > +#define BLOCK_X 32 > +#define BLOCK_Y 16 > + > +static const enum AVPixelFormat supported_formats[] = { > + AV_PIX_FMT_YUV420P, > + AV_PIX_FMT_YUV444P, > + AV_PIX_FMT_YUVA420P, > + AV_PIX_FMT_YUVA444P, > + AV_PIX_FMT_NV12, > +}; > + > +typedef struct CUDAPadContext { > + const AVClass *class; > + > + AVBufferRef *frames_ctx; > + > + int w, h; ///< output dimensions, a value of 0 will result in the input size > + int x, y; ///< offsets of the input area with respect to the padded area > + int in_w, in_h; ///< width and height for the padded input video > + > + char *w_expr; ///< width expression > + char *h_expr; ///< height expression > + char *x_expr; ///< x offset expression > + char *y_expr; ///< y offset expression > + > + uint8_t rgba_color[4]; ///< color for the padding area > + uint8_t parsed_color[4]; > + AVRational aspect; > + > + int eval_mode; > + > + int last_out_w, last_out_h; ///< used to evaluate the prior output width and height with the incoming frame > + > + AVCUDADeviceContext *hwctx; > + CUmodule cu_module; > + CUfunction cu_func_planar; > + CUfunction cu_func_uv; > +} CUDAPadContext; > + > +static const char *const var_names[] = { > + "in_w", "iw", > + "in_h", "ih", > + "out_w", "ow", > + "out_h", "oh", > + "x", > + "y", > + "a", > + "sar", > + "dar", > + "hsub", > + "vsub", > + NULL > +}; > + > +enum { > + VAR_IN_W, > + VAR_IW, > + VAR_IN_H, > + VAR_IH, > + VAR_OUT_W, > + VAR_OW, > + VAR_OUT_H, > + VAR_OH, > + VAR_X, > + VAR_Y, > + VAR_A, > + VAR_SAR, > + VAR_DAR, > + VAR_HSUB, > + VAR_VSUB, > + VARS_NB > +}; > + > +enum EvalMode { > + EVAL_MODE_INIT, > + EVAL_MODE_FRAME, > + EVAL_MODE_NB > +}; > + > +static int eval_expr(AVFilterContext *ctx) > +{ > + CUDAPadContext *s = ctx->priv; > + AVFilterLink *inlink = ctx->inputs[0]; > + const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(inlink->format); > + > + double var_values[VARS_NB], res; > + char *expr; > + int ret; > + > + var_values[VAR_IN_W] = var_values[VAR_IW] = s->in_w; > + var_values[VAR_IN_H] = var_values[VAR_IH] = s->in_h; > + var_values[VAR_OUT_W] = var_values[VAR_OW] = NAN; > + var_values[VAR_OUT_H] = var_values[VAR_OH] = NAN; > + var_values[VAR_A] = (double)s->in_w / s->in_h; > + var_values[VAR_SAR] = inlink->sample_aspect_ratio.num ? > + (double)inlink->sample_aspect_ratio.num / > + inlink->sample_aspect_ratio.den : 1; > + var_values[VAR_DAR] = var_values[VAR_A] * var_values[VAR_SAR]; > + var_values[VAR_HSUB] = 1 << desc->log2_chroma_w; > + var_values[VAR_VSUB] = 1 << desc->log2_chroma_h; > + > + expr = s->w_expr; > + ret = av_expr_parse_and_eval(&res, expr, var_names, var_values, NULL, NULL, NULL, NULL, NULL, 0, ctx); > + if (ret < 0) > + goto fail; > + > + s->w = res; > + if (s->w < 0) { > + av_log(ctx, AV_LOG_ERROR, "Width expression is negative.\n"); > + ret = AVERROR(EINVAL); > + goto fail; > + } > + > + var_values[VAR_OUT_W] = var_values[VAR_OW] = s->w; > + > + expr = s->h_expr; > + ret = av_expr_parse_and_eval(&res, expr, var_names, var_values, NULL, NULL, NULL, NULL, NULL, 0, ctx); > + if (ret < 0) > + goto fail; > + > + s->h = res; > + if (s->h < 0) { > + av_log(ctx, AV_LOG_ERROR, "Height expression is negative.\n"); > + ret = AVERROR(EINVAL); > + goto fail; > + } > + var_values[VAR_OUT_H] = var_values[VAR_OH] = s->h; > + > + if (!s->h) > + s->h = s->in_h; > + > + var_values[VAR_OUT_H] = var_values[VAR_OH] = s->h; > + > + > + expr = s->w_expr; > + ret = av_expr_parse_and_eval(&res, expr, var_names, var_values, NULL, NULL, NULL, NULL, NULL, 0, ctx); > + if (ret < 0) > + goto fail; > + > + s->w = res; > + if (s->w < 0) { > + av_log(ctx, AV_LOG_ERROR, "Width expression is negative.\n"); > + ret = AVERROR(EINVAL); > + goto fail; > + } > + if (!s->w) > + s->w = s->in_w; > + > + var_values[VAR_OUT_W] = var_values[VAR_OW] = s->w; > + > + > + expr = s->x_expr; > + ret = av_expr_parse_and_eval(&res, expr, var_names, var_values, NULL, NULL, NULL, NULL, NULL, 0, ctx); > + if (ret < 0) > + goto fail; > + > + s->x = res; > + > + > + expr = s->y_expr; > + ret = av_expr_parse_and_eval(&res, expr, var_names, var_values, NULL, NULL, NULL, NULL, NULL, 0, ctx); > + if (ret < 0) > + goto fail; > + > + s->y = res; > + > + if (s->x < 0 || s->x + s->in_w > s->w) { > + s->x = (s->w - s->in_w) / 2; > + av_log(ctx, AV_LOG_VERBOSE, "centering X offset.\n"); > + } > + > + if (s->y < 0 || s->y + s->in_h > s->h) { > + s->y = (s->h - s->in_h) / 2; > + av_log(ctx, AV_LOG_VERBOSE, "centering Y offset.\n"); > + } > + > + s->w = av_clip(s->w, 1, INT_MAX); > + s->h = av_clip(s->h, 1, INT_MAX); > + > + if (s->w < s->in_w || s->h < s->in_h) { > + av_log(ctx, AV_LOG_ERROR, "Padded size < input size.\n"); > + return AVERROR(EINVAL); > + } > + > + av_log(ctx, AV_LOG_DEBUG, > + "w:%d h:%d -> w:%d h:%d x:%d y:%d color:0x%02X%02X%02X%02X\n", > + inlink->w, inlink->h, s->w, s->h, s->x, s->y, s->rgba_color[0], > + s->rgba_color[1], s->rgba_color[2], s->rgba_color[3]); > + > + return 0; > + > +fail: > + av_log(ctx, AV_LOG_ERROR, "Error evaluating '%s'\n", expr); > + return ret; > +} Something I'm wondering with this being the 4th effective duplication of this exact logic, if it couldn't be pulled out of vf_pad somehow, so all the other pad filters can re-use it. That's not a problem for this patch to solve though. > +static int cuda_pad_alloc_out_frames_ctx(AVFilterContext *ctx, AVBufferRef **out_frames_ctx, const int width, const int height) > +{ > + AVFilterLink *inlink = ctx->inputs[0]; > + FilterLink *inl = ff_filter_link(inlink); > + AVHWFramesContext *in_frames_ctx = (AVHWFramesContext *)inl->hw_frames_ctx->data; > + int ret; > + > + *out_frames_ctx = av_hwframe_ctx_alloc(in_frames_ctx->device_ref); > + if (!*out_frames_ctx) { > + return AVERROR(ENOMEM); > + } > + > + AVHWFramesContext *out_fc = (AVHWFramesContext *)(*out_frames_ctx)->data; > + out_fc->format = AV_PIX_FMT_CUDA; > + out_fc->sw_format = in_frames_ctx->sw_format; > + > + out_fc->width = FFALIGN(width, 32); > + out_fc->height = FFALIGN(height, 32); > + > + ret = av_hwframe_ctx_init(*out_frames_ctx); > + if (ret < 0) { > + av_log(ctx, AV_LOG_ERROR, "Failed to init output ctx\n"); > + av_buffer_unref(out_frames_ctx); > + return ret; > + } > + > + return 0; > +} > + > +static av_cold int cuda_pad_init(AVFilterContext *ctx) > +{ > + CUDAPadContext *s = ctx->priv; > + if (!s) { > + av_log(ctx, AV_LOG_ERROR, "Failed to allocate CUDAPadContext.\n"); > + return AVERROR(ENOMEM); > + } I don't think I've ever seen a check like this. Pretty sure if the allocation for this failed, we'd have never gotten here. > + s->last_out_w = -1; > + s->last_out_h = -1; > + > + return 0; > +} > + > +static av_cold void cuda_pad_uninit(AVFilterContext *ctx) > +{ > + CUDAPadContext *s = ctx->priv; > + av_buffer_unref(&s->frames_ctx); > + > + if (s->cu_module) { > + s->hwctx->internal->cuda_dl->cuModuleUnload(s->cu_module); Missing push/pop current context here. > + s->cu_module = NULL; > + } > +} > + > +static av_cold int cuda_pad_load_functions(AVFilterContext *ctx) > +{ > + CUDAPadContext *s = ctx->priv; > + CudaFunctions *cu = s->hwctx->internal->cuda_dl; > + CUcontext dummy_cu_ctx; > + int ret; > + > + AVCUDADeviceContext *device_hwctx = s->hwctx; > + > + extern const unsigned char ff_vf_pad_cuda_ptx_data[]; > + extern const unsigned int ff_vf_pad_cuda_ptx_len; > + > + ret = CHECK_CU(cu->cuCtxPushCurrent(device_hwctx->cuda_ctx)); > + if (ret < 0) > + return ret; > + > + ret = ff_cuda_load_module(ctx, device_hwctx, &s->cu_module, > + ff_vf_pad_cuda_ptx_data, ff_vf_pad_cuda_ptx_len); > + if (ret < 0) { > + av_log(ctx, AV_LOG_ERROR, "Failed to load CUDA module\n"); > + goto end; > + } > + > + ret = CHECK_CU(cu->cuModuleGetFunction(&s->cu_func_planar, s->cu_module, "pad_planar_cuda")); > + if (ret < 0) { > + av_log(ctx, AV_LOG_ERROR, "Failed to load pad_planar_cuda\n"); > + goto end; > + } > + > + ret = CHECK_CU(cu->cuModuleGetFunction(&s->cu_func_uv, s->cu_module, "pad_uv_cuda")); > + if (ret < 0) > + av_log(ctx, AV_LOG_ERROR, "Failed to load pad_uv_cuda\n"); > + > +end: > + CHECK_CU(cu->cuCtxPopCurrent(&dummy_cu_ctx)); > + > + return ret; > +} > + > +static int cuda_pad_config_props(AVFilterLink *outlink) > +{ > + AVFilterContext *ctx = outlink->src; > + CUDAPadContext *s = ctx->priv; > + > + AVFilterLink *inlink = ctx->inputs[0]; > + FilterLink *inl = ff_filter_link(inlink); > + > + FilterLink *ol = ff_filter_link(outlink); > + > + AVHWFramesContext *in_frames_ctx; > + int format_supported = 0; > + int ret; > + > + s->in_w = inlink->w; > + s->in_h = inlink->h; > + ret = eval_expr(ctx); > + if (ret < 0) > + return ret; > + > + if (!inl->hw_frames_ctx) { > + av_log(ctx, AV_LOG_ERROR, "No hw context provided on input\n"); > + return AVERROR(EINVAL); > + } > + > + in_frames_ctx = (AVHWFramesContext *)inl->hw_frames_ctx->data; > + s->hwctx = in_frames_ctx->device_ctx->hwctx; > + > + for (int i = 0; i < FF_ARRAY_ELEMS(supported_formats); i++) { > + if (in_frames_ctx->sw_format == supported_formats[i]) { > + format_supported = 1; > + break; > + } > + } > + if (!format_supported) { > + av_log(ctx, AV_LOG_ERROR, "Unsupported input format.\n"); > + return AVERROR(EINVAL); > + } > + > + uint8_t R = s->rgba_color[0]; > + uint8_t G = s->rgba_color[1]; > + uint8_t B = s->rgba_color[2]; > + > + int Y = (( 66 * R + 129 * G + 25 * B + 128) >> 8) + 16; > + int U = ((-38 * R - 74 * G + 112 * B + 128) >> 8) + 128; > + int V = ((112 * R - 94 * G - 18 * B + 128) >> 8) + 128; > + s->parsed_color[0] = av_clip_uint8(Y); > + s->parsed_color[1] = av_clip_uint8(U); > + s->parsed_color[2] = av_clip_uint8(V); > + s->parsed_color[3] = s->rgba_color[3]; > + > + ret = cuda_pad_alloc_out_frames_ctx(ctx, &s->frames_ctx, s->w, s->h); > + if (ret < 0) > + return ret; > + > + ol->hw_frames_ctx = av_buffer_ref(s->frames_ctx); > + if (!ol->hw_frames_ctx) > + return AVERROR(ENOMEM); > + > + outlink->w = s->w; > + outlink->h = s->h; > + outlink->time_base = inlink->time_base; > + outlink->format = AV_PIX_FMT_CUDA; > + > + s->last_out_w = s->w; > + s->last_out_h = s->h; > + > + ret = cuda_pad_load_functions(ctx); > + if (ret < 0) > + return ret; > + > + return 0; > +} > + > +static int cuda_pad_pad(AVFilterContext *ctx, AVFrame *out, const AVFrame *in) > +{ > + CUDAPadContext *s = ctx->priv; > + FilterLink *inl = ff_filter_link(ctx->inputs[0]); > + > + AVHWFramesContext *in_frames_ctx = > + (AVHWFramesContext *)inl->hw_frames_ctx->data; > + const AVPixFmtDescriptor *desc_in = > + av_pix_fmt_desc_get(in_frames_ctx->sw_format); I wouldn't linebreak both of those either > + > + CudaFunctions *cu = s->hwctx->internal->cuda_dl; > + AVCUDADeviceContext *device_hwctx = s->hwctx; > + int ret; > + > + > + const int nb_planes = av_pix_fmt_count_planes(in_frames_ctx->sw_format); > + for (int plane = 0; plane < nb_planes; plane++) { > + > + int hsub = (plane == 1 || plane == 2) ? desc_in->log2_chroma_w : 0; > + int vsub = (plane == 1 || plane == 2) ? desc_in->log2_chroma_h : 0; > + > + if (in_frames_ctx->sw_format == AV_PIX_FMT_NV12 && plane == 1) { > + hsub = desc_in->log2_chroma_w; > + vsub = desc_in->log2_chroma_h; > + } > + > + int src_w = AV_CEIL_RSHIFT(s->in_w, hsub); > + int src_h = AV_CEIL_RSHIFT(s->in_h, vsub); > + > + int dst_w = AV_CEIL_RSHIFT(s->w, hsub); > + int dst_h = AV_CEIL_RSHIFT(s->h, vsub); > + > + int y_plane_offset = AV_CEIL_RSHIFT(s->y, vsub); > + int x_plane_offset = AV_CEIL_RSHIFT(s->x, hsub); > + > + if (x_plane_offset + src_w > dst_w || y_plane_offset + src_h > dst_h) { > + av_log(ctx, AV_LOG_ERROR, > + "ROI out of bounds in plane %d: offset=(%d,%d) in=(%dx%d) " > + "out=(%dx%d)\n", > + plane, x_plane_offset, y_plane_offset, src_w, src_h, dst_w, dst_h); > + return AVERROR(EINVAL); > + } > + > + CUfunction cuda_func; > + uchar2 fill_val_uv; > + uint8_t fill_val_planar; > + void *p_fill_val; > + > + if (in_frames_ctx->sw_format == AV_PIX_FMT_NV12 && plane == 1) { > + cuda_func = s->cu_func_uv; > + fill_val_uv.x = s->parsed_color[1]; > + fill_val_uv.y = s->parsed_color[2]; > + p_fill_val = &fill_val_uv; > + } else { > + cuda_func = s->cu_func_planar; > + fill_val_planar = s->parsed_color[plane]; > + p_fill_val = &fill_val_planar; > + } > + > + CUdeviceptr d_dst = (CUdeviceptr)out->data[plane]; > + CUdeviceptr d_src = (CUdeviceptr)in->data[plane]; > + > + int src_linesize = in->linesize[plane]; > + > + void *kernel_args[] = { > + &d_dst, &out->linesize[plane], &dst_w, &dst_h, > + &d_src, &src_linesize, &src_w, &src_h, > + &x_plane_offset, &y_plane_offset, p_fill_val > + }; > + > + unsigned int grid_x = DIV_UP(dst_w, BLOCK_X); > + unsigned int grid_y = DIV_UP(dst_h, BLOCK_Y); > + > + ret = CHECK_CU(cu->cuLaunchKernel(cuda_func, grid_x, grid_y, 1, > + BLOCK_X, BLOCK_Y, 1, > + 0, s->hwctx->stream, kernel_args, NULL)); > + > + if (ret < 0) { > + av_log(ctx, AV_LOG_ERROR, "Failed to launch kernel for plane %d\n", plane); > + return ret; > + } > + } > + > + return 0; > +} > + > +static int cuda_pad_filter_frame(AVFilterLink *inlink, AVFrame *in) > +{ > + AVFilterContext *ctx = inlink->dst; > + CUDAPadContext *s = ctx->priv; > + AVFilterLink *outlink = ctx->outputs[0]; > + > + FilterLink *outl = ff_filter_link(outlink); > + > + AVHWFramesContext *out_frames_ctx = > + (AVHWFramesContext *)outl->hw_frames_ctx->data; nit, but it's not worth it to linebreak here imo > + AVCUDADeviceContext *device_hwctx = out_frames_ctx->device_ctx->hwctx; > + > + int ret; > + > + if (s->eval_mode == EVAL_MODE_FRAME) { > + s->in_w = in->width; > + s->in_h = in->height; > + s->aspect = in->sample_aspect_ratio; > + > + ret = eval_expr(ctx); > + if (ret < 0) { > + av_frame_free(&in); > + return ret; > + } > + } > + > + > + if (s->x == 0 && s->y == 0 && > + s->w == in->width && s->h == in->height) { > + av_log(ctx, AV_LOG_DEBUG, "No border. Passing the frame unmodified.\n"); > + s->last_out_w = s->w; > + s->last_out_h = s->h; > + return ff_filter_frame(outlink, in); > + } > + > + > + if (s->w != s->last_out_w || s->h != s->last_out_h) { > + > + av_buffer_unref(&s->frames_ctx); > + > + ret = cuda_pad_alloc_out_frames_ctx(ctx, &s->frames_ctx, s->w, s->h); > + if (ret < 0) > + return ret; > + > + av_buffer_unref(&outl->hw_frames_ctx); > + outl->hw_frames_ctx = av_buffer_ref(s->frames_ctx); > + if (!outl->hw_frames_ctx) { > + av_frame_free(&in); > + av_log(ctx, AV_LOG_ERROR, "Failed to allocate output frame context.\n"); > + return AVERROR(ENOMEM); > + } > + outlink->w = s->w; > + outlink->h = s->h; > + > + s->last_out_w = s->w; > + s->last_out_h = s->h; > + } > + > + AVFrame *out = av_frame_alloc(); > + if (!out) { > + av_frame_free(&in); > + av_log(ctx, AV_LOG_ERROR, "Failed to allocate output AVFrame.\n"); > + return AVERROR(ENOMEM); > + } > + ret = av_hwframe_get_buffer(outl->hw_frames_ctx, out, 0); > + if (ret < 0) { > + av_log(ctx, AV_LOG_ERROR, "Unable to get output buffer: %s\n", > + av_err2str(ret)); > + av_frame_free(&out); > + av_frame_free(&in); > + return ret; > + } > + > + CUcontext dummy; > + ret = CHECK_CU(device_hwctx->internal->cuda_dl->cuCtxPushCurrent( > + device_hwctx->cuda_ctx)); > + if (ret < 0) { > + av_frame_free(&out); > + av_frame_free(&in); > + return ret; > + } > + > + ret = cuda_pad_pad(ctx, out, in); > + > + CHECK_CU(device_hwctx->internal->cuda_dl->cuCtxPopCurrent(&dummy)); > + > + if (ret < 0) { > + av_frame_free(&out); > + av_frame_free(&in); > + return ret; > + } > + > + av_frame_copy_props(out, in); > + out->width = s->w; > + out->height = s->h; > + > + > + av_reduce(&out->sample_aspect_ratio.num, &out->sample_aspect_ratio.den, > + (int64_t)in->sample_aspect_ratio.num * out->height * in->width, > + (int64_t)in->sample_aspect_ratio.den * out->width * in->height, > + INT_MAX); > + > + av_frame_free(&in); > + return ff_filter_frame(outlink, out); > +} > + > +#define OFFSET(x) offsetof(CUDAPadContext, x) > +#define FLAGS (AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM) > + > +static const AVOption cuda_pad_options[] = { > + { "width", "set the pad area width expression", OFFSET(w_expr), AV_OPT_TYPE_STRING, {.str = "iw"}, 0, 0, FLAGS }, > + { "w", "set the pad area width expression", OFFSET(w_expr), AV_OPT_TYPE_STRING, {.str = "iw"}, 0, 0, FLAGS }, > + { "height", "set the pad area height expression", OFFSET(h_expr), AV_OPT_TYPE_STRING, {.str = "ih"}, 0, 0, FLAGS }, > + { "h", "set the pad area height expression", OFFSET(h_expr), AV_OPT_TYPE_STRING, {.str = "ih"}, 0, 0, FLAGS }, > + { "x", "set the x offset expression for the input image position", OFFSET(x_expr), AV_OPT_TYPE_STRING, {.str = "0"}, 0, 0, FLAGS }, > + { "y", "set the y offset expression for the input image position", OFFSET(y_expr), AV_OPT_TYPE_STRING, {.str = "0"}, 0, 0, FLAGS }, > + { "color", "set the color of the padded area border", OFFSET(rgba_color), AV_OPT_TYPE_COLOR, {.str = "black"}, .flags = FLAGS }, > + { "eval", "specify when to evaluate expressions", OFFSET(eval_mode), AV_OPT_TYPE_INT, {.i64 = EVAL_MODE_INIT}, 0, EVAL_MODE_NB-1, FLAGS, .unit = "eval" }, > + { "init", "eval expressions once during initialization", 0, AV_OPT_TYPE_CONST, {.i64=EVAL_MODE_INIT}, .flags = FLAGS, .unit = "eval" }, > + { "frame", "eval expressions during initialization and per-frame", 0, AV_OPT_TYPE_CONST, {.i64=EVAL_MODE_FRAME}, .flags = FLAGS, .unit = "eval" }, > + { "aspect", "pad to fit an aspect instead of a resolution", OFFSET(aspect), AV_OPT_TYPE_RATIONAL, {.dbl = 0}, 0, DBL_MAX, FLAGS }, > + { NULL } > +}; > + > +static const AVClass cuda_pad_class = { > + .class_name = "pad_cuda", > + .item_name = av_default_item_name, > + .option = cuda_pad_options, > + .version = LIBAVUTIL_VERSION_INT, > +}; > + > +static const AVFilterPad cuda_pad_inputs[] = {{ > + .name = "default", > + .type = AVMEDIA_TYPE_VIDEO, > + .filter_frame = cuda_pad_filter_frame > +}}; > + > +static const AVFilterPad cuda_pad_outputs[] = {{ > + .name = "default", > + .type = AVMEDIA_TYPE_VIDEO, > + .config_props = cuda_pad_config_props, > +}}; > + > +const FFFilter ff_vf_pad_cuda = { > + .p.name = "pad_cuda", > + .p.description = NULL_IF_CONFIG_SMALL("CUDA-based GPU padding filter"), > + .init = cuda_pad_init, > + .uninit = cuda_pad_uninit, > + > + .p.priv_class = &cuda_pad_class, > + > + FILTER_INPUTS(cuda_pad_inputs), > + FILTER_OUTPUTS(cuda_pad_outputs), > + > + FILTER_SINGLE_PIXFMT(AV_PIX_FMT_CUDA), > + > + .priv_size = sizeof(CUDAPadContext), > + .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE, > +}; > \ No newline at end of file > diff --git a/libavfilter/vf_pad_cuda.cu b/libavfilter/vf_pad_cuda.cu > new file mode 100644 > index 0000000000..4524372e17 > --- /dev/null > +++ b/libavfilter/vf_pad_cuda.cu > @@ -0,0 +1,65 @@ > +/* > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + */ > + > + > +extern "C" { > + > +__global__ void pad_planar_cuda(unsigned char* dst, int dst_pitch, int dst_w, int dst_h, > + const unsigned char* src, int src_pitch, int src_w, int src_h, > + int roi_x, int roi_y, unsigned char fill_val) > +{ > + const int x = blockIdx.x * blockDim.x + threadIdx.x; > + const int y = blockIdx.y * blockDim.y + threadIdx.y; > + > + if (x >= dst_w || y >= dst_h) { > + return; > + } > + > + if (x >= roi_x && x < (roi_x + src_w) && y >= roi_y && y < (roi_y + src_h)) { > + const int src_x = x - roi_x; > + const int src_y = y - roi_y; > + dst[y * dst_pitch + x] = src[src_y * src_pitch + src_x]; > + } else { > + dst[y * dst_pitch + x] = fill_val; > + } > +} > + > +__global__ void pad_uv_cuda(uchar2* dst, int dst_pitch, int dst_w, int dst_h, > + const uchar2* src, int src_pitch, int src_w, int src_h, > + int roi_x, int roi_y, uchar2 fill_val) > +{ > + const int x = blockIdx.x * blockDim.x + threadIdx.x; > + const int y = blockIdx.y * blockDim.y + threadIdx.y; > + > + if (x >= dst_w || y >= dst_h) { > + return; > + } > + > + const int dst_pitch_uchar2 = dst_pitch / sizeof(uchar2); > + const int src_pitch_uchar2 = src_pitch / sizeof(uchar2); > + > + if (x >= roi_x && x < (roi_x + src_w) && y >= roi_y && y < (roi_y + src_h)) { > + const int src_x = x - roi_x; > + const int src_y = y - roi_y; > + dst[y * dst_pitch_uchar2 + x] = src[src_y * src_pitch_uchar2 + src_x]; > + } else { > + dst[y * dst_pitch_uchar2 + x] = fill_val; > + } > +} You could make a macro or templated function(this is C++ after all) plus macro to easily define this function for any datatype. That'd reduce code doubling here a bit and would also make it easy to add 10/12 bit pixel format support later on. > +} > \ No newline at end of file _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
prev parent reply other threads:[~2025-06-17 21:20 UTC|newest] Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top 2025-06-16 23:39 Jorge Estrada 2025-06-17 21:20 ` Timo Rothenpieler [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=924e11e3-4475-4457-9c65-9f5862be25f0@rothenpieler.org \ --to=timo@rothenpieler.org \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git