* [FFmpeg-devel] [PATCH] avfilter: add pad_cuda filter
@ 2025-06-16 23:39 Jorge Estrada
2025-06-17 21:20 ` Timo Rothenpieler
0 siblings, 1 reply; 2+ messages in thread
From: Jorge Estrada @ 2025-06-16 23:39 UTC (permalink / raw)
To: ffmpeg-devel
From: Jorge Estrada <--global>
This patch adds the pad_cuda video filter. A filter similar to the existing pad filter but accelerated by CUDA.
The filter shares the same options as the software pad filter.
Example usage:
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf "pad_cuda=w=iw+100:h=ih+100:x=-1:y=-1:color=red" out.mp4
---
Changelog | 1 +
configure | 2 +
doc/filters.texi | 82 +++++
libavfilter/Makefile | 1 +
libavfilter/allfilters.c | 1 +
libavfilter/version.h | 2 +-
libavfilter/vf_pad_cuda.c | 650 +++++++++++++++++++++++++++++++++++++
libavfilter/vf_pad_cuda.cu | 65 ++++
8 files changed, 803 insertions(+), 1 deletion(-)
create mode 100644 libavfilter/vf_pad_cuda.c
create mode 100644 libavfilter/vf_pad_cuda.cu
diff --git a/Changelog b/Changelog
index 4217449438..ad7906acf9 100644
--- a/Changelog
+++ b/Changelog
@@ -18,6 +18,7 @@ version <next>:
- APV encoding support through a libopenapv wrapper
- VVC decoder supports all content of SCC (Screen Content Coding):
IBC (Inter Block Copy), Palette Mode and ACT (Adaptive Color Transform
+- pad_cuda filter
version 7.1:
diff --git a/configure b/configure
index 534b443f7d..e68d612867 100755
--- a/configure
+++ b/configure
@@ -3357,6 +3357,8 @@ thumbnail_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
transpose_npp_filter_deps="ffnvcodec libnpp"
overlay_cuda_filter_deps="ffnvcodec"
overlay_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
+pad_cuda_filter_deps="ffnvcodec"
+pad_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
sharpen_npp_filter_deps="ffnvcodec libnpp"
ddagrab_filter_deps="d3d11va IDXGIOutput1 DXGI_OUTDUPL_FRAME_INFO"
diff --git a/doc/filters.texi b/doc/filters.texi
index f32fc23c70..68be03c1ab 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -26682,6 +26682,88 @@ See @ref{framesync}.
This filter also supports the @ref{framesync} options.
+
+@anchor{pad_cuda}
+@subsection pad_cuda
+Add paddings to an input video stream using CUDA.
+
+This filter is the CUDA-accelerated version of the @ref{pad} filter. It accepts CUDA frames as input and the following options:
+
+@table @option
+@item width, w
+@item height, h
+Specify an expression for the size of the output image with the paddings. If the value for @var{width} or @var{height} is 0, the corresponding input size is used for the output.
+
+The default value for both @var{width} and @var{height} is 0.
+
+@item x
+@item y
+Specify the offsets to place the input image at within the padded area, with respect to the top/left border of the output image.
+
+If @var{x} or @var{y} evaluates to a negative number, the input image will be centered on the padded area.
+
+The default value for both @var{x} and @var{y} is 0.
+
+@item color
+Specify the color of the padded area. For the syntax of this option, check the @ref{color syntax,,"Color" section in the ffmpeg-utils manual,ffmpeg-utils}.
+
+The default value of @var{color} is "black".
+
+@item eval
+Specify when to evaluate the @var{width}, @var{height}, @var{x}, and @var{y} expressions.
+
+It accepts the following values:
+@table @samp
+@item init
+Only evaluate expressions once during the filter initialization. (default)
+@item frame
+Evaluate expressions for each incoming frame.
+@end table
+
+@item aspect
+Pad to a specified aspect ratio instead of a fixed resolution.
+@end table
+
+The value for the @var{width}, @var{height}, @var{x}, and @var{y} options are expressions containing the following constants:
+@table @var
+@item in_w, iw
+The input video width.
+@item in_h, ih
+The input video height.
+@item out_w, ow
+The output width (the size of the padded area), as specified by the @var{width} expressions.
+@item out_h, oh
+The output height (the size of the padded area), as specified by the @var{height} expressions.
+@item x
+@item y
+The x and y offsets as specified by the @var{x} and @var{y} expressions.
+@item a
+Same as @code{iw / ih}.
+@item sar
+The input sample aspect ratio.
+@item dar
+The input display aspect ratio, equivalent to @code{(iw / ih) * sar}.
+@item hsub
+@item vsub
+The horizontal and vertical chroma subsample values.
+@end table
+
+@subsubsection Examples
+
+@itemize
+@item
+Add a 200-pixel black border to all sides of a video frame:
+@example
+ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf "pad_cuda=w=iw+400:h=ih+400:x=200:y=200" -c:v h264_nvenc out.mp4
+@end example
+
+@item
+Pad the input video to a 16:9 aspect ratio, filling with the color "blue":
+@example
+ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf "pad_cuda=w=ih*16/9/sar:h=ih:x=(ow-iw)/2:y=(oh-ih)/2:color=blue" -c:v h264_nvenc out.mp4
+@end example
+@end itemize
+
@anchor{scale_cuda}
@subsection scale_cuda
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index 97f8f17272..9e9153f5b0 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -422,6 +422,7 @@ OBJS-$(CONFIG_OVERLAY_VAAPI_FILTER) += vf_overlay_vaapi.o framesync.o v
OBJS-$(CONFIG_OVERLAY_VULKAN_FILTER) += vf_overlay_vulkan.o vulkan.o vulkan_filter.o
OBJS-$(CONFIG_OWDENOISE_FILTER) += vf_owdenoise.o
OBJS-$(CONFIG_PAD_FILTER) += vf_pad.o
+OBJS-$(CONFIG_PAD_CUDA_FILTER) += vf_pad_cuda.o vf_pad_cuda.ptx.o cuda/load_helper.o
OBJS-$(CONFIG_PAD_OPENCL_FILTER) += vf_pad_opencl.o opencl.o opencl/pad.o
OBJS-$(CONFIG_PALETTEGEN_FILTER) += vf_palettegen.o palette.o
OBJS-$(CONFIG_PALETTEUSE_FILTER) += vf_paletteuse.o framesync.o palette.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 3bc045b28f..409099bf1f 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -395,6 +395,7 @@ extern const FFFilter ff_vf_overlay_vulkan;
extern const FFFilter ff_vf_overlay_cuda;
extern const FFFilter ff_vf_owdenoise;
extern const FFFilter ff_vf_pad;
+extern const FFFilter ff_vf_pad_cuda;
extern const FFFilter ff_vf_pad_opencl;
extern const FFFilter ff_vf_palettegen;
extern const FFFilter ff_vf_paletteuse;
diff --git a/libavfilter/version.h b/libavfilter/version.h
index d5a6bc143a..1e884d9b44 100644
--- a/libavfilter/version.h
+++ b/libavfilter/version.h
@@ -31,7 +31,7 @@
#include "version_major.h"
-#define LIBAVFILTER_VERSION_MINOR 0
+#define LIBAVFILTER_VERSION_MINOR 1
#define LIBAVFILTER_VERSION_MICRO 100
diff --git a/libavfilter/vf_pad_cuda.c b/libavfilter/vf_pad_cuda.c
new file mode 100644
index 0000000000..a9a9036d71
--- /dev/null
+++ b/libavfilter/vf_pad_cuda.c
@@ -0,0 +1,650 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * CUDA video padding filter
+ */
+
+#include <float.h>
+#include <vector_types.h>
+
+#include "filters.h"
+#include "libavutil/avstring.h"
+#include "libavutil/common.h"
+#include "libavutil/cuda_check.h"
+#include "libavutil/eval.h"
+#include "libavutil/hwcontext.h"
+#include "libavutil/hwcontext_cuda_internal.h"
+#include "libavutil/imgutils.h"
+#include "libavutil/internal.h"
+#include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
+
+#include "cuda/load_helper.h"
+
+#define CHECK_CU(x) FF_CUDA_CHECK_DL(ctx, device_hwctx->internal->cuda_dl, x)
+#define DIV_UP(a, b) ( ((a) + (b) - 1) / (b) )
+#define BLOCK_X 32
+#define BLOCK_Y 16
+
+static const enum AVPixelFormat supported_formats[] = {
+ AV_PIX_FMT_YUV420P,
+ AV_PIX_FMT_YUV444P,
+ AV_PIX_FMT_YUVA420P,
+ AV_PIX_FMT_YUVA444P,
+ AV_PIX_FMT_NV12,
+};
+
+typedef struct CUDAPadContext {
+ const AVClass *class;
+
+ AVBufferRef *frames_ctx;
+
+ int w, h; ///< output dimensions, a value of 0 will result in the input size
+ int x, y; ///< offsets of the input area with respect to the padded area
+ int in_w, in_h; ///< width and height for the padded input video
+
+ char *w_expr; ///< width expression
+ char *h_expr; ///< height expression
+ char *x_expr; ///< x offset expression
+ char *y_expr; ///< y offset expression
+
+ uint8_t rgba_color[4]; ///< color for the padding area
+ uint8_t parsed_color[4];
+ AVRational aspect;
+
+ int eval_mode;
+
+ int last_out_w, last_out_h; ///< used to evaluate the prior output width and height with the incoming frame
+
+ AVCUDADeviceContext *hwctx;
+ CUmodule cu_module;
+ CUfunction cu_func_planar;
+ CUfunction cu_func_uv;
+} CUDAPadContext;
+
+static const char *const var_names[] = {
+ "in_w", "iw",
+ "in_h", "ih",
+ "out_w", "ow",
+ "out_h", "oh",
+ "x",
+ "y",
+ "a",
+ "sar",
+ "dar",
+ "hsub",
+ "vsub",
+ NULL
+};
+
+enum {
+ VAR_IN_W,
+ VAR_IW,
+ VAR_IN_H,
+ VAR_IH,
+ VAR_OUT_W,
+ VAR_OW,
+ VAR_OUT_H,
+ VAR_OH,
+ VAR_X,
+ VAR_Y,
+ VAR_A,
+ VAR_SAR,
+ VAR_DAR,
+ VAR_HSUB,
+ VAR_VSUB,
+ VARS_NB
+};
+
+enum EvalMode {
+ EVAL_MODE_INIT,
+ EVAL_MODE_FRAME,
+ EVAL_MODE_NB
+};
+
+static int eval_expr(AVFilterContext *ctx)
+{
+ CUDAPadContext *s = ctx->priv;
+ AVFilterLink *inlink = ctx->inputs[0];
+ const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(inlink->format);
+
+ double var_values[VARS_NB], res;
+ char *expr;
+ int ret;
+
+ var_values[VAR_IN_W] = var_values[VAR_IW] = s->in_w;
+ var_values[VAR_IN_H] = var_values[VAR_IH] = s->in_h;
+ var_values[VAR_OUT_W] = var_values[VAR_OW] = NAN;
+ var_values[VAR_OUT_H] = var_values[VAR_OH] = NAN;
+ var_values[VAR_A] = (double)s->in_w / s->in_h;
+ var_values[VAR_SAR] = inlink->sample_aspect_ratio.num ?
+ (double)inlink->sample_aspect_ratio.num /
+ inlink->sample_aspect_ratio.den : 1;
+ var_values[VAR_DAR] = var_values[VAR_A] * var_values[VAR_SAR];
+ var_values[VAR_HSUB] = 1 << desc->log2_chroma_w;
+ var_values[VAR_VSUB] = 1 << desc->log2_chroma_h;
+
+ expr = s->w_expr;
+ ret = av_expr_parse_and_eval(&res, expr, var_names, var_values, NULL, NULL, NULL, NULL, NULL, 0, ctx);
+ if (ret < 0)
+ goto fail;
+
+ s->w = res;
+ if (s->w < 0) {
+ av_log(ctx, AV_LOG_ERROR, "Width expression is negative.\n");
+ ret = AVERROR(EINVAL);
+ goto fail;
+ }
+
+ var_values[VAR_OUT_W] = var_values[VAR_OW] = s->w;
+
+ expr = s->h_expr;
+ ret = av_expr_parse_and_eval(&res, expr, var_names, var_values, NULL, NULL, NULL, NULL, NULL, 0, ctx);
+ if (ret < 0)
+ goto fail;
+
+ s->h = res;
+ if (s->h < 0) {
+ av_log(ctx, AV_LOG_ERROR, "Height expression is negative.\n");
+ ret = AVERROR(EINVAL);
+ goto fail;
+ }
+ var_values[VAR_OUT_H] = var_values[VAR_OH] = s->h;
+
+ if (!s->h)
+ s->h = s->in_h;
+
+ var_values[VAR_OUT_H] = var_values[VAR_OH] = s->h;
+
+
+ expr = s->w_expr;
+ ret = av_expr_parse_and_eval(&res, expr, var_names, var_values, NULL, NULL, NULL, NULL, NULL, 0, ctx);
+ if (ret < 0)
+ goto fail;
+
+ s->w = res;
+ if (s->w < 0) {
+ av_log(ctx, AV_LOG_ERROR, "Width expression is negative.\n");
+ ret = AVERROR(EINVAL);
+ goto fail;
+ }
+ if (!s->w)
+ s->w = s->in_w;
+
+ var_values[VAR_OUT_W] = var_values[VAR_OW] = s->w;
+
+
+ expr = s->x_expr;
+ ret = av_expr_parse_and_eval(&res, expr, var_names, var_values, NULL, NULL, NULL, NULL, NULL, 0, ctx);
+ if (ret < 0)
+ goto fail;
+
+ s->x = res;
+
+
+ expr = s->y_expr;
+ ret = av_expr_parse_and_eval(&res, expr, var_names, var_values, NULL, NULL, NULL, NULL, NULL, 0, ctx);
+ if (ret < 0)
+ goto fail;
+
+ s->y = res;
+
+ if (s->x < 0 || s->x + s->in_w > s->w) {
+ s->x = (s->w - s->in_w) / 2;
+ av_log(ctx, AV_LOG_VERBOSE, "centering X offset.\n");
+ }
+
+ if (s->y < 0 || s->y + s->in_h > s->h) {
+ s->y = (s->h - s->in_h) / 2;
+ av_log(ctx, AV_LOG_VERBOSE, "centering Y offset.\n");
+ }
+
+ s->w = av_clip(s->w, 1, INT_MAX);
+ s->h = av_clip(s->h, 1, INT_MAX);
+
+ if (s->w < s->in_w || s->h < s->in_h) {
+ av_log(ctx, AV_LOG_ERROR, "Padded size < input size.\n");
+ return AVERROR(EINVAL);
+ }
+
+ av_log(ctx, AV_LOG_DEBUG,
+ "w:%d h:%d -> w:%d h:%d x:%d y:%d color:0x%02X%02X%02X%02X\n",
+ inlink->w, inlink->h, s->w, s->h, s->x, s->y, s->rgba_color[0],
+ s->rgba_color[1], s->rgba_color[2], s->rgba_color[3]);
+
+ return 0;
+
+fail:
+ av_log(ctx, AV_LOG_ERROR, "Error evaluating '%s'\n", expr);
+ return ret;
+}
+
+static int cuda_pad_alloc_out_frames_ctx(AVFilterContext *ctx, AVBufferRef **out_frames_ctx, const int width, const int height)
+{
+ AVFilterLink *inlink = ctx->inputs[0];
+ FilterLink *inl = ff_filter_link(inlink);
+ AVHWFramesContext *in_frames_ctx = (AVHWFramesContext *)inl->hw_frames_ctx->data;
+ int ret;
+
+ *out_frames_ctx = av_hwframe_ctx_alloc(in_frames_ctx->device_ref);
+ if (!*out_frames_ctx) {
+ return AVERROR(ENOMEM);
+ }
+
+ AVHWFramesContext *out_fc = (AVHWFramesContext *)(*out_frames_ctx)->data;
+ out_fc->format = AV_PIX_FMT_CUDA;
+ out_fc->sw_format = in_frames_ctx->sw_format;
+
+ out_fc->width = FFALIGN(width, 32);
+ out_fc->height = FFALIGN(height, 32);
+
+ ret = av_hwframe_ctx_init(*out_frames_ctx);
+ if (ret < 0) {
+ av_log(ctx, AV_LOG_ERROR, "Failed to init output ctx\n");
+ av_buffer_unref(out_frames_ctx);
+ return ret;
+ }
+
+ return 0;
+}
+
+static av_cold int cuda_pad_init(AVFilterContext *ctx)
+{
+ CUDAPadContext *s = ctx->priv;
+ if (!s) {
+ av_log(ctx, AV_LOG_ERROR, "Failed to allocate CUDAPadContext.\n");
+ return AVERROR(ENOMEM);
+ }
+
+ s->last_out_w = -1;
+ s->last_out_h = -1;
+
+ return 0;
+}
+
+static av_cold void cuda_pad_uninit(AVFilterContext *ctx)
+{
+ CUDAPadContext *s = ctx->priv;
+ av_buffer_unref(&s->frames_ctx);
+
+ if (s->cu_module) {
+ s->hwctx->internal->cuda_dl->cuModuleUnload(s->cu_module);
+ s->cu_module = NULL;
+ }
+}
+
+static av_cold int cuda_pad_load_functions(AVFilterContext *ctx)
+{
+ CUDAPadContext *s = ctx->priv;
+ CudaFunctions *cu = s->hwctx->internal->cuda_dl;
+ CUcontext dummy_cu_ctx;
+ int ret;
+
+ AVCUDADeviceContext *device_hwctx = s->hwctx;
+
+ extern const unsigned char ff_vf_pad_cuda_ptx_data[];
+ extern const unsigned int ff_vf_pad_cuda_ptx_len;
+
+ ret = CHECK_CU(cu->cuCtxPushCurrent(device_hwctx->cuda_ctx));
+ if (ret < 0)
+ return ret;
+
+ ret = ff_cuda_load_module(ctx, device_hwctx, &s->cu_module,
+ ff_vf_pad_cuda_ptx_data, ff_vf_pad_cuda_ptx_len);
+ if (ret < 0) {
+ av_log(ctx, AV_LOG_ERROR, "Failed to load CUDA module\n");
+ goto end;
+ }
+
+ ret = CHECK_CU(cu->cuModuleGetFunction(&s->cu_func_planar, s->cu_module, "pad_planar_cuda"));
+ if (ret < 0) {
+ av_log(ctx, AV_LOG_ERROR, "Failed to load pad_planar_cuda\n");
+ goto end;
+ }
+
+ ret = CHECK_CU(cu->cuModuleGetFunction(&s->cu_func_uv, s->cu_module, "pad_uv_cuda"));
+ if (ret < 0)
+ av_log(ctx, AV_LOG_ERROR, "Failed to load pad_uv_cuda\n");
+
+end:
+ CHECK_CU(cu->cuCtxPopCurrent(&dummy_cu_ctx));
+
+ return ret;
+}
+
+static int cuda_pad_config_props(AVFilterLink *outlink)
+{
+ AVFilterContext *ctx = outlink->src;
+ CUDAPadContext *s = ctx->priv;
+
+ AVFilterLink *inlink = ctx->inputs[0];
+ FilterLink *inl = ff_filter_link(inlink);
+
+ FilterLink *ol = ff_filter_link(outlink);
+
+ AVHWFramesContext *in_frames_ctx;
+ int format_supported = 0;
+ int ret;
+
+ s->in_w = inlink->w;
+ s->in_h = inlink->h;
+ ret = eval_expr(ctx);
+ if (ret < 0)
+ return ret;
+
+ if (!inl->hw_frames_ctx) {
+ av_log(ctx, AV_LOG_ERROR, "No hw context provided on input\n");
+ return AVERROR(EINVAL);
+ }
+
+ in_frames_ctx = (AVHWFramesContext *)inl->hw_frames_ctx->data;
+ s->hwctx = in_frames_ctx->device_ctx->hwctx;
+
+ for (int i = 0; i < FF_ARRAY_ELEMS(supported_formats); i++) {
+ if (in_frames_ctx->sw_format == supported_formats[i]) {
+ format_supported = 1;
+ break;
+ }
+ }
+ if (!format_supported) {
+ av_log(ctx, AV_LOG_ERROR, "Unsupported input format.\n");
+ return AVERROR(EINVAL);
+ }
+
+ uint8_t R = s->rgba_color[0];
+ uint8_t G = s->rgba_color[1];
+ uint8_t B = s->rgba_color[2];
+
+ int Y = (( 66 * R + 129 * G + 25 * B + 128) >> 8) + 16;
+ int U = ((-38 * R - 74 * G + 112 * B + 128) >> 8) + 128;
+ int V = ((112 * R - 94 * G - 18 * B + 128) >> 8) + 128;
+ s->parsed_color[0] = av_clip_uint8(Y);
+ s->parsed_color[1] = av_clip_uint8(U);
+ s->parsed_color[2] = av_clip_uint8(V);
+ s->parsed_color[3] = s->rgba_color[3];
+
+ ret = cuda_pad_alloc_out_frames_ctx(ctx, &s->frames_ctx, s->w, s->h);
+ if (ret < 0)
+ return ret;
+
+ ol->hw_frames_ctx = av_buffer_ref(s->frames_ctx);
+ if (!ol->hw_frames_ctx)
+ return AVERROR(ENOMEM);
+
+ outlink->w = s->w;
+ outlink->h = s->h;
+ outlink->time_base = inlink->time_base;
+ outlink->format = AV_PIX_FMT_CUDA;
+
+ s->last_out_w = s->w;
+ s->last_out_h = s->h;
+
+ ret = cuda_pad_load_functions(ctx);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static int cuda_pad_pad(AVFilterContext *ctx, AVFrame *out, const AVFrame *in)
+{
+ CUDAPadContext *s = ctx->priv;
+ FilterLink *inl = ff_filter_link(ctx->inputs[0]);
+
+ AVHWFramesContext *in_frames_ctx =
+ (AVHWFramesContext *)inl->hw_frames_ctx->data;
+ const AVPixFmtDescriptor *desc_in =
+ av_pix_fmt_desc_get(in_frames_ctx->sw_format);
+
+ CudaFunctions *cu = s->hwctx->internal->cuda_dl;
+ AVCUDADeviceContext *device_hwctx = s->hwctx;
+ int ret;
+
+
+ const int nb_planes = av_pix_fmt_count_planes(in_frames_ctx->sw_format);
+ for (int plane = 0; plane < nb_planes; plane++) {
+
+ int hsub = (plane == 1 || plane == 2) ? desc_in->log2_chroma_w : 0;
+ int vsub = (plane == 1 || plane == 2) ? desc_in->log2_chroma_h : 0;
+
+ if (in_frames_ctx->sw_format == AV_PIX_FMT_NV12 && plane == 1) {
+ hsub = desc_in->log2_chroma_w;
+ vsub = desc_in->log2_chroma_h;
+ }
+
+ int src_w = AV_CEIL_RSHIFT(s->in_w, hsub);
+ int src_h = AV_CEIL_RSHIFT(s->in_h, vsub);
+
+ int dst_w = AV_CEIL_RSHIFT(s->w, hsub);
+ int dst_h = AV_CEIL_RSHIFT(s->h, vsub);
+
+ int y_plane_offset = AV_CEIL_RSHIFT(s->y, vsub);
+ int x_plane_offset = AV_CEIL_RSHIFT(s->x, hsub);
+
+ if (x_plane_offset + src_w > dst_w || y_plane_offset + src_h > dst_h) {
+ av_log(ctx, AV_LOG_ERROR,
+ "ROI out of bounds in plane %d: offset=(%d,%d) in=(%dx%d) "
+ "out=(%dx%d)\n",
+ plane, x_plane_offset, y_plane_offset, src_w, src_h, dst_w, dst_h);
+ return AVERROR(EINVAL);
+ }
+
+ CUfunction cuda_func;
+ uchar2 fill_val_uv;
+ uint8_t fill_val_planar;
+ void *p_fill_val;
+
+ if (in_frames_ctx->sw_format == AV_PIX_FMT_NV12 && plane == 1) {
+ cuda_func = s->cu_func_uv;
+ fill_val_uv.x = s->parsed_color[1];
+ fill_val_uv.y = s->parsed_color[2];
+ p_fill_val = &fill_val_uv;
+ } else {
+ cuda_func = s->cu_func_planar;
+ fill_val_planar = s->parsed_color[plane];
+ p_fill_val = &fill_val_planar;
+ }
+
+ CUdeviceptr d_dst = (CUdeviceptr)out->data[plane];
+ CUdeviceptr d_src = (CUdeviceptr)in->data[plane];
+
+ int src_linesize = in->linesize[plane];
+
+ void *kernel_args[] = {
+ &d_dst, &out->linesize[plane], &dst_w, &dst_h,
+ &d_src, &src_linesize, &src_w, &src_h,
+ &x_plane_offset, &y_plane_offset, p_fill_val
+ };
+
+ unsigned int grid_x = DIV_UP(dst_w, BLOCK_X);
+ unsigned int grid_y = DIV_UP(dst_h, BLOCK_Y);
+
+ ret = CHECK_CU(cu->cuLaunchKernel(cuda_func, grid_x, grid_y, 1,
+ BLOCK_X, BLOCK_Y, 1,
+ 0, s->hwctx->stream, kernel_args, NULL));
+
+ if (ret < 0) {
+ av_log(ctx, AV_LOG_ERROR, "Failed to launch kernel for plane %d\n", plane);
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
+static int cuda_pad_filter_frame(AVFilterLink *inlink, AVFrame *in)
+{
+ AVFilterContext *ctx = inlink->dst;
+ CUDAPadContext *s = ctx->priv;
+ AVFilterLink *outlink = ctx->outputs[0];
+
+ FilterLink *outl = ff_filter_link(outlink);
+
+ AVHWFramesContext *out_frames_ctx =
+ (AVHWFramesContext *)outl->hw_frames_ctx->data;
+ AVCUDADeviceContext *device_hwctx = out_frames_ctx->device_ctx->hwctx;
+
+ int ret;
+
+ if (s->eval_mode == EVAL_MODE_FRAME) {
+ s->in_w = in->width;
+ s->in_h = in->height;
+ s->aspect = in->sample_aspect_ratio;
+
+ ret = eval_expr(ctx);
+ if (ret < 0) {
+ av_frame_free(&in);
+ return ret;
+ }
+ }
+
+
+ if (s->x == 0 && s->y == 0 &&
+ s->w == in->width && s->h == in->height) {
+ av_log(ctx, AV_LOG_DEBUG, "No border. Passing the frame unmodified.\n");
+ s->last_out_w = s->w;
+ s->last_out_h = s->h;
+ return ff_filter_frame(outlink, in);
+ }
+
+
+ if (s->w != s->last_out_w || s->h != s->last_out_h) {
+
+ av_buffer_unref(&s->frames_ctx);
+
+ ret = cuda_pad_alloc_out_frames_ctx(ctx, &s->frames_ctx, s->w, s->h);
+ if (ret < 0)
+ return ret;
+
+ av_buffer_unref(&outl->hw_frames_ctx);
+ outl->hw_frames_ctx = av_buffer_ref(s->frames_ctx);
+ if (!outl->hw_frames_ctx) {
+ av_frame_free(&in);
+ av_log(ctx, AV_LOG_ERROR, "Failed to allocate output frame context.\n");
+ return AVERROR(ENOMEM);
+ }
+ outlink->w = s->w;
+ outlink->h = s->h;
+
+ s->last_out_w = s->w;
+ s->last_out_h = s->h;
+ }
+
+ AVFrame *out = av_frame_alloc();
+ if (!out) {
+ av_frame_free(&in);
+ av_log(ctx, AV_LOG_ERROR, "Failed to allocate output AVFrame.\n");
+ return AVERROR(ENOMEM);
+ }
+ ret = av_hwframe_get_buffer(outl->hw_frames_ctx, out, 0);
+ if (ret < 0) {
+ av_log(ctx, AV_LOG_ERROR, "Unable to get output buffer: %s\n",
+ av_err2str(ret));
+ av_frame_free(&out);
+ av_frame_free(&in);
+ return ret;
+ }
+
+ CUcontext dummy;
+ ret = CHECK_CU(device_hwctx->internal->cuda_dl->cuCtxPushCurrent(
+ device_hwctx->cuda_ctx));
+ if (ret < 0) {
+ av_frame_free(&out);
+ av_frame_free(&in);
+ return ret;
+ }
+
+ ret = cuda_pad_pad(ctx, out, in);
+
+ CHECK_CU(device_hwctx->internal->cuda_dl->cuCtxPopCurrent(&dummy));
+
+ if (ret < 0) {
+ av_frame_free(&out);
+ av_frame_free(&in);
+ return ret;
+ }
+
+ av_frame_copy_props(out, in);
+ out->width = s->w;
+ out->height = s->h;
+
+
+ av_reduce(&out->sample_aspect_ratio.num, &out->sample_aspect_ratio.den,
+ (int64_t)in->sample_aspect_ratio.num * out->height * in->width,
+ (int64_t)in->sample_aspect_ratio.den * out->width * in->height,
+ INT_MAX);
+
+ av_frame_free(&in);
+ return ff_filter_frame(outlink, out);
+}
+
+#define OFFSET(x) offsetof(CUDAPadContext, x)
+#define FLAGS (AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM)
+
+static const AVOption cuda_pad_options[] = {
+ { "width", "set the pad area width expression", OFFSET(w_expr), AV_OPT_TYPE_STRING, {.str = "iw"}, 0, 0, FLAGS },
+ { "w", "set the pad area width expression", OFFSET(w_expr), AV_OPT_TYPE_STRING, {.str = "iw"}, 0, 0, FLAGS },
+ { "height", "set the pad area height expression", OFFSET(h_expr), AV_OPT_TYPE_STRING, {.str = "ih"}, 0, 0, FLAGS },
+ { "h", "set the pad area height expression", OFFSET(h_expr), AV_OPT_TYPE_STRING, {.str = "ih"}, 0, 0, FLAGS },
+ { "x", "set the x offset expression for the input image position", OFFSET(x_expr), AV_OPT_TYPE_STRING, {.str = "0"}, 0, 0, FLAGS },
+ { "y", "set the y offset expression for the input image position", OFFSET(y_expr), AV_OPT_TYPE_STRING, {.str = "0"}, 0, 0, FLAGS },
+ { "color", "set the color of the padded area border", OFFSET(rgba_color), AV_OPT_TYPE_COLOR, {.str = "black"}, .flags = FLAGS },
+ { "eval", "specify when to evaluate expressions", OFFSET(eval_mode), AV_OPT_TYPE_INT, {.i64 = EVAL_MODE_INIT}, 0, EVAL_MODE_NB-1, FLAGS, .unit = "eval" },
+ { "init", "eval expressions once during initialization", 0, AV_OPT_TYPE_CONST, {.i64=EVAL_MODE_INIT}, .flags = FLAGS, .unit = "eval" },
+ { "frame", "eval expressions during initialization and per-frame", 0, AV_OPT_TYPE_CONST, {.i64=EVAL_MODE_FRAME}, .flags = FLAGS, .unit = "eval" },
+ { "aspect", "pad to fit an aspect instead of a resolution", OFFSET(aspect), AV_OPT_TYPE_RATIONAL, {.dbl = 0}, 0, DBL_MAX, FLAGS },
+ { NULL }
+};
+
+static const AVClass cuda_pad_class = {
+ .class_name = "pad_cuda",
+ .item_name = av_default_item_name,
+ .option = cuda_pad_options,
+ .version = LIBAVUTIL_VERSION_INT,
+};
+
+static const AVFilterPad cuda_pad_inputs[] = {{
+ .name = "default",
+ .type = AVMEDIA_TYPE_VIDEO,
+ .filter_frame = cuda_pad_filter_frame
+}};
+
+static const AVFilterPad cuda_pad_outputs[] = {{
+ .name = "default",
+ .type = AVMEDIA_TYPE_VIDEO,
+ .config_props = cuda_pad_config_props,
+}};
+
+const FFFilter ff_vf_pad_cuda = {
+ .p.name = "pad_cuda",
+ .p.description = NULL_IF_CONFIG_SMALL("CUDA-based GPU padding filter"),
+ .init = cuda_pad_init,
+ .uninit = cuda_pad_uninit,
+
+ .p.priv_class = &cuda_pad_class,
+
+ FILTER_INPUTS(cuda_pad_inputs),
+ FILTER_OUTPUTS(cuda_pad_outputs),
+
+ FILTER_SINGLE_PIXFMT(AV_PIX_FMT_CUDA),
+
+ .priv_size = sizeof(CUDAPadContext),
+ .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE,
+};
\ No newline at end of file
diff --git a/libavfilter/vf_pad_cuda.cu b/libavfilter/vf_pad_cuda.cu
new file mode 100644
index 0000000000..4524372e17
--- /dev/null
+++ b/libavfilter/vf_pad_cuda.cu
@@ -0,0 +1,65 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+
+extern "C" {
+
+__global__ void pad_planar_cuda(unsigned char* dst, int dst_pitch, int dst_w, int dst_h,
+ const unsigned char* src, int src_pitch, int src_w, int src_h,
+ int roi_x, int roi_y, unsigned char fill_val)
+{
+ const int x = blockIdx.x * blockDim.x + threadIdx.x;
+ const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+ if (x >= dst_w || y >= dst_h) {
+ return;
+ }
+
+ if (x >= roi_x && x < (roi_x + src_w) && y >= roi_y && y < (roi_y + src_h)) {
+ const int src_x = x - roi_x;
+ const int src_y = y - roi_y;
+ dst[y * dst_pitch + x] = src[src_y * src_pitch + src_x];
+ } else {
+ dst[y * dst_pitch + x] = fill_val;
+ }
+}
+
+__global__ void pad_uv_cuda(uchar2* dst, int dst_pitch, int dst_w, int dst_h,
+ const uchar2* src, int src_pitch, int src_w, int src_h,
+ int roi_x, int roi_y, uchar2 fill_val)
+{
+ const int x = blockIdx.x * blockDim.x + threadIdx.x;
+ const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+ if (x >= dst_w || y >= dst_h) {
+ return;
+ }
+
+ const int dst_pitch_uchar2 = dst_pitch / sizeof(uchar2);
+ const int src_pitch_uchar2 = src_pitch / sizeof(uchar2);
+
+ if (x >= roi_x && x < (roi_x + src_w) && y >= roi_y && y < (roi_y + src_h)) {
+ const int src_x = x - roi_x;
+ const int src_y = y - roi_y;
+ dst[y * dst_pitch_uchar2 + x] = src[src_y * src_pitch_uchar2 + src_x];
+ } else {
+ dst[y * dst_pitch_uchar2 + x] = fill_val;
+ }
+}
+
+}
\ No newline at end of file
--
2.34.1
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [FFmpeg-devel] [PATCH] avfilter: add pad_cuda filter
2025-06-16 23:39 [FFmpeg-devel] [PATCH] avfilter: add pad_cuda filter Jorge Estrada
@ 2025-06-17 21:20 ` Timo Rothenpieler
0 siblings, 0 replies; 2+ messages in thread
From: Timo Rothenpieler @ 2025-06-17 21:20 UTC (permalink / raw)
To: ffmpeg-devel
On 17.06.2025 01:39, Jorge Estrada wrote:
> From: Jorge Estrada <--global>
>
> This patch adds the pad_cuda video filter. A filter similar to the existing pad filter but accelerated by CUDA.
>
> The filter shares the same options as the software pad filter.
>
> Example usage:
> ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf "pad_cuda=w=iw+100:h=ih+100:x=-1:y=-1:color=red" out.mp4
> ---
> Changelog | 1 +
> configure | 2 +
> doc/filters.texi | 82 +++++
> libavfilter/Makefile | 1 +
> libavfilter/allfilters.c | 1 +
> libavfilter/version.h | 2 +-
> libavfilter/vf_pad_cuda.c | 650 +++++++++++++++++++++++++++++++++++++
> libavfilter/vf_pad_cuda.cu | 65 ++++
> 8 files changed, 803 insertions(+), 1 deletion(-)
> create mode 100644 libavfilter/vf_pad_cuda.c
> create mode 100644 libavfilter/vf_pad_cuda.cu
>
> diff --git a/Changelog b/Changelog
> index 4217449438..ad7906acf9 100644
> --- a/Changelog
> +++ b/Changelog
> @@ -18,6 +18,7 @@ version <next>:
> - APV encoding support through a libopenapv wrapper
> - VVC decoder supports all content of SCC (Screen Content Coding):
> IBC (Inter Block Copy), Palette Mode and ACT (Adaptive Color Transform
> +- pad_cuda filter
>
>
> version 7.1:
> diff --git a/configure b/configure
> index 534b443f7d..e68d612867 100755
> --- a/configure
> +++ b/configure
> @@ -3357,6 +3357,8 @@ thumbnail_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
> transpose_npp_filter_deps="ffnvcodec libnpp"
> overlay_cuda_filter_deps="ffnvcodec"
> overlay_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
> +pad_cuda_filter_deps="ffnvcodec"
> +pad_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
> sharpen_npp_filter_deps="ffnvcodec libnpp"
>
> ddagrab_filter_deps="d3d11va IDXGIOutput1 DXGI_OUTDUPL_FRAME_INFO"
> diff --git a/doc/filters.texi b/doc/filters.texi
> index f32fc23c70..68be03c1ab 100644
> --- a/doc/filters.texi
> +++ b/doc/filters.texi
> @@ -26682,6 +26682,88 @@ See @ref{framesync}.
>
> This filter also supports the @ref{framesync} options.
>
> +
> +@anchor{pad_cuda}
> +@subsection pad_cuda
> +Add paddings to an input video stream using CUDA.
> +
> +This filter is the CUDA-accelerated version of the @ref{pad} filter. It accepts CUDA frames as input and the following options:
> +
> +@table @option
> +@item width, w
> +@item height, h
> +Specify an expression for the size of the output image with the paddings. If the value for @var{width} or @var{height} is 0, the corresponding input size is used for the output.
> +
> +The default value for both @var{width} and @var{height} is 0.
> +
> +@item x
> +@item y
> +Specify the offsets to place the input image at within the padded area, with respect to the top/left border of the output image.
> +
> +If @var{x} or @var{y} evaluates to a negative number, the input image will be centered on the padded area.
> +
> +The default value for both @var{x} and @var{y} is 0.
> +
> +@item color
> +Specify the color of the padded area. For the syntax of this option, check the @ref{color syntax,,"Color" section in the ffmpeg-utils manual,ffmpeg-utils}.
> +
> +The default value of @var{color} is "black".
> +
> +@item eval
> +Specify when to evaluate the @var{width}, @var{height}, @var{x}, and @var{y} expressions.
> +
> +It accepts the following values:
> +@table @samp
> +@item init
> +Only evaluate expressions once during the filter initialization. (default)
> +@item frame
> +Evaluate expressions for each incoming frame.
> +@end table
> +
> +@item aspect
> +Pad to a specified aspect ratio instead of a fixed resolution.
> +@end table
> +
> +The value for the @var{width}, @var{height}, @var{x}, and @var{y} options are expressions containing the following constants:
> +@table @var
> +@item in_w, iw
> +The input video width.
> +@item in_h, ih
> +The input video height.
> +@item out_w, ow
> +The output width (the size of the padded area), as specified by the @var{width} expressions.
> +@item out_h, oh
> +The output height (the size of the padded area), as specified by the @var{height} expressions.
> +@item x
> +@item y
> +The x and y offsets as specified by the @var{x} and @var{y} expressions.
> +@item a
> +Same as @code{iw / ih}.
> +@item sar
> +The input sample aspect ratio.
> +@item dar
> +The input display aspect ratio, equivalent to @code{(iw / ih) * sar}.
> +@item hsub
> +@item vsub
> +The horizontal and vertical chroma subsample values.
> +@end table
> +
> +@subsubsection Examples
> +
> +@itemize
> +@item
> +Add a 200-pixel black border to all sides of a video frame:
> +@example
> +ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf "pad_cuda=w=iw+400:h=ih+400:x=200:y=200" -c:v h264_nvenc out.mp4
> +@end example
> +
> +@item
> +Pad the input video to a 16:9 aspect ratio, filling with the color "blue":
> +@example
> +ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf "pad_cuda=w=ih*16/9/sar:h=ih:x=(ow-iw)/2:y=(oh-ih)/2:color=blue" -c:v h264_nvenc out.mp4
> +@end example
> +@end itemize
> +
I'm not sure if it's neccesary to fully document this filter, given it's
identical to vf_pad.
You could just say that it's identical to vf_pad usage, and then only
describe any potential differences.
> @anchor{scale_cuda}
> @subsection scale_cuda
>
> diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> index 97f8f17272..9e9153f5b0 100644
> --- a/libavfilter/Makefile
> +++ b/libavfilter/Makefile
> @@ -422,6 +422,7 @@ OBJS-$(CONFIG_OVERLAY_VAAPI_FILTER) += vf_overlay_vaapi.o framesync.o v
> OBJS-$(CONFIG_OVERLAY_VULKAN_FILTER) += vf_overlay_vulkan.o vulkan.o vulkan_filter.o
> OBJS-$(CONFIG_OWDENOISE_FILTER) += vf_owdenoise.o
> OBJS-$(CONFIG_PAD_FILTER) += vf_pad.o
> +OBJS-$(CONFIG_PAD_CUDA_FILTER) += vf_pad_cuda.o vf_pad_cuda.ptx.o cuda/load_helper.o
> OBJS-$(CONFIG_PAD_OPENCL_FILTER) += vf_pad_opencl.o opencl.o opencl/pad.o
> OBJS-$(CONFIG_PALETTEGEN_FILTER) += vf_palettegen.o palette.o
> OBJS-$(CONFIG_PALETTEUSE_FILTER) += vf_paletteuse.o framesync.o palette.o
> diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> index 3bc045b28f..409099bf1f 100644
> --- a/libavfilter/allfilters.c
> +++ b/libavfilter/allfilters.c
> @@ -395,6 +395,7 @@ extern const FFFilter ff_vf_overlay_vulkan;
> extern const FFFilter ff_vf_overlay_cuda;
> extern const FFFilter ff_vf_owdenoise;
> extern const FFFilter ff_vf_pad;
> +extern const FFFilter ff_vf_pad_cuda;
> extern const FFFilter ff_vf_pad_opencl;
> extern const FFFilter ff_vf_palettegen;
> extern const FFFilter ff_vf_paletteuse;
> diff --git a/libavfilter/version.h b/libavfilter/version.h
> index d5a6bc143a..1e884d9b44 100644
> --- a/libavfilter/version.h
> +++ b/libavfilter/version.h
> @@ -31,7 +31,7 @@
>
> #include "version_major.h"
>
> -#define LIBAVFILTER_VERSION_MINOR 0
> +#define LIBAVFILTER_VERSION_MINOR 1
> #define LIBAVFILTER_VERSION_MICRO 100
>
>
> diff --git a/libavfilter/vf_pad_cuda.c b/libavfilter/vf_pad_cuda.c
> new file mode 100644
> index 0000000000..a9a9036d71
> --- /dev/null
> +++ b/libavfilter/vf_pad_cuda.c
> @@ -0,0 +1,650 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file
> + * CUDA video padding filter
> + */
> +
> +#include <float.h>
> +#include <vector_types.h>
> +
> +#include "filters.h"
> +#include "libavutil/avstring.h"
> +#include "libavutil/common.h"
> +#include "libavutil/cuda_check.h"
> +#include "libavutil/eval.h"
> +#include "libavutil/hwcontext.h"
> +#include "libavutil/hwcontext_cuda_internal.h"
> +#include "libavutil/imgutils.h"
> +#include "libavutil/internal.h"
> +#include "libavutil/opt.h"
> +#include "libavutil/pixdesc.h"
> +
> +#include "cuda/load_helper.h"
> +
> +#define CHECK_CU(x) FF_CUDA_CHECK_DL(ctx, device_hwctx->internal->cuda_dl, x)
> +#define DIV_UP(a, b) ( ((a) + (b) - 1) / (b) )
> +#define BLOCK_X 32
> +#define BLOCK_Y 16
> +
> +static const enum AVPixelFormat supported_formats[] = {
> + AV_PIX_FMT_YUV420P,
> + AV_PIX_FMT_YUV444P,
> + AV_PIX_FMT_YUVA420P,
> + AV_PIX_FMT_YUVA444P,
> + AV_PIX_FMT_NV12,
> +};
> +
> +typedef struct CUDAPadContext {
> + const AVClass *class;
> +
> + AVBufferRef *frames_ctx;
> +
> + int w, h; ///< output dimensions, a value of 0 will result in the input size
> + int x, y; ///< offsets of the input area with respect to the padded area
> + int in_w, in_h; ///< width and height for the padded input video
> +
> + char *w_expr; ///< width expression
> + char *h_expr; ///< height expression
> + char *x_expr; ///< x offset expression
> + char *y_expr; ///< y offset expression
> +
> + uint8_t rgba_color[4]; ///< color for the padding area
> + uint8_t parsed_color[4];
> + AVRational aspect;
> +
> + int eval_mode;
> +
> + int last_out_w, last_out_h; ///< used to evaluate the prior output width and height with the incoming frame
> +
> + AVCUDADeviceContext *hwctx;
> + CUmodule cu_module;
> + CUfunction cu_func_planar;
> + CUfunction cu_func_uv;
> +} CUDAPadContext;
> +
> +static const char *const var_names[] = {
> + "in_w", "iw",
> + "in_h", "ih",
> + "out_w", "ow",
> + "out_h", "oh",
> + "x",
> + "y",
> + "a",
> + "sar",
> + "dar",
> + "hsub",
> + "vsub",
> + NULL
> +};
> +
> +enum {
> + VAR_IN_W,
> + VAR_IW,
> + VAR_IN_H,
> + VAR_IH,
> + VAR_OUT_W,
> + VAR_OW,
> + VAR_OUT_H,
> + VAR_OH,
> + VAR_X,
> + VAR_Y,
> + VAR_A,
> + VAR_SAR,
> + VAR_DAR,
> + VAR_HSUB,
> + VAR_VSUB,
> + VARS_NB
> +};
> +
> +enum EvalMode {
> + EVAL_MODE_INIT,
> + EVAL_MODE_FRAME,
> + EVAL_MODE_NB
> +};
> +
> +static int eval_expr(AVFilterContext *ctx)
> +{
> + CUDAPadContext *s = ctx->priv;
> + AVFilterLink *inlink = ctx->inputs[0];
> + const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(inlink->format);
> +
> + double var_values[VARS_NB], res;
> + char *expr;
> + int ret;
> +
> + var_values[VAR_IN_W] = var_values[VAR_IW] = s->in_w;
> + var_values[VAR_IN_H] = var_values[VAR_IH] = s->in_h;
> + var_values[VAR_OUT_W] = var_values[VAR_OW] = NAN;
> + var_values[VAR_OUT_H] = var_values[VAR_OH] = NAN;
> + var_values[VAR_A] = (double)s->in_w / s->in_h;
> + var_values[VAR_SAR] = inlink->sample_aspect_ratio.num ?
> + (double)inlink->sample_aspect_ratio.num /
> + inlink->sample_aspect_ratio.den : 1;
> + var_values[VAR_DAR] = var_values[VAR_A] * var_values[VAR_SAR];
> + var_values[VAR_HSUB] = 1 << desc->log2_chroma_w;
> + var_values[VAR_VSUB] = 1 << desc->log2_chroma_h;
> +
> + expr = s->w_expr;
> + ret = av_expr_parse_and_eval(&res, expr, var_names, var_values, NULL, NULL, NULL, NULL, NULL, 0, ctx);
> + if (ret < 0)
> + goto fail;
> +
> + s->w = res;
> + if (s->w < 0) {
> + av_log(ctx, AV_LOG_ERROR, "Width expression is negative.\n");
> + ret = AVERROR(EINVAL);
> + goto fail;
> + }
> +
> + var_values[VAR_OUT_W] = var_values[VAR_OW] = s->w;
> +
> + expr = s->h_expr;
> + ret = av_expr_parse_and_eval(&res, expr, var_names, var_values, NULL, NULL, NULL, NULL, NULL, 0, ctx);
> + if (ret < 0)
> + goto fail;
> +
> + s->h = res;
> + if (s->h < 0) {
> + av_log(ctx, AV_LOG_ERROR, "Height expression is negative.\n");
> + ret = AVERROR(EINVAL);
> + goto fail;
> + }
> + var_values[VAR_OUT_H] = var_values[VAR_OH] = s->h;
> +
> + if (!s->h)
> + s->h = s->in_h;
> +
> + var_values[VAR_OUT_H] = var_values[VAR_OH] = s->h;
> +
> +
> + expr = s->w_expr;
> + ret = av_expr_parse_and_eval(&res, expr, var_names, var_values, NULL, NULL, NULL, NULL, NULL, 0, ctx);
> + if (ret < 0)
> + goto fail;
> +
> + s->w = res;
> + if (s->w < 0) {
> + av_log(ctx, AV_LOG_ERROR, "Width expression is negative.\n");
> + ret = AVERROR(EINVAL);
> + goto fail;
> + }
> + if (!s->w)
> + s->w = s->in_w;
> +
> + var_values[VAR_OUT_W] = var_values[VAR_OW] = s->w;
> +
> +
> + expr = s->x_expr;
> + ret = av_expr_parse_and_eval(&res, expr, var_names, var_values, NULL, NULL, NULL, NULL, NULL, 0, ctx);
> + if (ret < 0)
> + goto fail;
> +
> + s->x = res;
> +
> +
> + expr = s->y_expr;
> + ret = av_expr_parse_and_eval(&res, expr, var_names, var_values, NULL, NULL, NULL, NULL, NULL, 0, ctx);
> + if (ret < 0)
> + goto fail;
> +
> + s->y = res;
> +
> + if (s->x < 0 || s->x + s->in_w > s->w) {
> + s->x = (s->w - s->in_w) / 2;
> + av_log(ctx, AV_LOG_VERBOSE, "centering X offset.\n");
> + }
> +
> + if (s->y < 0 || s->y + s->in_h > s->h) {
> + s->y = (s->h - s->in_h) / 2;
> + av_log(ctx, AV_LOG_VERBOSE, "centering Y offset.\n");
> + }
> +
> + s->w = av_clip(s->w, 1, INT_MAX);
> + s->h = av_clip(s->h, 1, INT_MAX);
> +
> + if (s->w < s->in_w || s->h < s->in_h) {
> + av_log(ctx, AV_LOG_ERROR, "Padded size < input size.\n");
> + return AVERROR(EINVAL);
> + }
> +
> + av_log(ctx, AV_LOG_DEBUG,
> + "w:%d h:%d -> w:%d h:%d x:%d y:%d color:0x%02X%02X%02X%02X\n",
> + inlink->w, inlink->h, s->w, s->h, s->x, s->y, s->rgba_color[0],
> + s->rgba_color[1], s->rgba_color[2], s->rgba_color[3]);
> +
> + return 0;
> +
> +fail:
> + av_log(ctx, AV_LOG_ERROR, "Error evaluating '%s'\n", expr);
> + return ret;
> +}
Something I'm wondering with this being the 4th effective duplication of
this exact logic, if it couldn't be pulled out of vf_pad somehow, so all
the other pad filters can re-use it.
That's not a problem for this patch to solve though.
> +static int cuda_pad_alloc_out_frames_ctx(AVFilterContext *ctx, AVBufferRef **out_frames_ctx, const int width, const int height)
> +{
> + AVFilterLink *inlink = ctx->inputs[0];
> + FilterLink *inl = ff_filter_link(inlink);
> + AVHWFramesContext *in_frames_ctx = (AVHWFramesContext *)inl->hw_frames_ctx->data;
> + int ret;
> +
> + *out_frames_ctx = av_hwframe_ctx_alloc(in_frames_ctx->device_ref);
> + if (!*out_frames_ctx) {
> + return AVERROR(ENOMEM);
> + }
> +
> + AVHWFramesContext *out_fc = (AVHWFramesContext *)(*out_frames_ctx)->data;
> + out_fc->format = AV_PIX_FMT_CUDA;
> + out_fc->sw_format = in_frames_ctx->sw_format;
> +
> + out_fc->width = FFALIGN(width, 32);
> + out_fc->height = FFALIGN(height, 32);
> +
> + ret = av_hwframe_ctx_init(*out_frames_ctx);
> + if (ret < 0) {
> + av_log(ctx, AV_LOG_ERROR, "Failed to init output ctx\n");
> + av_buffer_unref(out_frames_ctx);
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +static av_cold int cuda_pad_init(AVFilterContext *ctx)
> +{
> + CUDAPadContext *s = ctx->priv;
> + if (!s) {
> + av_log(ctx, AV_LOG_ERROR, "Failed to allocate CUDAPadContext.\n");
> + return AVERROR(ENOMEM);
> + }
I don't think I've ever seen a check like this.
Pretty sure if the allocation for this failed, we'd have never gotten here.
> + s->last_out_w = -1;
> + s->last_out_h = -1;
> +
> + return 0;
> +}
> +
> +static av_cold void cuda_pad_uninit(AVFilterContext *ctx)
> +{
> + CUDAPadContext *s = ctx->priv;
> + av_buffer_unref(&s->frames_ctx);
> +
> + if (s->cu_module) {
> + s->hwctx->internal->cuda_dl->cuModuleUnload(s->cu_module);
Missing push/pop current context here.
> + s->cu_module = NULL;
> + }
> +}
> +
> +static av_cold int cuda_pad_load_functions(AVFilterContext *ctx)
> +{
> + CUDAPadContext *s = ctx->priv;
> + CudaFunctions *cu = s->hwctx->internal->cuda_dl;
> + CUcontext dummy_cu_ctx;
> + int ret;
> +
> + AVCUDADeviceContext *device_hwctx = s->hwctx;
> +
> + extern const unsigned char ff_vf_pad_cuda_ptx_data[];
> + extern const unsigned int ff_vf_pad_cuda_ptx_len;
> +
> + ret = CHECK_CU(cu->cuCtxPushCurrent(device_hwctx->cuda_ctx));
> + if (ret < 0)
> + return ret;
> +
> + ret = ff_cuda_load_module(ctx, device_hwctx, &s->cu_module,
> + ff_vf_pad_cuda_ptx_data, ff_vf_pad_cuda_ptx_len);
> + if (ret < 0) {
> + av_log(ctx, AV_LOG_ERROR, "Failed to load CUDA module\n");
> + goto end;
> + }
> +
> + ret = CHECK_CU(cu->cuModuleGetFunction(&s->cu_func_planar, s->cu_module, "pad_planar_cuda"));
> + if (ret < 0) {
> + av_log(ctx, AV_LOG_ERROR, "Failed to load pad_planar_cuda\n");
> + goto end;
> + }
> +
> + ret = CHECK_CU(cu->cuModuleGetFunction(&s->cu_func_uv, s->cu_module, "pad_uv_cuda"));
> + if (ret < 0)
> + av_log(ctx, AV_LOG_ERROR, "Failed to load pad_uv_cuda\n");
> +
> +end:
> + CHECK_CU(cu->cuCtxPopCurrent(&dummy_cu_ctx));
> +
> + return ret;
> +}
> +
> +static int cuda_pad_config_props(AVFilterLink *outlink)
> +{
> + AVFilterContext *ctx = outlink->src;
> + CUDAPadContext *s = ctx->priv;
> +
> + AVFilterLink *inlink = ctx->inputs[0];
> + FilterLink *inl = ff_filter_link(inlink);
> +
> + FilterLink *ol = ff_filter_link(outlink);
> +
> + AVHWFramesContext *in_frames_ctx;
> + int format_supported = 0;
> + int ret;
> +
> + s->in_w = inlink->w;
> + s->in_h = inlink->h;
> + ret = eval_expr(ctx);
> + if (ret < 0)
> + return ret;
> +
> + if (!inl->hw_frames_ctx) {
> + av_log(ctx, AV_LOG_ERROR, "No hw context provided on input\n");
> + return AVERROR(EINVAL);
> + }
> +
> + in_frames_ctx = (AVHWFramesContext *)inl->hw_frames_ctx->data;
> + s->hwctx = in_frames_ctx->device_ctx->hwctx;
> +
> + for (int i = 0; i < FF_ARRAY_ELEMS(supported_formats); i++) {
> + if (in_frames_ctx->sw_format == supported_formats[i]) {
> + format_supported = 1;
> + break;
> + }
> + }
> + if (!format_supported) {
> + av_log(ctx, AV_LOG_ERROR, "Unsupported input format.\n");
> + return AVERROR(EINVAL);
> + }
> +
> + uint8_t R = s->rgba_color[0];
> + uint8_t G = s->rgba_color[1];
> + uint8_t B = s->rgba_color[2];
> +
> + int Y = (( 66 * R + 129 * G + 25 * B + 128) >> 8) + 16;
> + int U = ((-38 * R - 74 * G + 112 * B + 128) >> 8) + 128;
> + int V = ((112 * R - 94 * G - 18 * B + 128) >> 8) + 128;
> + s->parsed_color[0] = av_clip_uint8(Y);
> + s->parsed_color[1] = av_clip_uint8(U);
> + s->parsed_color[2] = av_clip_uint8(V);
> + s->parsed_color[3] = s->rgba_color[3];
> +
> + ret = cuda_pad_alloc_out_frames_ctx(ctx, &s->frames_ctx, s->w, s->h);
> + if (ret < 0)
> + return ret;
> +
> + ol->hw_frames_ctx = av_buffer_ref(s->frames_ctx);
> + if (!ol->hw_frames_ctx)
> + return AVERROR(ENOMEM);
> +
> + outlink->w = s->w;
> + outlink->h = s->h;
> + outlink->time_base = inlink->time_base;
> + outlink->format = AV_PIX_FMT_CUDA;
> +
> + s->last_out_w = s->w;
> + s->last_out_h = s->h;
> +
> + ret = cuda_pad_load_functions(ctx);
> + if (ret < 0)
> + return ret;
> +
> + return 0;
> +}
> +
> +static int cuda_pad_pad(AVFilterContext *ctx, AVFrame *out, const AVFrame *in)
> +{
> + CUDAPadContext *s = ctx->priv;
> + FilterLink *inl = ff_filter_link(ctx->inputs[0]);
> +
> + AVHWFramesContext *in_frames_ctx =
> + (AVHWFramesContext *)inl->hw_frames_ctx->data;
> + const AVPixFmtDescriptor *desc_in =
> + av_pix_fmt_desc_get(in_frames_ctx->sw_format);
I wouldn't linebreak both of those either
> +
> + CudaFunctions *cu = s->hwctx->internal->cuda_dl;
> + AVCUDADeviceContext *device_hwctx = s->hwctx;
> + int ret;
> +
> +
> + const int nb_planes = av_pix_fmt_count_planes(in_frames_ctx->sw_format);
> + for (int plane = 0; plane < nb_planes; plane++) {
> +
> + int hsub = (plane == 1 || plane == 2) ? desc_in->log2_chroma_w : 0;
> + int vsub = (plane == 1 || plane == 2) ? desc_in->log2_chroma_h : 0;
> +
> + if (in_frames_ctx->sw_format == AV_PIX_FMT_NV12 && plane == 1) {
> + hsub = desc_in->log2_chroma_w;
> + vsub = desc_in->log2_chroma_h;
> + }
> +
> + int src_w = AV_CEIL_RSHIFT(s->in_w, hsub);
> + int src_h = AV_CEIL_RSHIFT(s->in_h, vsub);
> +
> + int dst_w = AV_CEIL_RSHIFT(s->w, hsub);
> + int dst_h = AV_CEIL_RSHIFT(s->h, vsub);
> +
> + int y_plane_offset = AV_CEIL_RSHIFT(s->y, vsub);
> + int x_plane_offset = AV_CEIL_RSHIFT(s->x, hsub);
> +
> + if (x_plane_offset + src_w > dst_w || y_plane_offset + src_h > dst_h) {
> + av_log(ctx, AV_LOG_ERROR,
> + "ROI out of bounds in plane %d: offset=(%d,%d) in=(%dx%d) "
> + "out=(%dx%d)\n",
> + plane, x_plane_offset, y_plane_offset, src_w, src_h, dst_w, dst_h);
> + return AVERROR(EINVAL);
> + }
> +
> + CUfunction cuda_func;
> + uchar2 fill_val_uv;
> + uint8_t fill_val_planar;
> + void *p_fill_val;
> +
> + if (in_frames_ctx->sw_format == AV_PIX_FMT_NV12 && plane == 1) {
> + cuda_func = s->cu_func_uv;
> + fill_val_uv.x = s->parsed_color[1];
> + fill_val_uv.y = s->parsed_color[2];
> + p_fill_val = &fill_val_uv;
> + } else {
> + cuda_func = s->cu_func_planar;
> + fill_val_planar = s->parsed_color[plane];
> + p_fill_val = &fill_val_planar;
> + }
> +
> + CUdeviceptr d_dst = (CUdeviceptr)out->data[plane];
> + CUdeviceptr d_src = (CUdeviceptr)in->data[plane];
> +
> + int src_linesize = in->linesize[plane];
> +
> + void *kernel_args[] = {
> + &d_dst, &out->linesize[plane], &dst_w, &dst_h,
> + &d_src, &src_linesize, &src_w, &src_h,
> + &x_plane_offset, &y_plane_offset, p_fill_val
> + };
> +
> + unsigned int grid_x = DIV_UP(dst_w, BLOCK_X);
> + unsigned int grid_y = DIV_UP(dst_h, BLOCK_Y);
> +
> + ret = CHECK_CU(cu->cuLaunchKernel(cuda_func, grid_x, grid_y, 1,
> + BLOCK_X, BLOCK_Y, 1,
> + 0, s->hwctx->stream, kernel_args, NULL));
> +
> + if (ret < 0) {
> + av_log(ctx, AV_LOG_ERROR, "Failed to launch kernel for plane %d\n", plane);
> + return ret;
> + }
> + }
> +
> + return 0;
> +}
> +
> +static int cuda_pad_filter_frame(AVFilterLink *inlink, AVFrame *in)
> +{
> + AVFilterContext *ctx = inlink->dst;
> + CUDAPadContext *s = ctx->priv;
> + AVFilterLink *outlink = ctx->outputs[0];
> +
> + FilterLink *outl = ff_filter_link(outlink);
> +
> + AVHWFramesContext *out_frames_ctx =
> + (AVHWFramesContext *)outl->hw_frames_ctx->data;
nit, but it's not worth it to linebreak here imo
> + AVCUDADeviceContext *device_hwctx = out_frames_ctx->device_ctx->hwctx;
> +
> + int ret;
> +
> + if (s->eval_mode == EVAL_MODE_FRAME) {
> + s->in_w = in->width;
> + s->in_h = in->height;
> + s->aspect = in->sample_aspect_ratio;
> +
> + ret = eval_expr(ctx);
> + if (ret < 0) {
> + av_frame_free(&in);
> + return ret;
> + }
> + }
> +
> +
> + if (s->x == 0 && s->y == 0 &&
> + s->w == in->width && s->h == in->height) {
> + av_log(ctx, AV_LOG_DEBUG, "No border. Passing the frame unmodified.\n");
> + s->last_out_w = s->w;
> + s->last_out_h = s->h;
> + return ff_filter_frame(outlink, in);
> + }
> +
> +
> + if (s->w != s->last_out_w || s->h != s->last_out_h) {
> +
> + av_buffer_unref(&s->frames_ctx);
> +
> + ret = cuda_pad_alloc_out_frames_ctx(ctx, &s->frames_ctx, s->w, s->h);
> + if (ret < 0)
> + return ret;
> +
> + av_buffer_unref(&outl->hw_frames_ctx);
> + outl->hw_frames_ctx = av_buffer_ref(s->frames_ctx);
> + if (!outl->hw_frames_ctx) {
> + av_frame_free(&in);
> + av_log(ctx, AV_LOG_ERROR, "Failed to allocate output frame context.\n");
> + return AVERROR(ENOMEM);
> + }
> + outlink->w = s->w;
> + outlink->h = s->h;
> +
> + s->last_out_w = s->w;
> + s->last_out_h = s->h;
> + }
> +
> + AVFrame *out = av_frame_alloc();
> + if (!out) {
> + av_frame_free(&in);
> + av_log(ctx, AV_LOG_ERROR, "Failed to allocate output AVFrame.\n");
> + return AVERROR(ENOMEM);
> + }
> + ret = av_hwframe_get_buffer(outl->hw_frames_ctx, out, 0);
> + if (ret < 0) {
> + av_log(ctx, AV_LOG_ERROR, "Unable to get output buffer: %s\n",
> + av_err2str(ret));
> + av_frame_free(&out);
> + av_frame_free(&in);
> + return ret;
> + }
> +
> + CUcontext dummy;
> + ret = CHECK_CU(device_hwctx->internal->cuda_dl->cuCtxPushCurrent(
> + device_hwctx->cuda_ctx));
> + if (ret < 0) {
> + av_frame_free(&out);
> + av_frame_free(&in);
> + return ret;
> + }
> +
> + ret = cuda_pad_pad(ctx, out, in);
> +
> + CHECK_CU(device_hwctx->internal->cuda_dl->cuCtxPopCurrent(&dummy));
> +
> + if (ret < 0) {
> + av_frame_free(&out);
> + av_frame_free(&in);
> + return ret;
> + }
> +
> + av_frame_copy_props(out, in);
> + out->width = s->w;
> + out->height = s->h;
> +
> +
> + av_reduce(&out->sample_aspect_ratio.num, &out->sample_aspect_ratio.den,
> + (int64_t)in->sample_aspect_ratio.num * out->height * in->width,
> + (int64_t)in->sample_aspect_ratio.den * out->width * in->height,
> + INT_MAX);
> +
> + av_frame_free(&in);
> + return ff_filter_frame(outlink, out);
> +}
> +
> +#define OFFSET(x) offsetof(CUDAPadContext, x)
> +#define FLAGS (AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM)
> +
> +static const AVOption cuda_pad_options[] = {
> + { "width", "set the pad area width expression", OFFSET(w_expr), AV_OPT_TYPE_STRING, {.str = "iw"}, 0, 0, FLAGS },
> + { "w", "set the pad area width expression", OFFSET(w_expr), AV_OPT_TYPE_STRING, {.str = "iw"}, 0, 0, FLAGS },
> + { "height", "set the pad area height expression", OFFSET(h_expr), AV_OPT_TYPE_STRING, {.str = "ih"}, 0, 0, FLAGS },
> + { "h", "set the pad area height expression", OFFSET(h_expr), AV_OPT_TYPE_STRING, {.str = "ih"}, 0, 0, FLAGS },
> + { "x", "set the x offset expression for the input image position", OFFSET(x_expr), AV_OPT_TYPE_STRING, {.str = "0"}, 0, 0, FLAGS },
> + { "y", "set the y offset expression for the input image position", OFFSET(y_expr), AV_OPT_TYPE_STRING, {.str = "0"}, 0, 0, FLAGS },
> + { "color", "set the color of the padded area border", OFFSET(rgba_color), AV_OPT_TYPE_COLOR, {.str = "black"}, .flags = FLAGS },
> + { "eval", "specify when to evaluate expressions", OFFSET(eval_mode), AV_OPT_TYPE_INT, {.i64 = EVAL_MODE_INIT}, 0, EVAL_MODE_NB-1, FLAGS, .unit = "eval" },
> + { "init", "eval expressions once during initialization", 0, AV_OPT_TYPE_CONST, {.i64=EVAL_MODE_INIT}, .flags = FLAGS, .unit = "eval" },
> + { "frame", "eval expressions during initialization and per-frame", 0, AV_OPT_TYPE_CONST, {.i64=EVAL_MODE_FRAME}, .flags = FLAGS, .unit = "eval" },
> + { "aspect", "pad to fit an aspect instead of a resolution", OFFSET(aspect), AV_OPT_TYPE_RATIONAL, {.dbl = 0}, 0, DBL_MAX, FLAGS },
> + { NULL }
> +};
> +
> +static const AVClass cuda_pad_class = {
> + .class_name = "pad_cuda",
> + .item_name = av_default_item_name,
> + .option = cuda_pad_options,
> + .version = LIBAVUTIL_VERSION_INT,
> +};
> +
> +static const AVFilterPad cuda_pad_inputs[] = {{
> + .name = "default",
> + .type = AVMEDIA_TYPE_VIDEO,
> + .filter_frame = cuda_pad_filter_frame
> +}};
> +
> +static const AVFilterPad cuda_pad_outputs[] = {{
> + .name = "default",
> + .type = AVMEDIA_TYPE_VIDEO,
> + .config_props = cuda_pad_config_props,
> +}};
> +
> +const FFFilter ff_vf_pad_cuda = {
> + .p.name = "pad_cuda",
> + .p.description = NULL_IF_CONFIG_SMALL("CUDA-based GPU padding filter"),
> + .init = cuda_pad_init,
> + .uninit = cuda_pad_uninit,
> +
> + .p.priv_class = &cuda_pad_class,
> +
> + FILTER_INPUTS(cuda_pad_inputs),
> + FILTER_OUTPUTS(cuda_pad_outputs),
> +
> + FILTER_SINGLE_PIXFMT(AV_PIX_FMT_CUDA),
> +
> + .priv_size = sizeof(CUDAPadContext),
> + .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE,
> +};
> \ No newline at end of file
> diff --git a/libavfilter/vf_pad_cuda.cu b/libavfilter/vf_pad_cuda.cu
> new file mode 100644
> index 0000000000..4524372e17
> --- /dev/null
> +++ b/libavfilter/vf_pad_cuda.cu
> @@ -0,0 +1,65 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +
> +extern "C" {
> +
> +__global__ void pad_planar_cuda(unsigned char* dst, int dst_pitch, int dst_w, int dst_h,
> + const unsigned char* src, int src_pitch, int src_w, int src_h,
> + int roi_x, int roi_y, unsigned char fill_val)
> +{
> + const int x = blockIdx.x * blockDim.x + threadIdx.x;
> + const int y = blockIdx.y * blockDim.y + threadIdx.y;
> +
> + if (x >= dst_w || y >= dst_h) {
> + return;
> + }
> +
> + if (x >= roi_x && x < (roi_x + src_w) && y >= roi_y && y < (roi_y + src_h)) {
> + const int src_x = x - roi_x;
> + const int src_y = y - roi_y;
> + dst[y * dst_pitch + x] = src[src_y * src_pitch + src_x];
> + } else {
> + dst[y * dst_pitch + x] = fill_val;
> + }
> +}
> +
> +__global__ void pad_uv_cuda(uchar2* dst, int dst_pitch, int dst_w, int dst_h,
> + const uchar2* src, int src_pitch, int src_w, int src_h,
> + int roi_x, int roi_y, uchar2 fill_val)
> +{
> + const int x = blockIdx.x * blockDim.x + threadIdx.x;
> + const int y = blockIdx.y * blockDim.y + threadIdx.y;
> +
> + if (x >= dst_w || y >= dst_h) {
> + return;
> + }
> +
> + const int dst_pitch_uchar2 = dst_pitch / sizeof(uchar2);
> + const int src_pitch_uchar2 = src_pitch / sizeof(uchar2);
> +
> + if (x >= roi_x && x < (roi_x + src_w) && y >= roi_y && y < (roi_y + src_h)) {
> + const int src_x = x - roi_x;
> + const int src_y = y - roi_y;
> + dst[y * dst_pitch_uchar2 + x] = src[src_y * src_pitch_uchar2 + src_x];
> + } else {
> + dst[y * dst_pitch_uchar2 + x] = fill_val;
> + }
> +}
You could make a macro or templated function(this is C++ after all) plus
macro to easily define this function for any datatype.
That'd reduce code doubling here a bit and would also make it easy to
add 10/12 bit pixel format support later on.
> +}
> \ No newline at end of file
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-06-17 21:20 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-06-16 23:39 [FFmpeg-devel] [PATCH] avfilter: add pad_cuda filter Jorge Estrada
2025-06-17 21:20 ` Timo Rothenpieler
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git