From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTPS id 41B604C9EB
	for <ffmpegdev@gitmailbox.com>; Thu,  7 Aug 2025 02:54:05 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id AB89968CD6B;
	Thu,  7 Aug 2025 05:54:01 +0300 (EEST)
Received: from mail-pg1-f178.google.com (mail-pg1-f178.google.com
 [209.85.215.178])
 by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id E552E68BA45
 for <ffmpeg-devel@ffmpeg.org>; Thu,  7 Aug 2025 05:53:54 +0300 (EEST)
Received: by mail-pg1-f178.google.com with SMTP id
 41be03b00d2f7-879d2e419b9so316745a12.2
 for <ffmpeg-devel@ffmpeg.org>; Wed, 06 Aug 2025 19:53:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1754535232; x=1755140032; darn=ffmpeg.org;
 h=content-transfer-encoding:mime-version:message-id:date:subject:cc
 :to:from:from:to:cc:subject:date:message-id:reply-to;
 bh=W3QIisMqO2to2O+3bNNKGYvnJGgE19cKHP3PO7eIt04=;
 b=DJmx/CsJF+bjsLz2wVnS0HD9LNy/i4fe09PQeW7xCvZJu1mAluAQzqRI0vJL2QAKWd
 DwpXalC057OM0j7QJ0Z1nFmwQB2IPBt+myAQZTmAJKhweRkf9nhxfaYfoDOIua4x67Iu
 j0FZhavhICcdyrdsrDtNOUB9M4cJuZbf2NqG/pk9maW1YrdZ6r51Rkn0avIRoSFjvR7C
 DTgS8sJJJNPh/hcFM1R5r7vk8ogi0MAwdxh0kn5FYAV8YwOwNAL11/HnXmM8SSgXxNj/
 x+Vkvn5ZumtH0bUMhz6S2QT3DG67t4k97EsxqOU0s5Skn4/+DVP/aBlFDgB7uTUYh1mj
 wOvg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1754535232; x=1755140032;
 h=content-transfer-encoding:mime-version:message-id:date:subject:cc
 :to:from:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=W3QIisMqO2to2O+3bNNKGYvnJGgE19cKHP3PO7eIt04=;
 b=gGyKXbi3bG56qQssTYzASwjweboWB8dOQoBC+veIRuZRupvTvFXoqeZL6Q8rdPSKN6
 3rvI029LdDu5ZDounwTD+KetQPGRu4/4NaNnBmF+0SCkYs3PtTRc44EwRd4cQBTbW34g
 T9JBV1zMXH4bx6bypyw58U3gftndwSsxGloQ2PHLeqqk8t9NQ+97Go99tdaPrRuQnfg3
 +G+JvSCXwlR8Xx8qagl9sxENg1QOoyikvIEpeI5ZnFfMPkqGG5EjqCJKzpogqgsgC14J
 R/F/UCHU81aSp6j0gDRhvSoHIUtUDK/lIJ+rRJQuEH6VfH1JZHb7GJOEInWzHOcpL0Es
 ztjA==
X-Gm-Message-State: AOJu0Yz4fiabhHomzXS2yJFFZIrc8ONKBACBZfBzOTXvsQN6s9rSZftI
 sGd/q5nYxYUyazN17f+/MoOgdIOuVI7YSom6c6G/Flz4QJv5c/SmD5SVM1Xqtg==
X-Gm-Gg: ASbGncuzk8RhcflmYTxmPpL0N555iWorQt6mKz+5p4rtB63jYdZmeKsQX06F8h5bsAo
 91nN+Jnht5ux89T9JqZW7/7hK0rg+w0aVzYnJSMvpB27oUIQ73i9nM0ESjZ9sebaByCJTAApevr
 hkL/hHzZgzDDMEv52++F5CnO0UJhZHY/9h+EFSaC5uqMDCD1K5B33gdLc+xJTYKLvhA07DfClp4
 QdhhKXlqlZN7TqzZLsptT3fD5YVvgFTPBFwx7e5611GAn7s9DOjksZm0FhrpjBgtJsaWYTx5C8y
 LvcOXrb6cwT3OKzebZ+EdQ7xrh4BXxnM/c8J8qsjTr1JeJTM7i8u2OhTXkGZgz0Qu0qCT/0QHiZ
 12wxRAqwSpUAvrUf1EpfG3SgK8rG+axMosM2CLpUQ
X-Google-Smtp-Source: AGHT+IHiZf70TDz1M6OJk2LscC20X2kxbJgOzNfC7aMEOL4p/HBTHompOsCTZU8cKSjda78G2Newpg==
X-Received: by 2002:a17:903:2a87:b0:23f:f39b:eaf6 with SMTP id
 d9443c01a7336-2429f43b579mr79733935ad.46.1754535232323; 
 Wed, 06 Aug 2025 19:53:52 -0700 (PDT)
Received: from JESTRADA-PC.localdomain ([98.97.30.252])
 by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-241d1ef6832sm170632105ad.25.2025.08.06.19.53.51
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Wed, 06 Aug 2025 19:53:51 -0700 (PDT)
From: Jorge Estrada <jestrada.list@gmail.com>
To: ffmpeg-devel@ffmpeg.org
Date: Wed,  6 Aug 2025 19:53:58 -0700
Message-Id: <20250807025358.49096-1-jestrada.list@gmail.com>
X-Mailer: git-send-email 2.34.1
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH] avfilter: add alphamerge_cuda filter
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Jorge Estrada <jestrada.list@gmail.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/20250807025358.49096-1-jestrada.list@gmail.com/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

This patch adds the alphamerge_cuda video filter

Example usage:
ffmpeg -f lavfi -i "color=c=red:s=1280x720:d=5,format=yuva420p" \
-f lavfi -i "color=c=black:s=1280x720:d=5,geq=lum='255*gt(W/4,hypot(X-W/2,Y-H/2))'" \
-f lavfi -i "color=c=blue:s=1280x720:d=5" \
-filter_complex \
"[0:v]hwupload_cuda[base]; \
 [1:v]hwupload_cuda[mask]; \
 [2:v]hwupload_cuda[bkgd]; \
 [base][mask]alphamerge_cuda[merged]; \
 [bkgd][merged]overlay_cuda" \
-c:v h264_nvenc -y out.mp4
---
 configure                         |   2 +
 doc/filters.texi                  |  21 ++
 libavfilter/Makefile              |   2 +
 libavfilter/allfilters.c          |   1 +
 libavfilter/version.h             |   2 +-
 libavfilter/vf_alphamerge_cuda.c  | 345 ++++++++++++++++++++++++++++++
 libavfilter/vf_alphamerge_cuda.cu |  44 ++++
 7 files changed, 416 insertions(+), 1 deletion(-)
 create mode 100644 libavfilter/vf_alphamerge_cuda.c
 create mode 100644 libavfilter/vf_alphamerge_cuda.cu

diff --git a/configure b/configure
index 30e61c5bb5..268e3c3e28 100755
--- a/configure
+++ b/configure
@@ -3349,6 +3349,8 @@ vaapi_encode_deps="vaapi"
 vulkan_encode_deps="vulkan"
 v4l2_m2m_deps="linux_videodev2_h sem_timedwait"
 
+alphamerge_cuda_filter_deps="ffnvcodec"
+alphamerge_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
 bilateral_cuda_filter_deps="ffnvcodec"
 bilateral_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
 chromakey_cuda_filter_deps="ffnvcodec"
diff --git a/doc/filters.texi b/doc/filters.texi
index 61ece1d000..e51e11eb28 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -26500,6 +26500,27 @@ Note: If FFmpeg detects the Nvidia CUDA Toolkit during configuration, it will en
 @item Configure FFmpeg with @code{--enable-cuda-llvm}. Additional requirement: @code{llvm} lib must be installed.
 @end itemize
 
+@subsection alphamerge_cuda
+Add or replace the alpha component of the primary input stream with the luma plane of a second input stream, using CUDA hardware acceleration.
+
+@subsubsection Examples
+@itemize
+@item
+Apply a circular alpha mask to a red video before overlaying it onto a blue background.
+@example
+ffmpeg -f lavfi -i "color=c=red:s=1280x720:d=5,format=yuva420p" \
+-f lavfi -i "color=c=black:s=1280x720:d=5,geq=lum='255*gt(W/4,hypot(X-W/2,Y-H/2))'" \
+-f lavfi -i "color=c=blue:s=1280x720:d=5" \
+-filter_complex \
+"[0:v]hwupload_cuda[base]; \
+ [1:v]hwupload_cuda[mask]; \
+ [2:v]hwupload_cuda[bkgd]; \
+ [base][mask]alphamerge_cuda[merged]; \
+ [bkgd][merged]overlay_cuda" \
+-c:v h264_nvenc -y out.mp4
+@end example
+@end itemize
+
 @subsection bilateral_cuda
 CUDA accelerated bilateral filter, an edge preserving filter.
 This filter is mathematically accurate thanks to the use of GPU acceleration.
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index 9a906bd342..3df6444f83 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -194,6 +194,8 @@ OBJS-$(CONFIG_ANULLSINK_FILTER)              += asink_anullsink.o
 OBJS-$(CONFIG_ADDROI_FILTER)                 += vf_addroi.o
 OBJS-$(CONFIG_ALPHAEXTRACT_FILTER)           += vf_extractplanes.o
 OBJS-$(CONFIG_ALPHAMERGE_FILTER)             += vf_alphamerge.o framesync.o
+OBJS-$(CONFIG_ALPHAMERGE_CUDA_FILTER)        += vf_alphamerge_cuda.o framesync.o vf_alphamerge_cuda.ptx.o \
+                                                cuda/load_helper.o
 OBJS-$(CONFIG_AMPLIFY_FILTER)                += vf_amplify.o
 OBJS-$(CONFIG_ASS_FILTER)                    += vf_subtitles.o
 OBJS-$(CONFIG_ATADENOISE_FILTER)             += vf_atadenoise.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 44b4de2a14..87e9f2bdd4 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -179,6 +179,7 @@ extern const FFFilter ff_asink_anullsink;
 extern const FFFilter ff_vf_addroi;
 extern const FFFilter ff_vf_alphaextract;
 extern const FFFilter ff_vf_alphamerge;
+extern const FFFilter ff_vf_alphamerge_cuda;
 extern const FFFilter ff_vf_amplify;
 extern const FFFilter ff_vf_ass;
 extern const FFFilter ff_vf_atadenoise;
diff --git a/libavfilter/version.h b/libavfilter/version.h
index 7e0eb9af97..4d8f28e512 100644
--- a/libavfilter/version.h
+++ b/libavfilter/version.h
@@ -31,7 +31,7 @@
 
 #include "version_major.h"
 
-#define LIBAVFILTER_VERSION_MINOR   3
+#define LIBAVFILTER_VERSION_MINOR   4
 #define LIBAVFILTER_VERSION_MICRO 100
 
 
diff --git a/libavfilter/vf_alphamerge_cuda.c b/libavfilter/vf_alphamerge_cuda.c
new file mode 100644
index 0000000000..b8ab78cb2e
--- /dev/null
+++ b/libavfilter/vf_alphamerge_cuda.c
@@ -0,0 +1,345 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * Copy the luma value of the second input into the alpha channel of the first input using CUDA.
+ */
+
+#include "libavutil/internal.h"
+#include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
+#include "libavutil/hwcontext.h"
+#include "libavutil/hwcontext_cuda_internal.h"
+#include "libavutil/cuda_check.h"
+#include "libavutil/mem.h"
+
+#include "avfilter.h"
+#include "filters.h"
+#include "formats.h"
+#include "framesync.h"
+
+#include "cuda/load_helper.h"
+
+#define CHECK_CU(call) FF_CUDA_CHECK_DL(ctx, s->hwctx->internal->cuda_dl, call)
+#define DIV_UP(a, b) ( ((a) + (b) - 1) / (b) )
+
+#define BLOCK_X 32
+#define BLOCK_Y 16
+
+#define MAIN_INPUT 0
+#define ALPHA_INPUT 1
+
+#define ALPHA_PLANE_INDEX 3
+
+static const enum AVPixelFormat supported_main_formats[] = {
+    AV_PIX_FMT_YUVA444P,
+    AV_PIX_FMT_YUVA420P,
+    AV_PIX_FMT_NONE,
+};
+
+static const enum AVPixelFormat supported_alpha_mask_formats[] = {
+    AV_PIX_FMT_YUV420P,
+    AV_PIX_FMT_YUV444P,
+    AV_PIX_FMT_YUVA420P,
+    AV_PIX_FMT_YUVA444P,
+    AV_PIX_FMT_NV12,
+    AV_PIX_FMT_NONE,
+};
+
+typedef struct AlphaMergeCUDAContext {
+    const AVClass *class;
+
+    enum AVPixelFormat sw_format_main;
+    enum AVPixelFormat sw_format_alpha_mask;
+
+    AVBufferRef *hw_device_ctx;
+    AVCUDADeviceContext *hwctx;
+
+    CUcontext cu_ctx;
+    CUmodule cu_module;
+    CUfunction cu_func_alphamerge_planar;
+    CUstream cu_stream;
+
+    FFFrameSync fs;
+
+    int alpha_plane_idx;
+
+} AlphaMergeCUDAContext;
+
+
+static int format_is_supported(const enum AVPixelFormat supported_formats[], enum AVPixelFormat fmt)
+{
+    for (int i = 0; supported_formats[i] != AV_PIX_FMT_NONE; i++)
+        if (supported_formats[i] == fmt)
+            return 1;
+    return 0;
+}
+
+static int query_formats(const AVFilterContext *ctx,
+                         AVFilterFormatsConfig **cfg_in,
+                         AVFilterFormatsConfig **cfg_out)
+{
+    static const int pix_fmts[] = { AV_PIX_FMT_CUDA, AV_PIX_FMT_NONE };
+    static const int alpha_mask_ranges[] = { AVCOL_RANGE_JPEG };
+    AVFilterFormats *formats = NULL;
+    int ret = 0;
+
+    formats = ff_make_format_list(pix_fmts);
+    if (!formats)
+        return AVERROR(ENOMEM);
+
+    if ((ret = ff_set_common_formats2(ctx, cfg_in, cfg_out, ff_make_format_list(pix_fmts))) < 0)
+        return ret;
+
+    formats = ff_make_format_list(alpha_mask_ranges);
+    if (!formats)
+        return AVERROR(ENOMEM);
+
+    ret = ff_formats_ref(formats, &cfg_in[ALPHA_INPUT]->color_ranges);
+    ff_formats_unref(&formats);
+    if (ret < 0)
+        return ret;
+
+    return 0;
+}
+
+static int do_alphamerge_cuda(FFFrameSync *fs)
+{
+    AVFilterContext *ctx = fs->parent;
+    AlphaMergeCUDAContext *s = ctx->priv;
+    AVFilterLink *outlink = ctx->outputs[0];
+    AVFrame *main_frame = NULL;
+    AVFrame *alpha_mask_frame = NULL;
+    CudaFunctions *cu = s->hwctx->internal->cuda_dl;
+    CUcontext dummy_cu_ctx;
+    int ret;
+
+    ret = ff_framesync_dualinput_get_writable(fs, &main_frame, &alpha_mask_frame);
+    if (ret < 0)
+        return ret;
+
+    if (!alpha_mask_frame)
+        return ff_filter_frame(outlink, main_frame);
+
+    ret = CHECK_CU(cu->cuCtxPushCurrent(s->cu_ctx));
+    if (ret < 0)
+        return ret;
+
+    void *kernel_args[] = {
+        &main_frame->data[s->alpha_plane_idx],
+        &main_frame->linesize[s->alpha_plane_idx],
+        &alpha_mask_frame->data[0],
+        &alpha_mask_frame->linesize[0],
+        &main_frame->width,
+        &main_frame->height
+    };
+    unsigned int grid_x = DIV_UP(main_frame->width, BLOCK_X);
+    unsigned int grid_y = DIV_UP(main_frame->height, BLOCK_Y);
+
+    ret = CHECK_CU(cu->cuLaunchKernel(s->cu_func_alphamerge_planar, grid_x, grid_y, 1,
+                                      BLOCK_X, BLOCK_Y, 1,
+                                      0, s->cu_stream, kernel_args, NULL));
+
+    CHECK_CU(cu->cuCtxPopCurrent(&dummy_cu_ctx));
+
+    if (ret < 0) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to launch CUDA kernel\n");
+        return ret;
+    }
+
+    return ff_filter_frame(outlink, main_frame);
+}
+
+
+static int alphamerge_cuda_config_output(AVFilterLink *outlink)
+{
+    AVFilterContext *ctx = outlink->src;
+    AlphaMergeCUDAContext *s = ctx->priv;
+
+    AVFilterLink *main_inlink = ctx->inputs[MAIN_INPUT];
+    AVFilterLink *alpha_inlink = ctx->inputs[ALPHA_INPUT];
+
+    FilterLink *main_inl = ff_filter_link(main_inlink);
+    FilterLink *alpha_inl = ff_filter_link(alpha_inlink);
+
+    AVHWFramesContext *main_frames_ctx = (AVHWFramesContext*)main_inl->hw_frames_ctx->data;
+    AVHWFramesContext *alpha_frames_ctx = (AVHWFramesContext*)alpha_inl->hw_frames_ctx->data;
+
+    const AVPixFmtDescriptor *main_desc;
+    CUcontext dummy_cu_ctx;
+    CudaFunctions *cu;
+    int ret = 0;
+
+    extern const unsigned char ff_vf_alphamerge_cuda_ptx_data[];
+    extern const unsigned int ff_vf_alphamerge_cuda_ptx_len;
+
+    s->sw_format_main = main_frames_ctx->sw_format;
+    if (!format_is_supported(supported_main_formats, s->sw_format_main)) {
+        av_log(ctx, AV_LOG_ERROR, "Unsupported main input software pixel format: %s\n",
+               av_get_pix_fmt_name(s->sw_format_main));
+        return AVERROR(ENOSYS);
+    }
+
+    s->sw_format_alpha_mask = alpha_frames_ctx->sw_format;
+    if (!format_is_supported(supported_alpha_mask_formats, s->sw_format_alpha_mask)) {
+        av_log(ctx, AV_LOG_ERROR, "Unsupported alpha mask input software pixel format: %s.\n",
+               av_get_pix_fmt_name(s->sw_format_alpha_mask));
+        return AVERROR(ENOSYS);
+    }
+
+    if (main_inlink->w != alpha_inlink->w || main_inlink->h != alpha_inlink->h) {
+        av_log(ctx, AV_LOG_ERROR, "Input frame sizes do not match (%dx%d vs %dx%d).\n",
+               main_inlink->w, main_inlink->h, alpha_inlink->w, alpha_inlink->h);
+        return AVERROR(EINVAL);
+    }
+
+    s->hw_device_ctx = av_buffer_ref(main_frames_ctx->device_ref);
+    if (!s->hw_device_ctx)
+        return AVERROR(ENOMEM);
+
+    s->hwctx = ((AVHWDeviceContext*)s->hw_device_ctx->data)->hwctx;
+    s->cu_ctx = s->hwctx->cuda_ctx;
+    s->cu_stream = s->hwctx->stream;
+    cu = s->hwctx->internal->cuda_dl;
+
+    ret = CHECK_CU(cu->cuCtxPushCurrent(s->cu_ctx));
+    if (ret < 0)
+        return ret;
+
+    ret = ff_cuda_load_module(ctx, s->hwctx, &s->cu_module,
+                              ff_vf_alphamerge_cuda_ptx_data, ff_vf_alphamerge_cuda_ptx_len);
+    if (ret < 0) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to load CUDA module.\n");
+        goto end;
+    }
+
+    ret = CHECK_CU(cu->cuModuleGetFunction(&s->cu_func_alphamerge_planar, s->cu_module, "alphamerge_planar"));
+    if (ret < 0) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to get kernel function 'alphamerge_planar'.\n");
+        goto end;
+    }
+
+    main_desc = av_pix_fmt_desc_get(s->sw_format_main);
+    if (!main_desc || !(main_desc->flags & AV_PIX_FMT_FLAG_ALPHA)) {
+        av_log(ctx, AV_LOG_ERROR, "Main input sw_format %s is not a supported format with an alpha channel.\n",
+               av_get_pix_fmt_name(s->sw_format_main));
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+    s->alpha_plane_idx = main_desc->comp[ALPHA_PLANE_INDEX].plane;
+
+    ff_filter_link(outlink)->hw_frames_ctx = av_buffer_ref(main_inl->hw_frames_ctx);
+    if (!ff_filter_link(outlink)->hw_frames_ctx) {
+        ret = AVERROR(ENOMEM);
+        goto end;
+    }
+
+    s->fs.time_base = main_inlink->time_base;
+    if ((ret = ff_framesync_init_dualinput(&s->fs, ctx)) < 0)
+        goto end;
+
+    outlink->w = main_inlink->w;
+    outlink->h = main_inlink->h;
+    outlink->time_base = main_inlink->time_base;
+    outlink->sample_aspect_ratio = main_inlink->sample_aspect_ratio;
+    ff_filter_link(outlink)->frame_rate = ff_filter_link(main_inlink)->frame_rate;
+
+    ret = ff_framesync_configure(&s->fs);
+
+end:
+    CHECK_CU(cu->cuCtxPopCurrent(&dummy_cu_ctx));
+    return ret;
+}
+
+static av_cold int alphamerge_cuda_init(AVFilterContext *ctx)
+{
+    AlphaMergeCUDAContext *s = ctx->priv;
+    s->fs.on_event = &do_alphamerge_cuda;
+    return 0;
+}
+
+static av_cold void alphamerge_cuda_uninit(AVFilterContext *ctx)
+{
+    AlphaMergeCUDAContext *s = ctx->priv;
+    CUcontext dummy;
+
+    ff_framesync_uninit(&s->fs);
+
+    if (s->cu_module) {
+        CudaFunctions *cu = s->hwctx->internal->cuda_dl;
+        CHECK_CU(cu->cuCtxPushCurrent(s->cu_ctx));
+
+        if (s->cu_stream)
+            CHECK_CU(cu->cuStreamSynchronize(s->cu_stream));
+
+        CHECK_CU(cu->cuModuleUnload(s->cu_module));
+        CHECK_CU(cu->cuCtxPopCurrent(&dummy));
+    }
+
+    av_buffer_unref(&s->hw_device_ctx);
+}
+
+static int alphamerge_cuda_activate(AVFilterContext *ctx)
+{
+    AlphaMergeCUDAContext *s = ctx->priv;
+    return ff_framesync_activate(&s->fs);
+}
+
+static const AVFilterPad alphamerge_cuda_inputs[] = {
+    {
+        .name = "main",
+        .type = AVMEDIA_TYPE_VIDEO,
+    },
+    {
+        .name = "alpha",
+        .type = AVMEDIA_TYPE_VIDEO,
+    }
+};
+
+static const AVFilterPad alphamerge_cuda_outputs[] = {
+    {
+        .name = "default",
+        .type = AVMEDIA_TYPE_VIDEO,
+        .config_props = &alphamerge_cuda_config_output,
+    }
+};
+
+static const AVOption alphamerge_cuda_options[] = {
+    { NULL },
+};
+
+FRAMESYNC_DEFINE_CLASS(alphamerge_cuda, AlphaMergeCUDAContext, fs);
+
+const FFFilter ff_vf_alphamerge_cuda = {
+    .p.name          = "alphamerge_cuda",
+    .p.description   = NULL_IF_CONFIG_SMALL("Copy the luma value of the second input into the alpha channel of the first input using CUDA."),
+
+    .priv_size     = sizeof(AlphaMergeCUDAContext),
+    .p.priv_class    = &alphamerge_cuda_class,
+
+    .init          = &alphamerge_cuda_init,
+    .uninit        = &alphamerge_cuda_uninit,
+
+    .activate      = &alphamerge_cuda_activate,
+    FILTER_INPUTS(alphamerge_cuda_inputs),
+    FILTER_OUTPUTS(alphamerge_cuda_outputs),
+    FILTER_QUERY_FUNC2(query_formats),
+    .preinit       = alphamerge_cuda_framesync_preinit,
+    .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE,
+};
\ No newline at end of file
diff --git a/libavfilter/vf_alphamerge_cuda.cu b/libavfilter/vf_alphamerge_cuda.cu
new file mode 100644
index 0000000000..99a9dc91ec
--- /dev/null
+++ b/libavfilter/vf_alphamerge_cuda.cu
@@ -0,0 +1,44 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+
+template <typename T>
+__device__ void alphamerge_impl(T *dst, int dst_pitch,
+                                const T *src, int src_pitch,
+                                int width, int height)
+{
+    const int x = blockIdx.x * blockDim.x + threadIdx.x;
+    const int y = blockIdx.y * blockDim.y + threadIdx.y;
+
+    if (x < width && y < height) {
+        dst[y * dst_pitch + x] = src[y * src_pitch + x];
+    }
+}
+
+extern "C" {
+    __global__ void alphamerge_planar(unsigned char* main_alpha_plane,
+                                      int main_alpha_linesize,
+                                      const unsigned char* alpha_mask_luma_plane,
+                                      int alpha_mask_luma_linesize,
+                                      int width, int height)
+    {
+        alphamerge_impl<unsigned char>(main_alpha_plane, main_alpha_linesize,
+                                       alpha_mask_luma_plane, alpha_mask_luma_linesize,
+                                       width, height);
+    }
+}
\ No newline at end of file
-- 
2.34.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".