[FFmpeg-devel] [PATCH 2/2] avfilter/vf_nvoffruc: Add filter for nvidia's Optical Flow FRUC library

From: Philip Langdale <philipl@overt.org>
To: ffmpeg-devel@ffmpeg.org
Cc: Philip Langdale <philipl@overt.org>
Subject: [FFmpeg-devel] [PATCH 2/2] avfilter/vf_nvoffruc: Add filter for nvidia's Optical Flow FRUC library
Date: Mon,  2 Jan 2023 15:21:33 -0800
Message-ID: <20230102232133.729217-3-philipl@overt.org> (raw)
In-Reply-To: <20230102232133.729217-1-philipl@overt.org>

The NvOFFRUC library provides a GPU accelerated interpolation feature
based on nvidia's Optical Flow functionality. It's able to provide
reasonably high quality realtime interpolation to increase the frame
rate of a video stream - as opposed to vf_framerate that just does a
linear blend or vf_minterpolate that is anything but realtime.

As interesting as that sounds, there are a lot of limitations that
mean this filter is mostly just a toy.

1. The licensing is useless. The library and header and distributed as
   part of the Optical Flow SDK which has a proprietary EULA, so anyone
   wanting to build the filter must obtain the SDK for both build and
   runtime and the resulting binaries will be nonfree and
   unredistributable.

2. The NvOFFRUC.h header doesn't even compile in pure C without
   modification.

3. The library can only handle NV12 and "ARGB" (which realy means any
   single plane, four channel, 8 bit format). This means it can't help
   with our inevitable future dominated by 10+ bit formats.

4. The pitch handling logic in the library is very inflexiable, and it
   assumes that for NV12, the UV plane is contiguous with the Y plane.
   This actually ends up making it incompatible with nvdec output for
   certain frame sizes. To avoid constantly fighting edge cases, I took
   the brute force approach and copy the input and output frames
   to/from CUarrays (which the library can accept) to give me a way to
   ensure the correct layout is used.

5. The library is stateful in an unhelpful way. It is called with one
   input frame, and one output buffer and always interpolates between
   the passed input frame and the frame from the previous call. This
   both requires special handling for the first frame, and also
   prevents generating more than one intermediate frame. If you want
   to do 3x or 4x etc interpolation, this approach doesn't work.

   So, again, I brute forced it by treating every interpolation as a
   new session - calling it twice with each input frame, even if the
   first frame happens to be the same as the last frame we called it
   with. This allows us to generate as many intermediate frames as we
   want, but it presumably consumes more GPU resources.

6. The library always creates a `NvOFFRUC` directory with an empty log
   file in it in $PWD. What a niusance.

But with all those caveats and limitations, it does actually work. I
was able to upsample a 24fps file to 144fps (my monitor limit) with
respectable results. In some situations, it starts bogging down, and
I'm not entirely sure where those limits are - certainly I can see it
consuming a significant percentage of GPU resources for large scaling
factors.

The implementation here is heavily based on vf_framerate with the
blending function ripped out and replaced by NvOFFRUC. That means we
have all the nice properties in terms of being able to do non-integer
scaling, and downsampling via interpolation as well.

Is this mergeable? No - but it was an interesting exercise and maybe
folks in narrow circumstances may find some genuine use from it.

Signed-off-by: Philip Langdale <philipl@overt.org>
---
 configure                 |   7 +-
 libavfilter/Makefile      |   1 +
 libavfilter/allfilters.c  |   1 +
 libavfilter/vf_nvoffruc.c | 644 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 650 insertions(+), 3 deletions(-)
 create mode 100644 libavfilter/vf_nvoffruc.c

diff --git a/configure b/configure
index 675dc84f56..6ea9f89f97 100755
--- a/configure
+++ b/configure
@@ -3691,6 +3691,7 @@ mptestsrc_filter_deps="gpl"
 negate_filter_deps="lut_filter"
 nlmeans_opencl_filter_deps="opencl"
 nnedi_filter_deps="gpl"
+nvoffruc_filter_deps="ffnvcodec nonfree"
 ocr_filter_deps="libtesseract"
 ocv_filter_deps="libopencv"
 openclsrc_filter_deps="opencl"
@@ -6450,9 +6451,9 @@ fi
 if ! disabled ffnvcodec; then
     ffnv_hdr_list="ffnvcodec/nvEncodeAPI.h ffnvcodec/dynlink_cuda.h ffnvcodec/dynlink_cuviddec.h ffnvcodec/dynlink_nvcuvid.h"
     check_pkg_config ffnvcodec "ffnvcodec >= 12.0.16.0" "$ffnv_hdr_list" "" || \
-      check_pkg_config ffnvcodec "ffnvcodec >= 11.1.5.2 ffnvcodec < 12.0" "$ffnv_hdr_list" "" || \
-      check_pkg_config ffnvcodec "ffnvcodec >= 11.0.10.2 ffnvcodec < 11.1" "$ffnv_hdr_list" "" || \
-      check_pkg_config ffnvcodec "ffnvcodec >= 8.1.24.14 ffnvcodec < 8.2" "$ffnv_hdr_list" ""
+      check_pkg_config ffnvcodec "ffnvcodec >= 11.1.5.3 ffnvcodec < 12.0" "$ffnv_hdr_list" "" || \
+      check_pkg_config ffnvcodec "ffnvcodec >= 11.0.10.3 ffnvcodec < 11.1" "$ffnv_hdr_list" "" || \
+      check_pkg_config ffnvcodec "ffnvcodec >= 8.1.24.15 ffnvcodec < 8.2" "$ffnv_hdr_list" ""
 fi
 
 if enabled_all libglslang libshaderc; then
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index cb41ccc622..292597f3a8 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -389,6 +389,7 @@ OBJS-$(CONFIG_NOFORMAT_FILTER)               += vf_format.o
 OBJS-$(CONFIG_NOISE_FILTER)                  += vf_noise.o
 OBJS-$(CONFIG_NORMALIZE_FILTER)              += vf_normalize.o
 OBJS-$(CONFIG_NULL_FILTER)                   += vf_null.o
+OBJS-$(CONFIG_NVOFFRUC_FILTER)               += vf_nvoffruc.o
 OBJS-$(CONFIG_OCR_FILTER)                    += vf_ocr.o
 OBJS-$(CONFIG_OCV_FILTER)                    += vf_libopencv.o
 OBJS-$(CONFIG_OSCILLOSCOPE_FILTER)           += vf_datascope.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 52741b60e4..84f102806e 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -368,6 +368,7 @@ extern const AVFilter ff_vf_noformat;
 extern const AVFilter ff_vf_noise;
 extern const AVFilter ff_vf_normalize;
 extern const AVFilter ff_vf_null;
+extern const AVFilter ff_vf_nvoffruc;
 extern const AVFilter ff_vf_ocr;
 extern const AVFilter ff_vf_ocv;
 extern const AVFilter ff_vf_oscilloscope;
diff --git a/libavfilter/vf_nvoffruc.c b/libavfilter/vf_nvoffruc.c
new file mode 100644
index 0000000000..e3a9f9e553
--- /dev/null
+++ b/libavfilter/vf_nvoffruc.c
@@ -0,0 +1,644 @@
+/*
+ * Copyright (C) 2022 Philip Langdale <philipl@overt.org>
+ * Based on vf_framerate - Copyright (C) 2012 Mark Himsley
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * filter upsamples the frame rate of a source using the nvidia Optical Flow
+ * FRUC library.
+ */
+
+#include <dlfcn.h>
+#include "libavutil/avassert.h"
+#include "libavutil/cuda_check.h"
+#include "libavutil/hwcontext.h"
+#include "libavutil/hwcontext_cuda_internal.h"
+#include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
+
+#include "avfilter.h"
+#include "filters.h"
+#include "internal.h"
+/*
+ * This cannot be distributed with the filter due to licensing. If you want to
+ * compile this filter, you will need to obtain it from nvidia and then fix it
+ * to work in a pure C environment:
+ * * Remove the `using namespace std;`
+ * * Replace the `bool *` with `void *`
+ */
+#include "NvOFFRUC.h"
+
+typedef struct FRUCContext {
+    const AVClass *class;
+
+    AVCUDADeviceContext *hwctx;
+    AVBufferRef         *device_ref;
+
+    CUcontext cu_ctx;
+    CUstream  stream;
+    CUarray   c0;                       ///< CUarray for f0
+    CUarray   c1;                       ///< CUarray for f1
+    CUarray   cw;                       ///< CUarray for work
+
+    AVRational dest_frame_rate;
+    int interp_start;                   ///< start of range to apply interpolation
+    int interp_end;                     ///< end of range to apply interpolation
+
+    AVRational srce_time_base;          ///< timebase of source
+    AVRational dest_time_base;          ///< timebase of destination
+
+    int blend_factor_max;
+    AVFrame *work;
+    enum AVPixelFormat format;
+
+    AVFrame *f0;                        ///< last frame
+    AVFrame *f1;                        ///< current frame
+    int64_t pts0;                       ///< last frame pts in dest_time_base
+    int64_t pts1;                       ///< current frame pts in dest_time_base
+    int64_t delta;                      ///< pts1 to pts0 delta
+    int flush;                          ///< 1 if the filter is being flushed
+    int64_t start_pts;                  ///< pts of the first output frame
+    int64_t n;                          ///< output frame counter
+
+    void *fruc_dl;
+    PtrToFuncNvOFFRUCCreate NvOFFRUCCreate;
+    PtrToFuncNvOFFRUCRegisterResource NvOFFRUCRegisterResource;
+    PtrToFuncNvOFFRUCUnregisterResource NvOFFRUCUnregisterResource;
+    PtrToFuncNvOFFRUCProcess NvOFFRUCProcess;
+    PtrToFuncNvOFFRUCDestroy NvOFFRUCDestroy;
+    NvOFFRUCHandle fruc;
+} FRUCContext;
+
+#define CHECK_CU(x) FF_CUDA_CHECK_DL(ctx, s->hwctx->internal->cuda_dl, x)
+#define OFFSET(x) offsetof(FRUCContext, x)
+#define V AV_OPT_FLAG_VIDEO_PARAM
+#define F AV_OPT_FLAG_FILTERING_PARAM
+#define FRAMERATE_FLAG_SCD 01
+
+static const AVOption nvoffruc_options[] = {
+    {"fps",                 "required output frames per second rate", OFFSET(dest_frame_rate), AV_OPT_TYPE_VIDEO_RATE, {.str="50"},             0,       INT_MAX, V|F },
+
+    {"interp_start",        "point to start linear interpolation",    OFFSET(interp_start),    AV_OPT_TYPE_INT,      {.i64=15},                 0,       255,     V|F },
+    {"interp_end",          "point to end linear interpolation",      OFFSET(interp_end),      AV_OPT_TYPE_INT,      {.i64=240},                0,       255,     V|F },
+
+    {NULL}
+};
+
+AVFILTER_DEFINE_CLASS(nvoffruc);
+
+static int blend_frames(AVFilterContext *ctx, int64_t work_pts)
+{
+    FRUCContext *s = ctx->priv;
+    AVFilterLink *outlink = ctx->outputs[0];
+
+    CudaFunctions *cu = s->hwctx->internal->cuda_dl;
+    CUDA_MEMCPY2D cpy_params = {0,};
+    NvOFFRUC_PROCESS_IN_PARAMS in = {0,};
+    NvOFFRUC_PROCESS_OUT_PARAMS out = {0,};
+    NvOFFRUC_STATUS status;
+
+    int num_channels = s->format == AV_PIX_FMT_NV12 ? 1 : 4;
+    int ret;
+    uint64_t ignored;
+
+    // get work-space for output frame
+    s->work = ff_get_video_buffer(outlink, outlink->w, outlink->h);
+    if (!s->work)
+        return AVERROR(ENOMEM);
+
+    av_frame_copy_props(s->work, s->f0);
+
+    cpy_params.srcMemoryType = CU_MEMORYTYPE_DEVICE,
+    cpy_params.srcDevice = (CUdeviceptr)s->f0->data[0],
+    cpy_params.srcPitch = s->f0->linesize[0],
+    cpy_params.srcY = 0,
+    cpy_params.dstMemoryType = CU_MEMORYTYPE_ARRAY,
+    cpy_params.dstArray = s->c0,
+    cpy_params.dstY = 0,
+    cpy_params.WidthInBytes = s->f0->width * num_channels,
+    cpy_params.Height = s->f0->height,
+    ret = CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
+    if (ret < 0)
+        return ret;
+
+    if (s->f0->data[1]) {
+        cpy_params.srcMemoryType = CU_MEMORYTYPE_DEVICE,
+        cpy_params.srcDevice = (CUdeviceptr)s->f0->data[1],
+        cpy_params.srcPitch = s->f0->linesize[1],
+        cpy_params.srcY = 0,
+        cpy_params.dstMemoryType = CU_MEMORYTYPE_ARRAY,
+        cpy_params.dstArray = s->c0,
+        cpy_params.dstY = s->f0->height,
+        cpy_params.WidthInBytes = s->f0->width * num_channels,
+        cpy_params.Height = s->f0->height * 0.5,
+        CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
+        if (ret < 0)
+            return ret;
+    }
+
+    cpy_params.srcMemoryType = CU_MEMORYTYPE_DEVICE,
+    cpy_params.srcDevice = (CUdeviceptr)s->f1->data[0],
+    cpy_params.srcPitch = s->f1->linesize[0],
+    cpy_params.srcY = 0,
+    cpy_params.dstMemoryType = CU_MEMORYTYPE_ARRAY,
+    cpy_params.dstArray = s->c1,
+    cpy_params.dstY = 0,
+    cpy_params.WidthInBytes = s->f1->width * num_channels,
+    cpy_params.Height = s->f1->height,
+    CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
+    if (ret < 0)
+        return ret;
+
+    if (s->f1->data[1]) {
+        cpy_params.srcMemoryType = CU_MEMORYTYPE_DEVICE,
+        cpy_params.srcDevice = (CUdeviceptr)s->f1->data[1],
+        cpy_params.srcPitch = s->f1->linesize[1],
+        cpy_params.srcY = 0,
+        cpy_params.dstMemoryType = CU_MEMORYTYPE_ARRAY,
+        cpy_params.dstArray = s->c1,
+        cpy_params.dstY = s->f1->height,
+        cpy_params.WidthInBytes = s->f1->width * num_channels,
+        cpy_params.Height = s->f1->height * 0.5,
+        CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
+        if (ret < 0)
+            return ret;
+    }
+
+    in.stFrameDataInput.pFrame = s->c0;
+    in.stFrameDataInput.nTimeStamp = s->pts0;
+    out.stFrameDataOutput.pFrame = s->cw,
+    out.stFrameDataOutput.nTimeStamp = s->pts0;
+    out.stFrameDataOutput.bHasFrameRepetitionOccurred = &ignored;
+    status = s->NvOFFRUCProcess(s->fruc, &in, &out);
+    if (status) {
+        av_log(ctx, AV_LOG_ERROR, "FRUC: Process failure: %d\n", status);
+        return AVERROR(ENOSYS);
+    }
+
+    in.stFrameDataInput.pFrame = s->c1;
+    in.stFrameDataInput.nTimeStamp = s->pts1;
+    out.stFrameDataOutput.pFrame = s->cw,
+    out.stFrameDataOutput.nTimeStamp = work_pts;
+    out.stFrameDataOutput.bHasFrameRepetitionOccurred = &ignored;
+    status = s->NvOFFRUCProcess(s->fruc, &in, &out);
+    if (status) {
+        av_log(ctx, AV_LOG_ERROR, "FRUC: Process failure: %d\n", status);
+        return AVERROR(ENOSYS);
+    }
+
+    cpy_params.srcMemoryType = CU_MEMORYTYPE_ARRAY,
+    cpy_params.srcArray = s->cw,
+    cpy_params.srcY = 0,
+    cpy_params.dstMemoryType = CU_MEMORYTYPE_DEVICE,
+    cpy_params.dstDevice = (CUdeviceptr)s->work->data[0],
+    cpy_params.dstPitch = s->work->linesize[0],
+    cpy_params.dstY = 0,
+    cpy_params.WidthInBytes = s->work->width * num_channels,
+    cpy_params.Height = s->work->height,
+    CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
+    if (ret < 0)
+        return ret;
+
+    if (s->work->data[1]) {
+        cpy_params.srcMemoryType = CU_MEMORYTYPE_ARRAY,
+        cpy_params.srcArray = s->cw,
+        cpy_params.srcY = s->work->height,
+        cpy_params.dstMemoryType = CU_MEMORYTYPE_DEVICE,
+        cpy_params.dstDevice = (CUdeviceptr)s->work->data[1],
+        cpy_params.dstPitch = s->work->linesize[1],
+        cpy_params.dstY = 0,
+        cpy_params.WidthInBytes = s->work->width * num_channels,
+        cpy_params.Height = s->work->height * 0.5,
+        CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
+        if (ret < 0)
+            return ret;
+    }
+
+    return 0;
+}
+
+static int process_work_frame(AVFilterContext *ctx)
+{
+    FRUCContext *s = ctx->priv;
+    int64_t work_pts;
+    int64_t interpolate, interpolate8;
+    int ret;
+
+    if (!s->f1)
+        return 0;
+    if (!s->f0 && !s->flush)
+        return 0;
+
+    work_pts = s->start_pts + av_rescale_q(s->n, av_inv_q(s->dest_frame_rate), s->dest_time_base);
+
+    if (work_pts >= s->pts1 && !s->flush)
+        return 0;
+
+    if (!s->f0) {
+        av_assert1(s->flush);
+        s->work = s->f1;
+        s->f1 = NULL;
+    } else {
+        if (work_pts >= s->pts1 + s->delta && s->flush)
+            return 0;
+
+        interpolate = av_rescale(work_pts - s->pts0, s->blend_factor_max, s->delta);
+        interpolate8 = av_rescale(work_pts - s->pts0, 256, s->delta);
+        ff_dlog(ctx, "process_work_frame() interpolate: %"PRId64"/256\n", interpolate8);
+        if (interpolate >= s->blend_factor_max || interpolate8 > s->interp_end) {
+            av_log(ctx, AV_LOG_DEBUG, "Matched f0: pts %lu\n", work_pts);
+            s->work = av_frame_clone(s->f1);
+        } else if (interpolate <= 0 || interpolate8 < s->interp_start) {
+            av_log(ctx, AV_LOG_DEBUG, "Matched f1: pts %lu\n", work_pts);
+            s->work = av_frame_clone(s->f0);
+        } else {
+            av_log(ctx, AV_LOG_DEBUG, "Unmatched pts: %lu\n", work_pts);
+            ret = blend_frames(ctx, work_pts);
+            if (ret < 0)
+                return ret;
+        }
+    }
+
+    if (!s->work)
+        return AVERROR(ENOMEM);
+
+    s->work->pts = work_pts;
+    s->n++;
+
+    return 1;
+}
+
+static av_cold int init(AVFilterContext *ctx)
+{
+    FRUCContext *s = ctx->priv;
+    s->start_pts = AV_NOPTS_VALUE;
+
+    // TODO: Need windows equivalent symbol loading
+    s->fruc_dl = dlopen("libNvOFFRUC.so", RTLD_LAZY);
+    if (!s->fruc_dl) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to load FRUC: %s\n", dlerror());
+        return AVERROR(EINVAL);
+    }
+
+    s->NvOFFRUCCreate = (PtrToFuncNvOFFRUCCreate)
+        dlsym(s->fruc_dl, "NvOFFRUCCreate");
+    s->NvOFFRUCRegisterResource = (PtrToFuncNvOFFRUCRegisterResource)
+        dlsym(s->fruc_dl, "NvOFFRUCRegisterResource");
+    s->NvOFFRUCUnregisterResource = (PtrToFuncNvOFFRUCUnregisterResource)
+        dlsym(s->fruc_dl, "NvOFFRUCUnregisterResource");
+    s->NvOFFRUCProcess = (PtrToFuncNvOFFRUCProcess)
+        dlsym(s->fruc_dl, "NvOFFRUCProcess");
+    s->NvOFFRUCDestroy = (PtrToFuncNvOFFRUCDestroy)
+        dlsym(s->fruc_dl, "NvOFFRUCDestroy");
+    return 0;
+}
+
+static av_cold void uninit(AVFilterContext *ctx)
+{
+    FRUCContext *s = ctx->priv;
+    CudaFunctions *cu = s->hwctx->internal->cuda_dl;
+    CUcontext dummy;
+
+    CHECK_CU(cu->cuCtxPushCurrent(s->cu_ctx));
+
+    if (s->fruc) {
+        NvOFFRUC_UNREGISTER_RESOURCE_PARAM in_param = {
+            .pArrResource = {s->c0, s->c1, s->cw},
+            .uiCount = 1,
+        };
+        NvOFFRUC_STATUS nv_status = s->NvOFFRUCUnregisterResource(s->fruc, &in_param);
+        if (nv_status) {
+            av_log(ctx, AV_LOG_WARNING, "Could not unregister: %d\n", nv_status);
+        }
+        s->NvOFFRUCDestroy(s->fruc);
+    }
+    if (s->c0)
+        CHECK_CU(cu->cuArrayDestroy(s->c0));
+    if (s->c1)
+        CHECK_CU(cu->cuArrayDestroy(s->c1));
+    if (s->cw)
+        CHECK_CU(cu->cuArrayDestroy(s->cw));
+
+    CHECK_CU(cu->cuCtxPopCurrent(&dummy));
+
+    if (s->fruc_dl)
+        dlclose(s->fruc_dl);
+    av_frame_free(&s->f0);
+    av_frame_free(&s->f1);
+    av_buffer_unref(&s->device_ref);
+}
+
+static const enum AVPixelFormat supported_formats[] = {
+    AV_PIX_FMT_NV12,
+    // Actually any single plane, four channel, 8bit format will work.
+    AV_PIX_FMT_ARGB,
+    AV_PIX_FMT_ABGR,
+    AV_PIX_FMT_RGBA,
+    AV_PIX_FMT_BGRA,
+    AV_PIX_FMT_NONE
+};
+
+static int format_is_supported(enum AVPixelFormat fmt)
+{
+    int i;
+
+    for (i = 0; i < FF_ARRAY_ELEMS(supported_formats); i++)
+        if (supported_formats[i] == fmt)
+            return 1;
+    return 0;
+}
+
+static int activate(AVFilterContext *ctx)
+{
+    int ret, status;
+    AVFilterLink *inlink = ctx->inputs[0];
+    AVFilterLink *outlink = ctx->outputs[0];
+    FRUCContext *s = ctx->priv;
+    AVFrame *inpicref;
+    int64_t pts;
+
+    CudaFunctions *cu = s->hwctx->internal->cuda_dl;
+    CUcontext dummy;
+
+    FF_FILTER_FORWARD_STATUS_BACK(outlink, inlink);
+
+    CHECK_CU(cu->cuCtxPushCurrent(s->cu_ctx));
+
+retry:
+    ret = process_work_frame(ctx);
+    if (ret < 0) {
+        goto exit;
+    } else if (ret == 1) {
+        ret = ff_filter_frame(outlink, s->work);
+        goto exit;
+    }
+
+    ret = ff_inlink_consume_frame(inlink, &inpicref);
+    if (ret < 0)
+        goto exit;
+
+    if (inpicref) {
+        if (inpicref->interlaced_frame)
+            av_log(ctx, AV_LOG_WARNING, "Interlaced frame found - the output will not be correct.\n");
+
+        if (inpicref->pts == AV_NOPTS_VALUE) {
+            av_log(ctx, AV_LOG_WARNING, "Ignoring frame without PTS.\n");
+            av_frame_free(&inpicref);
+        }
+    }
+
+    if (inpicref) {
+        pts = av_rescale_q(inpicref->pts, s->srce_time_base, s->dest_time_base);
+
+        if (s->f1 && pts == s->pts1) {
+            av_log(ctx, AV_LOG_WARNING, "Ignoring frame with same PTS.\n");
+            av_frame_free(&inpicref);
+        }
+    }
+
+    if (inpicref) {
+        av_frame_free(&s->f0);
+        s->f0 = s->f1;
+        s->pts0 = s->pts1;
+
+        s->f1 = inpicref;
+        s->pts1 = pts;
+        s->delta = s->pts1 - s->pts0;
+
+        if (s->delta < 0) {
+            av_log(ctx, AV_LOG_WARNING, "PTS discontinuity.\n");
+            s->start_pts = s->pts1;
+            s->n = 0;
+            av_frame_free(&s->f0);
+        }
+
+        if (s->start_pts == AV_NOPTS_VALUE)
+            s->start_pts = s->pts1;
+
+        goto retry;
+    }
+
+    if (ff_inlink_acknowledge_status(inlink, &status, &pts)) {
+        if (!s->flush) {
+            s->flush = 1;
+            goto retry;
+        }
+        ff_outlink_set_status(outlink, status, pts);
+        ret = 0;
+        goto exit;
+    }
+
+    FF_FILTER_FORWARD_WANTED(outlink, inlink);
+
+    return FFERROR_NOT_READY;
+
+exit:
+    CHECK_CU(cu->cuCtxPopCurrent(&dummy));
+    return ret;
+}
+
+static int config_input(AVFilterLink *inlink)
+{
+    AVFilterContext *ctx = inlink->dst;
+    FRUCContext *s = ctx->priv;
+
+    s->srce_time_base = inlink->time_base;
+    s->blend_factor_max = 1 << (8 -1);
+
+    return 0;
+}
+
+static int config_output(AVFilterLink *outlink)
+{
+    AVFilterContext *ctx = outlink->src;
+    AVFilterLink *inlink = outlink->src->inputs[0];
+    AVHWFramesContext *in_frames_ctx;
+    AVHWFramesContext *output_frames;
+    FRUCContext *s = ctx->priv;
+    CudaFunctions *cu;
+    CUcontext dummy;
+    CUDA_ARRAY_DESCRIPTOR desc = {0,};
+    NvOFFRUC_CREATE_PARAM create_param = {0,};
+    NvOFFRUC_REGISTER_RESOURCE_PARAM register_param = {0,};
+    NvOFFRUC_STATUS status;
+    int exact;
+    int ret;
+
+    ff_dlog(ctx, "config_output()\n");
+
+    ff_dlog(ctx,
+           "config_output() input time base:%u/%u (%f)\n",
+           ctx->inputs[0]->time_base.num,ctx->inputs[0]->time_base.den,
+           av_q2d(ctx->inputs[0]->time_base));
+
+    // make sure timebase is small enough to hold the framerate
+
+    exact = av_reduce(&s->dest_time_base.num, &s->dest_time_base.den,
+                      av_gcd((int64_t)s->srce_time_base.num * s->dest_frame_rate.num,
+                             (int64_t)s->srce_time_base.den * s->dest_frame_rate.den ),
+                      (int64_t)s->srce_time_base.den * s->dest_frame_rate.num, INT_MAX);
+
+    av_log(ctx, AV_LOG_INFO,
+           "time base:%u/%u -> %u/%u exact:%d\n",
+           s->srce_time_base.num, s->srce_time_base.den,
+           s->dest_time_base.num, s->dest_time_base.den, exact);
+    if (!exact) {
+        av_log(ctx, AV_LOG_WARNING, "Timebase conversion is not exact\n");
+    }
+
+    outlink->frame_rate = s->dest_frame_rate;
+    outlink->time_base = s->dest_time_base;
+
+    ff_dlog(ctx,
+           "config_output() output time base:%u/%u (%f) w:%d h:%d\n",
+           outlink->time_base.num, outlink->time_base.den,
+           av_q2d(outlink->time_base),
+           outlink->w, outlink->h);
+
+
+    av_log(ctx, AV_LOG_INFO, "fps -> fps:%u/%u\n",
+            s->dest_frame_rate.num, s->dest_frame_rate.den);
+
+    /* check that we have a hw context */
+    if (!inlink->hw_frames_ctx) {
+        av_log(ctx, AV_LOG_ERROR, "No hw context provided on input\n");
+        return AVERROR(EINVAL);
+    }
+    in_frames_ctx = (AVHWFramesContext*)inlink->hw_frames_ctx->data;
+    s->format = in_frames_ctx->sw_format;
+
+    if (!format_is_supported(s->format)) {
+        av_log(ctx, AV_LOG_ERROR, "Unsupported input format: %s\n",
+               av_get_pix_fmt_name(s->format));
+        return AVERROR(ENOSYS);
+    }
+
+    s->device_ref = av_buffer_ref(in_frames_ctx->device_ref);
+    if (!s->device_ref)
+        return AVERROR(ENOMEM);
+
+    s->hwctx = ((AVHWDeviceContext*)s->device_ref->data)->hwctx;
+    s->cu_ctx = s->hwctx->cuda_ctx;
+    s->stream = s->hwctx->stream;
+    cu = s->hwctx->internal->cuda_dl;
+    outlink->hw_frames_ctx = av_hwframe_ctx_alloc(s->device_ref);
+    if (!inlink->hw_frames_ctx)
+        return AVERROR(ENOMEM);
+
+    output_frames = (AVHWFramesContext*)outlink->hw_frames_ctx->data;
+
+    output_frames->format    = AV_PIX_FMT_CUDA;
+    output_frames->sw_format = s->format;
+    output_frames->width     = ctx->inputs[0]->w;
+    output_frames->height    = ctx->inputs[0]->h;
+
+    output_frames->initial_pool_size = 3;
+
+    ret = ff_filter_init_hw_frames(ctx, outlink, 0);
+    if (ret < 0)
+        return ret;
+
+    ret = av_hwframe_ctx_init(outlink->hw_frames_ctx);
+    if (ret < 0) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to initialise CUDA frame "
+               "context for output: %d\n", ret);
+        return ret;
+    }
+
+    outlink->w = inlink->w;
+    outlink->h = inlink->h;
+
+    ret = CHECK_CU(cu->cuCtxPushCurrent(s->cu_ctx));
+    if (ret < 0)
+        return ret;
+
+    desc.Format = CU_AD_FORMAT_UNSIGNED_INT8;
+    desc.Height = inlink->h * (s->format == AV_PIX_FMT_NV12 ? 1.5 : 1);
+    desc.Width = inlink->w;
+    desc.NumChannels = s->format == AV_PIX_FMT_NV12 ? 1 : 4;
+    ret = CHECK_CU(cu->cuArrayCreate(&s->c0, &desc));
+    if (ret < 0)
+        goto exit;
+    ret = CHECK_CU(cu->cuArrayCreate(&s->c1, &desc));
+    if (ret < 0)
+        goto exit;
+    ret = CHECK_CU(cu->cuArrayCreate(&s->cw, &desc));
+    if (ret < 0)
+        goto exit;
+
+    create_param.uiWidth = inlink->w;
+    create_param.uiHeight = inlink->h;
+    create_param.pDevice = NULL;
+    create_param.eResourceType = CudaResource;
+    create_param.eSurfaceFormat = s->format == AV_PIX_FMT_NV12 ? NV12Surface : ARGBSurface;
+    create_param.eCUDAResourceType = CudaResourceCuArray;
+    status = s->NvOFFRUCCreate(&create_param, &s->fruc);
+    if (status) {
+        av_log(ctx, AV_LOG_ERROR, "FRUC: Failed to create: %d\n", status);
+        ret = AVERROR(ENOSYS);
+        goto exit;
+    }
+
+    register_param.pArrResource[0] = s->c0;
+    register_param.pArrResource[1] = s->c1;
+    register_param.pArrResource[2] = s->cw;
+    register_param.uiCount = 3;
+    status = s->NvOFFRUCRegisterResource(s->fruc, &register_param);
+    if (status) {
+        av_log(ctx, AV_LOG_ERROR, "FRUC: Failed to register: %d\n", status);
+        ret = AVERROR(ENOSYS);
+        goto exit;
+    }
+
+    ret = 0;
+exit:
+    CHECK_CU(cu->cuCtxPopCurrent(&dummy));
+    return ret;
+}
+
+static const AVFilterPad framerate_inputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_VIDEO,
+        .config_props = config_input,
+    },
+};
+
+static const AVFilterPad framerate_outputs[] = {
+    {
+        .name          = "default",
+        .type          = AVMEDIA_TYPE_VIDEO,
+        .config_props  = config_output,
+    },
+};
+
+const AVFilter ff_vf_nvoffruc = {
+    .name          = "nvoffruc",
+    .description   = NULL_IF_CONFIG_SMALL("Upsamples progressive source to specified frame rates with nvidia FRUC"),
+    .priv_size     = sizeof(FRUCContext),
+    .priv_class    = &nvoffruc_class,
+    .init          = init,
+    .uninit        = uninit,
+    .activate      = activate,
+    FILTER_INPUTS(framerate_inputs),
+    FILTER_OUTPUTS(framerate_outputs),
+    FILTER_SINGLE_PIXFMT(AV_PIX_FMT_CUDA),
+    .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE,
+};
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".