[FFmpeg-devel] [PATCH 0/2] Interpolation filter using nvidia OFFRUC Library

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed

* [FFmpeg-devel] [PATCH 0/2] Interpolation filter using nvidia OFFRUC Library
@ 2023-01-02 23:21 Philip Langdale
  2023-01-02 23:21 ` [FFmpeg-devel] [PATCH 1/2] lavu/hwcontext_cuda: declare support for argb/abgr/rgba/bgra Philip Langdale
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Philip Langdale @ 2023-01-02 23:21 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Philip Langdale

This filter implements frame rate down/upsampling using nvidia's
Optical Flow FRUC (Frame Rate Up Conversion) library. It's neat because
you get realtime interpolation with a decent level of quality. It's
impractical because of licensing.

I have no actual intention to merge this, as it doesn't even meet our
bar for a nonfree filter, and given the EULA restrictions with the SDK,
anyone who would want to use it can easily cherry-pick it into the
build they have to anyway. But I figured I'd send it to list as a way
of announcing that it exists.

How nice would it be if nvidia had sane licensing on this stuff?

I'll keep a branch at: https://github.com/philipl/FFmpeg/tree/fruc-me

--phil

Philip Langdale (2):
  lavu/hwcontext_cuda: declare support for argb/abgr/rgba/bgra
  avfilter/vf_nvoffruc: Add filter for nvidia's Optical Flow FRUC
    library

 configure                  |   7 +-
 libavfilter/Makefile       |   1 +
 libavfilter/allfilters.c   |   1 +
 libavfilter/vf_nvoffruc.c  | 644 +++++++++++++++++++++++++++++++++++++
 libavutil/hwcontext_cuda.c |   4 +
 5 files changed, 654 insertions(+), 3 deletions(-)
 create mode 100644 libavfilter/vf_nvoffruc.c

-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [FFmpeg-devel] [PATCH 1/2] lavu/hwcontext_cuda: declare support for argb/abgr/rgba/bgra
  2023-01-02 23:21 [FFmpeg-devel] [PATCH 0/2] Interpolation filter using nvidia OFFRUC Library Philip Langdale
@ 2023-01-02 23:21 ` Philip Langdale
  2023-01-02 23:21 ` [FFmpeg-devel] [PATCH 2/2] avfilter/vf_nvoffruc: Add filter for nvidia's Optical Flow FRUC library Philip Langdale
  2023-01-02 23:39 ` [FFmpeg-devel] [PATCH 0/2] Interpolation filter using nvidia OFFRUC Library Dennis Mungai
  2 siblings, 0 replies; 7+ messages in thread
From: Philip Langdale @ 2023-01-02 23:21 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Philip Langdale

These can be useful.

Signed-off-by: Philip Langdale <philipl@overt.org>
---
 libavutil/hwcontext_cuda.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/libavutil/hwcontext_cuda.c b/libavutil/hwcontext_cuda.c
index 5ae7711c94..22eb9f5513 100644
--- a/libavutil/hwcontext_cuda.c
+++ b/libavutil/hwcontext_cuda.c
@@ -45,6 +45,10 @@ static const enum AVPixelFormat supported_formats[] = {
     AV_PIX_FMT_YUV444P16,
     AV_PIX_FMT_0RGB32,
     AV_PIX_FMT_0BGR32,
+    AV_PIX_FMT_ARGB,
+    AV_PIX_FMT_ABGR,
+    AV_PIX_FMT_RGBA,
+    AV_PIX_FMT_BGRA,
 #if CONFIG_VULKAN
     AV_PIX_FMT_VULKAN,
 #endif
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [FFmpeg-devel] [PATCH 2/2] avfilter/vf_nvoffruc: Add filter for nvidia's Optical Flow FRUC library
  2023-01-02 23:21 [FFmpeg-devel] [PATCH 0/2] Interpolation filter using nvidia OFFRUC Library Philip Langdale
  2023-01-02 23:21 ` [FFmpeg-devel] [PATCH 1/2] lavu/hwcontext_cuda: declare support for argb/abgr/rgba/bgra Philip Langdale
@ 2023-01-02 23:21 ` Philip Langdale
  2023-01-02 23:29   ` Dennis Mungai
  2023-01-02 23:39 ` [FFmpeg-devel] [PATCH 0/2] Interpolation filter using nvidia OFFRUC Library Dennis Mungai
  2 siblings, 1 reply; 7+ messages in thread
From: Philip Langdale @ 2023-01-02 23:21 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Philip Langdale

The NvOFFRUC library provides a GPU accelerated interpolation feature
based on nvidia's Optical Flow functionality. It's able to provide
reasonably high quality realtime interpolation to increase the frame
rate of a video stream - as opposed to vf_framerate that just does a
linear blend or vf_minterpolate that is anything but realtime.

As interesting as that sounds, there are a lot of limitations that
mean this filter is mostly just a toy.

1. The licensing is useless. The library and header and distributed as
   part of the Optical Flow SDK which has a proprietary EULA, so anyone
   wanting to build the filter must obtain the SDK for both build and
   runtime and the resulting binaries will be nonfree and
   unredistributable.

2. The NvOFFRUC.h header doesn't even compile in pure C without
   modification.

3. The library can only handle NV12 and "ARGB" (which realy means any
   single plane, four channel, 8 bit format). This means it can't help
   with our inevitable future dominated by 10+ bit formats.

4. The pitch handling logic in the library is very inflexiable, and it
   assumes that for NV12, the UV plane is contiguous with the Y plane.
   This actually ends up making it incompatible with nvdec output for
   certain frame sizes. To avoid constantly fighting edge cases, I took
   the brute force approach and copy the input and output frames
   to/from CUarrays (which the library can accept) to give me a way to
   ensure the correct layout is used.

5. The library is stateful in an unhelpful way. It is called with one
   input frame, and one output buffer and always interpolates between
   the passed input frame and the frame from the previous call. This
   both requires special handling for the first frame, and also
   prevents generating more than one intermediate frame. If you want
   to do 3x or 4x etc interpolation, this approach doesn't work.

   So, again, I brute forced it by treating every interpolation as a
   new session - calling it twice with each input frame, even if the
   first frame happens to be the same as the last frame we called it
   with. This allows us to generate as many intermediate frames as we
   want, but it presumably consumes more GPU resources.

6. The library always creates a `NvOFFRUC` directory with an empty log
   file in it in $PWD. What a niusance.

But with all those caveats and limitations, it does actually work. I
was able to upsample a 24fps file to 144fps (my monitor limit) with
respectable results. In some situations, it starts bogging down, and
I'm not entirely sure where those limits are - certainly I can see it
consuming a significant percentage of GPU resources for large scaling
factors.

The implementation here is heavily based on vf_framerate with the
blending function ripped out and replaced by NvOFFRUC. That means we
have all the nice properties in terms of being able to do non-integer
scaling, and downsampling via interpolation as well.

Is this mergeable? No - but it was an interesting exercise and maybe
folks in narrow circumstances may find some genuine use from it.

Signed-off-by: Philip Langdale <philipl@overt.org>
---
 configure                 |   7 +-
 libavfilter/Makefile      |   1 +
 libavfilter/allfilters.c  |   1 +
 libavfilter/vf_nvoffruc.c | 644 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 650 insertions(+), 3 deletions(-)
 create mode 100644 libavfilter/vf_nvoffruc.c

diff --git a/configure b/configure
index 675dc84f56..6ea9f89f97 100755
--- a/configure
+++ b/configure
@@ -3691,6 +3691,7 @@ mptestsrc_filter_deps="gpl"
 negate_filter_deps="lut_filter"
 nlmeans_opencl_filter_deps="opencl"
 nnedi_filter_deps="gpl"
+nvoffruc_filter_deps="ffnvcodec nonfree"
 ocr_filter_deps="libtesseract"
 ocv_filter_deps="libopencv"
 openclsrc_filter_deps="opencl"
@@ -6450,9 +6451,9 @@ fi
 if ! disabled ffnvcodec; then
     ffnv_hdr_list="ffnvcodec/nvEncodeAPI.h ffnvcodec/dynlink_cuda.h ffnvcodec/dynlink_cuviddec.h ffnvcodec/dynlink_nvcuvid.h"
     check_pkg_config ffnvcodec "ffnvcodec >= 12.0.16.0" "$ffnv_hdr_list" "" || \
-      check_pkg_config ffnvcodec "ffnvcodec >= 11.1.5.2 ffnvcodec < 12.0" "$ffnv_hdr_list" "" || \
-      check_pkg_config ffnvcodec "ffnvcodec >= 11.0.10.2 ffnvcodec < 11.1" "$ffnv_hdr_list" "" || \
-      check_pkg_config ffnvcodec "ffnvcodec >= 8.1.24.14 ffnvcodec < 8.2" "$ffnv_hdr_list" ""
+      check_pkg_config ffnvcodec "ffnvcodec >= 11.1.5.3 ffnvcodec < 12.0" "$ffnv_hdr_list" "" || \
+      check_pkg_config ffnvcodec "ffnvcodec >= 11.0.10.3 ffnvcodec < 11.1" "$ffnv_hdr_list" "" || \
+      check_pkg_config ffnvcodec "ffnvcodec >= 8.1.24.15 ffnvcodec < 8.2" "$ffnv_hdr_list" ""
 fi
 
 if enabled_all libglslang libshaderc; then
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index cb41ccc622..292597f3a8 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -389,6 +389,7 @@ OBJS-$(CONFIG_NOFORMAT_FILTER)               += vf_format.o
 OBJS-$(CONFIG_NOISE_FILTER)                  += vf_noise.o
 OBJS-$(CONFIG_NORMALIZE_FILTER)              += vf_normalize.o
 OBJS-$(CONFIG_NULL_FILTER)                   += vf_null.o
+OBJS-$(CONFIG_NVOFFRUC_FILTER)               += vf_nvoffruc.o
 OBJS-$(CONFIG_OCR_FILTER)                    += vf_ocr.o
 OBJS-$(CONFIG_OCV_FILTER)                    += vf_libopencv.o
 OBJS-$(CONFIG_OSCILLOSCOPE_FILTER)           += vf_datascope.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 52741b60e4..84f102806e 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -368,6 +368,7 @@ extern const AVFilter ff_vf_noformat;
 extern const AVFilter ff_vf_noise;
 extern const AVFilter ff_vf_normalize;
 extern const AVFilter ff_vf_null;
+extern const AVFilter ff_vf_nvoffruc;
 extern const AVFilter ff_vf_ocr;
 extern const AVFilter ff_vf_ocv;
 extern const AVFilter ff_vf_oscilloscope;
diff --git a/libavfilter/vf_nvoffruc.c b/libavfilter/vf_nvoffruc.c
new file mode 100644
index 0000000000..e3a9f9e553
--- /dev/null
+++ b/libavfilter/vf_nvoffruc.c
@@ -0,0 +1,644 @@
+/*
+ * Copyright (C) 2022 Philip Langdale <philipl@overt.org>
+ * Based on vf_framerate - Copyright (C) 2012 Mark Himsley
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * filter upsamples the frame rate of a source using the nvidia Optical Flow
+ * FRUC library.
+ */
+
+#include <dlfcn.h>
+#include "libavutil/avassert.h"
+#include "libavutil/cuda_check.h"
+#include "libavutil/hwcontext.h"
+#include "libavutil/hwcontext_cuda_internal.h"
+#include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
+
+#include "avfilter.h"
+#include "filters.h"
+#include "internal.h"
+/*
+ * This cannot be distributed with the filter due to licensing. If you want to
+ * compile this filter, you will need to obtain it from nvidia and then fix it
+ * to work in a pure C environment:
+ * * Remove the `using namespace std;`
+ * * Replace the `bool *` with `void *`
+ */
+#include "NvOFFRUC.h"
+
+typedef struct FRUCContext {
+    const AVClass *class;
+
+    AVCUDADeviceContext *hwctx;
+    AVBufferRef         *device_ref;
+
+    CUcontext cu_ctx;
+    CUstream  stream;
+    CUarray   c0;                       ///< CUarray for f0
+    CUarray   c1;                       ///< CUarray for f1
+    CUarray   cw;                       ///< CUarray for work
+
+    AVRational dest_frame_rate;
+    int interp_start;                   ///< start of range to apply interpolation
+    int interp_end;                     ///< end of range to apply interpolation
+
+    AVRational srce_time_base;          ///< timebase of source
+    AVRational dest_time_base;          ///< timebase of destination
+
+    int blend_factor_max;
+    AVFrame *work;
+    enum AVPixelFormat format;
+
+    AVFrame *f0;                        ///< last frame
+    AVFrame *f1;                        ///< current frame
+    int64_t pts0;                       ///< last frame pts in dest_time_base
+    int64_t pts1;                       ///< current frame pts in dest_time_base
+    int64_t delta;                      ///< pts1 to pts0 delta
+    int flush;                          ///< 1 if the filter is being flushed
+    int64_t start_pts;                  ///< pts of the first output frame
+    int64_t n;                          ///< output frame counter
+
+    void *fruc_dl;
+    PtrToFuncNvOFFRUCCreate NvOFFRUCCreate;
+    PtrToFuncNvOFFRUCRegisterResource NvOFFRUCRegisterResource;
+    PtrToFuncNvOFFRUCUnregisterResource NvOFFRUCUnregisterResource;
+    PtrToFuncNvOFFRUCProcess NvOFFRUCProcess;
+    PtrToFuncNvOFFRUCDestroy NvOFFRUCDestroy;
+    NvOFFRUCHandle fruc;
+} FRUCContext;
+
+#define CHECK_CU(x) FF_CUDA_CHECK_DL(ctx, s->hwctx->internal->cuda_dl, x)
+#define OFFSET(x) offsetof(FRUCContext, x)
+#define V AV_OPT_FLAG_VIDEO_PARAM
+#define F AV_OPT_FLAG_FILTERING_PARAM
+#define FRAMERATE_FLAG_SCD 01
+
+static const AVOption nvoffruc_options[] = {
+    {"fps",                 "required output frames per second rate", OFFSET(dest_frame_rate), AV_OPT_TYPE_VIDEO_RATE, {.str="50"},             0,       INT_MAX, V|F },
+
+    {"interp_start",        "point to start linear interpolation",    OFFSET(interp_start),    AV_OPT_TYPE_INT,      {.i64=15},                 0,       255,     V|F },
+    {"interp_end",          "point to end linear interpolation",      OFFSET(interp_end),      AV_OPT_TYPE_INT,      {.i64=240},                0,       255,     V|F },
+
+    {NULL}
+};
+
+AVFILTER_DEFINE_CLASS(nvoffruc);
+
+static int blend_frames(AVFilterContext *ctx, int64_t work_pts)
+{
+    FRUCContext *s = ctx->priv;
+    AVFilterLink *outlink = ctx->outputs[0];
+
+    CudaFunctions *cu = s->hwctx->internal->cuda_dl;
+    CUDA_MEMCPY2D cpy_params = {0,};
+    NvOFFRUC_PROCESS_IN_PARAMS in = {0,};
+    NvOFFRUC_PROCESS_OUT_PARAMS out = {0,};
+    NvOFFRUC_STATUS status;
+
+    int num_channels = s->format == AV_PIX_FMT_NV12 ? 1 : 4;
+    int ret;
+    uint64_t ignored;
+
+    // get work-space for output frame
+    s->work = ff_get_video_buffer(outlink, outlink->w, outlink->h);
+    if (!s->work)
+        return AVERROR(ENOMEM);
+
+    av_frame_copy_props(s->work, s->f0);
+
+    cpy_params.srcMemoryType = CU_MEMORYTYPE_DEVICE,
+    cpy_params.srcDevice = (CUdeviceptr)s->f0->data[0],
+    cpy_params.srcPitch = s->f0->linesize[0],
+    cpy_params.srcY = 0,
+    cpy_params.dstMemoryType = CU_MEMORYTYPE_ARRAY,
+    cpy_params.dstArray = s->c0,
+    cpy_params.dstY = 0,
+    cpy_params.WidthInBytes = s->f0->width * num_channels,
+    cpy_params.Height = s->f0->height,
+    ret = CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
+    if (ret < 0)
+        return ret;
+
+    if (s->f0->data[1]) {
+        cpy_params.srcMemoryType = CU_MEMORYTYPE_DEVICE,
+        cpy_params.srcDevice = (CUdeviceptr)s->f0->data[1],
+        cpy_params.srcPitch = s->f0->linesize[1],
+        cpy_params.srcY = 0,
+        cpy_params.dstMemoryType = CU_MEMORYTYPE_ARRAY,
+        cpy_params.dstArray = s->c0,
+        cpy_params.dstY = s->f0->height,
+        cpy_params.WidthInBytes = s->f0->width * num_channels,
+        cpy_params.Height = s->f0->height * 0.5,
+        CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
+        if (ret < 0)
+            return ret;
+    }
+
+    cpy_params.srcMemoryType = CU_MEMORYTYPE_DEVICE,
+    cpy_params.srcDevice = (CUdeviceptr)s->f1->data[0],
+    cpy_params.srcPitch = s->f1->linesize[0],
+    cpy_params.srcY = 0,
+    cpy_params.dstMemoryType = CU_MEMORYTYPE_ARRAY,
+    cpy_params.dstArray = s->c1,
+    cpy_params.dstY = 0,
+    cpy_params.WidthInBytes = s->f1->width * num_channels,
+    cpy_params.Height = s->f1->height,
+    CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
+    if (ret < 0)
+        return ret;
+
+    if (s->f1->data[1]) {
+        cpy_params.srcMemoryType = CU_MEMORYTYPE_DEVICE,
+        cpy_params.srcDevice = (CUdeviceptr)s->f1->data[1],
+        cpy_params.srcPitch = s->f1->linesize[1],
+        cpy_params.srcY = 0,
+        cpy_params.dstMemoryType = CU_MEMORYTYPE_ARRAY,
+        cpy_params.dstArray = s->c1,
+        cpy_params.dstY = s->f1->height,
+        cpy_params.WidthInBytes = s->f1->width * num_channels,
+        cpy_params.Height = s->f1->height * 0.5,
+        CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
+        if (ret < 0)
+            return ret;
+    }
+
+    in.stFrameDataInput.pFrame = s->c0;
+    in.stFrameDataInput.nTimeStamp = s->pts0;
+    out.stFrameDataOutput.pFrame = s->cw,
+    out.stFrameDataOutput.nTimeStamp = s->pts0;
+    out.stFrameDataOutput.bHasFrameRepetitionOccurred = &ignored;
+    status = s->NvOFFRUCProcess(s->fruc, &in, &out);
+    if (status) {
+        av_log(ctx, AV_LOG_ERROR, "FRUC: Process failure: %d\n", status);
+        return AVERROR(ENOSYS);
+    }
+
+    in.stFrameDataInput.pFrame = s->c1;
+    in.stFrameDataInput.nTimeStamp = s->pts1;
+    out.stFrameDataOutput.pFrame = s->cw,
+    out.stFrameDataOutput.nTimeStamp = work_pts;
+    out.stFrameDataOutput.bHasFrameRepetitionOccurred = &ignored;
+    status = s->NvOFFRUCProcess(s->fruc, &in, &out);
+    if (status) {
+        av_log(ctx, AV_LOG_ERROR, "FRUC: Process failure: %d\n", status);
+        return AVERROR(ENOSYS);
+    }
+
+    cpy_params.srcMemoryType = CU_MEMORYTYPE_ARRAY,
+    cpy_params.srcArray = s->cw,
+    cpy_params.srcY = 0,
+    cpy_params.dstMemoryType = CU_MEMORYTYPE_DEVICE,
+    cpy_params.dstDevice = (CUdeviceptr)s->work->data[0],
+    cpy_params.dstPitch = s->work->linesize[0],
+    cpy_params.dstY = 0,
+    cpy_params.WidthInBytes = s->work->width * num_channels,
+    cpy_params.Height = s->work->height,
+    CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
+    if (ret < 0)
+        return ret;
+
+    if (s->work->data[1]) {
+        cpy_params.srcMemoryType = CU_MEMORYTYPE_ARRAY,
+        cpy_params.srcArray = s->cw,
+        cpy_params.srcY = s->work->height,
+        cpy_params.dstMemoryType = CU_MEMORYTYPE_DEVICE,
+        cpy_params.dstDevice = (CUdeviceptr)s->work->data[1],
+        cpy_params.dstPitch = s->work->linesize[1],
+        cpy_params.dstY = 0,
+        cpy_params.WidthInBytes = s->work->width * num_channels,
+        cpy_params.Height = s->work->height * 0.5,
+        CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
+        if (ret < 0)
+            return ret;
+    }
+
+    return 0;
+}
+
+static int process_work_frame(AVFilterContext *ctx)
+{
+    FRUCContext *s = ctx->priv;
+    int64_t work_pts;
+    int64_t interpolate, interpolate8;
+    int ret;
+
+    if (!s->f1)
+        return 0;
+    if (!s->f0 && !s->flush)
+        return 0;
+
+    work_pts = s->start_pts + av_rescale_q(s->n, av_inv_q(s->dest_frame_rate), s->dest_time_base);
+
+    if (work_pts >= s->pts1 && !s->flush)
+        return 0;
+
+    if (!s->f0) {
+        av_assert1(s->flush);
+        s->work = s->f1;
+        s->f1 = NULL;
+    } else {
+        if (work_pts >= s->pts1 + s->delta && s->flush)
+            return 0;
+
+        interpolate = av_rescale(work_pts - s->pts0, s->blend_factor_max, s->delta);
+        interpolate8 = av_rescale(work_pts - s->pts0, 256, s->delta);
+        ff_dlog(ctx, "process_work_frame() interpolate: %"PRId64"/256\n", interpolate8);
+        if (interpolate >= s->blend_factor_max || interpolate8 > s->interp_end) {
+            av_log(ctx, AV_LOG_DEBUG, "Matched f0: pts %lu\n", work_pts);
+            s->work = av_frame_clone(s->f1);
+        } else if (interpolate <= 0 || interpolate8 < s->interp_start) {
+            av_log(ctx, AV_LOG_DEBUG, "Matched f1: pts %lu\n", work_pts);
+            s->work = av_frame_clone(s->f0);
+        } else {
+            av_log(ctx, AV_LOG_DEBUG, "Unmatched pts: %lu\n", work_pts);
+            ret = blend_frames(ctx, work_pts);
+            if (ret < 0)
+                return ret;
+        }
+    }
+
+    if (!s->work)
+        return AVERROR(ENOMEM);
+
+    s->work->pts = work_pts;
+    s->n++;
+
+    return 1;
+}
+
+static av_cold int init(AVFilterContext *ctx)
+{
+    FRUCContext *s = ctx->priv;
+    s->start_pts = AV_NOPTS_VALUE;
+
+    // TODO: Need windows equivalent symbol loading
+    s->fruc_dl = dlopen("libNvOFFRUC.so", RTLD_LAZY);
+    if (!s->fruc_dl) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to load FRUC: %s\n", dlerror());
+        return AVERROR(EINVAL);
+    }
+
+    s->NvOFFRUCCreate = (PtrToFuncNvOFFRUCCreate)
+        dlsym(s->fruc_dl, "NvOFFRUCCreate");
+    s->NvOFFRUCRegisterResource = (PtrToFuncNvOFFRUCRegisterResource)
+        dlsym(s->fruc_dl, "NvOFFRUCRegisterResource");
+    s->NvOFFRUCUnregisterResource = (PtrToFuncNvOFFRUCUnregisterResource)
+        dlsym(s->fruc_dl, "NvOFFRUCUnregisterResource");
+    s->NvOFFRUCProcess = (PtrToFuncNvOFFRUCProcess)
+        dlsym(s->fruc_dl, "NvOFFRUCProcess");
+    s->NvOFFRUCDestroy = (PtrToFuncNvOFFRUCDestroy)
+        dlsym(s->fruc_dl, "NvOFFRUCDestroy");
+    return 0;
+}
+
+static av_cold void uninit(AVFilterContext *ctx)
+{
+    FRUCContext *s = ctx->priv;
+    CudaFunctions *cu = s->hwctx->internal->cuda_dl;
+    CUcontext dummy;
+
+    CHECK_CU(cu->cuCtxPushCurrent(s->cu_ctx));
+
+    if (s->fruc) {
+        NvOFFRUC_UNREGISTER_RESOURCE_PARAM in_param = {
+            .pArrResource = {s->c0, s->c1, s->cw},
+            .uiCount = 1,
+        };
+        NvOFFRUC_STATUS nv_status = s->NvOFFRUCUnregisterResource(s->fruc, &in_param);
+        if (nv_status) {
+            av_log(ctx, AV_LOG_WARNING, "Could not unregister: %d\n", nv_status);
+        }
+        s->NvOFFRUCDestroy(s->fruc);
+    }
+    if (s->c0)
+        CHECK_CU(cu->cuArrayDestroy(s->c0));
+    if (s->c1)
+        CHECK_CU(cu->cuArrayDestroy(s->c1));
+    if (s->cw)
+        CHECK_CU(cu->cuArrayDestroy(s->cw));
+
+    CHECK_CU(cu->cuCtxPopCurrent(&dummy));
+
+    if (s->fruc_dl)
+        dlclose(s->fruc_dl);
+    av_frame_free(&s->f0);
+    av_frame_free(&s->f1);
+    av_buffer_unref(&s->device_ref);
+}
+
+static const enum AVPixelFormat supported_formats[] = {
+    AV_PIX_FMT_NV12,
+    // Actually any single plane, four channel, 8bit format will work.
+    AV_PIX_FMT_ARGB,
+    AV_PIX_FMT_ABGR,
+    AV_PIX_FMT_RGBA,
+    AV_PIX_FMT_BGRA,
+    AV_PIX_FMT_NONE
+};
+
+static int format_is_supported(enum AVPixelFormat fmt)
+{
+    int i;
+
+    for (i = 0; i < FF_ARRAY_ELEMS(supported_formats); i++)
+        if (supported_formats[i] == fmt)
+            return 1;
+    return 0;
+}
+
+static int activate(AVFilterContext *ctx)
+{
+    int ret, status;
+    AVFilterLink *inlink = ctx->inputs[0];
+    AVFilterLink *outlink = ctx->outputs[0];
+    FRUCContext *s = ctx->priv;
+    AVFrame *inpicref;
+    int64_t pts;
+
+    CudaFunctions *cu = s->hwctx->internal->cuda_dl;
+    CUcontext dummy;
+
+    FF_FILTER_FORWARD_STATUS_BACK(outlink, inlink);
+
+    CHECK_CU(cu->cuCtxPushCurrent(s->cu_ctx));
+
+retry:
+    ret = process_work_frame(ctx);
+    if (ret < 0) {
+        goto exit;
+    } else if (ret == 1) {
+        ret = ff_filter_frame(outlink, s->work);
+        goto exit;
+    }
+
+    ret = ff_inlink_consume_frame(inlink, &inpicref);
+    if (ret < 0)
+        goto exit;
+
+    if (inpicref) {
+        if (inpicref->interlaced_frame)
+            av_log(ctx, AV_LOG_WARNING, "Interlaced frame found - the output will not be correct.\n");
+
+        if (inpicref->pts == AV_NOPTS_VALUE) {
+            av_log(ctx, AV_LOG_WARNING, "Ignoring frame without PTS.\n");
+            av_frame_free(&inpicref);
+        }
+    }
+
+    if (inpicref) {
+        pts = av_rescale_q(inpicref->pts, s->srce_time_base, s->dest_time_base);
+
+        if (s->f1 && pts == s->pts1) {
+            av_log(ctx, AV_LOG_WARNING, "Ignoring frame with same PTS.\n");
+            av_frame_free(&inpicref);
+        }
+    }
+
+    if (inpicref) {
+        av_frame_free(&s->f0);
+        s->f0 = s->f1;
+        s->pts0 = s->pts1;
+
+        s->f1 = inpicref;
+        s->pts1 = pts;
+        s->delta = s->pts1 - s->pts0;
+
+        if (s->delta < 0) {
+            av_log(ctx, AV_LOG_WARNING, "PTS discontinuity.\n");
+            s->start_pts = s->pts1;
+            s->n = 0;
+            av_frame_free(&s->f0);
+        }
+
+        if (s->start_pts == AV_NOPTS_VALUE)
+            s->start_pts = s->pts1;
+
+        goto retry;
+    }
+
+    if (ff_inlink_acknowledge_status(inlink, &status, &pts)) {
+        if (!s->flush) {
+            s->flush = 1;
+            goto retry;
+        }
+        ff_outlink_set_status(outlink, status, pts);
+        ret = 0;
+        goto exit;
+    }
+
+    FF_FILTER_FORWARD_WANTED(outlink, inlink);
+
+    return FFERROR_NOT_READY;
+
+exit:
+    CHECK_CU(cu->cuCtxPopCurrent(&dummy));
+    return ret;
+}
+
+static int config_input(AVFilterLink *inlink)
+{
+    AVFilterContext *ctx = inlink->dst;
+    FRUCContext *s = ctx->priv;
+
+    s->srce_time_base = inlink->time_base;
+    s->blend_factor_max = 1 << (8 -1);
+
+    return 0;
+}
+
+static int config_output(AVFilterLink *outlink)
+{
+    AVFilterContext *ctx = outlink->src;
+    AVFilterLink *inlink = outlink->src->inputs[0];
+    AVHWFramesContext *in_frames_ctx;
+    AVHWFramesContext *output_frames;
+    FRUCContext *s = ctx->priv;
+    CudaFunctions *cu;
+    CUcontext dummy;
+    CUDA_ARRAY_DESCRIPTOR desc = {0,};
+    NvOFFRUC_CREATE_PARAM create_param = {0,};
+    NvOFFRUC_REGISTER_RESOURCE_PARAM register_param = {0,};
+    NvOFFRUC_STATUS status;
+    int exact;
+    int ret;
+
+    ff_dlog(ctx, "config_output()\n");
+
+    ff_dlog(ctx,
+           "config_output() input time base:%u/%u (%f)\n",
+           ctx->inputs[0]->time_base.num,ctx->inputs[0]->time_base.den,
+           av_q2d(ctx->inputs[0]->time_base));
+
+    // make sure timebase is small enough to hold the framerate
+
+    exact = av_reduce(&s->dest_time_base.num, &s->dest_time_base.den,
+                      av_gcd((int64_t)s->srce_time_base.num * s->dest_frame_rate.num,
+                             (int64_t)s->srce_time_base.den * s->dest_frame_rate.den ),
+                      (int64_t)s->srce_time_base.den * s->dest_frame_rate.num, INT_MAX);
+
+    av_log(ctx, AV_LOG_INFO,
+           "time base:%u/%u -> %u/%u exact:%d\n",
+           s->srce_time_base.num, s->srce_time_base.den,
+           s->dest_time_base.num, s->dest_time_base.den, exact);
+    if (!exact) {
+        av_log(ctx, AV_LOG_WARNING, "Timebase conversion is not exact\n");
+    }
+
+    outlink->frame_rate = s->dest_frame_rate;
+    outlink->time_base = s->dest_time_base;
+
+    ff_dlog(ctx,
+           "config_output() output time base:%u/%u (%f) w:%d h:%d\n",
+           outlink->time_base.num, outlink->time_base.den,
+           av_q2d(outlink->time_base),
+           outlink->w, outlink->h);
+
+
+    av_log(ctx, AV_LOG_INFO, "fps -> fps:%u/%u\n",
+            s->dest_frame_rate.num, s->dest_frame_rate.den);
+
+    /* check that we have a hw context */
+    if (!inlink->hw_frames_ctx) {
+        av_log(ctx, AV_LOG_ERROR, "No hw context provided on input\n");
+        return AVERROR(EINVAL);
+    }
+    in_frames_ctx = (AVHWFramesContext*)inlink->hw_frames_ctx->data;
+    s->format = in_frames_ctx->sw_format;
+
+    if (!format_is_supported(s->format)) {
+        av_log(ctx, AV_LOG_ERROR, "Unsupported input format: %s\n",
+               av_get_pix_fmt_name(s->format));
+        return AVERROR(ENOSYS);
+    }
+
+    s->device_ref = av_buffer_ref(in_frames_ctx->device_ref);
+    if (!s->device_ref)
+        return AVERROR(ENOMEM);
+
+    s->hwctx = ((AVHWDeviceContext*)s->device_ref->data)->hwctx;
+    s->cu_ctx = s->hwctx->cuda_ctx;
+    s->stream = s->hwctx->stream;
+    cu = s->hwctx->internal->cuda_dl;
+    outlink->hw_frames_ctx = av_hwframe_ctx_alloc(s->device_ref);
+    if (!inlink->hw_frames_ctx)
+        return AVERROR(ENOMEM);
+
+    output_frames = (AVHWFramesContext*)outlink->hw_frames_ctx->data;
+
+    output_frames->format    = AV_PIX_FMT_CUDA;
+    output_frames->sw_format = s->format;
+    output_frames->width     = ctx->inputs[0]->w;
+    output_frames->height    = ctx->inputs[0]->h;
+
+    output_frames->initial_pool_size = 3;
+
+    ret = ff_filter_init_hw_frames(ctx, outlink, 0);
+    if (ret < 0)
+        return ret;
+
+    ret = av_hwframe_ctx_init(outlink->hw_frames_ctx);
+    if (ret < 0) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to initialise CUDA frame "
+               "context for output: %d\n", ret);
+        return ret;
+    }
+
+    outlink->w = inlink->w;
+    outlink->h = inlink->h;
+
+    ret = CHECK_CU(cu->cuCtxPushCurrent(s->cu_ctx));
+    if (ret < 0)
+        return ret;
+
+    desc.Format = CU_AD_FORMAT_UNSIGNED_INT8;
+    desc.Height = inlink->h * (s->format == AV_PIX_FMT_NV12 ? 1.5 : 1);
+    desc.Width = inlink->w;
+    desc.NumChannels = s->format == AV_PIX_FMT_NV12 ? 1 : 4;
+    ret = CHECK_CU(cu->cuArrayCreate(&s->c0, &desc));
+    if (ret < 0)
+        goto exit;
+    ret = CHECK_CU(cu->cuArrayCreate(&s->c1, &desc));
+    if (ret < 0)
+        goto exit;
+    ret = CHECK_CU(cu->cuArrayCreate(&s->cw, &desc));
+    if (ret < 0)
+        goto exit;
+
+    create_param.uiWidth = inlink->w;
+    create_param.uiHeight = inlink->h;
+    create_param.pDevice = NULL;
+    create_param.eResourceType = CudaResource;
+    create_param.eSurfaceFormat = s->format == AV_PIX_FMT_NV12 ? NV12Surface : ARGBSurface;
+    create_param.eCUDAResourceType = CudaResourceCuArray;
+    status = s->NvOFFRUCCreate(&create_param, &s->fruc);
+    if (status) {
+        av_log(ctx, AV_LOG_ERROR, "FRUC: Failed to create: %d\n", status);
+        ret = AVERROR(ENOSYS);
+        goto exit;
+    }
+
+    register_param.pArrResource[0] = s->c0;
+    register_param.pArrResource[1] = s->c1;
+    register_param.pArrResource[2] = s->cw;
+    register_param.uiCount = 3;
+    status = s->NvOFFRUCRegisterResource(s->fruc, &register_param);
+    if (status) {
+        av_log(ctx, AV_LOG_ERROR, "FRUC: Failed to register: %d\n", status);
+        ret = AVERROR(ENOSYS);
+        goto exit;
+    }
+
+    ret = 0;
+exit:
+    CHECK_CU(cu->cuCtxPopCurrent(&dummy));
+    return ret;
+}
+
+static const AVFilterPad framerate_inputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_VIDEO,
+        .config_props = config_input,
+    },
+};
+
+static const AVFilterPad framerate_outputs[] = {
+    {
+        .name          = "default",
+        .type          = AVMEDIA_TYPE_VIDEO,
+        .config_props  = config_output,
+    },
+};
+
+const AVFilter ff_vf_nvoffruc = {
+    .name          = "nvoffruc",
+    .description   = NULL_IF_CONFIG_SMALL("Upsamples progressive source to specified frame rates with nvidia FRUC"),
+    .priv_size     = sizeof(FRUCContext),
+    .priv_class    = &nvoffruc_class,
+    .init          = init,
+    .uninit        = uninit,
+    .activate      = activate,
+    FILTER_INPUTS(framerate_inputs),
+    FILTER_OUTPUTS(framerate_outputs),
+    FILTER_SINGLE_PIXFMT(AV_PIX_FMT_CUDA),
+    .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE,
+};
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [FFmpeg-devel] [PATCH 2/2] avfilter/vf_nvoffruc: Add filter for nvidia's Optical Flow FRUC library
  2023-01-02 23:21 ` [FFmpeg-devel] [PATCH 2/2] avfilter/vf_nvoffruc: Add filter for nvidia's Optical Flow FRUC library Philip Langdale
@ 2023-01-02 23:29   ` Dennis Mungai
  0 siblings, 0 replies; 7+ messages in thread
From: Dennis Mungai @ 2023-01-02 23:29 UTC (permalink / raw)
  To: FFmpeg development discussions and patches; +Cc: Philip Langdale

On Tue, 3 Jan 2023 at 02:22, Philip Langdale <philipl@overt.org> wrote:

> The NvOFFRUC library provides a GPU accelerated interpolation feature
> based on nvidia's Optical Flow functionality. It's able to provide
> reasonably high quality realtime interpolation to increase the frame
> rate of a video stream - as opposed to vf_framerate that just does a
> linear blend or vf_minterpolate that is anything but realtime.
>
> As interesting as that sounds, there are a lot of limitations that
> mean this filter is mostly just a toy.
>
> 1. The licensing is useless. The library and header and distributed as
>    part of the Optical Flow SDK which has a proprietary EULA, so anyone
>    wanting to build the filter must obtain the SDK for both build and
>    runtime and the resulting binaries will be nonfree and
>    unredistributable.
>
> 2. The NvOFFRUC.h header doesn't even compile in pure C without
>    modification.
>
> 3. The library can only handle NV12 and "ARGB" (which realy means any
>    single plane, four channel, 8 bit format). This means it can't help
>    with our inevitable future dominated by 10+ bit formats.
>
> 4. The pitch handling logic in the library is very inflexiable, and it
>    assumes that for NV12, the UV plane is contiguous with the Y plane.
>    This actually ends up making it incompatible with nvdec output for
>    certain frame sizes. To avoid constantly fighting edge cases, I took
>    the brute force approach and copy the input and output frames
>    to/from CUarrays (which the library can accept) to give me a way to
>    ensure the correct layout is used.
>
> 5. The library is stateful in an unhelpful way. It is called with one
>    input frame, and one output buffer and always interpolates between
>    the passed input frame and the frame from the previous call. This
>    both requires special handling for the first frame, and also
>    prevents generating more than one intermediate frame. If you want
>    to do 3x or 4x etc interpolation, this approach doesn't work.
>
>    So, again, I brute forced it by treating every interpolation as a
>    new session - calling it twice with each input frame, even if the
>    first frame happens to be the same as the last frame we called it
>    with. This allows us to generate as many intermediate frames as we
>    want, but it presumably consumes more GPU resources.
>
> 6. The library always creates a `NvOFFRUC` directory with an empty log
>    file in it in $PWD. What a niusance.
>
> But with all those caveats and limitations, it does actually work. I
> was able to upsample a 24fps file to 144fps (my monitor limit) with
> respectable results. In some situations, it starts bogging down, and
> I'm not entirely sure where those limits are - certainly I can see it
> consuming a significant percentage of GPU resources for large scaling
> factors.
>
> The implementation here is heavily based on vf_framerate with the
> blending function ripped out and replaced by NvOFFRUC. That means we
> have all the nice properties in terms of being able to do non-integer
> scaling, and downsampling via interpolation as well.
>
> Is this mergeable? No - but it was an interesting exercise and maybe
> folks in narrow circumstances may find some genuine use from it.
>
> Signed-off-by: Philip Langdale <philipl@overt.org>
> ---
>  configure                 |   7 +-
>  libavfilter/Makefile      |   1 +
>  libavfilter/allfilters.c  |   1 +
>  libavfilter/vf_nvoffruc.c | 644 ++++++++++++++++++++++++++++++++++++++
>  4 files changed, 650 insertions(+), 3 deletions(-)
>  create mode 100644 libavfilter/vf_nvoffruc.c
>
> diff --git a/configure b/configure
> index 675dc84f56..6ea9f89f97 100755
> --- a/configure
> +++ b/configure
> @@ -3691,6 +3691,7 @@ mptestsrc_filter_deps="gpl"
>  negate_filter_deps="lut_filter"
>  nlmeans_opencl_filter_deps="opencl"
>  nnedi_filter_deps="gpl"
> +nvoffruc_filter_deps="ffnvcodec nonfree"
>  ocr_filter_deps="libtesseract"
>  ocv_filter_deps="libopencv"
>  openclsrc_filter_deps="opencl"
> @@ -6450,9 +6451,9 @@ fi
>  if ! disabled ffnvcodec; then
>      ffnv_hdr_list="ffnvcodec/nvEncodeAPI.h ffnvcodec/dynlink_cuda.h
> ffnvcodec/dynlink_cuviddec.h ffnvcodec/dynlink_nvcuvid.h"
>      check_pkg_config ffnvcodec "ffnvcodec >= 12.0.16.0" "$ffnv_hdr_list"
> "" || \
> -      check_pkg_config ffnvcodec "ffnvcodec >= 11.1.5.2 ffnvcodec < 12.0"
> "$ffnv_hdr_list" "" || \
> -      check_pkg_config ffnvcodec "ffnvcodec >= 11.0.10.2 ffnvcodec <
> 11.1" "$ffnv_hdr_list" "" || \
> -      check_pkg_config ffnvcodec "ffnvcodec >= 8.1.24.14 ffnvcodec < 8.2"
> "$ffnv_hdr_list" ""
> +      check_pkg_config ffnvcodec "ffnvcodec >= 11.1.5.3 ffnvcodec < 12.0"
> "$ffnv_hdr_list" "" || \
> +      check_pkg_config ffnvcodec "ffnvcodec >= 11.0.10.3 ffnvcodec <
> 11.1" "$ffnv_hdr_list" "" || \
> +      check_pkg_config ffnvcodec "ffnvcodec >= 8.1.24.15 ffnvcodec < 8.2"
> "$ffnv_hdr_list" ""
>  fi
>
>  if enabled_all libglslang libshaderc; then
> diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> index cb41ccc622..292597f3a8 100644
> --- a/libavfilter/Makefile
> +++ b/libavfilter/Makefile
> @@ -389,6 +389,7 @@ OBJS-$(CONFIG_NOFORMAT_FILTER)               +=
> vf_format.o
>  OBJS-$(CONFIG_NOISE_FILTER)                  += vf_noise.o
>  OBJS-$(CONFIG_NORMALIZE_FILTER)              += vf_normalize.o
>  OBJS-$(CONFIG_NULL_FILTER)                   += vf_null.o
> +OBJS-$(CONFIG_NVOFFRUC_FILTER)               += vf_nvoffruc.o
>  OBJS-$(CONFIG_OCR_FILTER)                    += vf_ocr.o
>  OBJS-$(CONFIG_OCV_FILTER)                    += vf_libopencv.o
>  OBJS-$(CONFIG_OSCILLOSCOPE_FILTER)           += vf_datascope.o
> diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> index 52741b60e4..84f102806e 100644
> --- a/libavfilter/allfilters.c
> +++ b/libavfilter/allfilters.c
> @@ -368,6 +368,7 @@ extern const AVFilter ff_vf_noformat;
>  extern const AVFilter ff_vf_noise;
>  extern const AVFilter ff_vf_normalize;
>  extern const AVFilter ff_vf_null;
> +extern const AVFilter ff_vf_nvoffruc;
>  extern const AVFilter ff_vf_ocr;
>  extern const AVFilter ff_vf_ocv;
>  extern const AVFilter ff_vf_oscilloscope;
> diff --git a/libavfilter/vf_nvoffruc.c b/libavfilter/vf_nvoffruc.c
> new file mode 100644
> index 0000000000..e3a9f9e553
> --- /dev/null
> +++ b/libavfilter/vf_nvoffruc.c
> @@ -0,0 +1,644 @@
> +/*
> + * Copyright (C) 2022 Philip Langdale <philipl@overt.org>
> + * Based on vf_framerate - Copyright (C) 2012 Mark Himsley
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street
> <https://www.google.com/maps/search/51+Franklin+Street?entry=gmail&source=g>,
> Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file
> + * filter upsamples the frame rate of a source using the nvidia Optical
> Flow
> + * FRUC library.
> + */
> +
> +#include <dlfcn.h>
> +#include "libavutil/avassert.h"
> +#include "libavutil/cuda_check.h"
> +#include "libavutil/hwcontext.h"
> +#include "libavutil/hwcontext_cuda_internal.h"
> +#include "libavutil/opt.h"
> +#include "libavutil/pixdesc.h"
> +
> +#include "avfilter.h"
> +#include "filters.h"
> +#include "internal.h"
> +/*
> + * This cannot be distributed with the filter due to licensing. If you
> want to
> + * compile this filter, you will need to obtain it from nvidia and then
> fix it
> + * to work in a pure C environment:
> + * * Remove the `using namespace std;`
> + * * Replace the `bool *` with `void *`
> + */
> +#include "NvOFFRUC.h"
> +
> +typedef struct FRUCContext {
> +    const AVClass *class;
> +
> +    AVCUDADeviceContext *hwctx;
> +    AVBufferRef         *device_ref;
> +
> +    CUcontext cu_ctx;
> +    CUstream  stream;
> +    CUarray   c0;                       ///< CUarray for f0
> +    CUarray   c1;                       ///< CUarray for f1
> +    CUarray   cw;                       ///< CUarray for work
> +
> +    AVRational dest_frame_rate;
> +    int interp_start;                   ///< start of range to apply
> interpolation
> +    int interp_end;                     ///< end of range to apply
> interpolation
> +
> +    AVRational srce_time_base;          ///< timebase of source
> +    AVRational dest_time_base;          ///< timebase of destination
> +
> +    int blend_factor_max;
> +    AVFrame *work;
> +    enum AVPixelFormat format;
> +
> +    AVFrame *f0;                        ///< last frame
> +    AVFrame *f1;                        ///< current frame
> +    int64_t pts0;                       ///< last frame pts in
> dest_time_base
> +    int64_t pts1;                       ///< current frame pts in
> dest_time_base
> +    int64_t delta;                      ///< pts1 to pts0 delta
> +    int flush;                          ///< 1 if the filter is being
> flushed
> +    int64_t start_pts;                  ///< pts of the first output frame
> +    int64_t n;                          ///< output frame counter
> +
> +    void *fruc_dl;
> +    PtrToFuncNvOFFRUCCreate NvOFFRUCCreate;
> +    PtrToFuncNvOFFRUCRegisterResource NvOFFRUCRegisterResource;
> +    PtrToFuncNvOFFRUCUnregisterResource NvOFFRUCUnregisterResource;
> +    PtrToFuncNvOFFRUCProcess NvOFFRUCProcess;
> +    PtrToFuncNvOFFRUCDestroy NvOFFRUCDestroy;
> +    NvOFFRUCHandle fruc;
> +} FRUCContext;
> +
> +#define CHECK_CU(x) FF_CUDA_CHECK_DL(ctx, s->hwctx->internal->cuda_dl, x)
> +#define OFFSET(x) offsetof(FRUCContext, x)
> +#define V AV_OPT_FLAG_VIDEO_PARAM
> +#define F AV_OPT_FLAG_FILTERING_PARAM
> +#define FRAMERATE_FLAG_SCD 01
> +
> +static const AVOption nvoffruc_options[] = {
> +    {"fps",                 "required output frames per second rate",
> OFFSET(dest_frame_rate), AV_OPT_TYPE_VIDEO_RATE, {.str="50"},
>  0,       INT_MAX, V|F },
> +
> +    {"interp_start",        "point to start linear interpolation",
> OFFSET(interp_start),    AV_OPT_TYPE_INT,      {.i64=15},
>  0,       255,     V|F },
> +    {"interp_end",          "point to end linear interpolation",
> OFFSET(interp_end),      AV_OPT_TYPE_INT,      {.i64=240},
> 0,       255,     V|F },
> +
> +    {NULL}
> +};
> +
> +AVFILTER_DEFINE_CLASS(nvoffruc);
> +
> +static int blend_frames(AVFilterContext *ctx, int64_t work_pts)
> +{
> +    FRUCContext *s = ctx->priv;
> +    AVFilterLink *outlink = ctx->outputs[0];
> +
> +    CudaFunctions *cu = s->hwctx->internal->cuda_dl;
> +    CUDA_MEMCPY2D cpy_params = {0,};
> +    NvOFFRUC_PROCESS_IN_PARAMS in = {0,};
> +    NvOFFRUC_PROCESS_OUT_PARAMS out = {0,};
> +    NvOFFRUC_STATUS status;
> +
> +    int num_channels = s->format == AV_PIX_FMT_NV12 ? 1 : 4;
> +    int ret;
> +    uint64_t ignored;
> +
> +    // get work-space for output frame
> +    s->work = ff_get_video_buffer(outlink, outlink->w, outlink->h);
> +    if (!s->work)
> +        return AVERROR(ENOMEM);
> +
> +    av_frame_copy_props(s->work, s->f0);
> +
> +    cpy_params.srcMemoryType = CU_MEMORYTYPE_DEVICE,
> +    cpy_params.srcDevice = (CUdeviceptr)s->f0->data[0],
> +    cpy_params.srcPitch = s->f0->linesize[0],
> +    cpy_params.srcY = 0,
> +    cpy_params.dstMemoryType = CU_MEMORYTYPE_ARRAY,
> +    cpy_params.dstArray = s->c0,
> +    cpy_params.dstY = 0,
> +    cpy_params.WidthInBytes = s->f0->width * num_channels,
> +    cpy_params.Height = s->f0->height,
> +    ret = CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
> +    if (ret < 0)
> +        return ret;
> +
> +    if (s->f0->data[1]) {
> +        cpy_params.srcMemoryType = CU_MEMORYTYPE_DEVICE,
> +        cpy_params.srcDevice = (CUdeviceptr)s->f0->data[1],
> +        cpy_params.srcPitch = s->f0->linesize[1],
> +        cpy_params.srcY = 0,
> +        cpy_params.dstMemoryType = CU_MEMORYTYPE_ARRAY,
> +        cpy_params.dstArray = s->c0,
> +        cpy_params.dstY = s->f0->height,
> +        cpy_params.WidthInBytes = s->f0->width * num_channels,
> +        cpy_params.Height = s->f0->height * 0.5,
> +        CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
> +        if (ret < 0)
> +            return ret;
> +    }
> +
> +    cpy_params.srcMemoryType = CU_MEMORYTYPE_DEVICE,
> +    cpy_params.srcDevice = (CUdeviceptr)s->f1->data[0],
> +    cpy_params.srcPitch = s->f1->linesize[0],
> +    cpy_params.srcY = 0,
> +    cpy_params.dstMemoryType = CU_MEMORYTYPE_ARRAY,
> +    cpy_params.dstArray = s->c1,
> +    cpy_params.dstY = 0,
> +    cpy_params.WidthInBytes = s->f1->width * num_channels,
> +    cpy_params.Height = s->f1->height,
> +    CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
> +    if (ret < 0)
> +        return ret;
> +
> +    if (s->f1->data[1]) {
> +        cpy_params.srcMemoryType = CU_MEMORYTYPE_DEVICE,
> +        cpy_params.srcDevice = (CUdeviceptr)s->f1->data[1],
> +        cpy_params.srcPitch = s->f1->linesize[1],
> +        cpy_params.srcY = 0,
> +        cpy_params.dstMemoryType = CU_MEMORYTYPE_ARRAY,
> +        cpy_params.dstArray = s->c1,
> +        cpy_params.dstY = s->f1->height,
> +        cpy_params.WidthInBytes = s->f1->width * num_channels,
> +        cpy_params.Height = s->f1->height * 0.5,
> +        CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
> +        if (ret < 0)
> +            return ret;
> +    }
> +
> +    in.stFrameDataInput.pFrame = s->c0;
> +    in.stFrameDataInput.nTimeStamp = s->pts0;
> +    out.stFrameDataOutput.pFrame = s->cw,
> +    out.stFrameDataOutput.nTimeStamp = s->pts0;
> +    out.stFrameDataOutput.bHasFrameRepetitionOccurred = &ignored;
> +    status = s->NvOFFRUCProcess(s->fruc, &in, &out);
> +    if (status) {
> +        av_log(ctx, AV_LOG_ERROR, "FRUC: Process failure: %d\n", status);
> +        return AVERROR(ENOSYS);
> +    }
> +
> +    in.stFrameDataInput.pFrame = s->c1;
> +    in.stFrameDataInput.nTimeStamp = s->pts1;
> +    out.stFrameDataOutput.pFrame = s->cw,
> +    out.stFrameDataOutput.nTimeStamp = work_pts;
> +    out.stFrameDataOutput.bHasFrameRepetitionOccurred = &ignored;
> +    status = s->NvOFFRUCProcess(s->fruc, &in, &out);
> +    if (status) {
> +        av_log(ctx, AV_LOG_ERROR, "FRUC: Process failure: %d\n", status);
> +        return AVERROR(ENOSYS);
> +    }
> +
> +    cpy_params.srcMemoryType = CU_MEMORYTYPE_ARRAY,
> +    cpy_params.srcArray = s->cw,
> +    cpy_params.srcY = 0,
> +    cpy_params.dstMemoryType = CU_MEMORYTYPE_DEVICE,
> +    cpy_params.dstDevice = (CUdeviceptr)s->work->data[0],
> +    cpy_params.dstPitch = s->work->linesize[0],
> +    cpy_params.dstY = 0,
> +    cpy_params.WidthInBytes = s->work->width * num_channels,
> +    cpy_params.Height = s->work->height,
> +    CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
> +    if (ret < 0)
> +        return ret;
> +
> +    if (s->work->data[1]) {
> +        cpy_params.srcMemoryType = CU_MEMORYTYPE_ARRAY,
> +        cpy_params.srcArray = s->cw,
> +        cpy_params.srcY = s->work->height,
> +        cpy_params.dstMemoryType = CU_MEMORYTYPE_DEVICE,
> +        cpy_params.dstDevice = (CUdeviceptr)s->work->data[1],
> +        cpy_params.dstPitch = s->work->linesize[1],
> +        cpy_params.dstY = 0,
> +        cpy_params.WidthInBytes = s->work->width * num_channels,
> +        cpy_params.Height = s->work->height * 0.5,
> +        CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream));
> +        if (ret < 0)
> +            return ret;
> +    }
> +
> +    return 0;
> +}
> +
> +static int process_work_frame(AVFilterContext *ctx)
> +{
> +    FRUCContext *s = ctx->priv;
> +    int64_t work_pts;
> +    int64_t interpolate, interpolate8;
> +    int ret;
> +
> +    if (!s->f1)
> +        return 0;
> +    if (!s->f0 && !s->flush)
> +        return 0;
> +
> +    work_pts = s->start_pts + av_rescale_q(s->n,
> av_inv_q(s->dest_frame_rate), s->dest_time_base);
> +
> +    if (work_pts >= s->pts1 && !s->flush)
> +        return 0;
> +
> +    if (!s->f0) {
> +        av_assert1(s->flush);
> +        s->work = s->f1;
> +        s->f1 = NULL;
> +    } else {
> +        if (work_pts >= s->pts1 + s->delta && s->flush)
> +            return 0;
> +
> +        interpolate = av_rescale(work_pts - s->pts0, s->blend_factor_max,
> s->delta);
> +        interpolate8 = av_rescale(work_pts - s->pts0, 256, s->delta);
> +        ff_dlog(ctx, "process_work_frame() interpolate: %"PRId64"/256\n",
> interpolate8);
> +        if (interpolate >= s->blend_factor_max || interpolate8 >
> s->interp_end) {
> +            av_log(ctx, AV_LOG_DEBUG, "Matched f0: pts %lu\n", work_pts);
> +            s->work = av_frame_clone(s->f1);
> +        } else if (interpolate <= 0 || interpolate8 < s->interp_start) {
> +            av_log(ctx, AV_LOG_DEBUG, "Matched f1: pts %lu\n", work_pts);
> +            s->work = av_frame_clone(s->f0);
> +        } else {
> +            av_log(ctx, AV_LOG_DEBUG, "Unmatched pts: %lu\n", work_pts);
> +            ret = blend_frames(ctx, work_pts);
> +            if (ret < 0)
> +                return ret;
> +        }
> +    }
> +
> +    if (!s->work)
> +        return AVERROR(ENOMEM);
> +
> +    s->work->pts = work_pts;
> +    s->n++;
> +
> +    return 1;
> +}
> +
> +static av_cold int init(AVFilterContext *ctx)
> +{
> +    FRUCContext *s = ctx->priv;
> +    s->start_pts = AV_NOPTS_VALUE;
> +
> +    // TODO: Need windows equivalent symbol loading
> +    s->fruc_dl = dlopen("libNvOFFRUC.so", RTLD_LAZY);
> +    if (!s->fruc_dl) {
> +        av_log(ctx, AV_LOG_ERROR, "Failed to load FRUC: %s\n", dlerror());
> +        return AVERROR(EINVAL);
> +    }
> +
> +    s->NvOFFRUCCreate = (PtrToFuncNvOFFRUCCreate)
> +        dlsym(s->fruc_dl, "NvOFFRUCCreate");
> +    s->NvOFFRUCRegisterResource = (PtrToFuncNvOFFRUCRegisterResource)
> +        dlsym(s->fruc_dl, "NvOFFRUCRegisterResource");
> +    s->NvOFFRUCUnregisterResource = (PtrToFuncNvOFFRUCUnregisterResource)
> +        dlsym(s->fruc_dl, "NvOFFRUCUnregisterResource");
> +    s->NvOFFRUCProcess = (PtrToFuncNvOFFRUCProcess)
> +        dlsym(s->fruc_dl, "NvOFFRUCProcess");
> +    s->NvOFFRUCDestroy = (PtrToFuncNvOFFRUCDestroy)
> +        dlsym(s->fruc_dl, "NvOFFRUCDestroy");
> +    return 0;
> +}
> +
> +static av_cold void uninit(AVFilterContext *ctx)
> +{
> +    FRUCContext *s = ctx->priv;
> +    CudaFunctions *cu = s->hwctx->internal->cuda_dl;
> +    CUcontext dummy;
> +
> +    CHECK_CU(cu->cuCtxPushCurrent(s->cu_ctx));
> +
> +    if (s->fruc) {
> +        NvOFFRUC_UNREGISTER_RESOURCE_PARAM in_param = {
> +            .pArrResource = {s->c0, s->c1, s->cw},
> +            .uiCount = 1,
> +        };
> +        NvOFFRUC_STATUS nv_status =
> s->NvOFFRUCUnregisterResource(s->fruc, &in_param);
> +        if (nv_status) {
> +            av_log(ctx, AV_LOG_WARNING, "Could not unregister: %d\n",
> nv_status);
> +        }
> +        s->NvOFFRUCDestroy(s->fruc);
> +    }
> +    if (s->c0)
> +        CHECK_CU(cu->cuArrayDestroy(s->c0));
> +    if (s->c1)
> +        CHECK_CU(cu->cuArrayDestroy(s->c1));
> +    if (s->cw)
> +        CHECK_CU(cu->cuArrayDestroy(s->cw));
> +
> +    CHECK_CU(cu->cuCtxPopCurrent(&dummy));
> +
> +    if (s->fruc_dl)
> +        dlclose(s->fruc_dl);
> +    av_frame_free(&s->f0);
> +    av_frame_free(&s->f1);
> +    av_buffer_unref(&s->device_ref);
> +}
> +
> +static const enum AVPixelFormat supported_formats[] = {
> +    AV_PIX_FMT_NV12,
> +    // Actually any single plane, four channel, 8bit format will work.
> +    AV_PIX_FMT_ARGB,
> +    AV_PIX_FMT_ABGR,
> +    AV_PIX_FMT_RGBA,
> +    AV_PIX_FMT_BGRA,
> +    AV_PIX_FMT_NONE
> +};
> +
> +static int format_is_supported(enum AVPixelFormat fmt)
> +{
> +    int i;
> +
> +    for (i = 0; i < FF_ARRAY_ELEMS(supported_formats); i++)
> +        if (supported_formats[i] == fmt)
> +            return 1;
> +    return 0;
> +}
> +
> +static int activate(AVFilterContext *ctx)
> +{
> +    int ret, status;
> +    AVFilterLink *inlink = ctx->inputs[0];
> +    AVFilterLink *outlink = ctx->outputs[0];
> +    FRUCContext *s = ctx->priv;
> +    AVFrame *inpicref;
> +    int64_t pts;
> +
> +    CudaFunctions *cu = s->hwctx->internal->cuda_dl;
> +    CUcontext dummy;
> +
> +    FF_FILTER_FORWARD_STATUS_BACK(outlink, inlink);
> +
> +    CHECK_CU(cu->cuCtxPushCurrent(s->cu_ctx));
> +
> +retry:
> +    ret = process_work_frame(ctx);
> +    if (ret < 0) {
> +        goto exit;
> +    } else if (ret == 1) {
> +        ret = ff_filter_frame(outlink, s->work);
> +        goto exit;
> +    }
> +
> +    ret = ff_inlink_consume_frame(inlink, &inpicref);
> +    if (ret < 0)
> +        goto exit;
> +
> +    if (inpicref) {
> +        if (inpicref->interlaced_frame)
> +            av_log(ctx, AV_LOG_WARNING, "Interlaced frame found - the
> output will not be correct.\n");
> +
> +        if (inpicref->pts == AV_NOPTS_VALUE) {
> +            av_log(ctx, AV_LOG_WARNING, "Ignoring frame without PTS.\n");
> +            av_frame_free(&inpicref);
> +        }
> +    }
> +
> +    if (inpicref) {
> +        pts = av_rescale_q(inpicref->pts, s->srce_time_base,
> s->dest_time_base);
> +
> +        if (s->f1 && pts == s->pts1) {
> +            av_log(ctx, AV_LOG_WARNING, "Ignoring frame with same
> PTS.\n");
> +            av_frame_free(&inpicref);
> +        }
> +    }
> +
> +    if (inpicref) {
> +        av_frame_free(&s->f0);
> +        s->f0 = s->f1;
> +        s->pts0 = s->pts1;
> +
> +        s->f1 = inpicref;
> +        s->pts1 = pts;
> +        s->delta = s->pts1 - s->pts0;
> +
> +        if (s->delta < 0) {
> +            av_log(ctx, AV_LOG_WARNING, "PTS discontinuity.\n");
> +            s->start_pts = s->pts1;
> +            s->n = 0;
> +            av_frame_free(&s->f0);
> +        }
> +
> +        if (s->start_pts == AV_NOPTS_VALUE)
> +            s->start_pts = s->pts1;
> +
> +        goto retry;
> +    }
> +
> +    if (ff_inlink_acknowledge_status(inlink, &status, &pts)) {
> +        if (!s->flush) {
> +            s->flush = 1;
> +            goto retry;
> +        }
> +        ff_outlink_set_status(outlink, status, pts);
> +        ret = 0;
> +        goto exit;
> +    }
> +
> +    FF_FILTER_FORWARD_WANTED(outlink, inlink);
> +
> +    return FFERROR_NOT_READY;
> +
> +exit:
> +    CHECK_CU(cu->cuCtxPopCurrent(&dummy));
> +    return ret;
> +}
> +
> +static int config_input(AVFilterLink *inlink)
> +{
> +    AVFilterContext *ctx = inlink->dst;
> +    FRUCContext *s = ctx->priv;
> +
> +    s->srce_time_base = inlink->time_base;
> +    s->blend_factor_max = 1 << (8 -1);
> +
> +    return 0;
> +}
> +
> +static int config_output(AVFilterLink *outlink)
> +{
> +    AVFilterContext *ctx = outlink->src;
> +    AVFilterLink *inlink = outlink->src->inputs[0];
> +    AVHWFramesContext *in_frames_ctx;
> +    AVHWFramesContext *output_frames;
> +    FRUCContext *s = ctx->priv;
> +    CudaFunctions *cu;
> +    CUcontext dummy;
> +    CUDA_ARRAY_DESCRIPTOR desc = {0,};
> +    NvOFFRUC_CREATE_PARAM create_param = {0,};
> +    NvOFFRUC_REGISTER_RESOURCE_PARAM register_param = {0,};
> +    NvOFFRUC_STATUS status;
> +    int exact;
> +    int ret;
> +
> +    ff_dlog(ctx, "config_output()\n");
> +
> +    ff_dlog(ctx,
> +           "config_output() input time base:%u/%u (%f)\n",
> +           ctx->inputs[0]->time_base.num,ctx->inputs[0]->time_base.den,
> +           av_q2d(ctx->inputs[0]->time_base));
> +
> +    // make sure timebase is small enough to hold the framerate
> +
> +    exact = av_reduce(&s->dest_time_base.num, &s->dest_time_base.den,
> +                      av_gcd((int64_t)s->srce_time_base.num *
> s->dest_frame_rate.num,
> +                             (int64_t)s->srce_time_base.den *
> s->dest_frame_rate.den ),
> +                      (int64_t)s->srce_time_base.den *
> s->dest_frame_rate.num, INT_MAX);
> +
> +    av_log(ctx, AV_LOG_INFO,
> +           "time base:%u/%u -> %u/%u exact:%d\n",
> +           s->srce_time_base.num, s->srce_time_base.den,
> +           s->dest_time_base.num, s->dest_time_base.den, exact);
> +    if (!exact) {
> +        av_log(ctx, AV_LOG_WARNING, "Timebase conversion is not exact\n");
> +    }
> +
> +    outlink->frame_rate = s->dest_frame_rate;
> +    outlink->time_base = s->dest_time_base;
> +
> +    ff_dlog(ctx,
> +           "config_output() output time base:%u/%u (%f) w:%d h:%d\n",
> +           outlink->time_base.num, outlink->time_base.den,
> +           av_q2d(outlink->time_base),
> +           outlink->w, outlink->h);
> +
> +
> +    av_log(ctx, AV_LOG_INFO, "fps -> fps:%u/%u\n",
> +            s->dest_frame_rate.num, s->dest_frame_rate.den);
> +
> +    /* check that we have a hw context */
> +    if (!inlink->hw_frames_ctx) {
> +        av_log(ctx, AV_LOG_ERROR, "No hw context provided on input\n");
> +        return AVERROR(EINVAL);
> +    }
> +    in_frames_ctx = (AVHWFramesContext*)inlink->hw_frames_ctx->data;
> +    s->format = in_frames_ctx->sw_format;
> +
> +    if (!format_is_supported(s->format)) {
> +        av_log(ctx, AV_LOG_ERROR, "Unsupported input format: %s\n",
> +               av_get_pix_fmt_name(s->format));
> +        return AVERROR(ENOSYS);
> +    }
> +
> +    s->device_ref = av_buffer_ref(in_frames_ctx->device_ref);
> +    if (!s->device_ref)
> +        return AVERROR(ENOMEM);
> +
> +    s->hwctx = ((AVHWDeviceContext*)s->device_ref->data)->hwctx;
> +    s->cu_ctx = s->hwctx->cuda_ctx;
> +    s->stream = s->hwctx->stream;
> +    cu = s->hwctx->internal->cuda_dl;
> +    outlink->hw_frames_ctx = av_hwframe_ctx_alloc(s->device_ref);
> +    if (!inlink->hw_frames_ctx)
> +        return AVERROR(ENOMEM);
> +
> +    output_frames = (AVHWFramesContext*)outlink->hw_frames_ctx->data;
> +
> +    output_frames->format    = AV_PIX_FMT_CUDA;
> +    output_frames->sw_format = s->format;
> +    output_frames->width     = ctx->inputs[0]->w;
> +    output_frames->height    = ctx->inputs[0]->h;
> +
> +    output_frames->initial_pool_size = 3;
> +
> +    ret = ff_filter_init_hw_frames(ctx, outlink, 0);
> +    if (ret < 0)
> +        return ret;
> +
> +    ret = av_hwframe_ctx_init(outlink->hw_frames_ctx);
> +    if (ret < 0) {
> +        av_log(ctx, AV_LOG_ERROR, "Failed to initialise CUDA frame "
> +               "context for output: %d\n", ret);
> +        return ret;
> +    }
> +
> +    outlink->w = inlink->w;
> +    outlink->h = inlink->h;
> +
> +    ret = CHECK_CU(cu->cuCtxPushCurrent(s->cu_ctx));
> +    if (ret < 0)
> +        return ret;
> +
> +    desc.Format = CU_AD_FORMAT_UNSIGNED_INT8;
> +    desc.Height = inlink->h * (s->format == AV_PIX_FMT_NV12 ? 1.5 : 1);
> +    desc.Width = inlink->w;
> +    desc.NumChannels = s->format == AV_PIX_FMT_NV12 ? 1 : 4;
> +    ret = CHECK_CU(cu->cuArrayCreate(&s->c0, &desc));
> +    if (ret < 0)
> +        goto exit;
> +    ret = CHECK_CU(cu->cuArrayCreate(&s->c1, &desc));
> +    if (ret < 0)
> +        goto exit;
> +    ret = CHECK_CU(cu->cuArrayCreate(&s->cw, &desc));
> +    if (ret < 0)
> +        goto exit;
> +
> +    create_param.uiWidth = inlink->w;
> +    create_param.uiHeight = inlink->h;
> +    create_param.pDevice = NULL;
> +    create_param.eResourceType = CudaResource;
> +    create_param.eSurfaceFormat = s->format == AV_PIX_FMT_NV12 ?
> NV12Surface : ARGBSurface;
> +    create_param.eCUDAResourceType = CudaResourceCuArray;
> +    status = s->NvOFFRUCCreate(&create_param, &s->fruc);
> +    if (status) {
> +        av_log(ctx, AV_LOG_ERROR, "FRUC: Failed to create: %d\n", status);
> +        ret = AVERROR(ENOSYS);
> +        goto exit;
> +    }
> +
> +    register_param.pArrResource[0] = s->c0;
> +    register_param.pArrResource[1] = s->c1;
> +    register_param.pArrResource[2] = s->cw;
> +    register_param.uiCount = 3;
> +    status = s->NvOFFRUCRegisterResource(s->fruc, &register_param);
> +    if (status) {
> +        av_log(ctx, AV_LOG_ERROR, "FRUC: Failed to register: %d\n",
> status);
> +        ret = AVERROR(ENOSYS);
> +        goto exit;
> +    }
> +
> +    ret = 0;
> +exit:
> +    CHECK_CU(cu->cuCtxPopCurrent(&dummy));
> +    return ret;
> +}
> +
> +static const AVFilterPad framerate_inputs[] = {
> +    {
> +        .name         = "default",
> +        .type         = AVMEDIA_TYPE_VIDEO,
> +        .config_props = config_input,
> +    },
> +};
> +
> +static const AVFilterPad framerate_outputs[] = {
> +    {
> +        .name          = "default",
> +        .type          = AVMEDIA_TYPE_VIDEO,
> +        .config_props  = config_output,
> +    },
> +};
> +
> +const AVFilter ff_vf_nvoffruc = {
> +    .name          = "nvoffruc",
> +    .description   = NULL_IF_CONFIG_SMALL("Upsamples progressive source
> to specified frame rates with nvidia FRUC"),
> +    .priv_size     = sizeof(FRUCContext),
> +    .priv_class    = &nvoffruc_class,
> +    .init          = init,
> +    .uninit        = uninit,
> +    .activate      = activate,
> +    FILTER_INPUTS(framerate_inputs),
> +    FILTER_OUTPUTS(framerate_outputs),
> +    FILTER_SINGLE_PIXFMT(AV_PIX_FMT_CUDA),
> +    .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE,
> +};
> --
> 2.37.2



Fascinating, what would it take to get this merged? Also, does it have any
restrictions on supported hardware? Optical Flow Acceleration was
implemented as from Turing, not sure how that would scale across older
GPUs.

>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [FFmpeg-devel] [PATCH 0/2] Interpolation filter using nvidia OFFRUC Library
  2023-01-02 23:21 [FFmpeg-devel] [PATCH 0/2] Interpolation filter using nvidia OFFRUC Library Philip Langdale
  2023-01-02 23:21 ` [FFmpeg-devel] [PATCH 1/2] lavu/hwcontext_cuda: declare support for argb/abgr/rgba/bgra Philip Langdale
  2023-01-02 23:21 ` [FFmpeg-devel] [PATCH 2/2] avfilter/vf_nvoffruc: Add filter for nvidia's Optical Flow FRUC library Philip Langdale
@ 2023-01-02 23:39 ` Dennis Mungai
  2023-01-03  0:13   ` Philip Langdale
  2 siblings, 1 reply; 7+ messages in thread
From: Dennis Mungai @ 2023-01-02 23:39 UTC (permalink / raw)
  To: FFmpeg development discussions and patches; +Cc: Philip Langdale

On Tue, 3 Jan 2023 at 02:22, Philip Langdale <philipl@overt.org> wrote:

> This filter implements frame rate down/upsampling using nvidia's
> Optical Flow FRUC (Frame Rate Up Conversion) library. It's neat because
> you get realtime interpolation with a decent level of quality. It's
> impractical because of licensing.
>
> I have no actual intention to merge this, as it doesn't even meet our
> bar for a nonfree filter, and given the EULA restrictions with the SDK,
> anyone who would want to use it can easily cherry-pick it into the
> build they have to anyway. But I figured I'd send it to list as a way
> of announcing that it exists.
>
> How nice would it be if nvidia had sane licensing on this stuff?
>
> I'll keep a branch at: https://github.com/philipl/FFmpeg/tree/fruc-me
>
> --phil
>
> Philip Langdale (2):
>   lavu/hwcontext_cuda: declare support for argb/abgr/rgba/bgra
>   avfilter/vf_nvoffruc: Add filter for nvidia's Optical Flow FRUC
>     library
>
>  configure                  |   7 +-
>  libavfilter/Makefile       |   1 +
>  libavfilter/allfilters.c   |   1 +
>  libavfilter/vf_nvoffruc.c  | 644 +++++++++++++++++++++++++++++++++++++
>  libavutil/hwcontext_cuda.c |   4 +
>  5 files changed, 654 insertions(+), 3 deletions(-)
>  create mode 100644 libavfilter/vf_nvoffruc.c
>
> --
> 2.37.2



Related,

If this were to be implemented in mpv, can libplacebo pick up this feature
spec as a filter in ffmpeg? Perhaps that would make such a feature easier
to merge down the line, instead of re-implementing it directly in ffmpeg as
an additional filter.

Adding Niklaas to the thread.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [FFmpeg-devel] [PATCH 0/2] Interpolation filter using nvidia OFFRUC Library
  2023-01-02 23:39 ` [FFmpeg-devel] [PATCH 0/2] Interpolation filter using nvidia OFFRUC Library Dennis Mungai
@ 2023-01-03  0:13   ` Philip Langdale
  2023-01-03  0:15     ` Dennis Mungai
  0 siblings, 1 reply; 7+ messages in thread
From: Philip Langdale @ 2023-01-03  0:13 UTC (permalink / raw)
  To: FFmpeg development discussions and patches; +Cc: Dennis Mungai

On Tue, 3 Jan 2023 02:39:19 +0300
Dennis Mungai <dmngaie@gmail.com> wrote:

> Related,
> 
> If this were to be implemented in mpv, can libplacebo pick up this
> feature spec as a filter in ffmpeg? Perhaps that would make such a
> feature easier to merge down the line, instead of re-implementing it
> directly in ffmpeg as an additional filter.
> 
> Adding Niklaas to the thread.

It doesn't make a difference. The licensing is fundamentally unusable
for an open-source project (and there are engineers at nvidia who know
this and wish they could write filters leveraging all their various
capabilities). The only thing with any nuance is what level of
`nonfree` a project is willing to have sitting in their repo. Most
projects (including mpv and libplacebo) would say "none", because it's
not worth the trouble. ffmpeg has gone back and forth on what exact
critera have to be met to qualify as mergeable vs unmergeable nonfree.
In the past we have accepted filters based around nvidia libraries with
prohibitive licensing - see the libnpp based filters, but I don't think
we have the appetite for that now. If we were to decide that this
filter was ok on that basis, I'd merge it, but honestly, the usability
benefit of it being in master is tiny vs all the other hoops you have
to jump through.

Anyway - punchline: it is not easier to get this kind of thing merged
into other projects.

--phil
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [FFmpeg-devel] [PATCH 0/2] Interpolation filter using nvidia OFFRUC Library
  2023-01-03  0:13   ` Philip Langdale
@ 2023-01-03  0:15     ` Dennis Mungai
  0 siblings, 0 replies; 7+ messages in thread
From: Dennis Mungai @ 2023-01-03  0:15 UTC (permalink / raw)
  To: Philip Langdale; +Cc: FFmpeg development discussions and patches

On Tue, 3 Jan 2023 at 03:13, Philip Langdale <philipl@overt.org> wrote:

> On Tue, 3 Jan 2023 02:39:19 +0300
> Dennis Mungai <dmngaie@gmail.com> wrote:
>
> > Related,
> >
> > If this were to be implemented in mpv, can libplacebo pick up this
> > feature spec as a filter in ffmpeg? Perhaps that would make such a
> > feature easier to merge down the line, instead of re-implementing it
> > directly in ffmpeg as an additional filter.
> >
> > Adding Niklaas to the thread.
>
> It doesn't make a difference. The licensing is fundamentally unusable
> for an open-source project (and there are engineers at nvidia who know
> this and wish they could write filters leveraging all their various
> capabilities). The only thing with any nuance is what level of
> `nonfree` a project is willing to have sitting in their repo. Most
> projects (including mpv and libplacebo) would say "none", because it's
> not worth the trouble. ffmpeg has gone back and forth on what exact
> critera have to be met to qualify as mergeable vs unmergeable nonfree.
> In the past we have accepted filters based around nvidia libraries with
> prohibitive licensing - see the libnpp based filters, but I don't think
> we have the appetite for that now. If we were to decide that this
> filter was ok on that basis, I'd merge it, but honestly, the usability
> benefit of it being in master is tiny vs all the other hoops you have
> to jump through.
>
> Anyway - punchline: it is not easier to get this kind of thing merged
> into other projects.
>
> --phil
>

Got it, thanks for the clarifications.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-01-03  0:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-02 23:21 [FFmpeg-devel] [PATCH 0/2] Interpolation filter using nvidia OFFRUC Library Philip Langdale
2023-01-02 23:21 ` [FFmpeg-devel] [PATCH 1/2] lavu/hwcontext_cuda: declare support for argb/abgr/rgba/bgra Philip Langdale
2023-01-02 23:21 ` [FFmpeg-devel] [PATCH 2/2] avfilter/vf_nvoffruc: Add filter for nvidia's Optical Flow FRUC library Philip Langdale
2023-01-02 23:29   ` Dennis Mungai
2023-01-02 23:39 ` [FFmpeg-devel] [PATCH 0/2] Interpolation filter using nvidia OFFRUC Library Dennis Mungai
2023-01-03  0:13   ` Philip Langdale
2023-01-03  0:15     ` Dennis Mungai

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git