From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 67B3945C65 for ; Mon, 28 Aug 2023 18:59:10 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B4B8F68C668; Mon, 28 Aug 2023 21:59:08 +0300 (EEST) Received: from btbn.de (btbn.de [136.243.74.85]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CCAF668C602 for ; Mon, 28 Aug 2023 21:59:01 +0300 (EEST) Received: from [authenticated] by btbn.de (Postfix) with ESMTPSA id 23CC836D99E for ; Mon, 28 Aug 2023 20:59:01 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rothenpieler.org; s=mail; t=1693249141; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jwvSCzX63LtiFovnehugKRoNRCCbpKWKlONK0ow/t9c=; b=N1gNm6wER2XdB1DGR/AN2RotlJwkXNTZneBmzHPkZZ7Qmf5ANgbNQnGrFwDJ/YID6sfpSb 6GLQ+wsSUjTDF1q44xNgEe935ahaskirWRbj1cq1hJvvVomNItkTFAp+JKzA19NUnHbeRB G2y29rBd0rFKp17KoQiTRzIElYJHF2iacGOESUdoAGSH+3MlkIEhW7+EicLYSlXjYUWmTV 0WGsWAVyxq+YebAevnwwM8Ertpm4dxJ7EEBU/GAAIJmXm061QiQcMFS4VY4LJDJEmcTgVq HbOd3OiK0aN0CsK/VVz/IbO3Kqn5OO5/tg0ZbsFiPoR+weMW2gpepzGHZHh0Bg== Message-ID: <74ca77b7-5b84-9d4d-2baa-0223418fd9d6@rothenpieler.org> Date: Mon, 28 Aug 2023 20:59:00 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 To: ffmpeg-devel@ffmpeg.org References: <97c138ad-457f-d098-e145-983bdd3e7f69@rothenpieler.org> Content-Language: en-US From: Timo Rothenpieler In-Reply-To: Subject: Re: [FFmpeg-devel] [PATCH] avfilter: add libvmaf_cuda X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: > From f6f0afffadfc5fae97b11b0feb7c1d740b7c86ab Mon Sep 17 00:00:00 2001 > From: Kyle Swanson > Date: Mon, 28 Aug 2023 11:49:34 -0700 > Subject: [PATCH] avfilter: add libvmaf_cuda > > --- > configure | 4 + > doc/filters.texi | 26 +++++ > libavfilter/Makefile | 1 + > libavfilter/allfilters.c | 1 + > libavfilter/vf_libvmaf.c | 210 +++++++++++++++++++++++++++++++++++++++ > 5 files changed, 242 insertions(+) > > diff --git a/configure b/configure > index bd7f7697c8..6f6c6aaf22 100755 > --- a/configure > +++ b/configure > @@ -286,6 +286,7 @@ External library support: > --enable-libv4l2 enable libv4l2/v4l-utils [no] > --enable-libvidstab enable video stabilization using vid.stab [no] > --enable-libvmaf enable vmaf filter via libvmaf [no] > + --enable-libvmaf-cuda enable cuda vmaf filter via libvmaf [no] > --enable-libvo-amrwbenc enable AMR-WB encoding via libvo-amrwbenc [no] > --enable-libvorbis enable Vorbis en/decoding via libvorbis, > native implementation exists [no] > @@ -1902,6 +1903,7 @@ EXTERNAL_LIBRARY_LIST=" > libuavs3d > libv4l2 > libvmaf > + libvmaf_cuda > libvorbis > libvpx > libwebp > @@ -3831,6 +3833,7 @@ vflip_vulkan_filter_deps="vulkan spirv_compiler" > vidstabdetect_filter_deps="libvidstab" > vidstabtransform_filter_deps="libvidstab" > libvmaf_filter_deps="libvmaf" > +libvmaf_cuda_filter_deps="libvmaf cuda_nvcc" Does this really depend on nvcc? Does it not work with only ffnvcodec? > zmq_filter_deps="libzmq" > zoompan_filter_deps="swscale" > zscale_filter_deps="libzimg const_nan" > @@ -6811,6 +6814,7 @@ enabled libuavs3d && require_pkg_config libuavs3d "uavs3d >= 1.1.41" uav > enabled libv4l2 && require_pkg_config libv4l2 libv4l2 libv4l2.h v4l2_ioctl > enabled libvidstab && require_pkg_config libvidstab "vidstab >= 0.98" vid.stab/libvidstab.h vsMotionDetectInit > enabled libvmaf && require_pkg_config libvmaf "libvmaf >= 2.0.0" libvmaf.h vmaf_init > +enabled libvmaf_cuda && require_pkg_config libvmaf "libvmaf >= 2.0.0" libvmaf_cuda.h vmaf_cuda_state_init > enabled libvo_amrwbenc && require libvo_amrwbenc vo-amrwbenc/enc_if.h E_IF_init -lvo-amrwbenc > enabled libvorbis && require_pkg_config libvorbis vorbis vorbis/codec.h vorbis_info_init && > require_pkg_config libvorbisenc vorbisenc vorbis/vorbisenc.h vorbis_encode_init > diff --git a/doc/filters.texi b/doc/filters.texi > index 14a6be49ac..eaff3f1ddc 100644 > --- a/doc/filters.texi > +++ b/doc/filters.texi > @@ -16928,6 +16928,32 @@ ffmpeg -i distorted.mpg -i reference.mkv -lavfi "[0:v]settb=AVTB,setpts=PTS-STAR > @end example > @end itemize > > +@section libvmaf_cuda > + > +This is the CUDA variant of the @ref{libvmaf} filter. It only accepts CUDA frames. > + > +It requires Netflix's vmaf library (libvmaf) as a pre-requisite. > +After installing the library it can be enabled using: > +@code{./configure --enable-nonfree --enable-cuda-nvcc --enable-libvmaf-cuda}. see above > +@subsection Examples > +@itemize > + > +@item > +Basic usage showing CUVID hardware decoding and CUDA scaling with @ref{scale_cuda}: > +@example > +ffmpeg \ > + -hwaccel cuda -hwaccel_output_format cuda -codec:v av1_cuvid -i dis.obu \ > + -hwaccel cuda -hwaccel_output_format cuda -codec:v av1_cuvid -i ref.obu \ > + -filter_complex " > + [0:v]scale_cuda=format=yuv420p[ref]; \ > + [1:v]scale_cuda=format=yuv420p[dis]; \ > + [dis][ref]libvmaf_cuda=log_fmt=json:log_path=output.json > + " \ > + -f null - > +@end example > +@end itemize > + > @section limitdiff > Apply limited difference filter using second and optionally third video stream. > > diff --git a/libavfilter/Makefile b/libavfilter/Makefile > index 2fe0033b21..57f5809acb 100644 > --- a/libavfilter/Makefile > +++ b/libavfilter/Makefile > @@ -363,6 +363,7 @@ OBJS-$(CONFIG_LENSCORRECTION_FILTER) += vf_lenscorrection.o > OBJS-$(CONFIG_LENSFUN_FILTER) += vf_lensfun.o > OBJS-$(CONFIG_LIBPLACEBO_FILTER) += vf_libplacebo.o vulkan.o vulkan_filter.o > OBJS-$(CONFIG_LIBVMAF_FILTER) += vf_libvmaf.o framesync.o > +OBJS-$(CONFIG_LIBVMAF_CUDA_FILTER) += vf_libvmaf.o framesync.o > OBJS-$(CONFIG_LIMITDIFF_FILTER) += vf_limitdiff.o framesync.o > OBJS-$(CONFIG_LIMITER_FILTER) += vf_limiter.o > OBJS-$(CONFIG_LOOP_FILTER) += f_loop.o > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c > index d4184d6e80..aa49703c6e 100644 > --- a/libavfilter/allfilters.c > +++ b/libavfilter/allfilters.c > @@ -339,6 +339,7 @@ extern const AVFilter ff_vf_lenscorrection; > extern const AVFilter ff_vf_lensfun; > extern const AVFilter ff_vf_libplacebo; > extern const AVFilter ff_vf_libvmaf; > +extern const AVFilter ff_vf_libvmaf_cuda; > extern const AVFilter ff_vf_limitdiff; > extern const AVFilter ff_vf_limiter; > extern const AVFilter ff_vf_loop; > diff --git a/libavfilter/vf_libvmaf.c b/libavfilter/vf_libvmaf.c > index 2586f37d99..d7d853ac3e 100644 > --- a/libavfilter/vf_libvmaf.c > +++ b/libavfilter/vf_libvmaf.c > @@ -24,6 +24,8 @@ > * Calculate the VMAF between two input videos. > */ > > +#include "config.h" > + > #include > > #include "libavutil/avstring.h" > @@ -36,6 +38,13 @@ > #include "internal.h" > #include "video.h" > > +#ifdef CONFIG_LIBVMAF_CUDA > +#include Does this include cuda.h or something like that? If so, it should probably be included after the cuda hwcontext, to avoid it doing that. > +#include "libavutil/hwcontext.h" > +#include "libavutil/hwcontext_cuda_internal.h" > +#endif > + > typedef struct LIBVMAFContext { > const AVClass *class; > FFFrameSync fs; > @@ -58,6 +67,9 @@ typedef struct LIBVMAFContext { > unsigned model_cnt; > unsigned frame_cnt; > unsigned bpc; > +#ifdef CONFIG_LIBVMAF_CUDA > + VmafCudaState *cu_state; > +#endif > } LIBVMAFContext; > > #define OFFSET(x) offsetof(LIBVMAFContext, x) > @@ -710,3 +722,201 @@ const AVFilter ff_vf_libvmaf = { > FILTER_OUTPUTS(libvmaf_outputs), > FILTER_PIXFMTS_ARRAY(pix_fmts), > }; > + > +#ifdef CONFIG_LIBVMAF_CUDA > +static const enum AVPixelFormat supported_formats[] = { > + AV_PIX_FMT_YUV420P, > + AV_PIX_FMT_YUV444P16, > +}; > + > +static int format_is_supported(enum AVPixelFormat fmt) > +{ > + int i; > + > + for (i = 0; i < FF_ARRAY_ELEMS(supported_formats); i++) > + if (supported_formats[i] == fmt) > + return 1; > + return 0; > +} > + > +static int config_props_cuda(AVFilterLink *outlink) > +{ > + int err; > + AVFilterContext *ctx = outlink->src; > + LIBVMAFContext *s = ctx->priv; > + AVFilterLink *inlink = ctx->inputs[0]; > + AVHWFramesContext *frames_ctx = (AVHWFramesContext*) inlink->hw_frames_ctx->data; > + AVCUDADeviceContext *device_hwctx = frames_ctx->device_ctx->hwctx; > + CUcontext cu_ctx = device_hwctx->cuda_ctx; > + const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(frames_ctx->sw_format); > + > + VmafConfiguration cfg = { > + .log_level = log_level_map(av_log_get_level()), > + .n_subsample = s->n_subsample, > + .n_threads = s->n_threads, > + }; > + > + VmafCudaPictureConfiguration cuda_pic_cfg = { > + .pic_params = { > + .bpc = desc->comp[0].depth, > + .w = inlink->w, > + .h = inlink->h, > + .pix_fmt = pix_fmt_map(frames_ctx->sw_format), > + }, > + .pic_prealloc_method = VMAF_CUDA_PICTURE_PREALLOCATION_METHOD_DEVICE, > + }; > + > + VmafCudaConfiguration cuda_cfg = { > + .cu_ctx = cu_ctx, > + }; > + > + if (!format_is_supported(frames_ctx->sw_format)) { > + av_log(s, AV_LOG_ERROR, > + "Unsupported input format: %s\n", desc->name); > + return AVERROR(EINVAL); > + } > + > + err = vmaf_init(&s->vmaf, cfg); > + if (err) > + return AVERROR(EINVAL); > + > + err = vmaf_cuda_state_init(&s->cu_state, cuda_cfg); > + if (err) > + return AVERROR(EINVAL); > + > + err = vmaf_cuda_import_state(s->vmaf, s->cu_state); > + if (err) > + return AVERROR(EINVAL); > + > + err = vmaf_cuda_preallocate_pictures(s->vmaf, cuda_pic_cfg); > + if (err < 0) > + return err; > + > + err = parse_deprecated_options(ctx); > + if (err) > + return err; > + > + err = parse_models(ctx); > + if (err) > + return err; > + > + err = parse_features(ctx); > + if (err) > + return err; > + > + return config_output(outlink); > +} > + > +static int copy_picture_data_cuda(VmafContext* vmaf, > + AVCUDADeviceContext* device_hwctx, > + AVFrame* src, VmafPicture* dst, > + enum AVPixelFormat pix_fmt) > +{ > + const AVPixFmtDescriptor *pix_desc = av_pix_fmt_desc_get(pix_fmt); > + CudaFunctions *cu = device_hwctx->internal->cuda_dl; > + > + CUDA_MEMCPY2D m = { > + .srcMemoryType = CU_MEMORYTYPE_DEVICE, > + .dstMemoryType = CU_MEMORYTYPE_DEVICE, > + }; > + > + int err = vmaf_cuda_fetch_preallocated_picture(vmaf, dst); > + if (err) > + return AVERROR(ENOMEM); > + > + err = cu->cuCtxPushCurrent(device_hwctx->cuda_ctx); > + if (err) > + return AVERROR_EXTERNAL; > + > + for (unsigned i = 0; i < pix_desc->nb_components; i++) { > + m.srcDevice = (CUdeviceptr) src->data[i]; > + m.srcPitch = src->linesize[i]; > + m.dstDevice = (CUdeviceptr) dst->data[i]; > + m.dstPitch = dst->stride[i]; > + m.WidthInBytes = dst->w[i] * ((dst->bpc + 7) / 8); > + m.Height = dst->h[i]; > + > + err = cu->cuMemcpy2D(&m); > + if (err) > + return AVERROR_EXTERNAL; > + break; > + } > + > + err = cu->cuCtxPopCurrent(NULL); > + if (err) > + return AVERROR_EXTERNAL; > + > + return 0; > +} > + > +static int do_vmaf_cuda(FFFrameSync* fs) > +{ > + AVFilterContext* ctx = fs->parent; > + LIBVMAFContext* s = ctx->priv; > + AVFilterLink *inlink = ctx->inputs[0]; > + AVHWFramesContext *frames_ctx = (AVHWFramesContext*) inlink->hw_frames_ctx->data; > + AVCUDADeviceContext *device_hwctx = frames_ctx->device_ctx->hwctx; > + VmafPicture pic_ref, pic_dist; > + AVFrame *ref, *dist; > + > + int err = 0; > + > + err = ff_framesync_dualinput_get(fs, &dist, &ref); > + if (err < 0) > + return err; > + if (ctx->is_disabled || !ref) > + return ff_filter_frame(ctx->outputs[0], dist); > + > + err = copy_picture_data_cuda(s->vmaf, device_hwctx, ref, &pic_ref, > + frames_ctx->sw_format); > + if (err) { > + av_log(s, AV_LOG_ERROR, "problem during copy_picture_data_cuda.\n"); > + return AVERROR(ENOMEM); > + } > + > + err = copy_picture_data_cuda(s->vmaf, device_hwctx, dist, &pic_dist, > + frames_ctx->sw_format); > + if (err) { > + av_log(s, AV_LOG_ERROR, "problem during copy_picture_data_cuda.\n"); > + return AVERROR(ENOMEM); > + } > + > + err = vmaf_read_pictures(s->vmaf, &pic_ref, &pic_dist, s->frame_cnt++); > + if (err) { > + av_log(s, AV_LOG_ERROR, "problem during vmaf_read_pictures.\n"); > + return AVERROR(EINVAL); > + } > + > + return ff_filter_frame(ctx->outputs[0], dist); > +} > + > +static av_cold int init_cuda(AVFilterContext *ctx) > +{ > + LIBVMAFContext *s = ctx->priv; > + s->fs.on_event = do_vmaf_cuda; > + return 0; > +} > + > +static const AVFilterPad libvmaf_outputs_cuda[] = { > + { > + .name = "default", > + .type = AVMEDIA_TYPE_VIDEO, > + .config_props = config_props_cuda, > + }, > +}; > + > +const AVFilter ff_vf_libvmaf_cuda = { > + .name = "libvmaf_cuda", > + .description = NULL_IF_CONFIG_SMALL("Calculate the VMAF between two video streams."), > + .preinit = libvmaf_framesync_preinit, > + .init = init_cuda, > + .uninit = uninit, > + .activate = activate, > + .priv_size = sizeof(LIBVMAFContext), > + .priv_class = &libvmaf_class, > + FILTER_INPUTS(libvmaf_inputs), > + FILTER_OUTPUTS(libvmaf_outputs_cuda), > + FILTER_SINGLE_PIXFMT(AV_PIX_FMT_CUDA), > + .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE, > +}; > +#endif > -- > 2.24.3 (Apple Git-128) > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".