Re: [FFmpeg-devel] [PATCH 06/16] avutil: add nvtegra hwcontext

From: averne <averne381@gmail.com>
To: ffmpeg-devel@ffmpeg.org
Subject: Re: [FFmpeg-devel] [PATCH 06/16] avutil: add nvtegra hwcontext
Date: Sat, 29 Jun 2024 21:35:39 +0200
Message-ID: <1f96eb24-1e09-4aed-bfdd-3fb8dff6d12c@gmail.com> (raw)
In-Reply-To: <000f9f24-a963-4a00-9a58-e78af043ee7b@jkqxz.net>

Le 05/06/2024 à 22:47, Mark Thompson a écrit :> 
> On 30/05/2024 20:43, averne wrote:
>> This includes hwdevice and hwframes objects.
>> As the multimedia engines work with tiled surfaces (block linear in nvidia jargon), two frame transfer methods are implemented.
>> The first makes use of the VIC to perform the copy. Since some revisions of the VIC (such as the one found in the tegra X1) did not support 10+ bit formats, these go through two separate copy steps for the luma and chroma planes.
>> The second method copies on the CPU, and is used as a fallback if the VIC constraints are not satisfied.
>>
>> Signed-off-by: averne <averne381@gmail.com>
>> ---
>>  libavutil/Makefile             |   7 +-
>>  libavutil/hwcontext.c          |   4 +
>>  libavutil/hwcontext.h          |   1 +
>>  libavutil/hwcontext_internal.h |   1 +
>>  libavutil/hwcontext_nvtegra.c  | 880 +++++++++++++++++++++++++++++++++
>>  libavutil/hwcontext_nvtegra.h  |  85 ++++
>>  6 files changed, 976 insertions(+), 2 deletions(-)
>>  create mode 100644 libavutil/hwcontext_nvtegra.c
>>  create mode 100644 libavutil/hwcontext_nvtegra.h
>>
>> ...> +
>> +static int nvtegra_transfer_data(AVHWFramesContext *ctx, AVFrame *dst, const AVFrame *src) {
>> +    const AVFrame *swframe;
>> +    bool from;
>> +    int num_planes, i;
>> +
>> +    from    = !dst->hw_frames_ctx;
>> +    swframe = from ? dst : src;
>> +
>> +    if (swframe->hw_frames_ctx)
>> +        return AVERROR(ENOSYS);
>> +
>> +    num_planes = av_pix_fmt_count_planes(swframe->format);
>> +
>> +    for (i = 0; i < num_planes; ++i) {
>> +        if (((uintptr_t)swframe->data[i] & 0xff) || (swframe->linesize[i] & 0xff)) {
>> +            av_log(ctx, AV_LOG_WARNING, "Frame address/pitch not aligned to 256, "
>> +                                        "falling back to cpu transfer\n");
>> +            return nvtegra_cpu_transfer_data(ctx, dst, src, num_planes, from);
>
> Are you doing something somewhere to avoid this case?  It seems like it should be the normal one (given alignment is typically set signficantly lower than 256), so this warning would be very spammy.

Nothing is done on the FFmpeg side. I could print this warning 
just once per frames context.

>> +        }
>> +    }
>> +
>> +    return nvtegra_vic_transfer_data(ctx, dst, src, num_planes, from);
>> +}
>> +
>> +const HWContextType ff_hwcontext_type_nvtegra = {
>> +    .type                   = AV_HWDEVICE_TYPE_NVTEGRA,
>> +    .name                   = "nvtegra",
>> +
>> +    .device_hwctx_size      = sizeof(NVTegraDevicePriv),
>> +    .device_hwconfig_size   = 0,
>> +    .frames_hwctx_size      = 0,
>> +
>> +    .device_create          = &nvtegra_device_create,
>> +    .device_init            = &nvtegra_device_init,
>> +    .device_uninit          = &nvtegra_device_uninit,
>> +
>> +    .frames_get_constraints = &nvtegra_frames_get_constraints,
>> +    .frames_init            = &nvtegra_frames_init,
>> +    .frames_uninit          = &nvtegra_frames_uninit,
>> +    .frames_get_buffer      = &nvtegra_get_buffer,
>> +
>> +    .transfer_get_formats   = &nvtegra_transfer_get_formats,
>> +    .transfer_data_to       = &nvtegra_transfer_data,
>> +    .transfer_data_from     = &nvtegra_transfer_data,
>> +
>> +    .pix_fmts = (const enum AVPixelFormat[]) {
>> +        AV_PIX_FMT_NVTEGRA,
>> +        AV_PIX_FMT_NONE,
>> +    },
>> +};
>
> What controls whether frames are linear or not?
>
> It seems like the linear case could be exposed directly to the user rather than having to wrap it like this - the decoder could return read-only NV12 (or whatever) frames directly and they would work with other components.

NVDEC can only output in block linear format (tiled layout). I'm not 
very keen on exposing that data, considering cache management can be 
somewhat tricky. Typically, after a decode operations, the CPU cache
must be invalidated. 

>> diff --git a/libavutil/hwcontext_nvtegra.h b/libavutil/hwcontext_nvtegra.h
>> new file mode 100644
>> index 0000000000..8a2383d304
>> --- /dev/null
>> +++ b/libavutil/hwcontext_nvtegra.h
>> @@ -0,0 +1,85 @@
>> +/*
>> + * Copyright (c) 2024 averne <averne381@gmail.com>
>> + *
>> + * This file is part of FFmpeg.
>> + *
>> + * FFmpeg is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * FFmpeg is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along
>> + * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
>> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
>> + */
>> +
>> +#ifndef AVUTIL_HWCONTEXT_NVTEGRA_H
>> +#define AVUTIL_HWCONTEXT_NVTEGRA_H
>> +
>> +#include <stdint.h>
>> +
>> +#include "hwcontext.h"
>> +#include "buffer.h"
>> +#include "frame.h"

Le 05/06/2024 à 22:47, Mark Thompson a écrit :
> On 30/05/2024 20:43, averne wrote:
>> This includes hwdevice and hwframes objects.
>> As the multimedia engines work with tiled surfaces (block linear in nvidia jargon), two frame transfer methods are implemented.
>> The first makes use of the VIC to perform the copy. Since some revisions of the VIC (such as the one found in the tegra X1) did not support 10+ bit formats, these go through two separate copy steps for the luma and chroma planes.
>> The second method copies on the CPU, and is used as a fallback if the VIC constraints are not satisfied.
>>
>> Signed-off-by: averne <averne381@gmail.com>
>> ---
>>  libavutil/Makefile             |   7 +-
>>  libavutil/hwcontext.c          |   4 +
>>  libavutil/hwcontext.h          |   1 +
>>  libavutil/hwcontext_internal.h |   1 +
>>  libavutil/hwcontext_nvtegra.c  | 880 +++++++++++++++++++++++++++++++++
>>  libavutil/hwcontext_nvtegra.h  |  85 ++++
>>  6 files changed, 976 insertions(+), 2 deletions(-)
>>  create mode 100644 libavutil/hwcontext_nvtegra.c
>>  create mode 100644 libavutil/hwcontext_nvtegra.h
>>
>> ...> +
>> +static int nvtegra_transfer_data(AVHWFramesContext *ctx, AVFrame *dst, const AVFrame *src) {
>> +    const AVFrame *swframe;
>> +    bool from;
>> +    int num_planes, i;
>> +
>> +    from    = !dst->hw_frames_ctx;
>> +    swframe = from ? dst : src;
>> +
>> +    if (swframe->hw_frames_ctx)
>> +        return AVERROR(ENOSYS);
>> +
>> +    num_planes = av_pix_fmt_count_planes(swframe->format);
>> +
>> +    for (i = 0; i < num_planes; ++i) {
>> +        if (((uintptr_t)swframe->data[i] & 0xff) || (swframe->linesize[i] & 0xff)) {
>> +            av_log(ctx, AV_LOG_WARNING, "Frame address/pitch not aligned to 256, "
>> +                                        "falling back to cpu transfer\n");
>> +            return nvtegra_cpu_transfer_data(ctx, dst, src, num_planes, from);
> 
> Are you doing something somewhere to avoid this case?  It seems like it should be the normal one (given alignment is typically set signficantly lower than 256), so this warning would be very spammy.
> 
>> +        }
>> +    }
>> +
>> +    return nvtegra_vic_transfer_data(ctx, dst, src, num_planes, from);
>> +}
>> +
>> +const HWContextType ff_hwcontext_type_nvtegra = {
>> +    .type                   = AV_HWDEVICE_TYPE_NVTEGRA,
>> +    .name                   = "nvtegra",
>> +
>> +    .device_hwctx_size      = sizeof(NVTegraDevicePriv),
>> +    .device_hwconfig_size   = 0,
>> +    .frames_hwctx_size      = 0,
>> +
>> +    .device_create          = &nvtegra_device_create,
>> +    .device_init            = &nvtegra_device_init,
>> +    .device_uninit          = &nvtegra_device_uninit,
>> +
>> +    .frames_get_constraints = &nvtegra_frames_get_constraints,
>> +    .frames_init            = &nvtegra_frames_init,
>> +    .frames_uninit          = &nvtegra_frames_uninit,
>> +    .frames_get_buffer      = &nvtegra_get_buffer,
>> +
>> +    .transfer_get_formats   = &nvtegra_transfer_get_formats,
>> +    .transfer_data_to       = &nvtegra_transfer_data,
>> +    .transfer_data_from     = &nvtegra_transfer_data,
>> +
>> +    .pix_fmts = (const enum AVPixelFormat[]) {
>> +        AV_PIX_FMT_NVTEGRA,
>> +        AV_PIX_FMT_NONE,
>> +    },
>> +};
> 
> What controls whether frames are linear or not?
> 
> It seems like the linear case could be exposed directly to the user rather than having to wrap it like this - the decoder could return read-only NV12 (or whatever) frames directly and they would work with other components.
> 
>> diff --git a/libavutil/hwcontext_nvtegra.h b/libavutil/hwcontext_nvtegra.h
>> new file mode 100644
>> index 0000000000..8a2383d304
>> --- /dev/null
>> +++ b/libavutil/hwcontext_nvtegra.h
>> @@ -0,0 +1,85 @@
>> +/*
>> + * Copyright (c) 2024 averne <averne381@gmail.com>
>> + *
>> + * This file is part of FFmpeg.
>> + *
>> + * FFmpeg is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * FFmpeg is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along
>> + * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
>> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
>> + */
>> +
>> +#ifndef AVUTIL_HWCONTEXT_NVTEGRA_H
>> +#define AVUTIL_HWCONTEXT_NVTEGRA_H
>> +
>> +#include <stdint.h>
>> +
>> +#include "hwcontext.h"
>> +#include "buffer.h"
>> +#include "frame.h"
>> +#include "pixfmt.h"
>> +
>> +#include "nvtegra.h"
>> +
>> +/*
>> + * Encode a hardware revision into a version number
>> + */
>> +#define AV_NVTEGRA_ENCODE_REV(maj, min) (((maj & 0xff) << 8) | (min & 0xff))
>> +
>> +/*
>> + * Decode a version number
>> + */
>> +static inline void av_nvtegra_decode_rev(int rev, int *maj, int *min) {
>> +    *maj = (rev >> 8) & 0xff;
>> +    *min = (rev >> 0) & 0xff;
>> +}
>> +
>> +/**
>> + * @file
>> + * API-specific header for AV_HWDEVICE_TYPE_NVTEGRA.
>> + *
>> + * For user-allocated pools, AVHWFramesContext.pool must return AVBufferRefs
>> + * with the data pointer set to an AVNVTegraMap.
>> + */
>> +
>> +typedef struct AVNVTegraDeviceContext {
>> +    /*
>> +     * Hardware multimedia engines
>> +     */
>> +    AVNVTegraChannel nvdec_channel, nvenc_channel, nvjpg_channel, vic_channel;
> 
> Does a user need to supply all of these when making a device?
> 
>> +
>> +    /*
>> +     * Hardware revisions for associated engines, or 0 if invalid
>> +     */
>> +    int nvdec_version, nvenc_version, nvjpg_version, vic_version;
> 
> Why does a user setting up a device context need to supply the version numbers for each thing?
> 
>> +} AVNVTegraDeviceContext;
>> +
>> +typedef struct AVNVTegraFrame {
>> +    /*
>> +     * Reference to an AVNVTegraMap object
>> +     */
>> +    AVBufferRef *map_ref;
>> +} AVNVTegraFrame;
> 
> What is the indirection doing here?  Can't it return the buffer inside this structure instead of making an intermediate structure?
> 
>> +
>> +/*
>> + * Helper to retrieve a map object from the corresponding frame
>> + */
>> +static inline AVNVTegraMap *av_nvtegra_frame_get_fbuf_map(const AVFrame *frame) {
>> +    return (AVNVTegraMap *)((AVNVTegraFrame *)frame->buf[0]->data)->map_ref->data;
>> +}
>> +
>> +/*
>> + * Converts a pixel format to the equivalent code for the VIC engine
>> + */
>> +int av_nvtegra_pixfmt_to_vic(enum AVPixelFormat fmt);
>> +
>> +#endif /* AVUTIL_HWCONTEXT_NVTEGRA_H */
> 
> The mix of normal implementation code and weird specific detail for the particular platform is pretty nasty.  It does seem like exporting more of this into a separate library rather than embedding it in ffmpeg would be a good idea.
> 
> Thanks,
> 
> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>> +#include "pixfmt.h"
>> +
>> +#include "nvtegra.h"
>> +
>> +/*
>> + * Encode a hardware revision into a version number
>> + */
>> +#define AV_NVTEGRA_ENCODE_REV(maj, min) (((maj & 0xff) << 8) | (min & 0xff))
>> +
>> +/*
>> + * Decode a version number
>> + */
>> +static inline void av_nvtegra_decode_rev(int rev, int *maj, int *min) {
>> +    *maj = (rev >> 8) & 0xff;
>> +    *min = (rev >> 0) & 0xff;
>> +}
>> +
>> +/**
>> + * @file
>> + * API-specific header for AV_HWDEVICE_TYPE_NVTEGRA.
>> + *
>> + * For user-allocated pools, AVHWFramesContext.pool must return AVBufferRefs
>> + * with the data pointer set to an AVNVTegraMap.
>> + */
>> +
>> +typedef struct AVNVTegraDeviceContext {
>> +    /*
>> +     * Hardware multimedia engines
>> +     */
>> +    AVNVTegraChannel nvdec_channel, nvenc_channel, nvjpg_channel, vic_channel;
>
> Does a user need to supply all of these when making a device?

These are filled out by the library when creating the hardware device. 

>> +
>> +    /*
>> +     * Hardware revisions for associated engines, or 0 if invalid
>> +     */
>> +    int nvdec_version, nvenc_version, nvjpg_version, vic_version;
>
> Why does a user setting up a device context need to supply the version numbers for each thing?

Same as above, the version is filled when creating the context.
Knowledge of the hardware revision is useful when determining hardware
capabilities.

>> +} AVNVTegraDeviceContext;
>> +
>> +typedef struct AVNVTegraFrame {
>> +    /*
>> +     * Reference to an AVNVTegraMap object
>> +     */
>> +    AVBufferRef *map_ref;
>> +} AVNVTegraFrame;
>
> What is the indirection doing here?  Can't it return the buffer inside this structure instead of making an intermediate structure?

That's an artifact of an optimization regarding my mpv graphics 
backend.
Since mapping/unmapping data in the GPU is sort of expensive (gmmu 
configuration), the mpv mapper would keep a reference to the frame 
object to avoid constantly doing mapping operations on the same 
memory ranges.
I'll remove that.

>> +
>> +/*
>> + * Helper to retrieve a map object from the corresponding frame
>> + */
>> +static inline AVNVTegraMap *av_nvtegra_frame_get_fbuf_map(const AVFrame *frame) {
>> +    return (AVNVTegraMap *)((AVNVTegraFrame *)frame->buf[0]->data)->map_ref->data;
>> +}
>> +
>> +/*
>> + * Converts a pixel format to the equivalent code for the VIC engine
>> + */
>> +int av_nvtegra_pixfmt_to_vic(enum AVPixelFormat fmt);
>> +
>> +#endif /* AVUTIL_HWCONTEXT_NVTEGRA_H */
>
> The mix of normal implementation code and weird specific detail for the particular platform is pretty nasty.  It does seem like exporting more of this into a separate library rather than embedding it in ffmpeg would be a good idea.
>
> Thanks,
>
> - Mark
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".